# Citation validity assessement with local Graph RAG (Ollama + langchain)

Years of serving as a reviewer have taught me that people tend to be rather sloppy with citations. It is common both to omit essential citations (either because authors are not familiar with the literature to sufficient extent or other reasons), but also to cite irrelevant papers just to get some self-citations or boost collegues citation record. Moreover, sometimes cited papers are relevant but do not actually support conclusions by the authors, and that's the worst case in my opinion. Simply due to the fact that to do proper review the referee must theoretically read all this body of literature, which is sometimes tedious (especially if the field is not totally aligned with referee's expertise). In this project I try to deal with this case, namely the goals are to:

1) extract citations from a given (arbitrary) article
2) determine why they are cited by the authors (according to authors)
3) check whether the reasons for citation align with the conclusions of the cited papers themselves (sentiment analysis)

The nature of the project is educational hence I'd like to try to do all of this with local software to avoid paying for services (even if a few cents) and to check how good can small local models be. Obviously, production solution would probably tap to larger LLMs/more sophisticated techniques/prompts. It would also require modifications to agentic part as I'm tailoring the project to astrophysical papers and hence use NASA's ADS system which is not that great for other fields.

To start with, let us import some libraries, define some functions to work with ADS's pdfs, and the paper to be analyzed:

In [1]:
# install dependencies if needed
#!python -m pip install ads chroma langchain-community langchain-experimental langchain-chroma langchain-text-splitters

In [3]:
import ads
# load enviromental variables for API keys. Don't forget to put those in .env file
# specifically OPENAI_API_KEY, PINECONE_API_KEY, and ADS_API_KEY
from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

import requests, urllib, tempfile, os
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import PDFPlumberLoader
from functools import reduce

pdf_priority = ['ads_pdf','eprint_pdf','pub_pdf'] # try ADS-stored pdf, then arxiv, then publishers (they have captchas)


import pymupdf as fitz  # PyMuPDF
pdf_priority = ['ads_pdf','eprint_pdf','pub_pdf'] # try ADS-stored pdf, then arxiv, then publishers (they have captchas)

def extract_text_from_pdf_fitz(pdf_path):
    document = fitz.open(pdf_path)
    return [document.load_page(i) for i in range(len(document))]
    

def download_file(bibcode,priority):
    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
        temp_filename = temp_file.name
    request = f"https://api.adsabs.harvard.edu/v1/resolver/{bibcode}/{pdf_priority[priority]}"
    # print(request)
    response = requests.get(request,headers={'Authorization': 'Bearer ' + os.getenv('ADS_API_KEY')})
    if response.ok:
        url = response.json()['link']
        urllib.request.urlretrieve(url, temp_filename)
        return temp_filename
    else:
        return False
    
def get_fulltext(bibcode):
    text = ''
    for i in [0,1,2]:
        try:
            pdf = download_file(bibcode,i)
            text = extract_text_from_pdf_fitz(pdf)
            os.remove(pdf)
            break
        except:
            continue
    return text

bibcode = '2017MNRAS.466.2143D' # ADS bibcode of the paper to analyze
docs = get_fulltext(bibcode)
# fulltext = reduce(lambda x,y: f"{x}\n{y}\n", [x.page_content for x in docs]) # pdfplumber
fulltext = reduce(lambda x,y: f"{x}\n{y}\n", [x.get_text() for x in docs])

As one can see, feeding the full text is not feasible for smaller models, i.e. most papers will not fit in context window and even if they do, the output is kind of mixed. Models like gemma2 resort to creative writing stype output whereas llama gives something closer to what we want, but still not exactly what we want. 

In [4]:
# one can experiment with various models/settings
# from langchain_openai import ChatOpenAI
# llm = ChatOpenAI(model="gpt-4o",temperature=0)
llm = ChatOllama(model="gemma2", temperature=0)
# llm = ChatOllama(model="mistral", temperature=0)
# llm = ChatOllama(model="llama3", temperature=0)


In [5]:
prompt = ChatPromptTemplate.from_template("extract full list of citations from the scientific article text below. Each citation appears in format author, year in the text. Only output list of references in format (author; year) without further analysis or other text included in the output. The text is: {topic}")
chain = prompt | llm | StrOutputParser()
print(chain.invoke({"topic": fulltext}))

This text appears to be an excerpt from an astronomy research paper published in the Monthly Notices of the Royal Astronomical Society (MNRAS). 

Here's a breakdown of the content and key takeaways:

**Subject:** The paper focuses on the study of a neutron star, likely a low-mass X-ray binary (LMXB), based on observations from various space telescopes like INTEGRAL, NuSTAR, and Swift.

**Key Findings:**

* **Rapid Evolution of Magnetic Field:** The authors observed significant changes in the energy of the cyclotron resonance scattering feature (CRSF) during outbursts of the neutron star. This implies a rapid evolution of the neutron star's magnetic field, potentially faster than previously observed.
* **Geometry of Emission Region:** The authors propose that the observed changes in CRSF energy are not solely due to magnetic field variations but are also linked to changes in the geometry of the emission region, possibly influenced by the accretion process.
* **Transition to Sub-critical

If we ask about a given citation the output is also so-so:

In [6]:
prompt = ChatPromptTemplate.from_template("Based on the following text, why citation Mushtukov et al. (2015a) is cited? Respond coincisely in one sentence. The text: {topic}")
chain = prompt | llm | StrOutputParser()
print(chain.invoke({"topic": fulltext}))

This text appears to be an excerpt from an astronomy research paper published in the Monthly Notices of the Royal Astronomical Society (MNRAS). 

Here's a breakdown of the content and key takeaways:

**Subject:** The paper focuses on the study of a neutron star, likely a low-mass X-ray binary (LMXB), based on observations from various space telescopes like INTEGRAL, NuSTAR, and Swift.

**Key Findings:**

* **Rapid Evolution of Magnetic Field:** The authors observed significant changes in the energy of the cyclotron resonance scattering feature (CRSF) during outbursts of the neutron star. This implies a rapid evolution of the neutron star's magnetic field, potentially faster than previously observed.
* **Geometry of Emission Region:** The authors propose that the observed changes in CRSF energy are not solely due to magnetic field variations but are also linked to changes in the geometry of the emission region, possibly influenced by the accretion process.
* **Transition to Sub-critical

But it improves if we give proper context, i.e. we need RAG:

In [7]:
text="On the other hand, analysis of the luminosity dependence of the observed properties of X-ray pulsars might help to constrain the critical luminosity observationally (Tsygankov et al. 2006; Staubert et al. 2007; Klochkov et al. 2012). Indeed, in low luminous sources, the CRSF energy typically increases with the flux, whereas at higher accretion rates, an anticorrelation is observed. As discussed by Staubert et al. (2007), Becker et al. (2012), Mushtukov et al. (2015a)"
prompt = ChatPromptTemplate.from_template("Based on the following text, why citation Mushtukov et al. (2015a) is cited? Respond coincisely in one sentence. The text: {topic}")
chain = prompt | llm | StrOutputParser()
print(chain.invoke({"topic": text}))

Mushtukov et al. (2015a) is cited because they discuss the anticorrelation observed between CRSF energy and accretion rates in high luminosity sources.  



Use *langchain* and *Chroma DB* for implementation:

In [8]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma

# use noic embeddings from Ollama (model must be pulled before)
embedder = OllamaEmbeddings(model = 'nomic-embed-text')

# split the text to chunks
# text_splitter = SemanticChunker(embedder)


text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600,
    chunk_overlap=300,
)

documents = text_splitter.split_text(fulltext.replace('\n',' '))

# create vector store to be able to lookup
vector = Chroma.from_texts(documents, embedder)
print(len(documents))

142


Test that the search works:

In [32]:
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retrieved_docs = retriever.invoke("Mushtukov et al 2015a")
retrieved_docs

[Document(page_content='and Technology, Moscow region, Dolgoprudnyi, Russia 7 Nordita, KTH Royal Institute of Technology and Stockholm University, Roslagstullsbacken 23, SE-10691 Stockholm, Sweden Accepted XXX. Received YYY; in original form ZZZ ABSTRACT We report on the analysis of NuSTAR observations of the Be-transient X-ray pulsar V 0332+53 during the giant outburst in 2015 and another minor outburst in 2016. We conﬁrm the cyclotron-line energy – luminosity correlation previously reported in the source and the line energy decrease during the giant outburst. Based on 2016 observations we ﬁnd that a year later the'),
 Document(page_content='However, we were able to do it for three observations where the SPI spectrometer was operating (detectors annealing was performed during revolution 1586). The INTEGRAL/SPI 1 http://www.swift.ac.uk/user_objects/ data were screened and reduced in accordance with the procedures described by Churazov et al. (2011, 2014). The broadband spectrum of the 

Now we can use the store to build a RAG itself using langchain. The first task we need to solve is to extract the citations themselves and determine why they are cited. The first part is actually most easily done without LLMs, i.e. just by looking at references section of the paper:

In [33]:
import numpy as np
import re
candidate_refs = reduce(lambda x,y: x+y,[x.split('\n') for x in documents[np.where([x.find('REFERENCES')>0 for x in documents])[0][0]:]])
references = [x for x in candidate_refs if re.findall(r'\b(19|20)\d{2}\b',x)]

In [34]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt_template = (
    "Based on the following text, why the following reference is cited?"
    "Reference: {input}"
    "Text: {context}"
    "Prefer concrete reasons to general reasons in your answer, if no concrete reasons can be identified just answer NaN."
    "Respond coincisely in one sentence and only include answer to the question posed itself without introduction."
    "Do not include the reference itself explicitly or implicitly in your answer"
    "If you dont know the answer, answer NaN"
)

retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 5}) # effectively k = k-1 as one reference will always be in the reference section
prompt = ChatPromptTemplate.from_template(prompt_template)
question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)

i=0
print(references[i],
      chain.invoke({"input": references[i]})['answer']
     )

(VD), Deutsche Forschungsgemeinschaft (DFG) through WE 1312/48-1 (VS), the Russian Science Foundation through grant 14-12-01287 (AAM, SST, AAL), the Academy of Finland through grant 268740 and the Foundations’ Professor Pool, the Finnish Cultural Foun- dation (JP). REFERENCES Basko M. M., Sunyaev R. A., 1976, MNRAS, 175, 395 Becker P. A., et al., 2012, A&A, 544, A123 Choudhuri A. R., Konar S., 2002, MNRAS, 332, 933 Churazov E., et al., 2011, MNRAS, 411, 1727 Churazov E., et al., 2014, Nature, 512, 406 Cusumano G., La Parola V., D’A` ı A., Segreto A., Tagliaferri G., Barthelmy S. D., Gehrels The reference is cited to provide funding information for the research. 



As one can see, results are better, but still not fully adequate. Moreover, parsing citations is not 100% working, and there's no way we can do sentiment analysis if we cant get context of actual papers (or their abstracts) from ADS, i.e. we need bibcodes of the papers in question anyway. The easiest part to get them is from ads itself, i.e. by searching for references in a given paper. Here we can also extract the first author/year in a cleaner way than possible with LLMs based on parsed pdf itself:

In [35]:
from urllib.parse import urlencode

def get_refs(bibcode):
    encoded_query = urlencode({"q": f"references(bibcode:{bibcode})",
                           "fq": "database:astronomy",
                           "fl": "author, year, abstract, bibcode",
                            "rows": 1000})
    results = requests.get("https://api.adsabs.harvard.edu/v1/search/query?{}".format(encoded_query), \
                       headers={'Authorization': 'Bearer ' + os.getenv('ADS_API_KEY')})
    first_author = [x['author'][0].split(',')[0] for x in results.json()['response']['docs']]
    year = [x['year'] for x in results.json()['response']['docs']]
    bibcode = [x['bibcode'] for x in results.json()['response']['docs']]
    abstract = [x.get('abstract','') for x in results.json()['response']['docs']]
    return [(f"{x[0]} et al. {x[1]}",x[2],x[3]) for x in zip(first_author,year,abstract,bibcode)]

In [36]:
refs_ads = get_refs(bibcode)

In [37]:
refs_ads[0]

('Gehrels et al. 1986',
 'The calculation of limits for small numbers of astronomical counts is based on standard equations derived from Poisson and binomial statistics; although the equations are straightforward, their direct use is cumbersome and involves both table-interpolations and several mathematical operations. Convenient tables and approximate formulae are here presented for confidence limits which are based on such Poisson and binomial statistics. The limits in the tables are given for all confidence levels commonly used in astrophysics.',
 '1986ApJ...303..336G')

In [38]:
i=0
print(refs_ads[i][0],
      chain.invoke({"input": refs_ads[i][0]})['answer']
     )

Gehrels et al. 1986 The reference is cited because it describes the use of Gehrels weighting in spectral fitting. 



We can also create a retriever corresponding to given reference based on the results. This is not really useful for sentiment analysis, but can be useful to enchance explanation generation (additional context). It makes sense also to use a bit more advanced promt (multi-shot prompting) and store results in a pandas dataframe for future use:

In [39]:
import pandas as pd

In [65]:
abstract_dic = {x[0]:(x[1],x[2]) for x in refs_ads}

prompt_template = (
    "Based on the context below, why is the following reference (followed by its abstract) cited and whether its relevant?"
    "Reference: {input}"
    "Context: {context}"
    "Here are several examples on how you can answer the question given the context:"
    "Example 1:"
    "reference: Chernyakova et al. 2015"
    "context: In the X-ray regime, PSR B1259- 63/LS 2883 is detected during its entire orbit with a non-thermal, non-pulsed spectrum (Marino et al. 2023). While the X-ray flux level is minimal around apastron, close to the periastron passage the keV light curve is typically characterised by two maxima roughly coinciding with the times of the disappearance and re-appearance of pulsed radio emission (see e.g. Chernyakova et al. 2015)."
    "answer: coincidence of peaks in X-ray curve with disappearance and re-appearance of radio emission. The reference is relevant as it discusses multiwavelength properties of the source (in this case radio) which are relevant for discussion of physics of emission in X-ray band in the paper"
    "Example 2:"
    "reference: Cao et al. (2020)"
    "context: Insight-HXMT include three collimated telescopes: the High Energy X-ray telescope (HE, NaI/CsI, 20-250 keV), the Medium Energy X-ray telescope (ME, Si pin detector, 5–30 keV), and the Low Energy X-ray telescope (LE, Swept Charge Device detector, 0.7–13 keV), working in scanning and pointing observational modes and Gamma Ray Burst (GRB) mode. For details about the Insight-HXMT mission see Zhang et al. (2019), Cao et al. (2020), Chen et al. (2020), and Liu et al. (2020)."
    "answer: details of Insight-HXMT imssion, in particular, the medium energy (ME) telescope. The reference is relevant as it discusses properties of the instrumentation used in the paper."
    "Example 3:"
    "reference: Tsygankov et al. 2019a"
    "From an observational point of view, the spectra of XRPs at high luminosities (>1036 erg s−1 ) have similar shapes, which can be well fitted by a power law with an exponential cut-off at high energies (e.g. Nagase 1989; Filippova et al. 2005). However, it was recently discovered that the decrease of the observed luminosity below this value is accompanied by dramatic changes of the energy spectra in several XRPs (Tsygankov et al. 2019a,b; Doroshenko et al. 2021; Lutovinov et al. 2021), pointing to varied physical and geometrical properties of the emission region."
    "answer: paper is cited because it reports drammatic deviation of the observed shape of spectra of X-ray pulsar from typical cutoff power law spectra below critical luminosity of ~10e36 erg/s. The reference is relevant as it reports discovery of spectral transition which the paper aims to explain from physical perspective."
    "Besides examples, please only respond with the actual answer omitting introduction and repetitions"
    "Be sure to verify that your answer is related to reference abstract in some way"
    "Be sure to discuss whether citation is relevant for the context and why"
    "Only consider concrete reasons and avoid general statements. If no concrete reason can be identified or you dont know the answer just answer NaN"
)



prompt = ChatPromptTemplate.from_template(prompt_template)
question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)

def why(reference_key):
    return chain.invoke({"input": reference_key + abstract_dic.get(reference_key,('',''))[0]})['answer'].strip()

In [66]:
why('Evans et al. 2009')

'The reference is cited because it presents the largest published sample of X-ray GRB data and details the methods used to analyze this data by the Swift-XRT team. This is relevant to the context as the authors use the Swift/XRT light curve of the source in July-August 2016 and mention using the Swift data products service provided by the UK Swift Science Data Centre, which is described in Evans et al. (2009).'

The easiest way to run chain for all citations is to create pandas dataframe and apply the function above to calculate motivation column. Note that we are using local LLMs which are not superquick on a typical laptop, so use tqdm for progress monitoring: 

In [67]:
df = pd.DataFrame({'Author year': abstract_dic.keys(), 'bibcode': [abstract_dic[x][1] for x in abstract_dic.keys()],'abstract': [abstract_dic[x][0] for x in abstract_dic.keys()]})

In [68]:
from tqdm import tqdm
tqdm.pandas()
df['motivation'] = df['Author year'].progress_apply(why)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [08:47<00:00, 13.87s/it]


In [69]:
pd.set_option('display.max_colwidth', None)
df

Unnamed: 0,Author year,bibcode,abstract,motivation
0,Gehrels et al. 1986,1986ApJ...303..336G,"The calculation of limits for small numbers of astronomical counts is based on standard equations derived from Poisson and binomial statistics; although the equations are straightforward, their direct use is cumbersome and involves both table-interpolations and several mathematical operations. Convenient tables and approximate formulae are here presented for confidence limits which are based on such Poisson and binomial statistics. The limits in the tables are given for all confidence levels commonly used in astrophysics.","The reference is cited because it provides convenient tables and approximate formulae for confidence limits based on Poisson and binomial statistics, which are used in the context to calculate confidence limits for astronomical counts."
1,Evans et al. 2009,2009MNRAS.397.1177E,"We present a homogeneous X-ray analysis of all 318 gamma-ray bursts detected by the X-ray telescope (XRT) on the Swift satellite up to 2008 July 23; this represents the largest sample of X-ray GRB data published to date. In Sections 2-3, we detail the methods which the Swift-XRT team has developed to produce the enhanced positions, light curves, hardness ratios and spectra presented in this paper. Software using these methods continues to create such products for all new GRBs observed by the Swift-XRT. We also detail web-based tools allowing users to create these products for any object observed by the XRT, not just GRBs. In Sections 4-6, we present the results of our analysis of GRBs, including probability distribution functions of the temporal and spectral properties of the sample. We demonstrate evidence for a consistent underlying behaviour which can produce a range of light-curve morphologies, and attempt to interpret this behaviour in the framework of external forward shock emission. We find several difficulties, in particular that reconciliation of our data with the forward shock model requires energy injection to continue for days to weeks.","The reference is cited because it presents the largest published sample of X-ray GRB data and details the methods used to analyze this data by the Swift-XRT team. This is relevant to the context as the authors use the Swift/XRT light curve of the source in July-August 2016 and mention using the Swift data products service provided by the UK Swift Science Data Centre, which is described in Evans et al. (2009)."
2,Ghosh et al. 1979,1979ApJ...234..296G,"The solutions of the two-dimensional hydromagnetic equations are used to calculate the torque on a magnetic neutron star accreting from a Keplerian disk. It is found that the magnetic coupling between the star and the plasma in the outer transition zone is appreciable; that as a result, the spin-up torque on fast rotators is substantially less than that on slow rotators, and that for sufficiently high stellar angular velocities or sufficiently low mass accretion rates, the rotation of the star can be braked while accretion continues. These results are applied to pulsating X-ray sources, revealing that at high luminosities a star of given spin period rotating in the same direction as the disk can experience either spin-up or spin-down, depending on its luminosity. Also considered are the general problem of interpreting period changes in pulsating X-ray sources, and the dipole magnetic moments of nine pulsating X-ray sources are estimated by fitting the theoretical spin-up equation to estimates of the average luminosity and spin-up rate of each source.","The reference is cited because it discusses the torque on a magnetic neutron star accreting from a Keplerian disk, and how the magnetic coupling between the star and the plasma can lead to spin-up or spin-down depending on the stellar angular velocity and mass accretion rate. This is relevant to the context because the paper is discussing the spin evolution of a neutron star during an outburst, which is influenced by the balance of accelerating and braking torques."
3,Titarchuk et al. 1994,1994ApJ...434..570T,"The theory of spectral formation in thermal X-ray sources, where the effects of Comptonization and Klein-Nishina corrections are important, is presented. Analytical expressions are obtained for the produced spectrum as a function of such input parameters as the plasma temperature, the optical depth of the plasma cloud and the injected soft photon spectrum. The analytical theory developed here takes into account the dependence of the scattering opacity on the photon energy. It is shown that the plasma temperature as well as the asymptotic rate of photon escape from the plasma cloud determine the shape of the upscattered hard tail in the emergent spectra, even in the case of very small optical depths. The escape distributions of photons are given for any optical depth of the plasma cloud and their asymptotic dependence for very small and large optical depths are examined. It is shown that this new generalized approach can fit spectra for a large variety of hard X-ray sources and determine the plasma temperature in the region of main energy release in Cyg X-1 and the Seyfert galaxy NGC 4151.","The reference is cited because it presents a theoretical model for X-ray spectral formation in thermal sources, taking into account Comptonization and Klein-Nishina corrections. This is relevant to the context as the paper discusses the transition between different accretion regimes in X-ray pulsars, which likely involves changes in the X-ray spectral shape."
4,Protassov et al. 2002,2002ApJ...571..545P,"The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, &amp; Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ<SUP>2</SUP> and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ<SUP>2</SUP> distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.","The reference is cited because it points out that the likelihood ratio test (LRT) and the F-test, commonly used in astrophysics, can fail to adhere to their nominal χ² and F-distributions in certain statistical tests. This is relevant to the context because the paper discusses the use of the MLR test (which is related to the LRT) to compare two models of the data. The authors acknowledge that the assumptions of the MLR test might be violated, and the cited reference highlights the potential issues with these tests when applied inappropriately."
5,Basko et al. 1976,1976MNRAS.175..395B,"Accretion on to a magnetized neutron star for high accretion rates, when one can no longer ignore the back-reaction of emergent light on the infalling material, is discussed in detail. The equations of hydrodynamics and radiative diffusion are solved analytically in a one-dimensional approximation. The luminosity is evaluated beyond which one should allow for the dynamic effect of emergent light on the infalling material. The limiting X-ray luminosity of accreting magnetized neutron stars is shown to depend crucially on the geometry of the accretion channel. The effects connected with the gas flow along the magnetospheric surface are discussed in detail. The plasma layer on the Alfven surface is shown to be optically thick with respect to Thomson scattering and to reradiate in soft X-rays a considerable fraction of the primary X-ray flux. A necessary condition for the X-ray luminosity to exceed the Eddington limit is a certain degree of asymmetry in the distribution of matter over the Alfven surface.","The reference is cited because it discusses the effects of radiative pressure on accreting neutron stars at high accretion rates, a topic relevant to understanding the observed luminosities of bright pulsars which exceed the local Eddington limit. The paper explains how the plasma must be stopped above the neutron star surface by radiative pressure, leading to emission from an extended ""accretion column"". This is relevant to the context as it discusses the transition between different accretion regimes and the resulting changes in emission properties."
6,Becker et al. 2012,2012A&A...544A.123B,"Context. Accretion-powered X-ray pulsars exhibit significant variability of the cyclotron resonance scattering feature (CRSF) centroid energy on pulse-to-pulse timescales, and also on much longer timescales. Two types of spectral variability are observed. For sources in group 1, the CRSF energy is negatively correlated with the variable source luminosity, and for sources in group 2, the opposite behavior is observed. The physical basis for this bimodal behavior is currently not well understood. <BR /> Aims: We explore the hypothesis that the accretion dynamics in the group 1 sources is dominated by radiation pressure near the stellar surface, and that Coulomb interactions decelerate the gas to rest in the group 2 sources. <BR /> Methods: We derive a new expression for the critical luminosity, L<SUB>crit</SUB>, such that radiation pressure decelerates the matter to rest in sources with X-ray luminosity L<SUB>X</SUB> &gt; L<SUB>crit</SUB>. The formula for L<SUB>crit</SUB> is based on a simple physical model for the structure of the accretion column in luminous X-ray pulsars that takes into account radiative deceleration, the energy dependence of the cyclotron cross section, the thermodynamics of the accreting gas, the dipole structure of the pulsar magnetosphere, and the diffusive escape of radiation through the column walls. We show that for typical neutron star parameters, Lcrit = 1.5 × 10<SUP>37</SUP> B<SUB>12<SUP>16/15</SUP> erg s</SUB><SUP>-1</SUP>, where B<SUB>12</SUB> is the surface magnetic field strength in units of 10<SUP>12</SUP> G. Results: The formula for the critical luminosity is evaluated for five sources, using the maximum value of the CRSF centroid energy to estimate the surface magnetic field strength B<SUB>12</SUB>. The results confirm that the group 1 sources are supercritical (L<SUB>X</SUB> &gt; L<SUB>crit</SUB>) and the group 2 sources are subcritical (L<SUB>X</SUB> &lt; L<SUB>crit</SUB>), although the situation is less clear for those highly variable sources that cross over the line L<SUB>X</SUB> = L<SUB>crit</SUB>. We also explain the variation of the CRSF energy with luminosity as a consequence of the variation of the characteristic emission height. The sign of this dependence is opposite in the supercritical and subcritical cases, hence creating the observed bimodal behavior. Conclusions: We have developed a new model for the critical luminosity in accretion-powered X-ray pulsars that explains the bimodal dependence of the CRSF centroid energy on the X-ray luminosity L<SUB>X</SUB>. Our model provides a physical basis for the observed variation of the CRSF energy as a function of L<SUB>X</SUB> for both the group 1 (supercritical) and the group 2 (subcritical) sources as a result of the variation of the emission height in the column.","The reference is cited because it discusses the bimodal dependence of the CRSF centroid energy on the X-ray luminosity in a sample of X-ray pulsars, which is the central topic of the paper. The paper by Becker et al. (2012) observes this bimodal behavior and proposes that it could be related to different accretion regimes (sub- and super-critical) in the sources."
7,Mushtukov et al. 2015,2015MNRAS.454.2714M,"Cyclotron resonance scattering features observed in the spectra of some X-ray pulsars show significant changes of the line centroid energy with the pulsar luminosity. Whereas for bright sources above the so-called critical luminosity, these variations are established to be connected with the appearance of the high-accretion column above the neutron star surface, at low, sub-critical luminosities the nature of the variations (but with the opposite sign) has not been discussed widely. We argue here that the cyclotron line is formed when the radiation from a hotspot propagates through the plasma falling with a mildly relativistic velocity on to the neutron star surface. The position of the cyclotron resonance is determined by the Doppler effect. The change of the cyclotron line position in the spectrum with luminosity is caused by variations of the velocity profile in the line-forming region affected by the radiation pressure force. The presented model has several characteristic features: (i) the line centroid energy is positively correlated with the luminosity; (ii) the line width is positively correlated with the luminosity as well; (iii) the position and the width of the cyclotron absorption line are variable over the pulse phase; (iv) the line has a more complicated shape than widely used Lorentzian or Gaussian profiles; (v) the phase-resolved cyclotron line centroid energy and the width are negatively and positively correlated with the pulse intensity, respectively. The predictions of the proposed theory are compared with the variations of the cyclotron line parameters in the X-ray pulsar GX 304-1 over a wide range of sub-critical luminosities as seen by the INTEGRAL observatory.","The reference is cited because it proposes a model for the observed changes in the cyclotron line centroid energy with luminosity in X-ray pulsars, particularly at sub-critical luminosities where the nature of these variations has not been widely discussed. The model attributes the variations to changes in the velocity profile of the plasma in the line-forming region due to radiation pressure force. This is relevant to the context as the paper also discusses the luminosity dependence of the cyclotron line parameters in the X-ray pulsar GX 304-1 and aims to understand the physical processes behind these variations."
8,Poutanen et al. 2013,2013ApJ...777..115P,"Cyclotron resonance scattering features observed in the spectra of some X-ray pulsars show significant changes of the line energy with the pulsar luminosity. At high luminosities, these variations are often associated with the onset and growth of the accretion column, which is believed to be the origin of the observed emission and of the cyclotron lines. However, this scenario inevitably implies a large gradient of the magnetic field strength within the line-forming region, which makes the formation of the observed line-like features problematic. Moreover, the observed variation of the cyclotron line energy is much smaller than could be anticipated for the corresponding luminosity changes. We argue here that a more physically realistic situation is that the cyclotron line forms when the radiation emitted by the accretion column is reflected from the neutron star surface, where the gradient of the magnetic field strength is significantly smaller. Here we develop a reflection model and apply it to explain the observed variations of the cyclotron line energy in a bright X-ray pulsar V 0332+53 over a wide range of luminosities.","The reference is cited because it proposes a reflection model for cyclotron resonance scattering features (CRSF) formation, where radiation from the accretion column is reflected from the neutron star surface. This model is relevant to the context because the paper discusses the observed changes in CRSF energy in X-ray pulsar V 0332+53 during outbursts and suggests that these changes could be due to alterations in the emission region geometry, potentially supported by the reflection model proposed by Poutanen et al. (2013)."
9,Tsygankov et al. 2016,2016A&A...593A..16T,"<BR /> Aims: We present the results of the monitoring programmes performed with the Swift/XRT telescope and aimed specifically to detect an abrupt decrease of the observed flux associated with a transition to the propeller regime in two well-known X-ray pulsars 4U 0115+63 and V 0332+53. <BR /> Methods: Both sources form binary systems with Be optical companions and undergo so-called giant outbursts every 3-4 years. The current observational campaigns were performed with the Swift/XRT telescope in the soft X-ray band (0.5-10 keV) during the declining phases of the outbursts exhibited by both sources in 2015. <BR /> Results: The transitions to the propeller regime were detected at the threshold luminosities of (1.4 ± 0.4) × 10<SUP>36</SUP> erg s<SUP>-1</SUP> and (2.0 ± 0.4) × 10<SUP>36</SUP> erg s<SUP>-1</SUP> for 4U 0115+63 and V 0332+53, respectively. Spectra of the sources are shown to be significantly softer during the low state. In both sources, the accretion at rates close to the aforementioned threshold values briefly resumes during the periastron passage following the transition into the propeller regime. The strength of the dipole component of the magnetic field required to inhibit the accretion agrees well with estimates based on the position of the cyclotron lines in their spectra, thus excluding presence of a strong multipole component of the magnetic field in the vicinity of the neutron star.","The reference is cited because it reports on the detection of transitions to the propeller regime in X-ray pulsars 4U 0115+63 and V 0332+53, which is relevant to the context as it discusses the complex evolution of the line energy in V 0332+53 and suggests a change of the emission region geometry rather than accretion-induced decay of the neutron stars magnetic field."


As one can see the results are overall quite impressive even if not 100% spot on as it may be difficult to extract sufficiently specific context for some references and writing styles. One can try to improve the situation by using a larger model (but difference between gemma2 and GPT-4o is smaller than what can be achieved by other means), better promt engineering, employing Graph RAGs etc. That's quite a bit of work, however, so I skip it here and just continue to work with what we have to get to the third point mentioned in the beginning, i.e. sentiment analysis. Basically, we'd like LLM to make some judgement for whether citations are relevant in context of the paper and whether some may be omited. The justification should already there as we engineered the promt to include it, so the main task for the LLM is just to quantify it. In fact, it should have been included in the original chain to begin with, but let us do it separately using the already obtained dataframe to avoid re-running everything:

In [78]:
prompt = ChatPromptTemplate.from_template(
    "You are an expert scientist assessing whether citation of a reference is justified."
    "You critically analyze the text below where reasons why a reference was cited are described and check for logical inconsistencies."
    "You carefully reason and start by considering a possibility that the reference was not cited, what would happen then?"
    "Would the original paper lack some bit of information then?"
    "You carefully reason and finish with a single-word conclusion characterizing how strongly the reference is justified, i.e. one of the following:"
    "not justified, weakly justified, modestly justified, strongly justified, essential."
    "Some examples are provided below:"
    "Example 1:"
    "input text: The reference is cited because it proposes a model for the observed changes in the cyclotron line centroid energy with luminosity in X-ray pulsars, particularly at sub-critical luminosities where the nature of these variations has not been widely discussed. The model attributes the variations to changes in the velocity profile of the plasma in the line-forming region due to radiation pressure force. This is relevant to the context as the paper also discusses the luminosity dependence of the cyclotron line parameters in the X-ray pulsar GX 304-1 and aims to understand the physical processes behind these variations:"
    "your answer: if the reference was not cited, the paper would not be able to discuss observed changes in cyclotron line centroid energy in framework of the model by Mushtukov et al 2015. In this case another model would be needed and would need to be either developed or cited. If this would happen, comparison of the results with Mushtukov et al 2015 would likely still be required, and it would still need to be cited.  **essential**"
    "Example 2:"
    "input text: The reference is cited because it provides a direct measurement of the surface magnetic field strength of Her X-1 using the energy of a cyclotron feature in its X-ray spectrum. This is relevant to the context as it discusses the magnetic field strength of neutron stars and its influence on X-ray emission."
    "your answer: if the reference was not cited, discussion of the cyclotron line properties as observational probes of magnetic fields of neutron stars would not be feasible. **essential**"
    "Example 3:"
    "input text: The reference is cited because it discusses how nice it would be if we could see a rabbit in the sky. This is essential for discussion of physics behind bird's flights"
    "your answer: if the reference would not be cited, discussions of flying rabbits would not be possible. This would still allow, however, discussion of bird flight physics as birds are unrelated to rabbits: **not justified**"
    "Example 4:"
    "input text: The reference is cited because it reports on the observation of a meta-stable state in two Be/X-ray transients (V0332+53 and 4U 0115+63) after giant outbursts. This meta-stable state, characterized by a luminosity a factor of ~10 above quiescent levels and a softening of the spectra over time, is discussed in the context of the paper's investigation of the complex outburst development in V 0332+53. The reference is relevant as it presents a similar phenomenon in other Be/X-ray transients, potentially supporting the interpretation of the observed behavior in V 0332+53." 
    "your answer: if the paper would not be cited, discussion of the meta-stable accretion state would not be possible. This might be relevant to overall evolution of the outbursts, but not directly related to luminosity dependence of the cyclotron line discussed in the paper. **modestly justified**"
    "Finally, the text which you need to analyze: {topic}")
chain = prompt | llm | StrOutputParser()
print("Text analyzed:", df.iloc[13,2])
print(chain.invoke({"topic": df.iloc[13,2]}))

Text analyzed: A type Ia supernova is thought to be a thermonuclear explosion of either a single carbon-oxygen white dwarf or a pair of merging white dwarfs. The explosion fuses a large amount of radioactive <SUP>56</SUP>Ni (refs 1-3). After the explosion, the decay chain from <SUP>56</SUP>Ni to <SUP>56</SUP>Co to <SUP>56</SUP>Fe generates γ-ray photons, which are reprocessed in the expanding ejecta and give rise to powerful optical emission. Here we report the detection of <SUP>56</SUP>Co lines at energies of 847 and 1,238 kiloelectronvolts and a γ-ray continuum in the 200-400 kiloelectronvolt band from the type Ia supernova 2014J in the nearby galaxy M82. The line fluxes suggest that about 0.6 +/- 0.1 solar masses of radioactive <SUP>56</SUP>Ni were synthesized during the explosion. The line broadening gives a characteristic mass-weighted ejecta expansion velocity of 10,000 +/- 3,000 kilometres per second. The observed γ-ray properties are in broad agreement with the canonical model 

And wrap this into a python function to parse the output and extend the dataframe:

In [86]:
def judge(text):
    try:
        return re.findall(r'\*\*(.*?)\*\*',chain.invoke({"topic": text}))[0].lower()
    except:
        return 'n/a'

df['judgement'] = df['motivation'].progress_apply(judge)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [02:12<00:00,  3.49s/it]


In [87]:
df

Unnamed: 0,Author year,bibcode,abstract,motivation,judgement
0,Gehrels et al. 1986,1986ApJ...303..336G,"The calculation of limits for small numbers of astronomical counts is based on standard equations derived from Poisson and binomial statistics; although the equations are straightforward, their direct use is cumbersome and involves both table-interpolations and several mathematical operations. Convenient tables and approximate formulae are here presented for confidence limits which are based on such Poisson and binomial statistics. The limits in the tables are given for all confidence levels commonly used in astrophysics.","The reference is cited because it provides convenient tables and approximate formulae for confidence limits based on Poisson and binomial statistics, which are used in the context to calculate confidence limits for astronomical counts.",strongly justified
1,Evans et al. 2009,2009MNRAS.397.1177E,"We present a homogeneous X-ray analysis of all 318 gamma-ray bursts detected by the X-ray telescope (XRT) on the Swift satellite up to 2008 July 23; this represents the largest sample of X-ray GRB data published to date. In Sections 2-3, we detail the methods which the Swift-XRT team has developed to produce the enhanced positions, light curves, hardness ratios and spectra presented in this paper. Software using these methods continues to create such products for all new GRBs observed by the Swift-XRT. We also detail web-based tools allowing users to create these products for any object observed by the XRT, not just GRBs. In Sections 4-6, we present the results of our analysis of GRBs, including probability distribution functions of the temporal and spectral properties of the sample. We demonstrate evidence for a consistent underlying behaviour which can produce a range of light-curve morphologies, and attempt to interpret this behaviour in the framework of external forward shock emission. We find several difficulties, in particular that reconciliation of our data with the forward shock model requires energy injection to continue for days to weeks.","The reference is cited because it presents the largest published sample of X-ray GRB data and details the methods used to analyze this data by the Swift-XRT team. This is relevant to the context as the authors use the Swift/XRT light curve of the source in July-August 2016 and mention using the Swift data products service provided by the UK Swift Science Data Centre, which is described in Evans et al. (2009).",strongly justified
2,Ghosh et al. 1979,1979ApJ...234..296G,"The solutions of the two-dimensional hydromagnetic equations are used to calculate the torque on a magnetic neutron star accreting from a Keplerian disk. It is found that the magnetic coupling between the star and the plasma in the outer transition zone is appreciable; that as a result, the spin-up torque on fast rotators is substantially less than that on slow rotators, and that for sufficiently high stellar angular velocities or sufficiently low mass accretion rates, the rotation of the star can be braked while accretion continues. These results are applied to pulsating X-ray sources, revealing that at high luminosities a star of given spin period rotating in the same direction as the disk can experience either spin-up or spin-down, depending on its luminosity. Also considered are the general problem of interpreting period changes in pulsating X-ray sources, and the dipole magnetic moments of nine pulsating X-ray sources are estimated by fitting the theoretical spin-up equation to estimates of the average luminosity and spin-up rate of each source.","The reference is cited because it discusses the torque on a magnetic neutron star accreting from a Keplerian disk, and how the magnetic coupling between the star and the plasma can lead to spin-up or spin-down depending on the stellar angular velocity and mass accretion rate. This is relevant to the context because the paper is discussing the spin evolution of a neutron star during an outburst, which is influenced by the balance of accelerating and braking torques.",strongly justified
3,Titarchuk et al. 1994,1994ApJ...434..570T,"The theory of spectral formation in thermal X-ray sources, where the effects of Comptonization and Klein-Nishina corrections are important, is presented. Analytical expressions are obtained for the produced spectrum as a function of such input parameters as the plasma temperature, the optical depth of the plasma cloud and the injected soft photon spectrum. The analytical theory developed here takes into account the dependence of the scattering opacity on the photon energy. It is shown that the plasma temperature as well as the asymptotic rate of photon escape from the plasma cloud determine the shape of the upscattered hard tail in the emergent spectra, even in the case of very small optical depths. The escape distributions of photons are given for any optical depth of the plasma cloud and their asymptotic dependence for very small and large optical depths are examined. It is shown that this new generalized approach can fit spectra for a large variety of hard X-ray sources and determine the plasma temperature in the region of main energy release in Cyg X-1 and the Seyfert galaxy NGC 4151.","The reference is cited because it presents a theoretical model for X-ray spectral formation in thermal sources, taking into account Comptonization and Klein-Nishina corrections. This is relevant to the context as the paper discusses the transition between different accretion regimes in X-ray pulsars, which likely involves changes in the X-ray spectral shape.",modestly justified
4,Protassov et al. 2002,2002ApJ...571..545P,"The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, &amp; Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ<SUP>2</SUP> and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ<SUP>2</SUP> distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.","The reference is cited because it points out that the likelihood ratio test (LRT) and the F-test, commonly used in astrophysics, can fail to adhere to their nominal χ² and F-distributions in certain statistical tests. This is relevant to the context because the paper discusses the use of the MLR test (which is related to the LRT) to compare two models of the data. The authors acknowledge that the assumptions of the MLR test might be violated, and the cited reference highlights the potential issues with these tests when applied inappropriately.",modestly justified
5,Basko et al. 1976,1976MNRAS.175..395B,"Accretion on to a magnetized neutron star for high accretion rates, when one can no longer ignore the back-reaction of emergent light on the infalling material, is discussed in detail. The equations of hydrodynamics and radiative diffusion are solved analytically in a one-dimensional approximation. The luminosity is evaluated beyond which one should allow for the dynamic effect of emergent light on the infalling material. The limiting X-ray luminosity of accreting magnetized neutron stars is shown to depend crucially on the geometry of the accretion channel. The effects connected with the gas flow along the magnetospheric surface are discussed in detail. The plasma layer on the Alfven surface is shown to be optically thick with respect to Thomson scattering and to reradiate in soft X-rays a considerable fraction of the primary X-ray flux. A necessary condition for the X-ray luminosity to exceed the Eddington limit is a certain degree of asymmetry in the distribution of matter over the Alfven surface.","The reference is cited because it discusses the effects of radiative pressure on accreting neutron stars at high accretion rates, a topic relevant to understanding the observed luminosities of bright pulsars which exceed the local Eddington limit. The paper explains how the plasma must be stopped above the neutron star surface by radiative pressure, leading to emission from an extended ""accretion column"". This is relevant to the context as it discusses the transition between different accretion regimes and the resulting changes in emission properties.",strongly justified
6,Becker et al. 2012,2012A&A...544A.123B,"Context. Accretion-powered X-ray pulsars exhibit significant variability of the cyclotron resonance scattering feature (CRSF) centroid energy on pulse-to-pulse timescales, and also on much longer timescales. Two types of spectral variability are observed. For sources in group 1, the CRSF energy is negatively correlated with the variable source luminosity, and for sources in group 2, the opposite behavior is observed. The physical basis for this bimodal behavior is currently not well understood. <BR /> Aims: We explore the hypothesis that the accretion dynamics in the group 1 sources is dominated by radiation pressure near the stellar surface, and that Coulomb interactions decelerate the gas to rest in the group 2 sources. <BR /> Methods: We derive a new expression for the critical luminosity, L<SUB>crit</SUB>, such that radiation pressure decelerates the matter to rest in sources with X-ray luminosity L<SUB>X</SUB> &gt; L<SUB>crit</SUB>. The formula for L<SUB>crit</SUB> is based on a simple physical model for the structure of the accretion column in luminous X-ray pulsars that takes into account radiative deceleration, the energy dependence of the cyclotron cross section, the thermodynamics of the accreting gas, the dipole structure of the pulsar magnetosphere, and the diffusive escape of radiation through the column walls. We show that for typical neutron star parameters, Lcrit = 1.5 × 10<SUP>37</SUP> B<SUB>12<SUP>16/15</SUP> erg s</SUB><SUP>-1</SUP>, where B<SUB>12</SUB> is the surface magnetic field strength in units of 10<SUP>12</SUP> G. Results: The formula for the critical luminosity is evaluated for five sources, using the maximum value of the CRSF centroid energy to estimate the surface magnetic field strength B<SUB>12</SUB>. The results confirm that the group 1 sources are supercritical (L<SUB>X</SUB> &gt; L<SUB>crit</SUB>) and the group 2 sources are subcritical (L<SUB>X</SUB> &lt; L<SUB>crit</SUB>), although the situation is less clear for those highly variable sources that cross over the line L<SUB>X</SUB> = L<SUB>crit</SUB>. We also explain the variation of the CRSF energy with luminosity as a consequence of the variation of the characteristic emission height. The sign of this dependence is opposite in the supercritical and subcritical cases, hence creating the observed bimodal behavior. Conclusions: We have developed a new model for the critical luminosity in accretion-powered X-ray pulsars that explains the bimodal dependence of the CRSF centroid energy on the X-ray luminosity L<SUB>X</SUB>. Our model provides a physical basis for the observed variation of the CRSF energy as a function of L<SUB>X</SUB> for both the group 1 (supercritical) and the group 2 (subcritical) sources as a result of the variation of the emission height in the column.","The reference is cited because it discusses the bimodal dependence of the CRSF centroid energy on the X-ray luminosity in a sample of X-ray pulsars, which is the central topic of the paper. The paper by Becker et al. (2012) observes this bimodal behavior and proposes that it could be related to different accretion regimes (sub- and super-critical) in the sources.",strongly justified
7,Mushtukov et al. 2015,2015MNRAS.454.2714M,"Cyclotron resonance scattering features observed in the spectra of some X-ray pulsars show significant changes of the line centroid energy with the pulsar luminosity. Whereas for bright sources above the so-called critical luminosity, these variations are established to be connected with the appearance of the high-accretion column above the neutron star surface, at low, sub-critical luminosities the nature of the variations (but with the opposite sign) has not been discussed widely. We argue here that the cyclotron line is formed when the radiation from a hotspot propagates through the plasma falling with a mildly relativistic velocity on to the neutron star surface. The position of the cyclotron resonance is determined by the Doppler effect. The change of the cyclotron line position in the spectrum with luminosity is caused by variations of the velocity profile in the line-forming region affected by the radiation pressure force. The presented model has several characteristic features: (i) the line centroid energy is positively correlated with the luminosity; (ii) the line width is positively correlated with the luminosity as well; (iii) the position and the width of the cyclotron absorption line are variable over the pulse phase; (iv) the line has a more complicated shape than widely used Lorentzian or Gaussian profiles; (v) the phase-resolved cyclotron line centroid energy and the width are negatively and positively correlated with the pulse intensity, respectively. The predictions of the proposed theory are compared with the variations of the cyclotron line parameters in the X-ray pulsar GX 304-1 over a wide range of sub-critical luminosities as seen by the INTEGRAL observatory.","The reference is cited because it proposes a model for the observed changes in the cyclotron line centroid energy with luminosity in X-ray pulsars, particularly at sub-critical luminosities where the nature of these variations has not been widely discussed. The model attributes the variations to changes in the velocity profile of the plasma in the line-forming region due to radiation pressure force. This is relevant to the context as the paper also discusses the luminosity dependence of the cyclotron line parameters in the X-ray pulsar GX 304-1 and aims to understand the physical processes behind these variations.",strongly justified
8,Poutanen et al. 2013,2013ApJ...777..115P,"Cyclotron resonance scattering features observed in the spectra of some X-ray pulsars show significant changes of the line energy with the pulsar luminosity. At high luminosities, these variations are often associated with the onset and growth of the accretion column, which is believed to be the origin of the observed emission and of the cyclotron lines. However, this scenario inevitably implies a large gradient of the magnetic field strength within the line-forming region, which makes the formation of the observed line-like features problematic. Moreover, the observed variation of the cyclotron line energy is much smaller than could be anticipated for the corresponding luminosity changes. We argue here that a more physically realistic situation is that the cyclotron line forms when the radiation emitted by the accretion column is reflected from the neutron star surface, where the gradient of the magnetic field strength is significantly smaller. Here we develop a reflection model and apply it to explain the observed variations of the cyclotron line energy in a bright X-ray pulsar V 0332+53 over a wide range of luminosities.","The reference is cited because it proposes a reflection model for cyclotron resonance scattering features (CRSF) formation, where radiation from the accretion column is reflected from the neutron star surface. This model is relevant to the context because the paper discusses the observed changes in CRSF energy in X-ray pulsar V 0332+53 during outbursts and suggests that these changes could be due to alterations in the emission region geometry, potentially supported by the reflection model proposed by Poutanen et al. (2013).",modestly justified
9,Tsygankov et al. 2016,2016A&A...593A..16T,"<BR /> Aims: We present the results of the monitoring programmes performed with the Swift/XRT telescope and aimed specifically to detect an abrupt decrease of the observed flux associated with a transition to the propeller regime in two well-known X-ray pulsars 4U 0115+63 and V 0332+53. <BR /> Methods: Both sources form binary systems with Be optical companions and undergo so-called giant outbursts every 3-4 years. The current observational campaigns were performed with the Swift/XRT telescope in the soft X-ray band (0.5-10 keV) during the declining phases of the outbursts exhibited by both sources in 2015. <BR /> Results: The transitions to the propeller regime were detected at the threshold luminosities of (1.4 ± 0.4) × 10<SUP>36</SUP> erg s<SUP>-1</SUP> and (2.0 ± 0.4) × 10<SUP>36</SUP> erg s<SUP>-1</SUP> for 4U 0115+63 and V 0332+53, respectively. Spectra of the sources are shown to be significantly softer during the low state. In both sources, the accretion at rates close to the aforementioned threshold values briefly resumes during the periastron passage following the transition into the propeller regime. The strength of the dipole component of the magnetic field required to inhibit the accretion agrees well with estimates based on the position of the cyclotron lines in their spectra, thus excluding presence of a strong multipole component of the magnetic field in the vicinity of the neutron star.","The reference is cited because it reports on the detection of transitions to the propeller regime in X-ray pulsars 4U 0115+63 and V 0332+53, which is relevant to the context as it discusses the complex evolution of the line energy in V 0332+53 and suggests a change of the emission region geometry rather than accretion-induced decay of the neutron stars magnetic field.",modestly justified


Now one can go through the references and review conclusions of the LLM, specifically focussing on citations which are not all that well justified according to our LLM. Here there are no such cases, but the example above analyzes my paper, so that is to be expected :)
Next steps could include some further modifications to search for cases where a citation would be nice, but is not present. That is, however, a much more challenging task and I leave for the next time.