### Installation

The LangChain VertexAI integration lives in the `langchain-google-vertexai` package:

In [1]:
%pip install -qU langchain-google-vertexai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.4/95.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Instantiation

Now we can instantiate our model object and generate chat completions:

In [2]:
import vertexai

# Set up the VertexAI client
vertexai.init(
    project="disco-direction-454210-k6",
    location="europe-central2",
)

In [3]:
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    model="gemini-2.0-flash-001",
    temperature=0, # not much reason to have randomness in this kind of classifier
    max_tokens=256,
    max_retries=6,
    stop=None,
)

## Invocation

We'll be evaluating the model using multiple prompts.

In [1]:
prompts = ["Is the given sentence from a scientific paper missing a citation marker?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Does the following sentence require a citation marker?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Should I add a citation marker to this sentence?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Does the given sentence reference a different scientific paper?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           ]

In [5]:
def generate_prediction(prompt, sentence):
    messages = [
        (
            "system",
            prompt,
        ),
        (
            "human",
            sentence,
        )
    ]

    ai_msg = llm.invoke(messages)

    return ai_msg.content.strip() == "yes"

In [6]:
import pandas as pd

# Change to the path where the dataset is stored
DATASET_PATH = "C:\\Users\\Adrian\\Documents\\datasets\\citing_test.parquet"

# Load the dataset into a pandas DataFrame
df = pd.read_parquet(DATASET_PATH)

# get first 500 rows
df = df.head(500)

df.describe()

Unnamed: 0,sentence,citing
count,500,500
unique,500,2
top,"Under these assumptions, we have the following...",False
freq,1,470


In [None]:
# evaluate each prompt
preds = []

for prompt in prompts:
    predictions = df["sentence"].apply(lambda x: generate_prediction(prompt, x))
    preds.append(predictions)

## Results

Evaluating each prompt, the clear winner is the last one (`Does the given sentence reference a different scientific paper?`). Unfortunately, going strictly by metrics, it's worse than the finetuned scibert.

In [9]:
from sklearn.metrics import classification_report

for prompt, predictions in zip(prompts, preds):
    print(prompt)

    report = classification_report(df["citing"], predictions)

    print(report)
    print()

Is the given sentence from a scientific paper missing a citation marker?
- yes
- no

Please only print the answer without anything else.
              precision    recall  f1-score   support

       False       0.98      0.24      0.39       470
        True       0.07      0.93      0.14        30

    accuracy                           0.28       500
   macro avg       0.53      0.59      0.26       500
weighted avg       0.93      0.28      0.37       500


Does the following sentence require a citation marker?
- yes
- no

Please only print the answer without anything else.
              precision    recall  f1-score   support

       False       0.96      0.57      0.72       470
        True       0.09      0.63      0.15        30

    accuracy                           0.57       500
   macro avg       0.52      0.60      0.43       500
weighted avg       0.91      0.57      0.68       500


Should I add a citation marker to this sentence?
- yes
- no

Please only print the answe