# Gemini Flash 2.0 for citing sentence classification

This notebook attempts to determine if simply prompting Gemini is capable of producing a good results in detecting citing sentences.

We will be evaluating multiple system prompts with low temperature on a dataset of sentences extracted from scientific papers and labeled as citing or non-citing.

For simplicity, we'll be using Langchain to call this model.

## Instantiation

Using Google AI products requires the Google Cloud SDK to be installed on your system.

The following code initializes the Vertex project (you can choose any project you want, since we're just prompting a base model that should be available in any project) and chooses a datacenter.

In [2]:
import vertexai

# Set up the VertexAI clie`nt
vertexai.init(
    project="disco-direction-454210-k6", # any project should work
    location="europe-central2", # adjust based on your location
)

We'll be prompting Gemini with a null temperature to avoid any randomness, as we want a straight answer for the most part.

In [3]:
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    model="gemini-2.0-flash-001",
    temperature=0, # not much reason to have randomness in this kind of classifier
    max_tokens=256, # we'll see immediately that our system prompts request a yes or no answer, so we don't need a lot of tokens
    max_retries=6,
    stop=None,
)

## Invocation

We'll be trying a couple of different system prompts, but all of them will be prompted to give a simple yes/no answer with no further thinking.

In [4]:
prompts = ["Is the given sentence from a scientific paper missing a citation marker?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Does the following sentence require a citation marker?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Should I add a citation marker to this sentence?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           "Does the given sentence reference a different scientific paper?\n- yes\n- no\n\nPlease only print the answer without anything else.",
           ]

In [5]:
def generate_prediction(sys_prompt, sentence):
    messages = [
        (
            "system",
            sys_prompt,
        ),
        (
            "human",
            sentence,
        )
    ]

    ai_msg = llm.invoke(messages)

    return ai_msg.content.strip() == "yes"

## Predictions

In [6]:
import pandas as pd

# Change to the path where the dataset is stored
DATASET_PATH = "C:\\Users\\Adrian\\Documents\\datasets\\citing_test.parquet"

# Load the dataset into a pandas DataFrame
df = pd.read_parquet(DATASET_PATH)

# get first 500 rows
df = df.head(500)

df.describe()

Unnamed: 0,sentence,citing
count,500,500
unique,500,2
top,"Under these assumptions, we have the following...",False
freq,1,470


In [8]:
# evaluate each prompt
sys_prompt_predictions = []

for prompt in prompts:
    # generate predictions asynchronously
    predictions = df["sentence"].apply(lambda x: generate_prediction(prompt, x))
    sys_prompt_predictions.append(predictions)

## Results

Evaluating each prompt, the clear winner is the last one (`Does the given sentence reference a different scientific paper?`). Unfortunately, going strictly by metrics, it's worse than the finetuned scibert so it's hard to justify, especially considering the slower inference.

In [9]:
from sklearn.metrics import classification_report

for prompt, predictions in zip(prompts, sys_prompt_predictions):
    print(prompt)

    report = classification_report(df["citing"], predictions)

    print(report)
    print()

Is the given sentence from a scientific paper missing a citation marker?
- yes
- no

Please only print the answer without anything else.
              precision    recall  f1-score   support

       False       0.98      0.25      0.39       470
        True       0.07      0.93      0.14        30

    accuracy                           0.29       500
   macro avg       0.53      0.59      0.27       500
weighted avg       0.93      0.29      0.38       500


Does the following sentence require a citation marker?
- yes
- no

Please only print the answer without anything else.
              precision    recall  f1-score   support

       False       0.96      0.56      0.71       470
        True       0.08      0.63      0.15        30

    accuracy                           0.57       500
   macro avg       0.52      0.60      0.43       500
weighted avg       0.91      0.57      0.68       500


Should I add a citation marker to this sentence?
- yes
- no

Please only print the answe