# Imports

In [1]:
import os
import pandas as pd
import time

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

from tqdm import tqdm

import ollama

In [2]:
# Force CUDA usage
os.environ["OLLAMA_BACKEND"] = "cuda"
os.environ["OLLAMA_NUM_THREADS"] = "16"

### Parameters

In [3]:
# List of models
ml_models = [
    ('Logistic Regression', LogisticRegression(random_state=42, max_iter=1000)), # 2 sec
    ('Random Forest', RandomForestClassifier(random_state=42)), # 2 min
    ('SVM', SVC(random_state=42)), # 30 min
    ('KNN', KNeighborsClassifier()), # 30 sec
    ('Gradient Boosting', GradientBoostingClassifier(random_state=42)) # 30 sec
]

In [4]:
# LLMs
models = ['llama3.2:1b', 'llama3.2:3b', 'llama3.1']

# Temperatures for LLMs
temperatures = [5, 10, 15]

In [5]:
# Number of rows to synthesise (n_rows positive, n_rows negative)
n_rows = 500

In [6]:
# Vectorize (Bag of words representation)
vectorizer = CountVectorizer(stop_words='english', max_features=10000, ngram_range=(1, 2))

# Load data

In [7]:
# Load the CSV file into a pandas DataFrame
df = pd.read_csv('IMB_preprocessed_2025_04_06.csv')

# We want to be able to read the full reviews
pd.set_option('display.max_colwidth', None) 

# Display the first 5 records
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.\nThe first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.\nIt is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.\nI would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",1
1,"A wonderful little production. \nThe filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. \nThe actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. \nThe realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",1
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.\nThis was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.\nThis may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",1
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.\nThis movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.\nOK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.\n3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",0
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter. \nThis being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.\nThe only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.\nThe acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.\nWe wish Mr. Mattei good luck and await anxiously for his next work.",1


### Meassure similarity of reviews
We will also do this later for the synthetic data

We check the similarity for atleast 50% of the data so this is an approximative method

In [8]:
def calculate_average_similarity(df, sample_size=25000):
    # Only sample if needed
    if sample_size > len(df):
        sample_size = len(df)

    # Set seed
    random.seed(42)
    
    # Sample reviews
    sample = random.sample(list(df['review']), sample_size)
    
    # Vectorize and calculate similarities
    vectorizer = TfidfVectorizer()
    tfidf_sample = vectorizer.fit_transform(sample)
    sample_sim = cosine_similarity(tfidf_sample)
    
    # Calculate average similarity (excluding self-similarities on diagonal)
    average_sim = (sample_sim.sum() - sample_size) / (sample_size * (sample_size - 1))
    print(f"Average cosine similarity: {average_sim:.4f}")

In [9]:
calculate_average_similarity(df)

Average cosine similarity: 0.1055


# Experimential setup

In [10]:
def experimental_setup(df):
    # Split into X and y
    X = df['review']
    y = df['sentiment']
    
    # First split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Transform text reviews to Bag of Words representation
    X_train = vectorizer.fit_transform(X_train)
    X_test = vectorizer.transform(X_test)
    
    # Print the shapes of the sets
    print(f"Training Set: X_train shape = {X_train.shape}, y_train shape = {y_train.shape}")
    print(f"Test Set: X_test shape = {X_test.shape}, y_test shape = {y_test.shape}")
    print()

    return X_train, y_train, X_test, y_test

# Modeling and Performance metrics

In [11]:
# Function to train and evaluate models with multiple metrics and measure time
def evaluate_models_with_metrics(models, X_train, y_train, X_test, y_test):
    results = []
    
    for name, model in models:
        # Train the model
        model.fit(X_train, y_train)
        
        # Predict on test set
        y_test_pred = model.predict(X_test)

        # Store results for the model
        model_results = {
            'Model': name,
            'Test Accuracy': accuracy_score(y_test, y_test_pred),
            'Test Precision': precision_score(y_test, y_test_pred),
            'Test Recall': recall_score(y_test, y_test_pred),
            'Test F1-Score': f1_score(y_test, y_test_pred)
        }
        
        results.append(model_results)

    # Convert results to a pandas DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df

In [12]:
# Create the split
X_train_base, y_train_base, X_test_base, y_test_base = experimental_setup(df)

# Evaluate models with multiple metrics and print results
base_results = evaluate_models_with_metrics(ml_models, X_train_base, y_train_base, X_test_base, y_test_base)
display(base_results)

Training Set: X_train shape = (40000, 10000), y_train shape = (40000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)



Unnamed: 0,Model,Test Accuracy,Test Precision,Test Recall,Test F1-Score
0,Logistic Regression,0.8692,0.868021,0.873189,0.870598
1,Random Forest,0.8542,0.861937,0.8462,0.853996
2,SVM,0.8816,0.861836,0.911093,0.88578
3,KNN,0.6,0.609947,0.571939,0.590332
4,Gradient Boosting,0.8089,0.778987,0.86664,0.820479


# Data Synthesis

### Baseline setup

In [13]:
# Simulate a low data availability environment
limited_data_df = df[0:199]

# Split into X and y
X_train_limited_data = limited_data_df['review']
y_train_limited_data = limited_data_df['sentiment']

# Use the same test set as used in base experiment
_, X_test_limited_data, _, y_test_limited_data = train_test_split(df['review'], df['sentiment'], test_size=0.2, random_state=42)

# Update the vocabulary
# Convert text reviews to Bag of Words representation
X_train_limited_data = vectorizer.fit_transform(X_train_limited_data)
X_test_limited_data = vectorizer.transform(X_test_limited_data)

# Print the shapes of the sets
print(f"Training Set: X_train shape = {X_train_limited_data.shape}, y_train shape = {y_train_limited_data.shape}")
print(f"Test Set: X_test shape = {X_test_limited_data.shape}, y_test shape = {y_test_limited_data.shape}")
    
# Train on limited data, Test on original data
limited_data_results = evaluate_models_with_metrics(ml_models, X_train_limited_data, y_train_limited_data, X_test_limited_data, y_test_limited_data)

Training Set: X_train shape = (199, 10000), y_train shape = (199,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)


In [14]:
display(limited_data_results)

Unnamed: 0,Model,Test Accuracy,Test Precision,Test Recall,Test F1-Score
0,Logistic Regression,0.7146,0.797441,0.581266,0.672406
1,Random Forest,0.5596,0.807357,0.165509,0.274704
2,SVM,0.5102,0.676692,0.053582,0.099301
3,KNN,0.5603,0.536043,0.94741,0.68469
4,Gradient Boosting,0.6398,0.728895,0.454058,0.55955


In [15]:
display(base_results)

Unnamed: 0,Model,Test Accuracy,Test Precision,Test Recall,Test F1-Score
0,Logistic Regression,0.8692,0.868021,0.873189,0.870598
1,Random Forest,0.8542,0.861937,0.8462,0.853996
2,SVM,0.8816,0.861836,0.911093,0.88578
3,KNN,0.6,0.609947,0.571939,0.590332
4,Gradient Boosting,0.8089,0.778987,0.86664,0.820479


# Let's generate some data

In [16]:
# Prompts
prompt_positive_sample = 'Generate a 200 to 300 word negative movie review, do not return any other text, just return the review.'
prompt_negative_sample = 'Generate a 200 to 300 word positive movie review, do not return any other text, just return the review.'

### Non-Qualitative loop

In [36]:
# Initialize a dictionary to store results per model and temperature
model_temperatures_list = {}

# Iterate through models
for index, model in enumerate(models):
    print("Processing Model: " + model + " (Model " + str(index + 1) + "/" + str(len(models)) + ")")
    model_temperatures_list[model] = {}

    # Loop through the different temperatures
    for temperature in temperatures:
        print(f"  Using Temperature: {temperature}")
        model_temperatures_list[model][temperature] = []

        # Generate n_rows positive samples
        for i in tqdm(range(n_rows), desc="Generating positive samples"):
            # Generate data with the specified temperature
            response = ollama.generate(model=model, prompt=prompt_positive_sample, options={"temperature": temperature})['response']

            # Store the response and positive class as sentiment (1) 
            model_temperatures_list[model][temperature].append((response, 1))
            
        # Generate n_rows negative samples
        for i in tqdm(range(n_rows), desc="Generating negative samples"):
            # Generate data with the specified temperature
            response = ollama.generate(model=model, prompt=prompt_negative_sample, options={"temperature": temperature})['response']

            # Store the response and negative class as sentiment (0) 
            model_temperatures_list[model][temperature].append((response, 0))

Processing Model: llama3.2:1b (Model 1/2)
  Using Temperature: 5


Generating positive samples: 100%|███████████████████████████████████████████████████| 500/500 [45:38<00:00,  5.48s/it]
Generating negative samples: 100%|███████████████████████████████████████████████████| 500/500 [57:52<00:00,  6.94s/it]


  Using Temperature: 10


Generating positive samples: 100%|███████████████████████████████████████████████████| 500/500 [44:46<00:00,  5.37s/it]
Generating negative samples: 100%|███████████████████████████████████████████████████| 500/500 [55:33<00:00,  6.67s/it]


  Using Temperature: 15


Generating positive samples: 100%|███████████████████████████████████████████████████| 500/500 [39:55<00:00,  4.79s/it]
Generating negative samples: 100%|███████████████████████████████████████████████████| 500/500 [43:20<00:00,  5.20s/it]


Processing Model: llama3.2:3b (Model 2/2)
  Using Temperature: 5


Generating positive samples: 100%|█████████████████████████████████████████████████| 500/500 [1:20:27<00:00,  9.66s/it]
Generating negative samples: 100%|█████████████████████████████████████████████████| 500/500 [1:18:31<00:00,  9.42s/it]


  Using Temperature: 10


Generating positive samples: 100%|█████████████████████████████████████████████████| 500/500 [1:22:15<00:00,  9.87s/it]
Generating negative samples: 100%|█████████████████████████████████████████████████| 500/500 [1:17:49<00:00,  9.34s/it]


  Using Temperature: 15


Generating positive samples: 100%|█████████████████████████████████████████████████| 500/500 [1:19:20<00:00,  9.52s/it]
Generating negative samples: 100%|█████████████████████████████████████████████████| 500/500 [1:16:33<00:00,  9.19s/it]


# Comparison synthetic data vs real data

In [38]:
def compare(base_df, limited_df, synth_df):
    # Set 'Model' as the index in DataFrames
    base_results_tmp = base_df.set_index('Model')
    limited_data_tmp = limited_df.set_index('Model')
    synth_data_results_tmp = synth_df.set_index('Model')
    
    # Rename evaluation metrics columns
    base_results_tmp = base_results_tmp.rename(columns={
        'Test Accuracy': 'Test Accuracy base', 'Test F1-Score': 'Test F1-Score base',
        'Test Precision': 'Test Precision base', 'Test Recall': 'Test Recall base'
    })

    limited_data_tmp = limited_data_tmp.rename(columns={
        'Test Accuracy': 'Test Accuracy limited base', 'Test F1-Score': 'Test F1-Score limited base',
        'Test Precision': 'Test Precision limited base', 'Test Recall': 'Test Recall limited base'
    })
    
    synth_data_results_tmp = synth_data_results_tmp.rename(columns={
        'Test Accuracy': 'Test Accuracy synthetic', 'Test F1-Score': 'Test F1-Score synthetic',
        'Test Precision': 'Test Precision synthetic', 'Test Recall': 'Test Recall synthetic'
    })
    
    # Merge dfs by index
    final_results = base_results_tmp.join(limited_data_tmp)
    final_results = final_results.join(synth_data_results_tmp)
    
    # Show results
    display(final_results)
    print()

In [39]:
# For each model
for model in models:
    # For each temperature
    for temp in temperatures:
        # Get data and make a DataFrame
        synth_df = pd.DataFrame(model_temperatures_list[model][temp], columns = ['review', 'sentiment'])

        # Split into X and y
        X_train_synth = synth_df['review']
        y_train_synth = synth_df['sentiment']

        # Same test set as base data
        # First split data into training and test set
        _, X_test_synth, _, y_test_synth = train_test_split(df['review'], df['sentiment'], test_size=0.2, random_state=42)

        # Update vocabulary
        # Convert text reviews to Bag of Words representation
        X_train_synth = vectorizer.fit_transform(X_train_synth)
        X_test_synth = vectorizer.transform(X_test_synth)

        # Print the shapes of the sets
        print("Model: " + model + ", Temperature = " + str(temp))
        print(f"Training Set: X_train shape = {X_train_synth.shape}, y_train shape = {y_train_synth.shape}")
        print(f"Test Set: X_test shape = {X_test_synth.shape}, y_test shape = {y_test_synth.shape}")
        print()
        
        # Similarity metric for reviews
        calculate_average_similarity(synth_df)
        
        # Train on synthetic data, Test on original (full) data
        synth_data_results = evaluate_models_with_metrics(ml_models, X_train_synth, y_train_synth, X_test_synth, y_test_synth)
        
        # Compare and display results
        compare(base_results, limited_data_results, synth_data_results)

Model: llama3.2:1b, Temperature = 5
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1296


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.4837,0.493621,0.952173,0.65018
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.4929,0.498381,0.977376,0.660143
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.4754,0.488354,0.861282,0.623295
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.5037,0.503801,0.999603,0.669947
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4946,0.499241,0.979559,0.661396



Model: llama3.2:1b, Temperature = 10
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1177


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.4877,0.495713,0.963683,0.654668
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.4885,0.496132,0.967255,0.655857
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.4757,0.488529,0.862076,0.623645
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.5039,0.503901,0.999802,0.67008
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.475,0.488966,0.927763,0.640411



Model: llama3.2:1b, Temperature = 15
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1155


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.4903,0.497055,0.971423,0.657621
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.4897,0.496749,0.970232,0.657079
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.4769,0.489369,0.87696,0.62819
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.5038,0.50385,0.999802,0.670036
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4875,0.495605,0.962294,0.654254



Model: llama3.2:3b, Temperature = 5
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1388


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.3412,0.39301,0.564596,0.463431
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.3525,0.253603,0.146656,0.185842
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.3283,0.377375,0.512403,0.434644
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.4492,0.302443,0.071244,0.115323
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4208,0.308202,0.120064,0.172808



Model: llama3.2:3b, Temperature = 10
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1330


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.3535,0.405763,0.609248,0.487108
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.3566,0.239447,0.127208,0.166148
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.3618,0.41074,0.613217,0.49196
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.4601,0.479757,0.846597,0.612447
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4202,0.288344,0.1026,0.151347



Model: llama3.2:3b, Temperature = 15
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1288


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.3279,0.38037,0.530661,0.443119
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.3604,0.254789,0.139909,0.18063
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.3419,0.391102,0.549514,0.456968
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.4597,0.4759,0.713237,0.570884
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4111,0.30629,0.13336,0.185815



Model: llama3.1, Temperature = 5
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1344


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.2816,0.299195,0.317126,0.3079
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.3839,0.201596,0.075213,0.109553
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.2935,0.303606,0.310776,0.307149
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.457,0.2918,0.054376,0.091669
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4556,0.227456,0.033538,0.058457



Model: llama3.1, Temperature = 10
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1271


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.2943,0.335185,0.407224,0.367709
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.4053,0.161194,0.042866,0.067722
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.312,0.348328,0.419528,0.380627
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.4522,0.3251,0.080968,0.129647
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4559,0.197289,0.025997,0.045941



Model: llama3.1, Temperature = 15
Training Set: X_train shape = (1000, 10000), y_train shape = (1000,)
Test Set: X_test shape = (10000, 10000), y_test shape = (10000,)

Average cosine similarity: 0.1241


Unnamed: 0_level_0,Test Accuracy base,Test Precision base,Test Recall base,Test F1-Score base,Test Accuracy limited base,Test Precision limited base,Test Recall limited base,Test F1-Score limited base,Test Accuracy synthetic,Test Precision synthetic,Test Recall synthetic,Test F1-Score synthetic
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Logistic Regression,0.8692,0.868021,0.873189,0.870598,0.7146,0.797441,0.581266,0.672406,0.2796,0.303218,0.331018,0.316509
Random Forest,0.8542,0.861937,0.8462,0.853996,0.5596,0.807357,0.165509,0.274704,0.39,0.145625,0.043263,0.066707
SVM,0.8816,0.861836,0.911093,0.88578,0.5102,0.676692,0.053582,0.099301,0.3017,0.23515,0.171264,0.198186
KNN,0.6,0.609947,0.571939,0.590332,0.5603,0.536043,0.94741,0.68469,0.4449,0.46762,0.733677,0.571186
Gradient Boosting,0.8089,0.778987,0.86664,0.820479,0.6398,0.728895,0.454058,0.55955,0.4554,0.215385,0.030562,0.053528



