# Final Project

This final project can be collaborative. The maximum members of a group is 3. You can also work by yourself. Please respect the academic integrity. **Remember: if you get caught on cheating, you get F.**

## A Introduction to the competition

<img src="news-sexisme-EN.jpg" alt="drawing" width="380"/>

Sexism is a growing problem online. It can inflict harm on women who are targeted, make online spaces inaccessible and unwelcoming, and perpetuate social asymmetries and injustices. Automated tools are now widely deployed to find, and assess sexist content at scale but most only give classifications for generic, high-level categories, with no further explanation. Flagging what is sexist content and also explaining why it is sexist improves interpretability, trust and understanding of the decisions that automated tools use, empowering both users and moderators.

This project is based on SemEval 2023 - Task 10 - Explainable Detection of Online Sexism (EDOS). [Here](https://codalab.lisn.upsaclay.fr/competitions/7124#learn_the_details-overview) you can find a detailed introduction to this task.

You only need to complete **TASK A - Binary Sexism Detection: a two-class (or binary) classification where systems have to predict whether a post is sexist or not sexist**. To cut down training time, we only use a subset of the original dataset (5k out of 20k). The dataset can be found in the same folder. 

Different from our previous homework, this competition gives you great flexibility (and very few hints). You can freely determine every component of your workflow, including but not limited to:
-  **Preprocessing the input text**: You may decide how to clean or transform the text. For example, removing emojis or URLs, lowercasing, removing stopwords, applying stemming or lemmatization, correcting spelling, or performing tokenization and sentence segmentation.
-  **Feature extraction and encoding**: You can choose any method to convert text into numerical representations, such as TF-IDF, Bag-of-Words, N-grams, Word2Vec, GloVe, FastText, contextual embeddings (e.g., BERT, RoBERTa, or other transformer-based models), Part-of-Speech (POS) tagging, dependency-based features, sentiment or emotion features, readability metrics, or even embeddings or features generated by large language models (LLMs).
-  **Data augmentation and enrichment**: You may expand or balance your dataset by incorporating other related corpora or using techniques like synonym replacement, random deletion/insertion, or LLM-assisted augmentation (e.g., generating paraphrased or synthetic examples to improve model robustness).
-  **Model selection**: You are free to experiment with different models — from traditional machine learning algorithms (e.g., Logistic Regression, SVM, Random Forest, XGBoost) to deep learning architectures (e.g., CNNs, RNNs, Transformers), or even hybrid/ensemble approaches that combine multiple models or leverage LLM-generated predictions or reasoning.

## Requirements
-  **Input**: the text for each instance.
-  **Output**: the binary label for each instance.
-  **Feature engineering**: use at least 2 different methods to extract features and encode text into numerical values. You may explore both traditional and AI-assisted techniques. Data augmentation is optional.
-  **Model selection**: implement with at least 3 different models and compare their performance.
-  **Evaluation**: create a dataframe with rows indicating feature+model and columns indicating Precision (P), Recall (R) and F1-score (using weighted average). Your results should have at least 6 rows (2 feature engineering methods x 3 models). Report best performance with (1) your feature engineering method, and (2) the model you choose. Here is an example illustrating how the experimental results table should be presented.

| Feature + Model | Sexist (P) | Sexist (R) | Sexist (F1) | Non-Sexist (P) | Non-Sexist (R) | Non-Sexist (F1) | Weighted (P) | Weighted (R) | Weighted (F1) |
|-----------------|:----------:|:----------:|:------------:|:---------------:|:---------------:|:----------------:|:-------------:|:--------------:|:---------------:|
| TF-IDF + Logistic Regression | ... | ... | ... | ... | ... | ... | ... | ... | ... |

- **Format of the report**: add explainations for each step (you can add markdown cells). At the end of the report, write a summary for each sections: 
    - Data Preprocessing
    - Feature Engineering
    - Model Selection and Architecture
    - Training and Validation
    - Evaluation and Results
    - Use of Generative AI (if you use)

## Rules 
Violations will result in 0 points in the grade: 
-   `Rule 1 - No test set leakage`: You must not use any instance from the test set during training, feature engineering, or model selection.
-   `Rule 2 - Responsible AI use`: You may use generative AI, but you must clearly document how it was used. If you have used genAI, include a section titled “Use of Generative AI” describing:
    -   What parts of the project you used AI for
    -   What was implemented manually vs. with AI assistance

## Grading

The performance should be only evaluated on the test set (a total of 1086 instances). Please split original dataset into train set and test set. The test set should NEVER be used in the training process. The evaluation metric is a combination of precision, recall, and f1-score (use `classification_report` in sklearn). 

The total points are 10.0. Each team will compete with other teams in the class on their best performance. Points will be deducted if not following the requirements above. 

If ALL the requirements are met:
- Top 25\% teams: 10.0 points.
- Top 25\% - 50\% teams: 8.5 points.
- Top 50\% - 75\% teams: 7.0 points.
- Top 75\% - 100\% teams: 6.0 points.

If your best performance reaches **0.82** or above (weighted F1-score) and follows all the requirements and rules, you will also get full points (10.0 points). 

## Submission
Similar as homework, submit both a PDF and .ipynb version of the report including: 
- code and experimental results with details explained
- combined results table, report and best performance
- a summary at the end of the report (please follow the format above)

Missing any part of the above requirements will result in point deductions.

The due date is **Dec 11, Thursday by 11:59pm**.

## Experimental Results

(A table detailed model performance on the test set with at least 6 rows. Report the best performance.)


## Project Summary
### 1. Data Preprocessing


### 2. Feature Engineering
 

### 3. Model Selection and Architecture


### 4. Training and Validation


### 5. Evaluation and Results


### 6. Use of Generative AI (if you use)

Data preprocessing

In [1]:
# data preprocessing

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegressionCV
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import re
# Load dataset
df = pd.read_csv('edos_labelled_data.csv')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

def advanced_clean_text(text):
    text = text.lower()
    text = re.sub(r"\[user\]|\[url\]", "", text)
    text = re.sub(r"http\S+|www\S+", "", text)
    text = re.sub(r"#\w+", "", text)
    
    # Keep some punctuation that might be meaningful for sexism detection
    text = re.sub(r"[^a-z\s!?.]", "", text)
    text = re.sub(r"\s+", " ", text).strip()
    
    # Optional: lemmatization (can help or hurt, test it)
    # lemmatizer = WordNetLemmatizer()
    # text = " ".join([lemmatizer.lemmatize(word) for word in text.split()])
    
    return text
df["clean_text"] = df["text"].apply(advanced_clean_text)
train_df = df[df["split"] == "train"]
test_df = df[df["split"] == "test"]

y_train = train_df["label"]
y_test = test_df["label"]


Feature Engineering

In [2]:
# Going to use TF-IDF vectorization for text features
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import hstack
word_vectorizer = TfidfVectorizer(
    analyzer='word',
    ngram_range=(1,3),
    max_features=20000,
    min_df=2,
    max_df=0.85,
    sublinear_tf=True
)
# Method 2: Character-level TF-IDF (CRUCIAL - this is your second method!)
char_vectorizer = TfidfVectorizer(
    analyzer='char',
    ngram_range=(3,5),
    max_features=15000,
    min_df=3,
    max_df=0.90,
    sublinear_tf=True
)

# Fit both
X_train_word = word_vectorizer.fit_transform(train_df["clean_text"])
X_test_word = word_vectorizer.transform(test_df["clean_text"])

X_train_char = char_vectorizer.fit_transform(train_df["clean_text"])
X_test_char = char_vectorizer.transform(test_df["clean_text"])

# Combine them (this will be your best feature set)
X_train_combined = hstack([X_train_word, X_train_char])
X_test_combined = hstack([X_test_word, X_test_char])

print(f"Word features: {X_train_word.shape[1]}")
print(f"Char features: {X_train_char.shape[1]}")
print(f"Combined features: {X_train_combined.shape[1]}")


Word features: 16211
Char features: 15000
Combined features: 31211


In [3]:

# we used AI for this part btw so will have to cite that but what it does is:
# for each minority sample, it finds k nearest neighbors and generates synthetic samples
# it will interpolate new points between knn and then it operates in the numeric feature space
# so it assumes interpolation yields meaningful examples
# and all this reduces bias towards majority class and helps improve recall and F1 for minority class
# using this helped me get to 0.81 weighted F1 score

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42, k_neighbors=5)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train_combined, y_train)

print(f"\nOriginal class distribution:")
print(y_train.value_counts())
print(f"\nBalanced class distribution:")
print(pd.Series(y_train_balanced).value_counts())


Original class distribution:
label
not sexist    2934
sexist        1259
Name: count, dtype: int64

Balanced class distribution:
label
not sexist    2934
sexist        2934
Name: count, dtype: int64


In [4]:
'''
#THIS CODE ONLY NEEDED TO BE RUN ONCE TO FIND THE OPTIMAL PARAMETERS FOR XGBoost IT TOOK AROUND 6 HOURS
#Before we turn it in I can run it overnight to make the output look nice


#GridSearch
from sklearn.model_selection import GridSearchCV
import xgboost as xgb
from sklearn.metrics import f1_score, make_scorer

# define the parameters to search
param_grid = {
    'n_estimators': [200, 300, 400],  
    'max_depth': [6, 7, 8],          
    'learning_rate': [0.1, 0.05],     
    'subsample': [0.8, 0.9],          
    'colsample_bytree': [0.8, 1.0]   
}

#inits XGBoost model
xgb_base = xgb.XGBClassifier(
    random_state=42,
    eval_metric='logloss',
    use_label_encoder=False
)

#makes weighted f1 the scoring metric
f1_scorer = make_scorer(f1_score, average='weighted')

# 4. init GridSearchCV
grid_search = GridSearchCV(
    estimator=xgb_base,
    param_grid=param_grid,
    scoring=f1_scorer,
    cv=3,
    verbose=1,
    n_jobs=-1
)

print("starting gridsearch (this takes a very long time)")

grid_search.fit(X_train_balanced, y_train_balanced_encoded)

print(f"Best CV Weighted F1 Score: {grid_search.best_score_:.2f}")
print(f"Best Parameters Found: {grid_search.best_params_}")


#get the best estimator
best_xgb = grid_search.best_estimator_

y_pred_xgb_best_encoded = best_xgb.predict(X_test_combined)
y_pred_xgb_best = le.inverse_transform(y_pred_xgb_best_encoded)

# bestXGB f1score
final_f1 = f1_score(y_test, y_pred_xgb_best, average='weighted')

print(classification_report(y_test, y_pred_xgb_best))
'''

'\n#THIS CODE ONLY NEEDED TO BE RUN ONCE TO FIND THE OPTIMAL PARAMETERS FOR XGBoost IT TOOK AROUND 6 HOURS\n#Before we turn it in I can run it overnight to make the output look nice\n\n\n#GridSearch\nfrom sklearn.model_selection import GridSearchCV\nimport xgboost as xgb\nfrom sklearn.metrics import f1_score, make_scorer\n\n# define the parameters to search\nparam_grid = {\n    \'n_estimators\': [200, 300, 400],  \n    \'max_depth\': [6, 7, 8],          \n    \'learning_rate\': [0.1, 0.05],     \n    \'subsample\': [0.8, 0.9],          \n    \'colsample_bytree\': [0.8, 1.0]   \n}\n\n#inits XGBoost model\nxgb_base = xgb.XGBClassifier(\n    random_state=42,\n    eval_metric=\'logloss\',\n    use_label_encoder=False\n)\n\n#makes weighted f1 the scoring metric\nf1_scorer = make_scorer(f1_score, average=\'weighted\')\n\n# 4. init GridSearchCV\ngrid_search = GridSearchCV(\n    estimator=xgb_base,\n    param_grid=param_grid,\n    scoring=f1_scorer,\n    cv=3,\n    verbose=1,\n    n_jobs=-1\n)

Model use

In [None]:
from sklearn.metrics import classification_report, f1_score
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
import xgboost as xgb

le = LabelEncoder()
le.fit(np.unique(y_train)) 
y_train_balanced_encoded = le.transform(y_train_balanced)
y_test_encoded = le.transform(y_test)

results_list = []

#Best Tracker
best_result = [0.0, "None", None, None] 

# Define feature sets
feature_sets = [
    ("Word TF-IDF", X_train_word, X_test_word, y_train, y_test),
    ("Char TF-IDF", X_train_char, X_test_char, y_train, y_test),
    ("Combined (Word+Char)", X_train_combined, X_test_combined, y_train, y_test),
    ("Combined + SMOTE", X_train_balanced, X_test_combined, y_train_balanced, y_test),
]

# Define base models
models = [
    ("Logistic Regression", LogisticRegression(max_iter=5000, C=1.0, random_state=42, solver='saga')), 
    ("SVM", LinearSVC(C=0.5, max_iter=2000, random_state=42)),
]

print("running model combinations")

# LR and SVM Loop
for feat_name, X_tr, X_te, y_tr, y_te in feature_sets:
    for model_name, model in models:
        full_name = f"{feat_name} + {model_name}"

        print(f"Started on: {full_name}")
        
        y_true_final = y_test # The true labels are always y_test
        try:
            model.fit(X_tr, y_tr)
            y_pred = model.predict(X_te)
            
            f1_weighted = f1_score(y_true_final, y_pred, average='weighted')

            #Print the current report
            report = classification_report(y_true_final, y_pred)
            
            
            #This is for the dataframe
            reportDict = classification_report(y_true_final, y_pred, output_dict=True)
            results_list.append({
                "Feature + Model": f"{feat_name} + {model_name}",
                "Sexist (P)": f"{reportDict['sexist']['precision']:.2f}",
                "Sexist (R)": f"{reportDict['sexist']['recall']:.2f}",
                "Sexist (F1)": f"{reportDict['sexist']['f1-score']:.2f}",
                "Not Sexist (P)": f"{reportDict['not sexist']['precision']:.2f}",
                "Not Sexist (R)": f"{reportDict['not sexist']['recall']:.2f}",
                "Not Sexist (F1)": f"{reportDict['not sexist']['f1-score']:.2f}",
                "Weighted (P)": f"{reportDict['weighted avg']['precision']:.2f}",
                "Weighted (R)": f"{reportDict['weighted avg']['recall']:.2f}",
                "Weighted (F1)": f"{reportDict['weighted avg']['f1-score']:.2f}"
            })
            
            print(f"Finished with: {full_name}")

            #Not sure about format so I am not printing this
            #print(report)
            
            # Update best F1 score tracker
            if f1_weighted > best_result[0]:
                best_result[0] = f1_weighted
                best_result[1] = full_name
                best_result[2] = y_pred 
                best_result[3] = y_true_final 
            
        except Exception as e:
            print(f"Error with {full_name}: {e}")
            

# XGBOOST loop
for feat_name, X_tr, X_te, y_tr, y_te in feature_sets:
    full_name = f"{feat_name} + XGBoost"

    print(f"Started on: {full_name}")
    
    y_true_final = y_test # The true labels are always y_test
    try:
        if feat_name == "Combined + SMOTE":
            y_tr_enc = y_train_balanced_encoded
        elif feat_name == "Combined + SMOTEENN":
            y_tr_enc = le.transform(y_tr) 
        else:
            y_tr_enc = le.transform(y_tr) 

        
        #I got parameters from the Grid Search
        xgb_temp = xgb.XGBClassifier(
            n_estimators=400, max_depth=7, learning_rate=0.05, random_state=42, eval_metric='logloss'
        )
        
        xgb_temp.fit(X_tr, y_tr_enc)
        y_pred_enc = xgb_temp.predict(X_te)
        y_pred = le.inverse_transform(y_pred_enc)
        
        f1_weighted = f1_score(y_true_final, y_pred, average='weighted')

        #Print report
        report = classification_report(y_true_final, y_pred)
        

        
        #This is for the dataframe
        reportDict = classification_report(y_true_final, y_pred, output_dict=True)
        
        results_list.append({
                "Feature + Model": f"{feat_name} + XGBoost",
                "Sexist (P)": f"{reportDict['sexist']['precision']:.2f}",
                "Sexist (R)": f"{reportDict['sexist']['recall']:.2f}",
                "Sexist (F1)": f"{reportDict['sexist']['f1-score']:.2f}",
                "Not Sexist (P)": f"{reportDict['not sexist']['precision']:.2f}",
                "Not Sexist (R)": f"{reportDict['not sexist']['recall']:.2f}",
                "Not Sexist (F1)": f"{reportDict['not sexist']['f1-score']:.2f}",
                "Weighted (P)": f"{reportDict['weighted avg']['precision']:.2f}",
                "Weighted (R)": f"{reportDict['weighted avg']['recall']:.2f}",
                "Weighted (F1)": f"{reportDict['weighted avg']['f1-score']:.2f}"
            })
        
        print(f"Finished wish: {full_name}")
        #Not sure about format so I am not printing this
        #print(report)  

        
        # Update best F1 score tracker
        if f1_weighted > best_result[0]:
            best_result[0] = f1_weighted
            best_result[1] = full_name
            best_result[2] = y_pred
            best_result[3] = y_true_final 

    except Exception as e:
        print(f"Error with {full_name}: {e}")
        
print("Finished with all combos")

#Report Best
best_f1, best_name, y_pred_best, y_true_best = best_result

results_df = pd.DataFrame(results_list)
display(results_df)

print(f"Best Combo: {best_name}")

if y_pred_best is not None:
    
    final_report = classification_report(y_true_best, y_pred_best)
    
    print(final_report)
    

running model combinations
Started on: Word TF-IDF + Logistic Regression
Finished with: Word TF-IDF + Logistic Regression
Started on: Word TF-IDF + SVM
Finished with: Word TF-IDF + SVM
Started on: Char TF-IDF + Logistic Regression
Finished with: Char TF-IDF + Logistic Regression
Started on: Char TF-IDF + SVM
Finished with: Char TF-IDF + SVM
Started on: Combined (Word+Char) + Logistic Regression
Finished with: Combined (Word+Char) + Logistic Regression
Started on: Combined (Word+Char) + SVM
Finished with: Combined (Word+Char) + SVM
Started on: Combined + SMOTE + Logistic Regression
Finished with: Combined + SMOTE + Logistic Regression
Started on: Combined + SMOTE + SVM
Finished with: Combined + SMOTE + SVM
Started on: Word TF-IDF + XGBoost
Finished wish: Word TF-IDF + XGBoost
Started on: Char TF-IDF + XGBoost
Finished wish: Char TF-IDF + XGBoost
Started on: Combined (Word+Char) + XGBoost
Finished wish: Combined (Word+Char) + XGBoost
Started on: Combined + SMOTE + XGBoost
Finished wish: 

Unnamed: 0,Feature + Model,Sexist (P),Sexist (R),Sexist (F1),Not Sexist (P),Not Sexist (R),Not Sexist (F1),Weighted (P),Weighted (R),Weighted (F1)
0,Word TF-IDF + Logistic Regression,0.82,0.24,0.37,0.77,0.98,0.86,0.79,0.78,0.73
1,Word TF-IDF + SVM,0.73,0.48,0.58,0.83,0.93,0.88,0.8,0.81,0.8
2,Char TF-IDF + Logistic Regression,0.84,0.31,0.45,0.79,0.98,0.87,0.8,0.79,0.76
3,Char TF-IDF + SVM,0.73,0.45,0.56,0.82,0.94,0.87,0.79,0.8,0.79
4,Combined (Word+Char) + Logistic Regression,0.79,0.38,0.52,0.81,0.96,0.88,0.8,0.8,0.78
5,Combined (Word+Char) + SVM,0.74,0.54,0.62,0.84,0.93,0.88,0.81,0.82,0.81
6,Combined + SMOTE + Logistic Regression,0.64,0.66,0.65,0.87,0.86,0.86,0.81,0.8,0.8
7,Combined + SMOTE + SVM,0.64,0.61,0.62,0.86,0.87,0.86,0.79,0.8,0.8
8,Word TF-IDF + SVM,0.76,0.47,0.58,0.83,0.94,0.88,0.81,0.81,0.8
9,Char TF-IDF + SVM,0.76,0.48,0.59,0.83,0.94,0.88,0.81,0.82,0.8


Best Combo: Combined + SMOTE + XGBoost
              precision    recall  f1-score   support

  not sexist       0.84      0.95      0.89       789
      sexist       0.79      0.52      0.62       297

    accuracy                           0.83      1086
   macro avg       0.81      0.73      0.76      1086
weighted avg       0.82      0.83      0.82      1086



In [6]:
# This is the old way I was doing it, I was too lazy to get rid of in case
# i wanted to reuse, just ignore this cell
# train and evaluate SVM with cross-validation

'''
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report, confusion_matrix, f1_score
import seaborn as sns

svm_model_cv = LinearSVC(class_weight="balanced", C=0.5)
cv_scores = cross_val_score(svm_model_cv, X_train, y_train, cv=5)
print(f"SVM Cross-Validation Accuracy: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")


log_model_cv = LogisticRegressionCV(max_iter=1000, class_weight="balanced", cv=5)
log_cv_scores = cross_val_score(log_model_cv, X_train, y_train, cv=5)
print(f"Logistic Regression Cross-Validation Accuracy: {np.mean(log_cv_scores):.4f} ± {np.std(log_cv_scores):.4f}")
y_pred_log = log_reg.predict(X_test)
print("Weighted F1:", f1_score(y_test, y_pred_log, average="weighted"))
y_pred_svm = svm_model.predict(X_test)
print("Logistic Regression Report:\n", classification_report(y_test, y_pred_log))
print("SVM Report:\n", classification_report(y_test, y_pred_svm))
"""
rf_model_cv = RandomForestClassifier(n_estimators=300, class_weight="balanced", random_state=42)
rf_cv_scores = cross_val_score(rf_model_cv, X_train, y_train, cv=5)
print(f"Random Forest Cross-Validation Accuracy: {np.mean(rf_cv_scores):.4f} ± {np.std(rf_cv_scores):.4f}")
y_pred_rf = rf_model.predict(X_test)
print("Random Forest Report:\n", classification_report(y_test, y_pred_rf))
"""
# Confusion matrix for visualization
cm = confusion_matrix(y_test, y_pred_log, labels=log_reg.classes_)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=log_reg.classes_, yticklabels=log_reg.classes_)
plt.title("Confusion Matrix - Logistic Regression")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()
# get dict form of the report
rep_lr = classification_report(y_test, y_pred_log, output_dict=True)
weighted_lr = rep_lr['weighted avg']   # dict with precision, recall, f1-score, support

rep_svm = classification_report(y_test, y_pred_svm, output_dict=True)
weighted_svm = rep_svm['weighted avg']
results = pd.DataFrame({
    "Model": ["Logistic Regression", "SVM"],
    "Accuracy": [log_reg_score, svm_score],
    "CV Mean": [np.nan, np.mean(cv_scores)],
    "CV Std": [np.nan, np.std(cv_scores)],
    "LogReg weighted P/R/F1:": [f"{weighted_lr['precision']:.4f}/{weighted_lr['recall']:.4f}/{weighted_lr['f1-score']:.4f}", ""],
    "SVM weighted P/R/F1:": ["", f"{weighted_svm['precision']:.4f}/{weighted_svm['recall']:.4f}/{weighted_svm['f1-score']:.4f}"]
})
results
'''

'\nfrom sklearn.svm import LinearSVC\nfrom sklearn.metrics import classification_report, confusion_matrix, f1_score\nimport seaborn as sns\n\nsvm_model_cv = LinearSVC(class_weight="balanced", C=0.5)\ncv_scores = cross_val_score(svm_model_cv, X_train, y_train, cv=5)\nprint(f"SVM Cross-Validation Accuracy: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")\n\n\nlog_model_cv = LogisticRegressionCV(max_iter=1000, class_weight="balanced", cv=5)\nlog_cv_scores = cross_val_score(log_model_cv, X_train, y_train, cv=5)\nprint(f"Logistic Regression Cross-Validation Accuracy: {np.mean(log_cv_scores):.4f} ± {np.std(log_cv_scores):.4f}")\ny_pred_log = log_reg.predict(X_test)\nprint("Weighted F1:", f1_score(y_test, y_pred_log, average="weighted"))\ny_pred_svm = svm_model.predict(X_test)\nprint("Logistic Regression Report:\n", classification_report(y_test, y_pred_log))\nprint("SVM Report:\n", classification_report(y_test, y_pred_svm))\n"""\nrf_model_cv = RandomForestClassifier(n_estimators=300, clas