# Introduction
This notebook leverages pre-defined functions from the `train_svm.py` script to train an SVM model on the Sentiment140 dataset using TF-IDF features.

### Setup

In [8]:
import sys
sys.path.append('../../src/models/')  # Add the path to the script

In [9]:
from train_svm import (
    load_data, vectorize_text, train_svm,
    evaluate_model, save_model_and_vectorizer
)

### Load the cleaned data

In [10]:
df = load_data('../../data/processed/cleaned_data.csv')
df = df.dropna(subset=['clean_text'])

### Feature Engineering: TF-IDF Vectorization

In [11]:
X, tfidf = vectorize_text(df, max_features=1000)
y = df['label']

### Train SVM Model

In [12]:
model = train_svm(X, y)

[LibLinear]

### Evaluate the Model

In [13]:
accuracy, report = evaluate_model(model, X, y)
print(f"Model Accuracy on Full Dataset: {accuracy}")
print("\nClassification Report:\n", report)

Model Accuracy on Full Dataset: 0.7497189017380981

Classification Report:
               precision    recall  f1-score   support

           0       0.77      0.72      0.74    796302
           1       0.73      0.78      0.76    795668

    accuracy                           0.75   1591970
   macro avg       0.75      0.75      0.75   1591970
weighted avg       0.75      0.75      0.75   1591970



### Visualization (e.g., Confusion Matrix, ROC Curve)

### Save the model and TF-IDF transformer

In [14]:
save_model_and_vectorizer(
    model, tfidf,
    '../../models/svm_model.pkl',
    '../../models/tfidf_vectorizer_svm.pkl'
)