# **NLP Classification: Human vs. Machine Translation**
## **Objective:**
This project aims to classify Spanish sentences as either:
- **0 (Machine-translated)**
- **1 (Human-translated)**

We will preprocess text, extract features, and train a classifier.

## **1. Install & Import Libraries**

In [None]:
pip install numpy pandas matplotlib seaborn scikit-learn
pip install nltk spacy unidecode transformers sentence-transformers torch torchvision torchaudio
pip install tensorflow keras fastapi uvicorn
python -m spacy download es_core_news_md

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
import spacy
import torch
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

## **2. Load & Explore Dataset**

In [None]:
# Load dataset (update filename accordingly)
df = pd.read_csv('dataset.csv')
df.head()

In [None]:
# Check dataset info
df.info()
# Check label distribution
sns.countplot(x=df['label'])
plt.show()

## **3. Text Preprocessing**

In [None]:
# Load SpaCy Spanish model
nlp = spacy.load('es_core_news_md')
def preprocess_text(text):
    doc = nlp(text.lower())
    tokens = [token.lemma_ for token in doc if not token.is_stop and token.is_alpha]
    return ' '.join(tokens)
df['clean_text'] = df['text'].apply(preprocess_text)
df.head()

## **4. Feature Extraction**

In [None]:
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['clean_text'])
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **5. Model Training & Evaluation**

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

In [None]:
plt.figure(figsize=(5,5))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

## **6. Save Model for Deployment**

In [None]:
import pickle
with open('translation_classifier.pkl', 'wb') as f:
    pickle.dump(model, f)
print('Model saved!')

## **Next Steps 🚀**
- Try different models (SVM, Random Forest, BERT).
- Fine-tune hyperparameters.
- Deploy using **FastAPI** or **Flask**.

Happy coding! 🎯