# Training a Text Classifier with TF-IDF and Logistic Regression

## Introduction
Let's learn how to create a simple text classifier using beginner-friendly techniques.

### What is TF-IDF?
- **TF (Term Frequency):** Measures how often a word appears in a document.
- **IDF (Inverse Document Frequency):** Reduces the importance of commonly used words across many documents.
- **Together (TF-IDF):** Helps identify words that are important in a specific document but rare overall.

### What is Logistic Regression?
- A mathematical method that finds patterns in data to predict categories, such as positive or negative reviews.


## Real-World Example
Imagine you have 10,000 movie reviews, half positive and half negative.
- We convert the reviews into numerical features using TF-IDF.
- Then, we train a logistic regression model to learn what words correspond to positive and negative reviews.
- After training, the model can predict the sentiment of new reviews.


## Training Code Demo

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Example training data
train_texts = ["I loved this movie!", "This film was terrible.", "An excellent experience!", "Not good."]
train_labels = [1, 0, 1, 0]  # 1: positive, 0: negative

# Step 1: Convert text to TF-IDF vectors
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(train_texts)

# Step 2: Train the classifier
classifier = LogisticRegression()
classifier.fit(X_train, train_labels)

# Example test data
test_texts = ["What an amazing movie!", "It was a horrible experience."]
X_test = vectorizer.transform(test_texts)

# Step 3: Make predictions
predictions = classifier.predict(X_test)
print("Predicted labels:", predictions)


<a href="https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/4/concept_2.ipynb" target="_blank" class="colab-button">🚀 Open in Colab</a>

## Simplified Explanation
- Show the model examples like "This is great!" → Positive.
- Let the model learn word patterns.
- Test with new text: "Amazing experience!" → Model predicts: Positive.

It's like teaching by example!

## Different Perspective
Imagine teaching a friend to recognize movie genres:
- Show them 100 action movies and 100 romance movies.
- They learn: "explosion", "chase" → Action; "love", "heart" → Romance.
- Now, they can classify new movie descriptions!

*Hope this clarifies how training works!*

## Quick Check
Why do we use TF-IDF instead of just counting words?
- Think about how important words are, and how common or rare they are across all documents.
- TF-IDF highlights important words that are distinctive, helping the model make better predictions.

💭 > Consider how common words like "the" or "and" can be less helpful for classification.