# Concept 3: Using TfidfVectorizer from sklearn

In this notebook, we will learn about `TfidfVectorizer` from scikit-learn (sklearn), which makes calculating TF-IDF scores easy and automatic!

### What is sklearn?
- sklearn is a popular Python library for machine learning.
- It provides many built-in tools to analyze data and build models.


### What is TfidfVectorizer?
- A ready-made tool in sklearn that computes TF-IDF scores for text data.
- It automatically cleans the text and calculates important scores for words.


### Think Like Using a Power Tool
- **Manual Way:** Count words manually using loops and dictionaries.
- **sklearn Way:** Just one line of code does everything!
- It handles text cleaning, scoring, and returns ready-to-use numbers.

_Like using a calculator instead of counting on fingers! 🧮_

### Real-World Example: Content Recommendation
- Spotify analyzes song lyrics to find similar songs.
- It extracts TF-IDF vectors from lyrics.
- Compares user liked songs and recommends music based on common words!

That's how you find your new favorite songs! 🎵

### Let's Use the Professional Tool!
Time to see `TfidfVectorizer` in action.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
documents = [
    "I love machine learning",
    "Python is great for AI",
    "Machine learning with Python"
]

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)

print(vectorizer.get_feature_names_out())
print(tfidf_matrix.toarray())

🚀 [Open this example in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/2/concept_3.ipynb)

### TfidfVectorizer Made Simple
- **Create:** `vectorizer = TfidfVectorizer()`
- **Train:** `vectorizer.fit_transform(your_texts)`
- **Use:** Get numbers that represent your text!


### Sklearn from a Different Angle
- sklearn is like having a team of data scientists - they've already solved the hard problems for you!
- It's a simple way to do complex text analysis.

### Quick Check
- **Question:**
  *What advantage do you see in using sklearn instead of writing TF-IDF from scratch?*