# Concept 2: Bag of Words & TF-IDF

### Bag of Words:
- Count how many times each word appears in a text

### TF-IDF:
- Smart counting method that considers the importance of words
- Combines how often a word appears in a specific document with how rare that word is across all documents

![Bag of Words Illustration](images/bag_of_words.png)

### Think Like a Word Counter

- Example sentence: "I love cats, I love dogs"

- Word counts:
  - "I" appears 2 times
  - "love" appears 2 times
  - "cats" appears 1 time
  - "dogs" appears 1 time

- This is similar to counting ingredients in a recipe! 👨‍🍳

### Real-World Application: Google Search

- **TF (Term Frequency):** How often does "Python" appear in an article?
- **IDF (Inverse Document Frequency):** Is "Python" rare across many articles?
- **Result:** Words that are rare and appear often are considered very important!

This helps search engines rank the most relevant results! 🔍

### Let's Count Some Words!

We'll build our own word counter from scratch

In [None]:
from collections import Counter

text = "I love Python. Python is amazing."
words = text.lower().split()
word_counts = Counter(words)

print(word_counts)

🚀 [Open in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/2/concept_2.ipynb)

### Bag of Words Made Simple

- It's just smart counting:
  - Count every word in your text
  - TF-IDF gives extra importance to rare but frequent words
  - Common words like "the", "a" are less important

### TF-IDF from a Different Angle

**Whiteboard Time! 📝**
- Think of TF-IDF as highlighting important or meaningful words in a text — rare words that appear often are the most significant!

I hope this explanation makes things clearer!

### Quick Check

- TF-IDF helps find the most meaningful words by balancing how often they appear with how rare they are.
- If analyzing movie reviews, which words do you think TF-IDF would consider most important?