# Cosine Similarity Basics

## Introduction to Cosine Similarity

Cosine similarity is a way to measure how similar two word vectors are. It gives a value between -1 and 1.

- **1** means the vectors are pointing in the same direction (very similar)
- **0** means they are perpendicular (not similar)
- **-1** means they are pointing in opposite directions (completely different)

Here's a visual to understand this concept:

![Cosine Similarity Illustration](images/cosine_similarity.png)

## Think Like Measuring Friendship

Imagine two friends pointing in the same or opposite directions:

- If they point in the same direction, they have similar interests (high similarity)
- If opposite, their interests differ (low similarity)

Cosine similarity measures the 'angle' between word vectors, capturing how aligned they are, regardless of their length.

## Real-World Application: Document Comparison

This technique is used to detect plagiarism by:

- Converting documents into vectors (like TF-IDF)
- Calculating cosine similarity between these vectors
- A high similarity might suggest copying

Universities often use this to check student submissions!

## Let's Measure Text Similarity!

We'll compare different sentences using cosine similarity.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

texts = [
    "I love programming in Python",
    "Python programming is amazing", 
    "I hate cooking vegetables"
]

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(texts)

In [None]:
# Compare first two texts
similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
print(f"Similarity: {similarity[0][0]:.3f}")

[Open in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/2/advanced.ipynb)

## Cosine Similarity Made Simple

- 1.0 = texts are identical
- 0.8+ = very similar texts
- 0.0 = completely different

It's a handy tool to find related content!

## Whiteboard Time! 📝

Think of it as measuring how much two texts 'agree' with each other mathematically.

I hope this is clear now!

## Quick Check

Cosine similarity helps us quantify how similar or different any two pieces of text are.

In what scenarios would measuring text similarity be crucial for your applications?