**Name:- Pranjal Godse ------------------- Batch:- 6**

# Problem 3 - End-to-End NLP Application
## News Similarity Search

This notebook demonstrates:
- Loading 20 Newsgroups dataset
- Converting text into TF-IDF vectors
- Computing cosine similarity
- Retrieving Top 3 similar news articles


In [1]:
!pip install scikit-learn



In [2]:
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Load Dataset

In [3]:
dataset = fetch_20newsgroups(remove=('headers', 'footers', 'quotes'))
documents = dataset.data[:2000]

print("Total Documents:", len(documents))

Total Documents: 2000


## Convert Text to TF-IDF

In [4]:
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
tfidf_matrix = vectorizer.fit_transform(documents)

print("TF-IDF Matrix Shape:", tfidf_matrix.shape)

TF-IDF Matrix Shape: (2000, 5000)


## Define Similarity Function

In [5]:
def find_similar_articles(query_index, top_n=3):
    query_vector = tfidf_matrix[query_index]
    similarities = cosine_similarity(query_vector, tfidf_matrix).flatten()
    similar_indices = similarities.argsort()[-(top_n+1):-1][::-1]
    return similar_indices

## Test Similarity Search

In [6]:
query_index = 10

print("Original Article:\n")
print(documents[query_index][:500])

similar_articles = find_similar_articles(query_index)

print("\n\nTop 3 Similar Articles:\n")

for idx in similar_articles:
    print("\n--- Similar Article ---\n")
    print(documents[idx][:500])

Original Article:

I have a line on a Ducati 900GTS 1978 model with 17k on the clock.  Runs
very well, paint is the bronze/brown/orange faded out, leaks a bit of oil
and pops out of 1st with hard accel.  The shop will fix trans and oil 
leak.  They sold the bike to the 1 and only owner.  They want $3495, and
I am thinking more like $3K.  Any opinions out there?  Please email me.
Thanks.  It would be a nice stable mate to the Beemer.  Then I'll get
a jap bike and call myself Axis Motors!

-- 
----------------------


Top 3 Similar Articles:


--- Similar Article ---

it seems the 200 miles of trailering in the rain has rusted my bike's headers.
the metal underneath is solid, but i need to sand off the rust coating and
repaint the pipes black.  any recommendations for paint and application
of said paint?

thanks!

--- Similar Article ---

However, this has nothing to do with motorcycling, unless you consider
the Amazona a bike.

--- Similar Article ---


: ... I think they should rename 