#  News Sentiment Analysis with VADER + Google News RSS

##  Objective
In this notebook, you will collect recent news articles from Google News using an RSS feed, analyze their sentiment using NLTK’s VADER lexicon, and summarize your findings.

You will:
- Choose 3 topics of interest
- Fetch recent headlines and summaries for each topic
- Analyze their sentiment (positive, negative, neutral)
- Create a summary table



## Step 0:  – Scrape Headlines with `requests` + `BeautifulSoup`

If you'd like to explore **another way to get news**, try scraping headlines directly from a news website.

Choose a simple website like:
- https://www.bbc.com/news
- https://edition.cnn.com/
- https://elpais.com/

Use `requests` to download the HTML, and `BeautifulSoup` to extract the titles (look for `<h2>` or `<a>` tags inside news containers).

Store at least 10 headlines in a list. You can later analyze these with the same sentiment pipeline.


## Step 1: Clean Headlines with NLP Tools

Before analyzing the headlines, clean the text using NLP tools:

- Lowercase all text
- Tokenize into words
- Remove stopwords (with `nltk` or `spacy`)
- Lemmatize (optional but recommended)

Show a sample of your cleaned tokens.


##  Step 1.5: Check and Correct Spelling with SymSpell

Using your scraped or fetched headlines, introduce a few **intentional spelling mistakes**, or analyze them as-is if they come from social media.

Then:
- Load the SymSpell dictionary
- Detect and suggest corrections for at least 5 tokens
- Display original and corrected terms

Explain briefly if any real mistakes were found.


## 🌐 Step 2: Fetch News from Google News RSS

Use the helper function `fetch_news_items()` to retrieve at least 5 articles per keyword.

Make sure each news item contains:
- Title
- Summary
- Published date
- Link

Display the articles in a DataFrame.


## Step 3: Analyze Sentiment with VADER

Use `SentimentIntensityAnalyzer` from NLTK to evaluate the sentiment of each article.

You should:
- Combine the title and summary
- Compute the sentiment scores
- Classify each article as:
  -  positive (compound ≥ 0.05)
  -  negative (compound ≤ -0.05)
  -  neutral (otherwise)

Store the results in a DataFrame.


##  Step 4: Summary by Keyword

Group the results by keyword and create a summary that includes:
- Mean compound score
- Percentage of positive, negative, and neutral articles
- Total number of articles

You can use `groupby()` and `agg()` for this part.


##  Bonus: Keyword or Entity Recognition (Regex or FlashText)

Use `FlashText` or `re.findall()` to search for keywords or patterns in your headlines or summaries.

Some ideas:
- Extract keywords like "AI", "Bitcoin", "climate"
- Use regex to extract years (`\d{4}`), dates, or email-like patterns

Display the matches in a table or dictionary format.
