<center><font color='Black' style='font-family:verdana; font-size:25px'>News Summarization & Hindi TTS - Project Overview</font></center>
<hr style="color: black; height: 1px;">

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Introduction</font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">
This project is designed to extract news articles related to a given company, summarize them, analyze their sentiment, translate the summary into Hindi, and generate a Hindi text-to-speech (TTS) output. The application is built using Streamlit for the frontend, while NLP techniques and API integrations are handled in the backend.
</font>
<br>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Objectives</font></b><br>
<ul>
    <li>Fetch the latest news articles for a given company using an API.</li>
    <li>Generate a concise summary of the news articles.</li>
    <li>Perform sentiment analysis to determine whether the news is positive, negative, or neutral.</li>
    <li>Translate the summary into Hindi for better accessibility.</li>
    <li>Convert the Hindi text into speech using Google TTS.</li>
    <li>Provide an interactive UI for users to input company names and retrieve results.</li>
</ul>
<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Workflow</font></b><br>
<ul>
    <li><b>User Input:</b> The user enters the name of a company.</li>
    <li><b>Fetching News:</b> The system retrieves the latest news articles from NewsAPI.</li>
    <li><b>Summarization:</b> The text of the news articles is summarized using NLP models.</li>
    <li><b>Sentiment Analysis:</b> The summarized text is analyzed to determine sentiment polarity.</li>
    <li><b>Translation:</b> The summary is translated into Hindi using Google Translate API.</li>
    <li><b>Text-to-Speech:</b> The translated summary is converted into an audio file using Google TTS.</li>
    <li><b>User Output:</b> The summarized text, sentiment, and Hindi audio output are displayed in the UI.</li>
</ul>
<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Tech Stack</font></b><br>
<ul>
    <li><b>Frontend:</b> Streamlit (for interactive UI)</li>
    <li><b>Backend:</b> Python (Flask APIs, NLP models)</li>
    <li><b>APIs:</b>
        <ul>
            <li>NewsAPI (for fetching news articles)</li>
            <li>Google Translate API (for language translation)</li>
            <li>Google Text-to-Speech (TTS) API (for generating speech output)</li>
        </ul>
    </li>
    <li><b>Libraries Used:</b>
        <ul>
            <li>requests (for making API calls)</li>
            <li>nltk (for sentiment analysis using VADER)</li>
            <li>transformers (for text summarization using NLP models)</li>
            <li>googletrans (for translation)</li>
            <li>gTTS (for text-to-speech conversion)</li>
            <li>beautifulsoup4 (for web scraping, if needed in future enhancements)</li>
        </ul>
    </li>
</ul>
<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Implementation Details</font></b><br>
<ul>
    <li><b>1. Fetching News:</b>
        <ul>
            <li>The function <code>fetch_news(company_name)</code> retrieves the latest 5 news articles related to the given company.</li>
            <li>Uses the <code>requests</code> library to make API calls to NewsAPI.</li>
        </ul>
    </li>
    <li><b>2. Text Summarization:</b>
        <ul>
            <li>Uses a transformer-based model for text summarization.</li>
            <li>Function <code>summarize_text(text)</code> processes the news content and returns a short summary.</li>
        </ul>
    </li>
    <li><b>3. Sentiment Analysis:</b>
        <ul>
            <li>Utilizes NLTK's VADER sentiment analysis tool.</li>
            <li>Function <code>analyze_sentiment(text)</code> categorizes news sentiment into Positive, Neutral, or Negative.</li>
        </ul>
    </li>
    <li><b>4. Translation to Hindi:</b>
        <ul>
            <li>Uses Google Translate API to translate English summaries into Hindi.</li>
            <li>Function <code>translate_to_hindi(text)</code> performs the translation.</li>
        </ul>
    </li>
    <li><b>5. Text-to-Speech (TTS):</b>
        <ul>
            <li>Converts Hindi text summaries into speech using Google TTS.</li>
            <li>Function <code>text_to_speech_google(text)</code> generates an MP3 audio file.</li>
        </ul>
    </li>
</ul>
<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Expected Output</font></b><br>
After processing the news articles, the system will display:
<ul>
    <li>News Title & Link</li>
    <li>Summarized Content</li>
    <li>Sentiment Analysis Result</li>
    <li>Hindi Summary</li>
    <li>Audio Output of Hindi Summary (with download option)</li>
</ul>
<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Future Enhancements</font></b><br>
<ul>
    <li>Support for Multiple Languages: Extend translation support beyond Hindi.</li>
    <li>Enhanced Summarization: Fine-tune summarization models for better results.</li>
    <li>Advanced Sentiment Analysis: Implement deep learning models for improved accuracy.</li>
    <li>Real-Time News Feeds: Fetch news in real-time instead of API limitations.</li>
</ul>
<hr>

<font color="black" style="font-family:Cambria; font-size:16px">
This project effectively combines news extraction, NLP techniques, and text-to-speech synthesis to make news content more accessible and insightful. 🚀
</font>

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import requests
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from transformers import pipeline
from collections import Counter
from gtts import gTTS  # Google TTS
import os
from googletrans import Translator  # Google Translate API




In [3]:
#  NLTK ke models download karo
nltk.download('vader_lexicon')

#  Transformer-based summarization model load karo
summarizer = pipeline("summarization")

#  News API Key
API_KEY = "b133a9477b9645729e0908fc27f18092"

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\shaik\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


<b><font color='#067F7D' style='font-family:cambria; font-size:20px'>Functions Overview</font></b><br>

<font color="black" style="font-family:Cambria; font-size:16px">This section explains the key functions used in the News Summarization & Hindi TTS project. Each function is designed to handle a specific task, ensuring smooth execution from fetching news to generating audio output.</font>

<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>1. Fetch News: <code>fetch_news(company_name)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function retrieves the latest news articles related to the specified company using an external API like NewsAPI.</font>

<ul>
    <li><b>Input:</b> Company name (string)</li>
    <li><b>Process:</b> Makes an API request to fetch relevant news articles.</li>
    <li><b>Output:</b> Returns a list of news articles containing the title, description, URL, and content.</li>
</ul>

<hr>

<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>2. Summarization: <code>summarize_text(text)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function generates a concise summary of the given news article using NLP techniques.</font>

<ul>
    <li><b>Input:</b> News article text (string)</li>
    <li><b>Process:</b> Uses a transformer-based NLP model to generate a summary.</li>
    <li><b>Output:</b> Returns the summarized text (shortened version of the article).</li>
</ul>

In [4]:
def fetch_news(query):
    """
    Given a query, fetch recent news from Google News API.
    """
    url = f"https://newsapi.org/v2/everything?q={query}&apiKey={API_KEY}&language=en&pageSize=5"
    
    response = requests.get(url)
    print("DEBUG: Response Status Code:", response.status_code)

    if response.status_code != 200:
        print("❌ Request failed!")
        return []

    data = response.json()
    if "articles" not in data or not data["articles"]:
        print("🚨 No articles found!")
        return []

    news_list = []
    for article in data["articles"]:
        full_text = article["description"] or article["title"]  
        
        # 🔹 Summarization
        summary = summarizer(full_text, max_length=min(50, len(full_text)//2), min_length=10, do_sample=False)[0]['summary_text']

        news_list.append({
            "title": article["title"],
            "url": article["url"],
            "summary": summary
        })

    return news_list

<hr>


<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>3. Sentiment Analysis: <code>analyze_sentiment(text)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function determines the sentiment of the summarized news text, categorizing it as positive, negative, or neutral.</font>

<ul>
    <li><b>Input:</b> Summarized news text (string)</li>
    <li><b>Process:</b> Uses VADER sentiment analysis from the NLTK library to classify sentiment.</li>
    <li><b>Output:</b> Returns the sentiment category (Positive, Neutral, or Negative).</li>
</ul>

In [5]:
def analyze_sentiment(text):
    """
    Analyze sentiment of a given text using NLTK VADER.
    """
    sia = SentimentIntensityAnalyzer()
    sentiment_score = sia.polarity_scores(text)

    if sentiment_score['compound'] >= 0.05:
        return "Positive", sentiment_score['compound']
    elif sentiment_score['compound'] <= -0.05:
        return "Negative", sentiment_score['compound']
    else:
        return "Neutral", sentiment_score['compound']


<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>4. Comparative Analysis: <code>comparative_analysis(company1, company2)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function compares the sentiment of news articles for two different companies.</font>

<ul>
    <li><b>Input:</b> Two company names (strings)</li>
    <li><b>Process:</b> Fetches news, summarizes them, and analyzes sentiment for both companies.</li>
    <li><b>Output:</b> Returns a comparative sentiment report.</li>
</ul>

In [6]:

def comparative_analysis(articles):
    """
    Compare sentiment distribution, coverage differences, and topic overlap.
    """
    sentiment_counts = {"Positive": 0, "Negative": 0, "Neutral": 0}
    topic_keywords = []
    coverage_differences = []

    for article in articles:
        sentiment, _ = analyze_sentiment(article["summary"])
        sentiment_counts[sentiment] += 1

        words = article["summary"].split()[:3]  
        topic_keywords.append(" ".join(words))

    topic_freq = Counter(topic_keywords)
    common_topics = [topic for topic, freq in topic_freq.items() if freq > 1]
    unique_topics = [topic for topic, freq in topic_freq.items() if freq == 1]

    for i in range(len(articles) - 1):
        coverage_differences.append({
            "Comparison": f"Article {i+1} vs Article {i+2}",
            "Impact": f"{articles[i]['title']} focuses on {topic_keywords[i]}, "
                      f"while {articles[i+1]['title']} focuses on {topic_keywords[i+1]}."
        })

    return {
        "Sentiment Distribution": sentiment_counts,
        "Coverage Differences": coverage_differences,
        "Topic Overlap": {
            "Common Topics": common_topics,
            "Unique Topics": unique_topics
        }
    }



<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>5. Translate to Hindi: <code>translate_to_hindi(text)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function translates the summarized news from English to Hindi using Google Translate API.</font>

<ul>
    <li><b>Input:</b> English text (string)</li>
    <li><b>Process:</b> Uses the Google Translate API to convert text into Hindi.</li>
    <li><b>Output:</b> Returns the Hindi-translated text.</li>
</ul>

In [7]:
def translate_to_hindi(text):
    """
    Translate English text to Hindi using Google Translate API.
    """
    translator = Translator()
    translated_text = translator.translate(text, src="en", dest="hi")
    return translated_text.text

<b><font color='#067F7D' style='font-family:cambria; font-size:18px'>6. Text-to-Speech (TTS): <code>text_to_speech_google(text)</code></font></b><br>
<font color="black" style="font-family:Cambria; font-size:16px">This function converts the translated Hindi text into an MP3 audio file using Google TTS.</font>

<ul>
    <li><b>Input:</b> Hindi text (string)</li>
    <li><b>Process:</b> Uses Google TTS API to generate speech from the text.</li>
    <li><b>Output:</b> Returns an MP3 audio file containing the spoken version of the news summary.</li>
</ul>

In [8]:
def text_to_speech_google(text, filename="news_summary.mp3"):
    """
    Convert text to speech in Hindi using Google TTS and save it as an MP3 file.
    """
    tts = gTTS(text=text, lang="hi", slow=False)
    tts.save(filename)
    print(f"🔊 Audio saved as {filename}")
    return filename

<font color="Brown" style="font-family:Cambria; font-size:16px">These functions work together to form the backbone of the News Summarization & Hindi TTS project, ensuring efficient data retrieval, processing, and output generation.</font>


In [9]:
#  Fetch news for a company
company_name = "TCS"
articles = fetch_news(company_name)

if not articles:
    print(" No news found for", company_name)
else:
    print(f"✅ Found {len(articles)} news articles for {company_name}.\n")

    for i, article in enumerate(articles, 1):
        sentiment, score = analyze_sentiment(article['summary'])

        print(f"{i}. {article['title']} ({article['url']})")
        print(f"Summary: {article['summary']}")
        print(f"Sentiment: {sentiment} (Score: {score})\n")
    
    # 🚀 Run comparative analysis
    analysis_result = comparative_analysis(articles)
    print("📊 Comparative Analysis:\n", analysis_result)

    # 🎙 Generate Hindi Audio using Google TTS
    final_summary = "\n".join([f"{i+1}. {translate_to_hindi(art['summary'])}" for i, art in enumerate(articles)])
    tts_filename = text_to_speech_google(final_summary, "news_summary.mp3")
    
    print('-'*50)

    print(f" Hindi TTS Generated: {tts_filename}")


Your max_length is set to 50, but your input_length is only 36. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=18)


DEBUG: Response Status Code: 200
✅ Found 5 news articles for TCS.

1. Top US Marathon Charity Programs Break Fundraising Records (https://www.forbes.com/sites/davidhessekiel/2025/02/26/top-us-marathon-charity-programs-break-fundraising-records/)
Summary:  The TCS New York City Marathon made it official today: the charity programs of all three of America’s most prestigious marathons raised record amounts in 2024 .
Sentiment: Positive (Score: 0.4215)

2. Brit supermarket finds breaking up is hard to do as Walmart-Asda divorce stretches into fourth year (https://www.theregister.com/2025/03/19/asda_walmart_tech_divorce/)
Summary:  'Three-year' tech support deal still running from Walmart . UK's third-largest grocery retailer is set to finish its "three-year" tech divorce project . Most project staff have been moved on .
Sentiment: Positive (Score: 0.4019)

3. Xiaomi expands its AIoT lineup with new accessories (https://phandroid.com/2025/03/03/xiaomi-expands-its-aiot-lineup-with-new-access