**Project Overview**

---


This project is a smart email assistant that helps users manage their emails more efficiently.
It automatically identifies important emails based on the keywords provided by the user,
creates short summaries for quick understanding, and can also read the summaries aloud in the
selected language. The system uses Artificial Intelligence (AI) techniques to understand,
prioritize, and convert text to speech, making email management faster and easier.

**Key Features of the System**


---
 File Upload: Upload a CSV file containing email data.  
1) *Email Display:* View the
subject and preview of each uploaded email.  
2) *Priority Ranking:* Sorts and highlights important emails based on user keywords using TF-IDF.  
3) *Summarization:* Generates short summaries of the top-ranked emails using the TextRank algorithm.  
4) *Multi-Language Support:* Converts summaries into speech in different languages (English, Hindi, Telugu, Tamil, etc.).  
5) *Speech Output:* Plays the generated audio directly in Colab for easy listening.  
6) *Download Option:* Lets users download the generated audio file for later use.

**Tech Stack**


---
1) *Python* – Core programming language used for building the project.  
2) *Pandas* – For reading and processing the CSV file containing email data.  
3) *Scikit-learn (TF-IDF Vectorizer)* – To identify and prioritize important emails based on keywords.  
4) *Sumy / Gensim* – For text summarization using the TextRank algorithm.  
5) *gTTS (Google Text-to-Speech)* – To convert summaries into speech.  
6) *IPython.display* – To play the generated audio directly in Colab.  
7) *Colab Widgets*– For interactive file upload and user input.

**How to Use the Notebook**

---



1. *Upload Email Data:*  
   Use the file upload option to upload your CSV file containing email data.
2. *View Emails:*  
   The notebook will display the subject lines and short previews of all uploaded emails.
3. *Enter Keywords:*  
   Provide keywords that represent your priority topics (e.g., "meeting", "invoice", "urgent").
4. *Email Prioritization:*  
   The system ranks the emails based on their relevance to the entered keywords.
5. *Summarization:*  
   Top-ranked emails are summarized using the TextRank algorithm.
6. *Text-to-Speech Conversion:*  
   Select a language and convert the generated summaries into audio.
7. *Play or Download Audio:*  
   Listen to the audio output directly in Colab or download it for later use.

In [8]:
# Priority-Based Email Summarizer with Multi-Language Speech & Download

!pip install pandas scikit-learn gtts googletrans==4.0.0-rc1 sumy nltk

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.text_rank import TextRankSummarizer
from gtts import gTTS
from googletrans import Translator
import nltk
import os
from google.colab import files
from IPython.display import Audio, display

# Download NLTK tokenizer
nltk.download('punkt')
nltk.download('punkt_tab')

# Step 1: Upload and Read CSV File
print("Upload your email CSV file (must include 'Subject' and 'Content' columns):")
uploaded = files.upload()
file_name = list(uploaded.keys())[0]
emails_df = pd.read_csv(file_name)

# Normalize column names
emails_df.columns = [c.strip().lower() for c in emails_df.columns]

# Detect main content column
possible_text_cols = ["content", "body", "message", "email text", "text"]
text_col = None
for col in possible_text_cols:
    if col in emails_df.columns:
        text_col = col
        break
if text_col is None:
    raise ValueError("Could not find an email content column. Include a column like 'Content' or 'Body'.")

# Combine subject and content
emails_df["subject"] = emails_df["subject"].fillna("") if "subject" in emails_df.columns else ""
emails_df["combined"] = emails_df["subject"] + " " + emails_df[text_col].fillna("")

# Step 2: Display Uploaded Emails
print("\nUploaded Emails:\n")
for i, row in emails_df.iterrows():
    snippet = row[text_col][:100].replace('\n', ' ')
    print(f"{i+1}. Subject: {row['subject'][:60]}")
    print(f"   Preview: {snippet}...\n")

# Step 3: Input Priority Keywords
priority_keywords = input("Enter priority keywords (comma-separated): ").lower()
priority_list = [kw.strip() for kw in priority_keywords.split(",")]

# Step 4: Rank Emails Based on Keywords
vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(emails_df["combined"])
priority_vector = vectorizer.transform([" ".join(priority_list)])
scores = cosine_similarity(priority_vector, tfidf_matrix).flatten()
emails_df["score"] = scores
emails_df_sorted = emails_df.sort_values(by="score", ascending=False).reset_index(drop=True)

print("\nTop 10 priority emails based on your keywords:\n")
print(emails_df_sorted[["subject", "score"]].head(10))

# Step 5: Summarize Top N Emails
num_to_summarize = int(input("\nEnter how many top emails to summarize: "))
summaries = []
summarizer = TextRankSummarizer()

for i in range(num_to_summarize):
    email_text = emails_df_sorted.iloc[i]["combined"]
    parser = PlaintextParser.from_string(email_text, Tokenizer("english"))
    summary_sentences = summarizer(parser.document, sentences_count=3)
    summary = " ".join(str(sentence) for sentence in summary_sentences)
    summaries.append(summary)
    print(f"\nSummary of Email {i+1}:")
    print(summary)

# Step 6: Choose Number of Summaries to Convert to Speech
num_to_speak = int(input(f"\nHow many of these {num_to_summarize} summaries to convert to speech? "))

# Step 7: Choose Language for Speech
print("\nSelect language for speech output:")
languages = {
    "1": ("en", "English"),
    "2": ("hi", "Hindi"),
    "3": ("ta", "Tamil"),
    "4": ("ml", "Malayalam"),
    "5": ("kn", "Kannada"),
    "6": ("fr", "French"),
    "7": ("te", "Telugu"),
    "8": ("es", "Spanish")
}
for k, v in languages.items():
    print(f"{k}. {v[1]}")

lang_choice = input("\nEnter your choice (1-8): ").strip()
language_code, language_name = languages.get(lang_choice, ("en", "English"))
print(f"\nSelected language: {language_name}")

# Step 8: Convert Summaries to Speech
translator = Translator()
for i in range(num_to_speak):
    print(f"\nEmail {i+1} Summary:\n")
    summary_text = summaries[i]
    print(summary_text, "\n")

    # Translate if not English
    translated_text = translator.translate(summary_text, dest=language_code).text

    # Convert to speech and save
    tts = gTTS(text=translated_text, lang=language_code)
    output_file = f"email_summary_{i+1}_{language_code}.mp3"
    tts.save(output_file)

    # Play and optionally download
    display(Audio(output_file, autoplay=True))
    download_choice = input("Download this audio file? (y/n): ").strip().lower()
    if download_choice == "y":
        files.download(output_file)

print("\nAll selected emails summarized and converted successfully.")


Upload your email CSV file (must include 'Subject' and 'Content' columns):


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


Saving emails.csv to emails (10).csv

Uploaded Emails:

1. Subject: Meeting Schedule for Project Discussion
   Preview: Hello team, we have a meeting scheduled tomorrow at 10 AM to discuss the project updates and next st...

2. Subject: Important: Deadline Extended
   Preview: The submission deadline for the project has been extended to next Friday. Kindly utilize this time t...

3. Subject: Lunch Invitation
   Preview: Hey everyone, let's grab lunch together this weekend to relax and catch up after a busy week....

4. Subject: Urgent: Server Maintenance
   Preview: Attention all team members, the main server will be under maintenance tonight from 11 PM to 3 AM. Pl...

5. Subject: Weekly Progress Report
   Preview: Please submit your weekly progress reports by end of the day. This helps us keep track of ongoing de...

6. Subject: Security Alert: Password Expiration
   Preview: Your account password will expire in 2 days. Please reset it immediately to avoid access issues....

7. Subjec

Download this audio file? (y/n): y


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


All selected emails summarized and converted successfully.
