# GenAI Internship Project: Automated Text Summarization and Sentiment Analysis
**Author:** [Souraj Saha]
**Domain:** Generative AI & NLP

In [1]:
pip install transformers torch pandas



In [5]:
results = []

print("Processing texts...")

for text in data:
    # 1. Generate Summary (The GenAI part)
    # Calculate dynamic max_length and min_length to avoid summaries longer than input
    word_count = len(text.split())
    dynamic_max_length = min(30, max(1, word_count - 1)) # Cap at 30, ensure at least 1, and shorter than original if possible
    dynamic_min_length = max(10, min(1, word_count // 2)) # At least 10, but not more than half the original, and at least 1.

    # Ensure max_length is always greater than or equal to min_length
    if dynamic_max_length < dynamic_min_length:
        dynamic_max_length = dynamic_min_length

    summary_result = summarizer(text, max_length=dynamic_max_length, min_length=dynamic_min_length, do_sample=False)
    summary_text = summary_result[0]['summary_text']

    # 2. Analyze Sentiment
    sentiment_result = sentiment_analyzer(text)
    sentiment_label = sentiment_result[0]['label']
    sentiment_score = sentiment_result[0]['score']

    # Store results
    results.append({
        "Original Text": text,
        "AI Summary": summary_text,
        "Sentiment": sentiment_label,
        "Confidence Score": f"{sentiment_score:.2f}"
    })

# Convert results to a clean table
final_df = pd.DataFrame(results)

print("Processing Complete! Here are the results:")
display(final_df)

Processing texts...


Your max_length is set to 30, but your input_length is only 24. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=12)


Processing Complete! Here are the results:


Unnamed: 0,Original Text,AI Summary,Sentiment,Confidence Score
0,I absolutely love this new laptop! The screen ...,I absolutely love this new laptop! The screen...,POSITIVE,1.0
1,The product was a total disappointment. It arr...,The product was a total disappointment . It a...,NEGATIVE,1.0
2,It's a decent phone for the price. The camera ...,"The camera is good, but the software is a bit...",NEGATIVE,0.98
3,The customer service was amazing. They helped ...,The customer service was amazing. They helped...,POSITIVE,1.0
4,I waited three weeks for delivery. When it arr...,I waited three weeks for delivery. When it ar...,NEGATIVE,0.99


In [4]:
data = [
    "I absolutely love this new laptop! The screen is incredibly sharp, and the battery life lasts me an entire day of work. I would highly recommend this to anyone.",
    "The product was a total disappointment. It arrived late, the packaging was damaged, and the device overheats within 10 minutes of use. I am returning it immediately.",
    "It's a decent phone for the price. The camera is good, but the software is a bit buggy. I think it's okay for a budget device, but don't expect premium features.",
    "The customer service was amazing. They helped me solve my issue in less than 5 minutes. The product itself works exactly as advertised. Very happy!",
    "I waited three weeks for delivery. When it arrived, it didn't even turn on. Terrible experience."
]

df = pd.DataFrame(data, columns=["Original Text"])
print("Data loaded:")
display(df)

Data loaded:


Unnamed: 0,Original Text
0,I absolutely love this new laptop! The screen ...
1,The product was a total disappointment. It arr...
2,It's a decent phone for the price. The camera ...
3,The customer service was amazing. They helped ...
4,I waited three weeks for delivery. When it arr...


## Methodology
We are utilizing Hugging Face `transformers` library to implement two NLP tasks:
1. **Summarization:** Using the `distilbart-cnn-12-6` model to generate concise summaries of long reviews.
2. **Sentiment Analysis:** Using `distilbert` to classify feedback as Positive or Negative.

In [3]:
print("Loading models... this might take a minute.")
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
print("Models loaded successfully!")

Loading models... this might take a minute.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


Models loaded successfully!


In [2]:
from transformers import pipeline
import pandas as pd

## Conclusion
This project demonstrates the capability of Large Language Models (LLMs) to automate customer feedback analysis. By combining summarization and sentiment analysis, businesses can quickly process large volumes of text data to derive actionable insights.