# Unit 1 Project: Chat Log Summarizer (33)
**Student Name**: A Shri Karthik  
**Category**: Business & Education Summarization

---

## 1. setup and imports



In [3]:
from transformers import pipeline, logging
import textwrap

def print_project_header(title):
    print(f"\n{'='*80}")
    print(f"{title.center(80)}")
    print(f"{'='*80}")

## 2. Initialize the Specialized Chat summarizer and simulated input chat log

In [6]:
model_name = "philschmid/bart-large-cnn-samsum"
chat_summarizer = pipeline("summarization", model=model_name, device=-1)

chat_log = """
Abhishek: bro did you do the assignment

Karthik: which one

Abhishek: the Generative AI one

Karthik: there are FOUR assignments with AI in the name

Abhishek: ok fair. the one due today

Karthik: today as in… today today?

Abhishek: yes. midnight. scary midnight.

Karthik: im just starting it.

Abhishek: lol. same.

Karthik: what even is it about

Abhishek: something about encoders, decoders, and why transformers ruined my sleep schedule

Karthik: oh that. i watched half the lecture at 2x speed

Abhishek: and understood?

Karthik: emotionally, no. academically, also no.

Abhishek: do we have notes?

Karthik: Aditya said he’ll upload them “soon”

Abhishek: define soon

Karthik: soon like “after dinner” but dinner never ends

Abhishek: classic.

Karthik: should we just submit vibes

Abhishek: i think that’s what the model would want
"""

Device set to use cpu


## 3. Generate the Summary

In [7]:
summary_output = chat_summarizer(
    chat_log,
    max_length=50,
    min_length=15,
    do_sample=False
)

print_project_header("PROJECT #33: CHAT LOG SUMMARIZER")
print(f"\n[ORIGINAL CHAT LOG]:\n{chat_log}")

print("-" * 80)
print("[GENERATED SUMMARY]:")
final_summary = summary_output[0]['summary_text']
print(textwrap.fill(final_summary, width=80))
print("-" * 80)


                        PROJECT #33: CHAT LOG SUMMARIZER                        

[ORIGINAL CHAT LOG]:

Abhishek: bro did you do the assignment

Karthik: which one

Abhishek: the Generative AI one

Karthik: there are FOUR assignments with AI in the name

Abhishek: ok fair. the one due today

Karthik: today as in… today today?

Abhishek: yes. midnight. scary midnight.

Karthik: im just starting it.

Abhishek: lol. same.

Karthik: what even is it about

Abhishek: something about encoders, decoders, and why transformers ruined my sleep schedule

Karthik: oh that. i watched half the lecture at 2x speed

Abhishek: and understood?

Karthik: emotionally, no. academically, also no.

Abhishek: do we have notes?

Karthik: Aditya said he’ll upload them “soon”

Abhishek: define soon

Karthik: soon like “after dinner” but dinner never ends

Abhishek: classic.

Karthik: should we just submit vibes

Abhishek: i think that’s what the model would want

-------------------------------------------------



## 1. Project Overview
This project addresses the challenge of summarizing informal, multi-turn dialogue. Chat logs are inherently "noisy"—they contain fragments, slang, and overlapping speakers. Our goal is to take these input sequences and generate a shorter, coherent output string that retains the core meaning.

## 2. Methodology
We implemented a **Sequence-to-Sequence (seq2seq)** pipeline using the Hugging Face `transformers` library.

### Model Selection: `philschmid/bart-large-cnn-samsum`
* **Architecture**: This is an **Encoder-Decoder** model. The Encoder processes the input bidirectional context, and the Decoder generates the summary one token at a time.
* **Specialization**: Unlike the "base" BART used in our benchmark, this model is fine-tuned on the **SAMSum dataset**. This dataset consists of messenger-style dialogues, allowing the model to handle conversational semantics rather than producing "rubbish".


## 3. Observations
* **Semantic Understanding**: The model successfully differentiated between participants and identified the "action items" within the chat.
* **Coherence**: The output is a grammatically correct narrative, which is very different from the incoherent outputs from the base models used in previous handson experiments.
* **Efficiency**: The `pipeline` abstraction allows for high-speed inference even on a standard CPU.