# Reddit Summarizer

The version of reddit summarizer to print the summary of a given post using a url

Evaluation metrices like rouge score and bert score couldn't be used for this

*   Evaluation metrices like rouge score and bert score couldn't be used for this project because there are no reference summary present for every post to compare the output to.
*   So, evaluation metrices like semantic similarity, extractive analysis, and model readebility is evaluated.






In [1]:
!pip install praw
!pip install transformers
!pip install rouge-score
!pip install bert-score
!pip install sentencepiece
!pip install torch
!pip install --upgrade asyncpraw
!pip install textstat



In [3]:
import asyncio
import asyncpraw
import textstat
import nest_asyncio
from transformers import pipeline
from transformers import BartTokenizer
from transformers import pipeline
from transformers import BartTokenizer, pipeline
from sentence_transformers import SentenceTransformer, util
from difflib import SequenceMatcher


async def setup_reddit():
    return asyncpraw.Reddit(
        client_id="        client_secret="        user_agent="    )

def preprocess_text(title, selftext, comments):
    text = f"{title}\n{selftext}\n" + "\n".join(
        c.body for c in comments if isinstance(c, asyncpraw.models.Comment)
    )
    return text.strip()

def summarize_text(text):
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
    summarizer = pipeline("summarization",
                         model="facebook/bart-large-cnn",
                         tokenizer=tokenizer)

    inputs = tokenizer(
        text,
        max_length=1024,
        truncation=True,
        return_tensors="pt"
    )

    summary_ids = summarizer.model.generate(
        inputs.input_ids,
        max_length=130,
        min_length=30,
        num_beams=4,
        early_stopping=True
    )

    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

def semantic_evaluation(summary, original_text):
    model = SentenceTransformer('all-mpnet-base-v2')

    # Generate embeddings
    summary_embedding = model.encode(summary, convert_to_tensor=True)
    original_embedding = model.encode(original_text, convert_to_tensor=True)

    # Calculate cosine similarity
    similarity = util.pytorch_cos_sim(summary_embedding, original_embedding).item()
    return similarity

def extractive_fraction(summary, original_text):
    matcher = SequenceMatcher(None, summary.lower(), original_text.lower())
    matching_blocks = matcher.get_matching_blocks()

    total_matching = sum(block.size for block in matching_blocks)
    return total_matching / len(summary)

def readability_metrics(summary):
    return {
        "flesch_reading_ease": textstat.flesch_reading_ease(summary),
        "smog_index": textstat.smog_index(summary),
        "coleman_liau": textstat.coleman_liau_index(summary)
    }

def comprehensive_evaluation(summary, original_text):
    print("\n=== Automated Evaluation ===")

    sim_score = semantic_evaluation(summary, original_text)
    print(f"Semantic Similarity: {sim_score:.4f}")

    extr_score = extractive_fraction(summary, original_text)
    print(f"Extractive Fragment Score: {extr_score:.4f}")

    readability = readability_metrics(summary)
    print("\nReadability:")
    for metric, value in readability.items():
        print(f"- {metric}: {value:.2f}")

async def process_post(url):
    reddit = await setup_reddit()
    try:
        submission = await reddit.submission(url=url)
        await submission.load()

        # Get comments with error handling
        try:
            await submission.comments.replace_more(limit=0)
            comments = submission.comments.list()[:50]  # Top 50 comments
        except Exception as e:
            print(f"Error loading comments: {str(e)}")
            comments = []

        # Build text content
        text = preprocess_text(submission.title, submission.selftext, comments)

        if not text.strip():
            return None, "No content to summarize"

        summary = summarize_text(text)
        return summary, text

    finally:
        await reddit.close()

async def main():
    url = "https://www.reddit.com/r/india/comments/1jqe90g/i_am_a_trade_professional_in_the_government_of/"

    # Get results
    summary, original_text = await process_post(url)

    if summary:
        print("\nGenerated Summary:")
        print("-"*40)
        print(summary)

        comprehensive_evaluation(summary, original_text)

        print("\nOriginal Text Preview:")
        print("-"*40)
        print(original_text[:500] + "...")  # Show first 500 characters
    else:
        print("Failed to generate summary")

if __name__ == "__main__":
    try:
        asyncio.run(main())
    except RuntimeError:
        nest_asyncio.apply()
        asyncio.run(main())


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Device set to use cpu



Generated Summary:
----------------------------------------
I am a trade professional in the government of India. Here's what you need to know about how US tariffs will affect you. US is the top location for remittances to India (NRIs sending money back to families in India), this will see a decline.

=== Automated Evaluation ===


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Semantic Similarity: 0.7802
Extractive Fragment Score: 1.0000

Readability:
- flesch_reading_ease: 64.71
- smog_index: 12.50
- coleman_liau: 6.60

Original Text Preview:
----------------------------------------
I am a trade professional in the government of India. Here's what you need to know about how US tariffs will affect you. AMA
[deleted]
Thank you for sharing your insights. Post like this are why I love Reddit.
What happens when the prices sky rockets for US consumers suddenly?
- Reduced spending
- Reduced demand
- Possible recession



What happens when the US market goes into recession?
- Services (like IT) which haven’t been touched yet, will be affected (due to lack of demand)
- Can lead the ...
