GitHub - Nityasav/Textsummarizer: Summarizes a large chunk of text into a smaller part which helps with saving time and also condensing information.

This project uses Python and the Hugging Face transformers library to summarize online articles. It leverages a pre-trained summarization pipeline and web scraping with BeautifulSoup to extract text from a URL, process it, and generate concise summaries.

Features

Scrapes the text content from an article using BeautifulSoup. Processes and organizes the text into manageable chunks for summarization. Summarizes text efficiently using Hugging Face's summarization pipeline. Outputs the summary to a text file (blogsummry.txt). Requirements Python 3.7+ Required Python Libraries: transformers beautifulsoup4 requests Install dependencies with:

Copy code:

pip install transformers beautifulsoup4 requests

How It Works:

Initialize Summarizer The Hugging Face pipeline for summarization is initialized to handle the summarization tasks.

Web Scraping:

The program fetches content from a specified URL using requests. BeautifulSoup parses the HTML and extracts text from h1 and p tags. Text Processing

The extracted text is processed into chunks of less than 500 words. This ensures compatibility with the summarizer’s input limits. Summarization

Each chunk is summarized using the pre-trained summarization pipeline. All summaries are combined into a single cohesive summary. Output The final summary is saved to a text file named blogsummry.txt.

How to Run:

Replace the URL variable with the link to the article you want to summarize:

python Copy code URL = "https://example.com/article"

Copy code:

python summarizer.py View the summary in the generated blogsummry.txt file.

Customization:

Change Summarization Model: Modify the summarization pipeline to use different models from Hugging Face’s library if needed. Be aware of storage and compatibility.

Adjust Summary Length:

Update the max_length and min_length parameters in the summarizer call to control the length of the summary.

Limitations: Scraper Specificity:

This script targets articles with h1 and p tags. It might need adjustments for differently structured websites.

Text Length:

Articles exceeding the maximum token limit may require further splitting or alternate summarization methods.

Acknowledgments

Built with Hugging Face's transformers library. Uses BeautifulSoup for HTML parsing and requests for fetching web content.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
textsummarizing.py		textsummarizing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Nityasav/Textsummarizer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages