Skip to content

Efter-26/Scraping-News-Website-using-FastAPI-with-Python-language

Repository files navigation

Project Title: News Scraping and Summarization

Table of Contents

1. Project Overview

This project is designed to scrape news articles from specified websites, store the data in a MySQL database, generate concise summaries using a Large Language Model (LLM), and provide a user-friendly interface via Streamlit. It aims to make news consumption easier by summarizing articles and presenting them in a simple, digestible format.

2. Features

  • Web Scraping: Collect news articles from various websites using FastAPI.
  • Database Management: Store and manage news articles in a MySQL database.
  • Summary Generation: Use an LLM to generate concise summaries of the articles.
  • User Interface: Provide a simple UI for viewing news and summaries using Streamlit.
  • Real-Time Updates: Periodically update the news database with the latest articles.

3. Tech Stack

  • Backend:
    • FastAPI: For creating the RESTful API using Python.
  • Database
    • MySQL: For storing news articles.
  • Machine Learning:
    • LLM Model: For generating summaries of the articles.
  • Frontend:
    • Streamlit: For creating the user interface.
    • Python: For scripting and automation.

4. Requirements

Software:

  • Python 3.8 or higher
  • MySQL Server
  • Docker (optional, for containerization)

Python Packages:

  • fastapi
  • uvicorn
  • mysql-connector-python
  • sqlalchemy
  • groq (depending on the LLM)
  • streamlit
  • requests
  • pydantic

System:

  • OS: Windows
  • Hardware: Minimum 8 GB RAM, 4 CPU cores

Installation and Setup

Follow the steps below to set up the project on your local machine:

Step 1: Create the Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Set Up MySQL Database

Install MySQL Server and create a database. Create a .env file in the root directory with your MySQL credentials:

DB_HOST=localhost
DB_USER=youruser
DB_PASSWORD=yourpassword
DB_NAME=newsdb

Step 4: Run the Application

Backend (FastAPI)

run main.py

Frontend (Streamlit)

streamlit run Home.py

Step 5: Access the Application

Open your browser and navigate to http://localhost:8501 to view the Streamlit UI. But as it set on local server it can't be view in publicly that's why I have attached the screenshots in below. For Api Documentation run code and navigate to http://localhost/8011/endpoints#

6. Screenshots

Homepage: Display the news feed and summaries.

Home

Scraping Status: Show the current scraping process.

scrapping

Database View: Overview of stored articles.

database

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages