News Scraper

Overview

News Scraper is a Python script that collects the latest news from Google News, extracts the full article content, title, images, and other metadata from the referenced source, and stores the data in a CSV file for easy analysis.

Features

Fetches the latest news from Google News
Extracts the original article's content, title, image, and metadata
Saves the collected data into a CSV file
Uses a fake user-agent to avoid bot detection
Implements retry logic for reliable scraping

Installation

Prerequisites

Ensure you have Python installed (preferably Python 3.7+).

Install Dependencies

First, clone this repository and navigate into the project folder:

git clone https://github.com/ashwin549/NewsScraper.git
cd NewsScraper

Then install the required dependencies

pip install -r requirements.txt

Then run getarticles.py

python3 getarticles.py

The script will:

Fetch the latest news from Google News
Extract article details (title, content, images, etc.)
Save the results to news_data.csv
(If the firebase credentials were replaced with your credentials, also upload it to your firestore)

Notes

I have included a webscrapingtrial.ipynb Jupyter notebook file, which contains an example output for the code. This can be viewed from github itself for reference.
Ensure you have an internet connection while running the script.
Some articles may require JavaScript rendering; this script only extracts static HTML content.
A sample text summarizer i tried was also included, credit to colombomf
The programs also include code to upload to firebase, which you can either replace with your own firebase credentials, or just remove. It will still extract the news regardless.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
README.md		README.md
fakenewsmodel.py		fakenewsmodel.py
firebaseuploader.ipynb		firebaseuploader.ipynb
getarticles.py		getarticles.py
news.csv		news.csv
requirements.txt		requirements.txt
summariser.ipynb		summariser.ipynb
webscrapingtrials.ipynb		webscrapingtrials.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Scraper

Overview

Features

Installation

Prerequisites

Install Dependencies

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News Scraper

Overview

Features

Installation

Prerequisites

Install Dependencies

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages