News Scraper is a Python script that collects the latest news from Google News, extracts the full article content, title, images, and other metadata from the referenced source, and stores the data in a CSV file for easy analysis.
- Fetches the latest news from Google News
- Extracts the original article's content, title, image, and metadata
- Saves the collected data into a CSV file
- Uses a fake user-agent to avoid bot detection
- Implements retry logic for reliable scraping
Ensure you have Python installed (preferably Python 3.7+).
First, clone this repository and navigate into the project folder:
git clone https://github.com/ashwin549/NewsScraper.git
cd NewsScraperThen install the required dependencies
pip install -r requirements.txtThen run getarticles.py
python3 getarticles.pyThe script will:
- Fetch the latest news from Google News
- Extract article details (title, content, images, etc.)
- Save the results to news_data.csv
- (If the firebase credentials were replaced with your credentials, also upload it to your firestore)
- I have included a webscrapingtrial.ipynb Jupyter notebook file, which contains an example output for the code. This can be viewed from github itself for reference.
- Ensure you have an internet connection while running the script.
- Some articles may require JavaScript rendering; this script only extracts static HTML content.
- A sample text summarizer i tried was also included, credit to colombomf
- The programs also include code to upload to firebase, which you can either replace with your own firebase credentials, or just remove. It will still extract the news regardless.