Skip to content

Sagyam/Nepali-News-Scraper

Repository files navigation

Logo GitHub wakatime

Nepali News Scraper

A crawler that scrapes news from various nepali news portals.

Run Locally

Clone the project

  git clone https://github.com/Sagyam/Nepali-News-Scraper

Go to the project directory

  cd Nepali-News-Scraper

Create a virtual enviroment

 virtualenv venv

Activate the virtual enviroment

For Windows

 venv\Scripts\activate

For Linux / OSX

source venv/bin/activate

Install dependencies

  pip install -r requirements.txt

Open the respective config.py file change parameters as needed

Start Crawling Online Khabar

  cd scraper
  scrapy crawl online_khabar

Start Crawling Ratopati

  cd ratopati
  scrapy crawl ratopati_spider

Start Crawling Setopati

  cd setopati
  scrapy crawl setopati_spider

Start Crawling Gorkhapatra Online

⚠️ Caution: Must read Gorkhapatra config file before scraping⚠️

  cd gorkhapatra
  scrapy crawl gorkhapatra_spider

Start Crawling Ekantipur

  cd ekantipur
  scrapy crawl ekantipur_spider

Caution ⚠️

  • To pause gracefully crawling hit Ctrl+C once
  • To pause forcefully hit Ctrl+C twice
  • For OnlineKhabar it is recommended to scrape 2000 pages in one go

Tech Stack

Tested using:

  • Python 3.7.3
  • Scrapy 2.5.0
  • BeautifulSoup 4.9.3

Related

Here are some related projects

Feedback

If you have any feedback, please reach out to me at sagyamthapa32@gmail.com