A crawler that scrapes news from various nepali news portals.
Clone the project
git clone https://github.com/Sagyam/Nepali-News-Scraper
Go to the project directory
cd Nepali-News-Scraper
Create a virtual enviroment
virtualenv venv
Activate the virtual enviroment
For Windows
venv\Scripts\activate
For Linux / OSX
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
Open the respective config.py file change parameters as needed
Start Crawling Online Khabar
cd scraper
scrapy crawl online_khabar
Start Crawling Ratopati
cd ratopati
scrapy crawl ratopati_spider
Start Crawling Setopati
cd setopati
scrapy crawl setopati_spider
Start Crawling Gorkhapatra Online
cd gorkhapatra
scrapy crawl gorkhapatra_spider
Start Crawling Ekantipur
cd ekantipur
scrapy crawl ekantipur_spider
- To pause gracefully crawling hit Ctrl+C once
- To pause forcefully hit Ctrl+C twice
- For OnlineKhabar it is recommended to scrape 2000 pages in one go
Tested using:
- Python 3.7.3
- Scrapy 2.5.0
- BeautifulSoup 4.9.3
Here are some related projects
If you have any feedback, please reach out to me at sagyamthapa32@gmail.com