NewsPanda Weekly Pipeline

NewsPanda is a collaboration between Carnegie Mellon University and the World Wide Fund for Nature. This project aims to make extracting conservation-related news articles easier. It scrapes, classifies, and performs some post-processing analysis on these articles.

The entire pipeline is run on a weekly basis. Currently, this project focuses on news articles in India and Nepal. We are working towards supporting NewsPanda for different countries and languages.

Quickstart

Setup database.yaml.
Download pretrained model from this link and place it inside ./model as ./model/model_v0.pt. See this README for more details.
Setup Selenium driver and Google Drive authentication. (See instructions below.)
Run pipeline.sh.

Setup Selenium Driver

In order for src/parivesh_downloader.py to work properly, you will need to first have a driver executable for either Chrome or Firefox. (Note: You need to make sure that you also have the browser installed, and that the driver version is compatible with the browser version.)

Chrome: Download chromedriver here
Firefox: Download geckodriver here

Once you have downloaded your chromedriver or geckodriver, indicate the path to it using the --driver_path argument in src/parivesh_downloader.py (The default is ./geckodriver)

Setup Google Drive Authentication

In order for src/google_drive_uploader.py to work properly, you will need to set up authentication for Google Drive. You will need client_secrets.json, credentials.json, and settings.yaml in your project directory. More detailed instructions can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
model		model
parivesh-files		parivesh-files
reference-files		reference-files
src		src
weekly-news		weekly-news
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pipeline.sh		pipeline.sh
pipeline_world.sh		pipeline_world.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

parivesh-files

parivesh-files

reference-files

reference-files

src

src

weekly-news

weekly-news

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pipeline.sh

pipeline.sh

pipeline_world.sh

pipeline_world.sh

requirements.txt

requirements.txt

Repository files navigation

NewsPanda Weekly Pipeline

Quickstart

Setup Selenium Driver

Setup Google Drive Authentication

About

Releases

Packages

Languages

License

NewsPanda-WWF-CMU/weekly-pipeline

Folders and files

Latest commit

History

Repository files navigation

NewsPanda Weekly Pipeline

Quickstart

Setup Selenium Driver

Setup Google Drive Authentication

About

Resources

License

Stars

Watchers

Forks

Languages