em-quem-votar

🔎 Web scraping to aid brazilian voters make well-informed decisions

The idea:

The idea is to find stories on politicians that will run for the 2018 brazil election
- and give each story a rating based on how strongly it may indicate that this politician is involved with anything ilegal
- and present this 'dossier' as clearly as possible for a regular citizen that wants to catch up on that politician's activity

The scraper
- Write scrapy spiders for each trust-worthy news website (i.e a site that wont publish fake news)
- Run each spider with a candidate name as input
  - Each spider will produce a candidate_newspaper.json file with the scraped material
How to rate stories as positive or negative?
- This is where it gets tricky. When scraping for stories on a candidate, how can we be sure that:
  1. This story is actually about that candidate and the candidate is not just mentioned.
  2. This story tells something good about that candidate.
  3. This story tells something bad about that candidate.
- It is possible that a solution for this is feasible through NLP (tagging). Simple detection of a politician name and 'corruption' or other inciriminating words leads to too many false positives.
The presentation
- A web interface will be used to present the data as clearly as possible.
- I am not sure whether an MVC framework is necessary. Even though a large amount of data must be presented, it is not going to be created or modified by the viewer.
- PyFlask is being considered as the framework for the web interface.
- Mockup
Home page showing list of candidates

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scrapy-project/newspapers		scrapy-project/newspapers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Twisted-17.9.0-cp36-cp36m-win32.whl		Twisted-17.9.0-cp36-cp36m-win32.whl
mockup.png		mockup.png
requirements.txt		requirements.txt