Scrapping data from our federal colleges, UFRN and IFRN, analysing expenses and producing some cool information for research.
Download datasets already scraped (on Kaggle): https://www.kaggle.com/mazuh69/midas-project
Our Constitution, our Law and our People demand easy access to this kind of information in an easier way to understand.
The initial idea is for me to learn and practice data science with Python and R. My current objective is to compute statistics for government employees salary. Further, more public expenses will be available for analysis.
Did it on my Ubuntu 17 with python3
(3.5), pip
and virtualenv
(from pip3
) installed.
After cloning the repository, I changed directory to it and ran:
virtualenv .
source ./bin/activate
pip install -r ./requirements.txt
Read at the end of midas_scraper.py
file which functions are being called. For a first installation,
there must be all of them not-commented. To run it:
./midas_scraper.py
That's it. Since there's no relase version yet, be careful with what you're doing and try to understand the code first.
Inspired by:
- professor Masanori and its students project that is a data scraper for USP salaries;
- an NPI/UFC project named Dados Abertos producing general statistics about federal data; and
- a Data Science Brigade project called Serenata de Amor for monitoring congressmen expenses using public money (may be similar to my future goal).
A related song:
- "Ambrosia" by Alesana.
Data sources for this prototype:
I'm already comfortable with Python basics and Object Oriented Programming. So there's my reading list:
- Web Scraping with Python by Ryan Mitchell;
- What you need to know about R by Dipanjan Sarkar & Raghav Bali;
- Data Analysis with R by Tony Fischetti.
After reading it all, I'll think about more features.
King Midas was a greek who had the gift (and curse) of transform into gold everything he touches, even his own food and family.
So be careful and don't be greedy with the public money, you're being watched.
Here in Brazil, since 2011 the law 12.527/11 specifies the constitutional right of every citizen to know better the public expenses. Therefore, this project is entirely legal.
This project is under MIT License.