Disclosed.ca is an open data initiative for the Canadian Government. In 2004 the Government announced a new policy on the mandatory publication of contracts over $10,000. Each government agency publishes this data on a quarterly basis. Here is an example for Environment Canada: http://www.ec.gc.ca/contracts-contrats/index.cfm?lang=En&state=reports.
This project scrapes third-party contract information from the Proactive Disclosure websites of all 80 government agencies.
The goal is to promote transparency and accountability in the Canadian Government. We make it easy for journalists and academics to access third party contract information, by aggregating the proactive disclosure data on one website.
There are 3 ways to access the data:
What data is available?
The format of the contract data is dictated by these guidelines
What other data are you planning to make available?
The Proactive Disclosure Act requires every agency to publish:
- Grants and Contribution Awards over $25000
- Completed Access to Information Requests titles only - not the actual report :(
- Travel and Hospitality Expenses for Employees
- Annual Expenditures for Travel, Hospitality Conferences
- Position Reclassifications
Wait, I thought the government already has an Open Data initiative!
Yes, the open data website currently publishes 209,183 data sets. But there are a few problems:
- Incomplete data sets. For example, searching for 'contracts' only yields data for 3/80 agencies link.
- Too many data formats. Data is served as CSV, PDF, XML, XLS, TXT and even JPEG.
- Difficult for non-technical people to view the data.
Help Wanted: Adopt a Scraper
You can help out by writing a scraper for the contracts data. Here is a list of all the scrapers that need to be written: https://github.com/disclosed/disclosed_app/milestones/Kickstart%20Ruby%20scrapers
Running the tests
bundle exec guard
Running the scraper
You will be prompted for the agency name, report, etc.
Backing up entire data set
Creates a .sql file in
Loading a data dump
.sql dump file into the
tmp folder. Your file name must end in
This will show you a list of all dump files available to be loaded from the