Disclosed.ca is an open data initiative for the Canadian Government. In 2004 the Government announced a new policy on the mandatory publication of contracts over $10,000. Each government agency publishes this data on a quarterly basis. Here is an example for Environment Canada: http://www.ec.gc.ca/contracts-contrats/index.cfm?lang=En&state=reports.
This project scrapes third-party contract information from the Proactive Disclosure websites of all 80 government agencies.
The goal is to promote transparency and accountability in the Canadian Government. We make it easy for journalists and academics to access third party contract information, by aggregating the proactive disclosure data on one website.
There are 3 ways to access the data:
- Search engine: http://disclosed.ca
- CSV downloads (coming soon): http://disclosed.ca/datasets
The format of the contract data is dictated by these guidelines
The Proactive Disclosure Act requires every agency to publish:
- Grants and Contribution Awards over $25000
- Completed Access to Information Requests titles only - not the actual report :(
- Travel and Hospitality Expenses for Employees
- Annual Expenditures for Travel, Hospitality Conferences
- Position Reclassifications
Yes, the open data website currently publishes 209,183 data sets. But there are a few problems:
- Incomplete data sets. For example, searching for 'contracts' only yields data for 3/80 agencies link.
- Too many data formats. Data is served as CSV, PDF, XML, XLS, TXT and even JPEG.
- Difficult for non-technical people to view the data.
You can help out by writing a scraper for the contracts data. Here is a list of all the scrapers that need to be written: https://github.com/disclosed/disclosed_app/milestones/Kickstart%20Ruby%20scrapers
bundle exec guard
rake contracts:scrape
You will be prompted for the agency name, report, etc.
Creates a .sql file in tmp
.
rake db:data:dump
Download a .sql
dump file into the tmp
folder. Your file name must end in *_disclosed_backup.sql
rake db:data:load
This will show you a list of all dump files available to be loaded from the tmp
folder.