Crowdsourcing the liberation of data trapped in documents!
CrowData is a tool to collaborate on the verification or release of data that otherwise would be hard or impossible to get via automatic tools.
But the outcome of Crowdata is more than only to extract data. With Crowdata you can work with your community on a data set. They can navigate the data, help to extract it via a game and make comments on information that may be interesting to look at by journalists.
When to use Crowdata?
- VozData is a website from La Nacion in Argentina to convert scanned PDF documents from senate spendings into an usable dataset. Collaborating to free data from PDFs.
.. toctree:: :maxdepth: 2 technical install schema how to use it
Crowdata was inspired by the project from ProPublica called Free the Files and The Guardian MP’s Expenses and Sarah Palin’s Emails. It was born from a need that La Nacion had to transform scanned image PDFs into a comprehensible and structured dataset, and ask for their community's help to catalog those expenses that call their attention.
Here are some of the projects that do the same for some specific cases.
'Crowdata' is an open source project that was born when Manuel Aristaran was an Open News fellow at La Nacion in 2013. It was finally released as free software when Gabriela Rodriguez continued it for VozData in 2014. Thanks to Cristian Bertelegni and La Nacion for contributing to the code.
Now it relies on contributions from people and organizations. Please, use it, comment on it and make improvements by pull requests in GitHub.
- Fork the repo
- Clone your fork
- Make a branch of your changes
- Make a pull request through GitHub, and clearly describe your changes