Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Python scripts to process IRS 990 XML data

Work in progress. Background on the project


Read about the 990 data at the IRS's amazon page:

In short, the IRS has posted (as of March 1 2017) about 59 gigabytes of XML files that represent tax-exempt organization's Form 990 filings.

The filings are inventoried by year in JSON index files, with names like index_2012.json. The filings themselves have names like 201017793492000000_public.xml.

There are seventeen separate schemas used. Check out the mirror of the data to download just the .xsd schema information -- it also has HTML-formatted diffs.


First, download the CSV of Ledger organizations into this directory.

pip install -r requirements.txt

To output a CSV, run OneWayToGetData() in ipython after run

Alternatively, see AnotherWayToGetData() for an example of parsing a single index json as a stream.


Include tests alongside your modules by adding _test to its name.

Run tests with nosetests.

Get coverage reports for all modules by running:

nosetests --with-coverage --cover-package=`find . -name '*.py' | sed 's/^\.\///' | sed 's/\.py$//' | grep -v | paste -s -d, -`

Related tools