Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRS 990 2019 index file discrepancy #651

Closed
briancherron opened this issue Aug 28, 2020 · 6 comments
Closed

IRS 990 2019 index file discrepancy #651

briancherron opened this issue Aug 28, 2020 · 6 comments

Comments

@briancherron
Copy link

While doing some analysis on 990 filings, I noticed a discrepancy between the number of filings in the 2019 CSV and JSON index files. It appears that the CSV index file has 416,880 while the JSON has 396,217. The CSV file also looks to have been updated much more recently than the JSON file (4/2020 vs 12/2019). I have not checked the index files for other years, though there may be conflicting counts there as well.

Wasn't sure if this is the best place to report it, but the ~20K difference seemed pretty significant. I haven't done any additional analysis yet to rule out something like duplicate records - figured I'd start here. Happy to lend a hand if I can help in any way.

@pschmied
Copy link
Contributor

Hi Brian, thank you for reporting! The majority of datasets here are managed by individual third parties (in this case the IRS). For most datasets there is a contact listed. In this case, however, it may be a little tricky for you to report to the IRS.

I have contact information, and will make sure that your report gets to them. I'm not sure if the data provider is set up to interact here on Github, but I'll follow up in this thread with what I hear back.

@briancherron
Copy link
Author

Right on, thank you Peter!

We will likely do some more analysis of the other 2013-2020 index files to see if this issue arises in other years. Would it be helpful at all to share a summary of those findings here?

@pschmied
Copy link
Contributor

I'll happily pass along any other issues that you find!

@briancherron
Copy link
Author

Quick update - I ran some scripts against the index files. It looks like the ~20K extra returns are found in the 2019 CSV index file and were submitted submitted between 12/11/2019 and 12/30/2019. However, the JSON index file stops at 12/10/2019.

This does not seem to be an issue for prior years, though I did notice that the 2020 CSV index file was updated today (8/31/2020) but the date on the JSON file is from 8/11/2020.

@pschmied
Copy link
Contributor

Much obliged—I'm sure that will be useful in for them! Sounds like maybe the processes just happen at slightly different times and maybe the end of 2019 just didn't get updated. Will pass this along.

@briancherron
Copy link
Author

Looks like the IRS finally caught the 2019 JSON index file up with the 2019 CSV index file. Our weekly import process picked up 20,693 new filings from the 2019 JSON index this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants