R
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
2013-07-15
2014-03-16
.gitignore
README.Rmd
README.html
README.md

README.md

Inventory of USAspending.gov Data

Location of Archive Files

Start here: http://www.usaspending.gov/data
Select: Archives tab

I normally search for All Agencies for a specific fiscal year (2000-2014) and spending type (Contracts, Direct Payments, Grants, Insurance, Loans, Other).

Files are normally updated near the middle of the month. While most additions and updates are for the most recent fiscal year, a number of changes are made to fiscal years sometimes back to the beginning of the archive.

The Archives have complete files and incremental files with changes made during the last month. The goal is to analyze the monthly incremental files, but for now I'm studying the "big picture" using the complete files captured every few months.

In recent months, files from 2003 to 2014 have some updates every month. As recently as July 2013, all files back to 2000 were updated.

My original intent was to obtain 10 years of data to look a trends, but when I found the data archive only went back to 2000, I decided to grab and study all years of data.

Scripts

A pattern in the filenames makes writing a script to download the files fairly easy.

Studying the filenames on the Archives tab reveals the pattern. For example, the file with FY 2014 Contracts data is:

2014_All_Contracts_Full_20140316.csv.zip  

The filename pattern can be deduced to be:

<fiscalyear>_All_<spendingtype>_Full_<releasedate>.csv.zip 

with fiscalyear ranging from 2000 to 2014, spendingtype one of the six spending types, and releasedate in the format YYYYMMDD, which is usually near the middle of the month.

Each of the downloads below have slightly different versions of these scripts. Look for comments like ##### m of n along the right margin of the R scripts to identify lines that possibly need to be changed with a new update.

The scripts must often be restarted to get around a variety of problems in the data or with the Internet connection. The goal is to develop a script that is a defined and repeatable process for working with the data.

0-graball.R

Downloading the current release of all files results in about 11 GB of .zip files that expand into about 72 GB of .csv raw data files. The complete files with changed records had a total of over 62 million records.

The script records md5sums for all .zip and .csv files to determine easily if files have changed from the past.

I run the script only during late evening or early morning hours when the server likely has extra capacity. The download time for me is about 6 hours, but that time can vary considerably.

1-FedSpendingFirstLook.R

The "first look" uses R's count.fields function to verify that all records in a file have the same number of fields in each line. The script writes two summary files showing the number of fields and number of records in each file.

The script for the current release takes over 2 hours.

The parsing check was added after encountering a number of problems in past downloads because of poor data quality. The run from March 16 shows a possible problem in parsing the 2011 Loans data. This has not yet been investigated.

16 March 2014 Download

An R script can be used to download archive data for a specified set of years and spending types.

Data for 2003-2014

d1 <- read.csv("2014-03-16/FedSpending-RecordCounts.csv")
names(d1)[1] <- "FiscalYear"
library(xtable)
xd1 <- xtable(d1, digits = 0)
print(xd1, format.args = list(big.mark = ","), type = "html")
FiscalYear Contracts DirectPayments Grants Insurance Loans Others
1 2003 1,183,739 588,868 714,741 14,734 0 1,359
2 2004 2,001,677 578,146 746,182 14,895 0 724
3 2005 2,921,828 493,433 708,084 14,456 0 1,609
4 2006 3,795,532 658,747 482,701 14,516 0 161
5 2007 4,110,469 292,406 379,270 11,109 230,058 3,024
6 2008 4,503,746 1,050,682 395,640 12,095 220,531 5,028
7 2009 3,493,168 1,542,045 580,429 18,512 416,629 3,835
8 2010 3,531,404 2,902,512 654,930 21,724 970,503 17,785
9 2011 3,383,438 2,505,670 543,568 84,009 1,252,483 45,208
10 2012 3,100,519 2,155,291 512,335 90,864 320,795 83,863
11 2013 2,480,526 2,142,292 557,882 99,110 259,712 68,178
12 2014 597,330 1,152,954 152,964 49,918 121,851 26,047
13 TOTAL 35,103,376 16,063,046 6,428,726 445,942 3,792,562 256,821

15 July 2013 Download

Data for 2000-2013 -- the last time Fiscal Years 2000-2002 were updated.

d2 <- read.csv("2013-07-15/FedSpending-RecordCounts.csv")
names(d2)[1] <- "FiscalYear"
xd2 <- xtable(d2, digits = 0)
print(xd2, format.args = list(big.mark = ","), type = "html")
FiscalYear Contracts DirectPayments Grants Insurance Loans Others
1 2000 594,426 486,313 552,428 20,359 0 729
2 2001 641,840 573,818 549,379 16,833 1 295
3 2002 830,363 532,286 591,117 15,325 0 2,593
4 2003 1,183,512 588,868 714,729 14,734 0 1,359
5 2004 2,001,477 578,146 746,136 14,895 0 724
6 2005 2,921,547 493,433 708,042 14,456 0 1,609
7 2006 3,795,297 658,747 482,618 14,516 0 161
8 2007 4,110,093 292,416 379,059 11,109 230,050 3,024
9 2008 4,501,806 1,050,641 394,974 12,095 220,508 5,025
10 2009 3,491,122 1,542,020 579,687 18,512 416,614 3,835
11 2010 3,531,586 2,902,484 651,679 21,724 970,530 17,785
12 2011 3,381,808 2,505,691 539,568 84,007 1,255,491 45,206
13 2012 3,092,747 2,177,933 511,077 90,862 323,030 83,858
14 2013 1,398,174 1,703,244 338,248 66,144 188,960 33,224
15 TOTAL 35,475,798 16,086,040 7,738,741 415,571 3,605,184 199,427

Change from July 2013 to March 2014

We need to modify both data.frames to have the same year rows:

d1 <- d1[-nrow(d1), ]
d1 <- rbind(d2[1:3, ], d1)

d2 <- d2[-nrow(d2), ]  # get rid of total row  
d2[15, ] <- c(2014, rep(0, 6))  # add zeroes for 2014
## Warning: invalid factor level, NA generated
Diffs <- d1 - d2
## Warning: - not meaningful for factors
Diffs$FiscalYear <- d1$FiscalYear

This table shows the changes in the number of records by fiscal year by type of spending from July 15, 2013 to March 16, 2014 in USAspending.gov:

xDiffs <- xtable(Diffs, digits = 0)
print(xDiffs, format.args = list(big.mark = ","), type = "html")
FiscalYear Contracts DirectPayments Grants Insurance Loans Others
1 2000 0 0 0 0 0 0
2 2001 0 0 0 0 0 0
3 2002 0 0 0 0 0 0
4 2003 227 0 12 0 0 0
5 2004 200 0 46 0 0 0
6 2005 281 0 42 0 0 0
7 2006 235 0 83 0 0 0
8 2007 376 -10 211 0 8 0
9 2008 1,940 41 666 0 23 3
10 2009 2,046 25 742 0 15 0
11 2010 -182 28 3,251 0 -27 0
12 2011 1,630 -21 4,000 2 -3,008 2
13 2012 7,772 -22,642 1,258 2 -2,235 5
14 2013 1,082,352 439,048 219,634 32,966 70,752 34,954
15 2014 597,330 1,152,954 152,964 49,918 121,851 26,047

Observations:

  1. The number of records for all spending types has not changed for fiscal years 2000 to 2002 since July 2013.
  2. The reason for a net decrease in certain records is unclear, e.g., 22,642 fewer DirectPayments in FY 2012, or over 5000 fewer loans in FY 2011 and FY 2012.
  3. The reason for changes in past fiscal years more than a few years back is unclear. There were hundreds of changes in contract records for FY 2003 to 2007, and with the exception of 2010, there were thousands of changes for FY 2008 to 2012 in contract counts.
  4. A chart of the changes by spending type by month might be interesting.
  5. It's unclear what changes will be happening in the next few months due to the OMB mandate to improve data quality in USAspending.gov.

WatchdogLabs.org Article

Inventory of USAspending.gov Data, WatchdogLabs.org, March 23, 2014.