# SBA DATA Act Pilot: JAAMS + Prism

The code below uses the JAAMS and Prism test data supplied by SBA to create a single file that contains the required DATA Act elements. This is a first pass, and this notebook tracks the process, assumptions, and outstanding questions.

In [184]:
#style up the notebook
#from IPython.core.display import HTML
#css_file = 'assets/css/notebook-style.css'
#HTML(open(css_file, "r").read())

In [185]:
import pandas as pd
import numpy as np
import glob
pd.options.mode.chained_assignment = None
pd.set_option('expand_frame_repr', True)

## Joining JAAMS and Prism

This work hinges on the ability to match SBA's financial system (JAAMS) and its grants system (Prism). The starting point was award id identified in the mapping document: JAAMS PO_HEADERS_ALL.SEGMENT1. We matched that to the Prism HEADER.DOCNUM.

Once JAAMS and Prism were joined, we built out from there, pulling in other financial and grants tables using the provided files and SQL join statements.

The image below is a rough sketch of how we joined the tables to gather the required DATA Act information.

![SBA DATA Act Mappings](assets/images/jaams-prism-data-act-mapping.png)

## Parse all JAAMS & Prism Extracts

This process is expecting comma-delimited, quoted .txt files.

In [186]:
prism_files = glob.glob('data/prism/*.txt')
prism = {}
for file in prism_files:
    key = file.split('/')[-1][:-4].lower()
    prism[key] = pd.read_csv(file)
    prism[key].rename(columns=lambda x: '{}.'.format(key) + x.lower(), inplace = True)
jaams = {}
jaams_files = glob.glob('data/jaams/*.txt')
for file in jaams_files:
    key = file.split('/')[-1][:-4].lower()
    jaams[key] = pd.read_csv(file, index_col = False)
    jaams[key].rename(columns=lambda x: '{}.'.format(key) + x.lower(), inplace = True)

## JAAMS PO_HEADERS_ALL to Prism Header.docnum

In [187]:
header = prism['header']
header['header.dockey'] = header['header.dockey'].astype(np.int64)
header['header.verkey'] = header['header.verkey'].astype(np.int64)

po_headers_all = jaams['po_headers_all']

jp_merge = pd.merge(
    po_headers_all,
    header,
    left_on = 'po_headers_all.segment1',
    right_on = 'header.docnum'
)
print ('Number of po_headers_all records matched to prism header: {}'.format(len(jp_merge.index)))

Number of po_headers_all records matched to prism header: 862


## Merge in Other Required Data

### GRANTHEADER (POP congressional district)

Join columns = dockey, verkey  
1:1?  
via data/prism/joins.txt

In [188]:
grantheader = prism['grantheader']
grantheader['grantheader.dockey'] = grantheader['grantheader.dockey'].astype(np.int64)
grantheader['grantheader.verkey'] = grantheader['grantheader.verkey'].astype(np.int64)
jp_merge = pd.merge(
    jp_merge, grantheader,
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['grantheader.dockey', 'grantheader.verkey']
)
len(jp_merge.index)

862

### Vendor (Prism)
According to mapping doc, the first address line of the awardee/recipient comes from SAM. The remaining portions of the address are mapped to Prism table docvendor.

Join columns = dockey, verkey     
1:1?  
via data/prism/joins.txt  

In [189]:
jp_merge = pd.merge(
    jp_merge, docvendor,
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['docvendor.dockey', 'docvendor.verkey']
)
len(jp_merge.index)

862

### JAAMS Vendor (awardee/recipient location)

Join columns: vendor_id, vendor_site_id  
1:1?  
This join info wasn't explicitly stated in data/jaams/sql/basic table joins.sql. Just guessing here.

In [190]:
ap_supplier_sites_all = jaams['ap_supplier_sites_all']
jp_merge = pd.merge(
    jp_merge, ap_supplier_sites_all,
    left_on = ['po_headers_all.vendor_id', 'po_headers_all.vendor_site_id'],
    right_on = ['ap_supplier_sites_all.vendor_id', 'ap_supplier_sites_all.vendor_site_id'] 
)
len(jp_merge.index)

862

### PO_LINES_ALL (for award amount info)

Join columns = po_header_id  
1:n  
via data/jaams/sql/basic table joins.sql

In [191]:
po_lines_all = jaams['po_lines_all']
jp_merge = pd.merge(
    jp_merge, po_lines_all,
    left_on = ['po_headers_all.po_header_id'],
    right_on = ['po_lines_all.po_header_id']
)
len(jp_merge.index)

2548

### PO_DISTRIBUTIONS_ALL (for type of transaction)

Join columns = po_header_id, po_line_id  
1:1 (i.e., in this sample data, at least, there was not more than one po distribution per po line)  
via data/jaams/sql/basic table joins.sql

In [192]:
po_distributions_all = jaams['po_distributions_all']
jp_merge = pd.merge(
    jp_merge, po_distributions_all,
    left_on = ['po_lines_all.po_header_id', 'po_lines_all.po_line_id'],
    right_on = ['po_distributions_all.po_header_id', 'po_distributions_all.po_line_id']
)
len(jp_merge.index)

2548

### GL_CODE_COMBINATIONS (funding office, object class, appropriations account)

Join columns = code_combination_id  
1:1 (i.e., in this sample data, there was one gl_code_combination per po_line)  
via data/jaams/sql/basic table joins.sql

In [193]:
gl_code_combinations = jaams['gl_code_combinations']
jp_merge = pd.merge(
    jp_merge, gl_code_combinations,
    left_on = 'po_distributions_all.code_combination_id',
    right_on = 'gl_code_combinations.code_combination_id'
)
len(jp_merge.index)

2548

## FV_FUND_PARAMETERS and FV_TREASURY_SYMBOLS (TAS)

In [194]:
fv_fund_parameters = jaams['fv_fund_parameters']
fv_treasury_symbols = jaams['fv_treasury_symbols']
jp_merge = pd.merge(
    pd.merge(fv_fund_parameters, fv_treasury_symbols, 
    left_on = 'fv_fund_parameters.treasury_symbol_id',
    right_on = 'fv_treasury_symbols.treasury_symbol_id'),
    jp_merge,
    left_on = 'fv_fund_parameters.fund_value',
    right_on = 'gl_code_combinations.segment2'
)
len(jp_merge.index)

2548

### FAADSCIV (record type, place of performance info)

faadsciv joins to the header table by using a crosswalk (called Association).  
See data/prism/joins.txt for more information  
1:1  

In [195]:
#first merge in Association so we can map faadsciv back to header
association = prism['association']
jp_merge = pd.merge(
    jp_merge, association,
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['association.dockey', 'association.verkey']
)
#then use Association as the crosswalk
faadsciv = prism['faadsciv']
jp_merge = pd.merge(
    jp_merge, faadsciv,
    left_on = ['association.assocdockey', 'association.assocverkey'],
    right_on = ['faadsciv.dockey', 'faadsciv.verkey']
)
len(jp_merge.index)

2552

### Prism: Docaddr (awarding office name)

Join columns: header.issuingdocaddresskey = docaddr.docaddrkey  
1:?  
via data/prism/joins.txt

#### Docaddr is missing, so hard-coding it based on the provided file: data_act_prism_grants_fy14.csv

In [196]:
jp_merge['docaddr.name'] = 'Office of Grants Management'

### Prism: Vendor2 (recipient type)

Join columns: ??  
1:??  
data/prism/joins.txt does not include Vendor2

#### Vendor2 is missing, so hard-coding it based on the provided file: data_act_prism_grants_fy14.csv

In [197]:
jp_merge['vendor2.businesstype'] = 'Other nonprofit'

### JAAMS: FIND_FLEX_VALUES_VL (funding office code)

Join columns: ??  
1:??  
data/jaams/sql/basic table joins.sql does not include find_flex_values_vl

#### FIND_FLEX_VALUES_VL is missing, so hard-coding based on provided file: data_act_prism_grants_fy14.csv

In [198]:
jp_merge['find_flex_values_vl'] = '602001'

## Add Calculated Fields and Various Hard-Coding

In [199]:
jp_merge['po_lines_all.total_amount'] = jp_merge['po_lines_all.quantity'] * jp_merge['po_lines_all.unit_price']
jp_merge['funding_agency_name'] = 'Small Business Administration'
jp_merge['funding_agency_code'] = '073'
jp_merge['funding_sub_tier_agency_name'] = 'Small Business Administration'
jp_merge['funding_sub_tier_agency_code'] = '073'
jp_merge['awarding_agency_name'] = 'Small Business Administration'
jp_merge['awarding_agency_code'] = '073'
jp_merge['awarding_sub_tier_agency_name'] = 'Small Business Administration'
jp_merge['awarding_sub_tier_agency_code'] = '073'
jp_merge['federal_agency'] = 'Small Business Administration'
jp_merge['tas'] = '730100' #this is what was listed in itemacct.tas#

## Reduce the Huge Merged File to DATA Act Elements

In [200]:
jp_merge = jp_merge.drop_duplicates()
data_act = jp_merge[[
    'po_lines_all.item_description', #award description
    'po_headers_all.segment1', #award id
    'header.issuingdocaddresskey', #awarding office code
    'header.awarddate', #action date
    'docvendor.name', #Recipient name
    'po_distributions_all.attribute10', #period of performance start date
    'po_distributions_all.attribute11', #period of performance end date
    'docvendor.duns', #awardee/recipient legal business DUNS
    'docvendor.dunsplus4', #awardee/recipient legal business DUNS+4
    'docvendor.address1', #awardee/recipient legal business street address line 1
    'ap_supplier_sites_all.address_line1', #just including this in to see if it matches address above
    'ap_supplier_sites_all.address_line2', #awardee/recipient legal business street address line 2
    'ap_supplier_sites_all.address_line3', #awardee/recipient legal business street address line 3
    'ap_supplier_sites_all.city', #awardee/recipient legal business city
    'ap_supplier_sites_all.state', #awardee/recipient state
    'ap_supplier_sites_all.zip', #awardee/recipient us zip code + 4; awardee/recipient postal code
    'grantheader.sba1222countyname', #recipient county name
    'grantheader.sba1222countycode', #recipient county code
    'po_lines_all.quantity',
    'po_lines_all.unit_price',
    'po_lines_all.total_amount', #funding action obligation (does not account for cancellations etc.)
    'funding_agency_name', #funding agency name
    'funding_agency_code', #funding agency code
    'funding_sub_tier_agency_name', #funding sub-tier agency name
    'funding_sub_tier_agency_code', #funding sub-tier agency code
    'po_distributions_all.attribute_category', #type of award code
    'header.obligatedamt', #amount of ba appropriated; obligation
    'po_distributions_all.quantity_billed', #outlay
    'gl_code_combinations.segment3', #funding office name
    'gl_code_combinations.segment5', #object class woo!
    'gl_code_combinations.code_combination_id', #appropriations account
    'gl_code_combinations.segment4', #program activity
    'grantheader.sba1222congdistno', #place of performance congressional district
    'faadsciv.recordtype', #record type
    'faadsciv.countycityname', #primary place of performance city code (note: there was no census_code columns)
    'faadsciv.principalstatecode', #primary place of performance state code
    'faadsciv.principalstatename', #primary place of performance state name
    'faadsciv.placeofperfzip', #primary place of performance zip code + 4
    'faadsciv.placeofperfcountrycode', #primary location of performance country code
    'faadsciv.placeofperfcountryname', #primary location of performance country name
    'faadsciv.cfdaprogramnumber', #cfda program number
    'faadsciv.cfdaprogramtitle', #cfda program title
    'docaddr.name', #awarding office name
    'vendor2.businesstype', #recipient type
    'funding_agency_name',
    'funding_agency_code',
    'funding_sub_tier_agency_name',
    'funding_sub_tier_agency_code',
    'awarding_agency_name',
    'awarding_agency_code',
    'awarding_sub_tier_agency_name',
    'awarding_sub_tier_agency_code',
    'federal_agency'
    ]]

#write out the data act file
data_act.to_csv('data/data_act.csv', index = False)
#also write out the entire merged file, so we can look for interesting things
jp_merge.to_csv('data/jp_merge.csv', index = False)