# SBA DATA Act Pilot: JAAMS + Prism

The code below uses the JAAMS and Prism test data supplied by SBA to create a single file that contains the required DATA Act elements. This is a first pass, and this notebook tracks the process, assumptions, and outstanding questions.

In [351]:
#style up the notebook
#from IPython.core.display import HTML
#css_file = 'assets/css/notebook-style.css'
#HTML(open(css_file, "r").read())

In [352]:
import pandas as pd
import numpy as np
import glob
pd.options.mode.chained_assignment = None
pd.set_option('expand_frame_repr', True)

## Joining JAAMS and Prism

This work hinges on the ability to match SBA's financial system (JAAMS) and its grants system (Prism). The starting point was award id identified in the mapping document: JAAMS PO_HEADERS_ALL.SEGMENT1. We matched that to the Prism HEADER.DOCNUM.

Once JAAMS and Prism were joined, we built out from there, pulling in other financial and grants tables using the provided files and SQL join statements.

The image shows how we joined the tables to gather the required DATA Act information.

![SBA DATA Act Mappings](assets/images/jaams-prism-data-act-mapping.png)

## Parse all JAAMS & Prism Extracts

This process is expecting comma-delimited, quoted .txt files in a file structure as follows. The script doesn't care about case in the file name.  
```
        data  
            jaams  
                FV_FUND_PARAMETERS.txt  
                FV_TREASURY_SYMBOLS.txt  
                etc.  
            prism  
                association.txt  
                docaddr.txt  
                etc.  
```
                

In [353]:
jp_merge = pd.DataFrame()
prism_files = glob.glob('data/prism/*.txt')
prism = {}
for file in prism_files:
    key = file.split('/')[-1][:-4].lower()
    prism[key] = pd.read_csv(file)
    prism[key].rename(columns=lambda x: '{}.'.format(key) + x.lower(), inplace = True)
jaams = {}
jaams_files = glob.glob('data/jaams/*.txt')
for file in jaams_files:
    key = file.split('/')[-1][:-4].lower()
    jaams[key] = pd.read_csv(file, index_col = False)
    jaams[key].rename(columns=lambda x: '{}.'.format(key) + x.lower(), inplace = True)

## JAAMS PO_HEADERS_ALL to Prism Header.docnum

In [354]:
jp_merge = pd.merge(
    jaams['po_headers_all'],
    prism['header'],
    left_on = 'po_headers_all.segment1',
    right_on = 'header.docnum'
)
print ('Number of po_headers_all records matched to prism header: {}'.format(len(jp_merge.index)))

Number of po_headers_all records matched to prism header: 862


## Merge in Other Required Data

### GRANTHEADER (POP congressional district)|

In [355]:
grantheader = prism['grantheader']
jp_merge = pd.merge(
    jp_merge, grantheader,
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['grantheader.dockey', 'grantheader.verkey']
)
len(jp_merge.index)

862

### Vendor (Prism)
According to mapping doc, the first address line of the awardee/recipient comes from SAM. The remaining portions of the address are mapped to Prism table docvendor.

In [356]:
jp_merge = pd.merge(
    jp_merge, prism['docvendor'],
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['docvendor.dockey', 'docvendor.verkey']
)
len(jp_merge.index)

862

### Itemacct (Prism)

In [357]:
jp_merge = pd.merge(
    jp_merge, prism['item'],
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['item.hdrdockey', 'item.hdrverkey']
)
len(jp_merge.index)

865

In [358]:
jp_merge = pd.merge(
    jp_merge, prism['deliverylocdate'],
    left_on = ['item.dockey', 'item.verkey'],
    right_on = ['deliverylocdate.dockey', 'deliverylocdate.verkey'])
len(jp_merge.index)

869

In [359]:
jp_merge = pd.merge(
    jp_merge, prism['itemacct'],
    left_on = 'deliverylocdate.deliverylocdatekey',
    right_on = 'itemacct.deliverylocdatekey'
)
len(jp_merge.index)

1022

### NAICS

In [360]:
jp_merge= pd.merge(
    jp_merge, prism['naicssicdata'],
    left_on = 'header.primarysiccode',
    right_on = 'naicssicdata.naics',
    how = 'left' #left join b/c NAICS not always applicable?
)
len(jp_merge.index)

1022

### JAAMS Vendor (awardee/recipient location)

In [361]:
jp_merge = pd.merge(
    jp_merge, jaams['ap_supplier_sites_all'],
    left_on = ['po_headers_all.vendor_id', 'po_headers_all.vendor_site_id'],
    right_on = ['ap_supplier_sites_all.vendor_id', 'ap_supplier_sites_all.vendor_site_id'] 
)
len(jp_merge.index)

1022

### PO_LINES_ALL (for award amount info)

In [362]:
jp_merge = pd.merge(
    jp_merge, jaams['po_lines_all'],
    left_on = ['po_headers_all.po_header_id'],
    right_on = ['po_lines_all.po_header_id']
)
len(jp_merge.index)

3148

### PO_DISTRIBUTIONS_ALL (for type of transaction)

In [363]:
jp_merge = pd.merge(
    jp_merge, jaams['po_distributions_all'],
    left_on = ['po_lines_all.po_header_id', 'po_lines_all.po_line_id'],
    right_on = ['po_distributions_all.po_header_id', 'po_distributions_all.po_line_id']
)
len(jp_merge.index)

3148

### GL_CODE_COMBINATIONS (funding office, object class, appropriations account)

In [364]:
jp_merge = pd.merge(
    jp_merge, jaams['gl_code_combinations'],
    left_on = 'po_distributions_all.code_combination_id',
    right_on = 'gl_code_combinations.code_combination_id'
)
len(jp_merge.index)

3148

### FV_FUND_PARAMETERS and FV_TREASURY_SYMBOLS (TAS)

In [365]:
fv_fund_parameters = jaams['fv_fund_parameters']
fv_treasury_symbols = jaams['fv_treasury_symbols']
jp_merge = pd.merge(
    pd.merge(fv_fund_parameters, fv_treasury_symbols, 
    left_on = 'fv_fund_parameters.treasury_symbol_id',
    right_on = 'fv_treasury_symbols.treasury_symbol_id'),
    jp_merge,
    left_on = 'fv_fund_parameters.fund_value',
    right_on = 'gl_code_combinations.segment2'
)
len(jp_merge.index)

3148

### FAADSCIV (record type, place of performance info)

In [366]:
#first merge in Association so we can map faadsciv back to header
jp_merge = pd.merge(
    jp_merge, prism['association'],
    left_on = ['header.dockey', 'header.verkey'],
    right_on = ['association.dockey', 'association.verkey']
)
#then use Association as the crosswalk
jp_merge = pd.merge(
    jp_merge, prism['faadsciv'],
    left_on = ['association.assocdockey', 'association.assocverkey'],
    right_on = ['faadsciv.dockey', 'faadsciv.verkey']
)
len(jp_merge.index)

3152

### Prism: Docaddr (awarding office name)



In [367]:
jp_merge = pd.merge(
    jp_merge, prism['docaddr'],
    left_on = 'header.issuingdocaddresskey',
    right_on = 'docaddr.docaddrkey')
len(jp_merge.index)

3152

## Add Calculated Fields and Various Hard-Coding

In [368]:
jp_merge['funding_agency_name'] = 'Small Business Administration'
jp_merge['funding_agency_code'] = '073'
jp_merge['funding_sub_tier_agency_name'] = 'Small Business Administration'
jp_merge['funding_sub_tier_agency_code'] = '073'
jp_merge['awarding_agency_name'] = 'Small Business Administration'
jp_merge['awarding_agency_code'] = '073'
jp_merge['awarding_sub_tier_agency_name'] = 'Small Business Administration'
jp_merge['awarding_sub_tier_agency_code'] = '073'

## Reduce the Huge Merged File to DATA Act Elements

In [369]:
jp_merge = jp_merge.drop_duplicates()
data_act = jp_merge[[
    'po_headers_all.segment1', #award id
    'header.versionnum', #award modification
    'po_lines_all.item_description', #award description
    #'header.shortdescr', #award description
    'header.awarddate', #action date
    'docvendor.name', #Recipient name
    'po_distributions_all.attribute10', #period of performance start date
    #'header.startdate', #period of performance start date
    'po_distributions_all.attribute11', #period of performance end date
    #'header.enddate', #period of performance end date
    'docvendor.duns', #awardee/recipient legal business DUNS
    'docvendor.dunsplus4', #awardee/recipient legal business DUNS+4
    'docvendor.address1', #awardee/recipient legal business street address line 1
    'docvendor.address2', #awardee/recipient legal busines street address line 2
    'docvendor.address3', #awardee/recipient legal business street address line 3
    'docvendor.city', #awardee/recipient legal business city
    'docvendor.state', #awardee/recipient state
    'docvendor.zip', #awardee/recipient us zip code + 4; awardee/recipient postal code
    'header.amount', #current total value of award/potential total value of award
    'header.awardtype', #type of award
    'header.primarysiccode', #naics code
    'grantheader.sba1222countyname', #recipient county name
    'grantheader.sba1222countycode', #recipient county code
    'grantheader.recipientcountrycode', #awardee/recipient legal business country code
    'grantheader.recipientcountryname', #awardee/recipient legal business country name
    'grantheader.sba1222personalservicenf', #used for non-fed funding amt calculation
    'grantheader.sba1222fringebenefitsnf', #used for non-fed funding amt calculation
    'grantheader.sba1222consultantsnf', #used for non-fed funding amt calculation
    'grantheader.sba1222travelnf', #used for non-fed funding amt calculation
    'grantheader.sba1222equipmentnf', #used for non-fed funding amt calculation
    'grantheader.sba1222suppliesnf', #used for non-fed funding amt calculation
    'grantheader.sba1222contractualnf', #used for non-fed funding amt calculation
    'grantheader.sba1222othernf', #used for non-fed funding amt calculation
    'grantheader.sba1222indcostnf', #used for non-fed funding amt calculation
    'grantheader.sba1222othercostnf', #used for non-fed funding amt calculation
    #'fv_treasury_symbols.treasury_symbol', #TAS
    'itemacct.tas#', #TAS
    'po_lines_all.quantity',
    'po_lines_all.unit_price',
    'funding_agency_name', #funding agency name
    'funding_agency_code', #funding agency code
    'funding_sub_tier_agency_name', #funding sub-tier agency name
    'funding_sub_tier_agency_code', #funding sub-tier agency code
    'po_distributions_all.attribute_category', #type of award code
    'header.obligatedamt', #amount of ba appropriated; obligation
    'po_distributions_all.quantity_billed', #outlay
    'gl_code_combinations.segment3', #funding office name
    'itemacct.acctfield3', #funding office name and funding office code
    'gl_code_combinations.segment5', #object class woo!
    #'itemacct.acctfield5', #object class
    #'gl_code_combinations.code_combination_id', #appropriations account
    'itemacct.acctfield6', #appropriations account
    'gl_code_combinations.segment4', #program activity
    #'itemacct.acctfield4', #program activity
    'grantheader.sba1222congdistno', #place of performance congressional district
    'faadsciv.recordtype', #record type
    'faadsciv.countycityname', #primary place of performance county name/primary place of performance city
    'faadsciv.countycitycode', #primary place of performance county code
    'faadsciv.principalstatecode', #primary place of performance state code
    'faadsciv.principalstatename', #primary place of performance state name
    'faadsciv.placeofperfzip', #primary place of performance zip code + 4
    'faadsciv.placeofperfcountrycode', #primary location of performance country code
    'faadsciv.placeofperfcountryname', #primary location of performance country name
    'faadsciv.cfdaprogramnumber', #cfda program number
    'faadsciv.cfdaprogramtitle', #cfda program title
    'docaddr.name', #awarding office name
    'header.issuingdocaddresskey', #awarding office code
    'faadsciv.recipienttype', #recipient type
    'funding_agency_name',
    'funding_agency_code',
    'funding_sub_tier_agency_name',
    'funding_sub_tier_agency_code',
    'awarding_agency_name',
    'awarding_agency_code',
    'awarding_sub_tier_agency_name',
    'awarding_sub_tier_agency_code'
    ]]

#write out the data act file
data_act.to_csv('data/data_act.csv', index = False)
#also write out the entire merged file, so we can look for interesting things
jp_merge.to_csv('data/jp_merge.csv', index = False)