# 3001_Create Analytic Dataset

This notebook is for data wrangling of the Urban Ministries of Durham (UMD) homeless shelter data.

## Import Data

In [3]:
import pandas as pd
import numpy as np

### CLIENT_191102.tsv

In [8]:
client = pd.read_csv("../data/client_191102.tsv", delimiter='\t', encoding='utf-8')
client.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Client Age at Entry,Client Age at Exit,Client Gender,Client Primary Race,Client Ethnicity,Client Veteran Status
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,60.0,61.0,Female,White (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
1,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,48.0,48.0,Female,Black or African American (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
2,Urban Ministries of Durham - Durham County - S...,687903,smrf06211973s620m640,188933,42.0,42.0,Female,Black or African American (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
3,Urban Ministries of Durham - Durham County - S...,687904,abrm07251958a416b600,168290,57.0,57.0,Male,White (HUD),Hispanic/Latino (HUD),No (HUD)
4,Urban Ministries of Durham - Durham County - S...,687905,wbom01251964w450b620,123122,51.0,51.0,Male,White (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)


In [3]:
client.groupby("Client ID").size().max()

37

There are multiple records per Client ID in this file with a maximum number of records of 37.

In [4]:
client_records=client.groupby("Client ID").size().reset_index(name='Size')
client_records[client_records.Size==37]

Unnamed: 0,Client ID,Size
773,320781,37


In [5]:
client.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    4319
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Entry- ESG(1970)          721
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Forward- ESG(5694)         65
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Outreach- ESG(4515)        51
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Recovery- ESG(1932)        67
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Tech- ESG(4516)            61
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Veterans- ESG(5069)        15
dtype: int64

I will limit the analyses to records with EE Provider ID=Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838) because I don't know what the other things are.

In [6]:
client=client[client["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
client.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    4319
dtype: int64

Also not really sure what the difference between "Client Unique ID" and "Client ID" so I'm only going to use "Client ID". So I'm going to drop "Client Unique ID" and "EE Provider ID" since it should all be records from the Urban Ministries of Durham - Singles Emergency Shelter and nothing with xxxClosed.

In [7]:
client = client.drop(['Client Unique ID', 'EE Provider ID'],1)
client.head()

Unnamed: 0,EE UID,Client ID,Client Age at Entry,Client Age at Exit,Client Gender,Client Primary Race,Client Ethnicity,Client Veteran Status
0,687901,397941,60.0,61.0,Female,White (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
1,687902,130335,48.0,48.0,Female,Black or African American (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
2,687903,188933,42.0,42.0,Female,Black or African American (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)
3,687904,168290,57.0,57.0,Male,White (HUD),Hispanic/Latino (HUD),No (HUD)
4,687905,123122,51.0,51.0,Male,White (HUD),Non-Hispanic/Non-Latino (HUD),No (HUD)


In [8]:
client.groupby("Client Gender").size()
client.groupby("Client Primary Race").size()
client.groupby("Client Ethnicity").size()
client.groupby("Client Veteran Status").size()

Client Veteran Status
Data not collected (HUD)       2
No (HUD)                    3848
Yes (HUD)                    461
dtype: int64

In [9]:
## change Trans Female (MTF or Male to Female) to missing for identifiable purposes
client['Client Gender'] = client['Client Gender'].replace('Trans Female (MTF or Male to Female)', np.NaN)
client.groupby("Client Gender").size()

Client Gender
Female    1036
Male      3268
dtype: int64

In [10]:
# Remove the "(HUD)" from this response, convert don't know to missing
client['Client Primary Race']=client['Client Primary Race'].str.rstrip(" (HUD)").replace("Client doesn't know", np.NaN).replace("Client refused", np.NaN).replace("Data not collected", np.NaN)
client.groupby("Client Primary Race").size()

Client Primary Race
American Indian or Alaska Native               74
Asian                                           3
Black or African American                    3133
Native Hawaiian or Other Pacific Islander      11
White                                        1086
dtype: int64

In [11]:
# Remove the "(HUD)" from this response, convert don't know to missing
client['Client Ethnicity']=client['Client Ethnicity'].str.rstrip(" (HUD)").replace("Client doesn't know", np.NaN).replace("Client refused", np.NaN).replace("Data not collected", np.NaN)
client.groupby("Client Ethnicity").size()

Client Ethnicity
Hispanic/Latino             163
Non-Hispanic/Non-Latino    4144
dtype: int64

In [12]:
# Remove the "(HUD)" from this response, convert don't know to missing
client['Client Veteran Status']=client['Client Veteran Status'].str.rstrip(" (HUD)").replace("Data not collected", np.NaN)
client.groupby("Client Veteran Status").size()

Client Veteran Status
No     3848
Yes     461
dtype: int64

### ENTRY_EXIT_191102.tsv

In [13]:
entry_exit = pd.read_csv("../data/entry_exit_191102.tsv", delimiter='\t', encoding='utf-8')
entry_exit.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Entry Exit Group Id,Entry Exit Household Id,Unnamed: 6,Entry Date,Housing Move-in Date(5584),Exit Date,Destination,Reason for Leaving,Entry Exit Type,Entry Exit Date Added,Entry Exit Date Updated
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,,,,8/15/2015,4/20/2015,7/11/2016,"Rental by client, with other ongoing housing s...",Completed program,HUD,8/19/2015,7/20/2016
1,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,,,,8/15/2015,,8/31/2015,Data not collected (HUD),Needs could not be met,HUD,8/19/2015,9/3/2015
2,Urban Ministries of Durham - Durham County - S...,687903,smrf06211973s620m640,188933,,,,8/15/2015,,9/19/2015,"Staying or living with friends, temporary tenu...",Other,HUD,8/19/2015,9/22/2015
3,Urban Ministries of Durham - Durham County - S...,687904,abrm07251958a416b600,168290,,,,8/15/2015,,3/7/2016,Hospital or other residential non-psychiatric ...,Other,HUD,8/19/2015,3/8/2016
4,Urban Ministries of Durham - Durham County - S...,687905,wbom01251964w450b620,123122,,,,8/15/2015,,8/24/2015,"Staying or living with friends, temporary tenu...",Other,HUD,8/19/2015,8/25/2015


In [14]:
entry_exit.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    4319
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Entry- ESG(1970)          721
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Forward- ESG(5694)         65
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Outreach- ESG(4515)        51
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Recovery- ESG(1932)        67
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Tech- ESG(4516)            61
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Veterans- ESG(5069)        15
dtype: int64

In [15]:
entry_exit=entry_exit[entry_exit["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
entry_exit.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    4319
dtype: int64

In [16]:
entry_exit = entry_exit[['EE UID', 'Entry Date', 'Exit Date', 'Destination']]
entry_exit.head()

Unnamed: 0,EE UID,Entry Date,Exit Date,Destination
0,687901,8/15/2015,7/11/2016,"Rental by client, with other ongoing housing s..."
1,687902,8/15/2015,8/31/2015,Data not collected (HUD)
2,687903,8/15/2015,9/19/2015,"Staying or living with friends, temporary tenu..."
3,687904,8/15/2015,3/7/2016,Hospital or other residential non-psychiatric ...
4,687905,8/15/2015,8/24/2015,"Staying or living with friends, temporary tenu..."


In [17]:
entry_exit[['Entry Date', 'Exit Date']] = entry_exit[['Entry Date', 'Exit Date']].apply(pd.to_datetime)
entry_exit.head()

Unnamed: 0,EE UID,Entry Date,Exit Date,Destination
0,687901,2015-08-15,2016-07-11,"Rental by client, with other ongoing housing s..."
1,687902,2015-08-15,2015-08-31,Data not collected (HUD)
2,687903,2015-08-15,2015-09-19,"Staying or living with friends, temporary tenu..."
3,687904,2015-08-15,2016-03-07,Hospital or other residential non-psychiatric ...
4,687905,2015-08-15,2015-08-24,"Staying or living with friends, temporary tenu..."


In [18]:
entry_exit['LOS']=entry_exit['Exit Date'] - entry_exit['Entry Date']
entry_exit.head()

Unnamed: 0,EE UID,Entry Date,Exit Date,Destination,LOS
0,687901,2015-08-15,2016-07-11,"Rental by client, with other ongoing housing s...",331 days
1,687902,2015-08-15,2015-08-31,Data not collected (HUD),16 days
2,687903,2015-08-15,2015-09-19,"Staying or living with friends, temporary tenu...",35 days
3,687904,2015-08-15,2016-03-07,Hospital or other residential non-psychiatric ...,205 days
4,687905,2015-08-15,2015-08-24,"Staying or living with friends, temporary tenu...",9 days


In [19]:
entry_exit["LOS"] = entry_exit["LOS"].apply(lambda row: row.days)

### DISABILITY_ENTRY_191102.tsv

In [20]:
disab_entry = pd.read_csv("../data/disability_entry_191102.tsv", delimiter='\t', encoding='utf-8')
disab_entry.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Disability Determination (Entry),Disability Type (Entry),Disability Start Date (Entry),Disability End Date (Entry),Provider (417-provider),Recordset ID (417-recordset_id),Date Added (417-date_added)
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No (HUD),Alcohol Abuse (HUD),4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261529,7/16/2015
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No (HUD),Both Alcohol and Drug Abuse (HUD),4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261530,7/16/2015
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No (HUD),Chronic Health Condition (HUD),4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261524,7/16/2015
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No (HUD),Developmental (HUD),4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261527,7/16/2015
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No (HUD),Drug Abuse (HUD),4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261528,7/16/2015


In [21]:
disab_entry.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    36396
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Entry- ESG(1970)          3258
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Forward- ESG(5694)         490
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Outreach- ESG(4515)        409
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Recovery- ESG(1932)        264
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Tech- ESG(4516)            458
XXXClosed2015 Urban Ministries of Durham- Durham County- Journey Veterans- ESG(5069)        118
dtype: int64

In [22]:
disab_entry = disab_entry[disab_entry["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
disab_entry.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    36396
dtype: int64

In [23]:
disab_entry = disab_entry[['EE UID', 'Client ID', 'Disability Determination (Entry)', 'Disability Type (Entry)', 'Date Added (417-date_added)']]
disab_entry.head()

Unnamed: 0,EE UID,Client ID,Disability Determination (Entry),Disability Type (Entry),Date Added (417-date_added)
0,687901,397941,No (HUD),Alcohol Abuse (HUD),7/16/2015
1,687901,397941,No (HUD),Both Alcohol and Drug Abuse (HUD),7/16/2015
2,687901,397941,No (HUD),Chronic Health Condition (HUD),7/16/2015
3,687901,397941,No (HUD),Developmental (HUD),7/16/2015
4,687901,397941,No (HUD),Drug Abuse (HUD),7/16/2015


In [24]:
disab_entry.groupby("Disability Determination (Entry)").size()

Disability Determination (Entry)
Client doesn't know (HUD)       28
Data not collected (HUD)         8
No (HUD)                     32214
Yes (HUD)                     4047
dtype: int64

In [25]:
# Remove the "(HUD)" from this response and combine "Client doesn't know" and "Data not collected" into "Unknown"
disab_deter_map = {"Client doesn't know (HUD)":'Unk', "Data not collected (HUD)":'Unk', "No (HUD)":"No", "Yes (HUD)":"Yes"}
disab_entry['Disab Determination'] = disab_entry['Disability Determination (Entry)'].map(disab_deter_map)

# change data not collected to NaN
disab_entry['Disab Determination'] = disab_entry["Disab Determination"].replace('Unk', np.NaN)
disab_entry.groupby("Disab Determination").size()

Disab Determination
No     32214
Yes     4047
dtype: int64

In [26]:
disab_entry.groupby("Disability Type (Entry)").size()

Disability Type (Entry)
Alcohol Abuse (HUD)                  4468
Both Alcohol and Drug Abuse (HUD)    4476
Chronic Health Condition (HUD)       4512
Developmental (HUD)                  4472
Drug Abuse (HUD)                     4494
Dual Diagnosis                          2
HIV/AIDS (HUD)                       4485
Hearing Impaired                        2
Mental Health Problem (HUD)          4621
Other                                   3
Other: Learning                         3
Other: Speech                           2
Physical (HUD)                       4552
Physical/Medical                      301
Vision Impaired                         3
dtype: int64

In [27]:
# Remove the "(HUD)" from this response
disab_entry['Disability Type']=disab_entry['Disability Type (Entry)'].str.rstrip(" (HUD)")
disab_entry.groupby("Disability Type").size()

Disability Type
Alcohol Abuse                  4468
Both Alcohol and Drug Abuse    4476
Chronic Health Condition       4512
Developmental                  4472
Drug Abuse                     4494
Dual Diagnosis                    2
HIV/AIDS                       4485
Hearing Impaired                  2
Mental Health Problem          4621
Other                             3
Other: Learning                   3
Other: Speech                     2
Physical                       4552
Physical/Medical                301
Vision Impaired                   3
dtype: int64

In [28]:
# Drop old variables.
disab_entry=disab_entry.drop(['Disability Determination (Entry)', 'Disability Type (Entry)'], axis=1)
disab_entry.head()

Unnamed: 0,EE UID,Client ID,Date Added (417-date_added),Disab Determination,Disability Type
0,687901,397941,7/16/2015,No,Alcohol Abuse
1,687901,397941,7/16/2015,No,Both Alcohol and Drug Abuse
2,687901,397941,7/16/2015,No,Chronic Health Condition
3,687901,397941,7/16/2015,No,Developmental
4,687901,397941,7/16/2015,No,Drug Abuse


In [29]:
# sorting by first name 
disab_entry.sort_values(by=['EE UID', 'Client ID', 'Disability Type', 'Date Added (417-date_added)'], inplace=True)
disab_entry.head()

Unnamed: 0,EE UID,Client ID,Date Added (417-date_added),Disab Determination,Disability Type
0,687901,397941,7/16/2015,No,Alcohol Abuse
1,687901,397941,7/16/2015,No,Both Alcohol and Drug Abuse
2,687901,397941,7/16/2015,No,Chronic Health Condition
3,687901,397941,7/16/2015,No,Developmental
4,687901,397941,7/16/2015,No,Drug Abuse


In [30]:
# dropping duplicate values - we will only keep the last dated record because this looks to me like it was an "update"
disab_entry.drop_duplicates(subset=['EE UID', 'Client ID', 'Disability Type'], keep='first',inplace=True)
disab_entry.head()

Unnamed: 0,EE UID,Client ID,Date Added (417-date_added),Disab Determination,Disability Type
0,687901,397941,7/16/2015,No,Alcohol Abuse
1,687901,397941,7/16/2015,No,Both Alcohol and Drug Abuse
2,687901,397941,7/16/2015,No,Chronic Health Condition
3,687901,397941,7/16/2015,No,Developmental
4,687901,397941,7/16/2015,No,Drug Abuse


In [31]:
# drop date
disab_entry=disab_entry.drop(['Date Added (417-date_added)'], axis=1)
disab_entry.head()

Unnamed: 0,EE UID,Client ID,Disab Determination,Disability Type
0,687901,397941,No,Alcohol Abuse
1,687901,397941,No,Both Alcohol and Drug Abuse
2,687901,397941,No,Chronic Health Condition
3,687901,397941,No,Developmental
4,687901,397941,No,Drug Abuse


In [32]:
#Transform data so 1 column for each disability type and disab determination as the values.
disab_entry_t = disab_entry.pivot(index='EE UID', columns='Disability Type', values='Disab Determination')
disab_entry_t.head()

Disability Type,Alcohol Abuse,Both Alcohol and Drug Abuse,Chronic Health Condition,Developmental,Drug Abuse,Dual Diagnosis,HIV/AIDS,Hearing Impaired,Mental Health Problem,Other,Other: Learning,Other: Speech,Physical,Physical/Medical,Vision Impaired
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
687901,No,No,No,No,No,,No,,Yes,,,,No,Yes,
687902,No,No,Yes,No,No,,No,,No,,,,No,Yes,
687903,No,No,No,No,Yes,,No,,Yes,,,,Yes,,
687904,No,No,Yes,No,No,,No,,No,,,,No,,
687905,No,No,Yes,Yes,No,,No,,Yes,,,,No,Yes,


In [33]:
disab_entry_t['Any Disability']="No"
for index in disab_entry_t.index:
    any_disability="No"
    for col in disab_entry_t.columns:
        if disab_entry_t[col][index] == "Yes":
            any_disability="Yes"
    disab_entry_t['Any Disability'][index]=any_disability
disab_entry_t.head()

Disability Type,Alcohol Abuse,Both Alcohol and Drug Abuse,Chronic Health Condition,Developmental,Drug Abuse,Dual Diagnosis,HIV/AIDS,Hearing Impaired,Mental Health Problem,Other,Other: Learning,Other: Speech,Physical,Physical/Medical,Vision Impaired,Any Disability
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
687901,No,No,No,No,No,,No,,Yes,,,,No,Yes,,Yes
687902,No,No,Yes,No,No,,No,,No,,,,No,Yes,,Yes
687903,No,No,No,No,Yes,,No,,Yes,,,,Yes,,,Yes
687904,No,No,Yes,No,No,,No,,No,,,,No,,,Yes
687905,No,No,Yes,Yes,No,,No,,Yes,,,,No,Yes,,Yes


### EE_UDES_191102.tsv 

In [34]:
ee_udes = pd.read_csv("../data/ee_udes_191102.tsv", delimiter='\t', encoding='utf-8')
ee_udes.head()

Unnamed: 0,EE Provider ID,Entry Exit Provider Program Type Code,EE UID,Client Unique ID,Client ID,Client Location(4378),"Zip Code (of Last Permanent Address, if known)(1932)",Relationship to Head of Household(4374),Prior Living Situation(43),Length of Stay in Previous Place(1934),...,Did you stay less than 90 days?(5163),"On the night before did you stay on the streets, ES or SH?(5165)","Regardless of where they stayed last night - Number of times the client has been on the streets, in ES, or SH in the past three years including today(5167)","Total number of months homeless on the street, in ES or SH in the past three years(5168)",Housing Status(2703),Does the client have a disabling condition?(1935),Covered by Health Insurance(4376),Domestic violence victim/survivor(341),"If yes for Domestic violence victim/survivor, when experience occurred(1917)",Date of Birth(893)
0,Urban Ministries of Durham - Durham County - S...,Emergency Shelter (HUD),687901,pbkf09291954p610b236,397941,NC-502 Durham City and County CoC,27701,Self (head of household),"Staying or living in a friend's room, apartmen...",One year or longer (HUD),...,,,Two times (HUD),2,Category 1 - Homeless (HUD),Yes (HUD),Yes (HUD),No (HUD),,9/29/1954
1,Urban Ministries of Durham - Durham County - S...,Emergency Shelter (HUD),687902,kdaf01071967k400d635,130335,NC-502 Durham City and County CoC,29033,Self (head of household),"Staying or living in a family member's room, a...","One month or more, but less than 90 days",...,,,Four or more times (HUD),More than 12 months (HUD),Category 1 - Homeless (HUD),Yes (HUD),Yes (HUD),Yes (HUD),More than a year ago (HUD),1/7/1967
2,Urban Ministries of Durham - Durham County - S...,Emergency Shelter (HUD),687903,smrf06211973s620m640,188933,NC-502 Durham City and County CoC,27703,Self (head of household),"Staying or living in a friend's room, apartmen...","One month or more, but less than 90 days",...,,,Four or more times (HUD),More than 12 months (HUD),Category 1 - Homeless (HUD),Yes (HUD),No (HUD),No (HUD),,6/21/1973
3,Urban Ministries of Durham - Durham County - S...,Emergency Shelter (HUD),687904,abrm07251958a416b600,168290,NC-502 Durham City and County CoC,27603,Self (head of household),"Staying or living in a friend's room, apartmen...",One year or longer (HUD),...,,,Four or more times (HUD),Data not collected (HUD),Category 1 - Homeless (HUD),Yes (HUD),No (HUD),No (HUD),,7/25/1958
4,Urban Ministries of Durham - Durham County - S...,Emergency Shelter (HUD),687905,wbom01251964w450b620,123122,NC-502 Durham City and County CoC,27510,Self (head of household),"Staying or living in a friend's room, apartmen...","One week or more, but less than one month",...,,,Four or more times (HUD),More than 12 months (HUD),Category 1 - Homeless (HUD),Yes (HUD),No (HUD),No (HUD),,1/25/1964


In [35]:
ee_udes = ee_udes[ee_udes["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
ee_udes.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    4319
dtype: int64

In [36]:
ee_udes.groupby("Prior Living Situation(43)").size()

Prior Living Situation(43)
Client doesn't know (HUD)                                                                                4
Client refused (HUD)                                                                                     1
Data not collected (HUD)                                                                                 3
Emergency shelter, incl. hotel/motel paid for w/ ES voucher, or RHY-funded Host Home shelter (HUD)     731
Foster care home or foster care group home (HUD)                                                         4
Hospital or other residential non-psychiatric medical facility (HUD)                                   133
Hotel or motel paid for without emergency shelter voucher (HUD)                                        172
Interim Housing (HUD) (Retired)                                                                          7
Jail, prison or juvenile detention facility (HUD)                                                      165
Long-term 

In [37]:
ee_udes['temp prior living']=ee_udes['Prior Living Situation(43)'].fillna("0")
ee_udes['Prior Living'] = pd.np.where(ee_udes['temp prior living'].str.contains("doesn't know|0|refused|not collected", case=False),"UNK",
                                      pd.np.where(ee_udes['temp prior living'].str.contains("hospital|nursing|treatment", case=False), "HOSPITAL",
                                                  pd.np.where(ee_udes['temp prior living'].str.contains("rental", case=False), "RENTAL",
                                                              pd.np.where(ee_udes['temp prior living'].str.contains("friend|family", case=False), "FRIEND or FAMILY",
                                                                          pd.np.where(ee_udes['temp prior living'].str.contains("jail", case=False), "PRISON",
                                                                                      pd.np.where(ee_udes['temp prior living'].str.contains("owned|permanent", case=False), "PERMANENT",
                                                                                                  pd.np.where(ee_udes['temp prior living'].str.contains("habitation", case=False), "NOT HABITABLE", 
                                                                                                              pd.np.where(ee_udes['temp prior living'].str.contains("transition|halfway|safe|interim|foster", case=False), "INTERIM",
                                                                                                                          pd.np.where(ee_udes['temp prior living'].str.contains("Host Home shelter"), "SHELTER","OTHER")))))))))

In [38]:
ee_udes.groupby("Prior Living").size()

Prior Living
FRIEND or FAMILY    1163
HOSPITAL             246
INTERIM              196
NOT HABITABLE       1313
OTHER                176
PERMANENT             38
PRISON               165
RENTAL               248
SHELTER              731
UNK                   43
dtype: int64

In [39]:
list(ee_udes.columns.values)

['EE Provider ID',
 'Entry Exit Provider Program Type Code',
 'EE UID',
 'Client Unique ID',
 'Client ID',
 'Client Location(4378)',
 'Zip Code (of Last Permanent Address, if known)(1932)',
 'Relationship to Head of Household(4374)',
 'Prior Living Situation(43)',
 'Length of Stay in Previous Place(1934)',
 'Did you stay less than 7 nights?(5164)',
 'Did you stay less than 90 days?(5163)',
 'On the night before did you stay on the streets, ES or SH?(5165)',
 'Regardless of where they stayed last night - Number of times the client has been on the streets, in ES, or SH in the past three years including today(5167)',
 'Total number of months homeless on the street, in ES or SH in the past three years(5168)',
 'Housing Status(2703)',
 'Does the client have a disabling condition?(1935)',
 'Covered by Health Insurance(4376)',
 'Domestic violence victim/survivor(341)',
 'If yes for Domestic violence victim/survivor, when experience occurred(1917)',
 'Date of Birth(893)',
 'temp prior living',

In [40]:
ee_udes.groupby('Domestic violence victim/survivor(341)').size()

Domestic violence victim/survivor(341)
Client doesn't know (HUD)       7
Client refused (HUD)            1
No (HUD)                     3854
Yes (HUD)                     413
dtype: int64

In [41]:
# Remove the "(HUD)" from this response and combine "Client doesn't know" and "Data not collected" into "Unknown"
dv_deter_map = {"Client doesn't know (HUD)":'Unk', "Client refused (HUD)":'Unk', "No (HUD)":"No", "Yes (HUD)":"Yes"}
ee_udes['Domestic violence victim/survivor'] = ee_udes['Domestic violence victim/survivor(341)'].map(dv_deter_map)
ee_udes['Domestic violence victim/survivor'] = ee_udes['Domestic violence victim/survivor'].replace('Unk', np.NaN)
ee_udes.groupby('Domestic violence victim/survivor').size()

Domestic violence victim/survivor
No     3854
Yes     413
dtype: int64

In [42]:
# select columns of interest
ee_udes= ee_udes[['EE UID', 'Prior Living', 'Domestic violence victim/survivor']]
ee_udes.head()

Unnamed: 0,EE UID,Prior Living,Domestic violence victim/survivor
0,687901,FRIEND or FAMILY,No
1,687902,FRIEND or FAMILY,Yes
2,687903,FRIEND or FAMILY,No
3,687904,FRIEND or FAMILY,No
4,687905,FRIEND or FAMILY,No


### HEALTH_INS_ENTRY_191102.tsv

In [43]:
health_ins_entry = pd.read_csv("../data/health_ins_entry_191102.tsv", delimiter='\t', encoding='utf-8')
health_ins_entry.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Covered (Entry),Health Insurance Type (Entry),Health Insurance Start Date (Entry),Health Insurance End Date (Entry),Provider (4307-provider),Recordset ID (4307-recordset_id),Date Added (4307-date_added)
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Employer - Provided Health Insurance,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261535,7/16/2015
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Health Insurance obtained through COBRA,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261536,7/16/2015
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Indian Health Services Program,6/9/2015,,Urban Ministries of Durham - Durham County - S...,4677504,12/22/2016
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICAID,4/20/2014,,Urban Ministries of Durham - Durham County(1562),1959563,4/21/2015
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICARE,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261533,7/16/2015


In [44]:
health_ins_entry = health_ins_entry[health_ins_entry["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
health_ins_entry.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    40583
dtype: int64

In [45]:
health_ins_entry.groupby("Health Insurance Type (Entry)").size()

Health Insurance Type (Entry)
Employer - Provided Health Insurance              4451
Health Insurance obtained through COBRA           4452
Indian Health Services Program                    2413
MEDICAID                                          4551
MEDICARE                                          4465
Other                                             2410
Private Pay Health Insurance                      4455
State Children's Health Insurance Program         4453
State Health Insurance for Adults                 4434
Veteran's Administration (VA) Medical Services    4483
dtype: int64

In [46]:
health_ins_entry.groupby("Covered (Entry)").size()

Covered (Entry)
Data Not Collected       10
No                    38402
Yes                    2166
dtype: int64

In [47]:
# change data not collected to NaN
health_ins_entry['Covered'] = health_ins_entry["Covered (Entry)"].replace('Data Not Collected', np.NaN)
health_ins_entry.groupby("Covered").size()

Covered
No     38402
Yes     2166
dtype: int64

In [48]:
# sorting 
health_ins_entry.sort_values(by=['EE UID', 'Client ID', 'Health Insurance Type (Entry)', 'Date Added (4307-date_added)'], inplace=True)
health_ins_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Covered (Entry),Health Insurance Type (Entry),Health Insurance Start Date (Entry),Health Insurance End Date (Entry),Provider (4307-provider),Recordset ID (4307-recordset_id),Date Added (4307-date_added),Covered
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Employer - Provided Health Insurance,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261535,7/16/2015,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Health Insurance obtained through COBRA,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261536,7/16/2015,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Indian Health Services Program,6/9/2015,,Urban Ministries of Durham - Durham County - S...,4677504,12/22/2016,No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICAID,4/20/2014,,Urban Ministries of Durham - Durham County(1562),1959563,4/21/2015,No
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICARE,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261533,7/16/2015,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other,6/9/2015,,Urban Ministries of Durham - Durham County - S...,4677505,12/22/2016,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Private Pay Health Insurance,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261538,7/16/2015,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,State Children's Health Insurance Program,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261532,7/16/2015,No
8,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,State Health Insurance for Adults,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261537,7/16/2015,No
9,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Veteran's Administration (VA) Medical Services,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261534,7/16/2015,No


In [49]:
# dropping duplicate values - we will only keep the last dated record because this looks to me like it was an "update"
health_ins_entry.drop_duplicates(subset=['EE UID', 'Client ID', 'Health Insurance Type (Entry)'], keep='first',inplace=True)
health_ins_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Covered (Entry),Health Insurance Type (Entry),Health Insurance Start Date (Entry),Health Insurance End Date (Entry),Provider (4307-provider),Recordset ID (4307-recordset_id),Date Added (4307-date_added),Covered
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Employer - Provided Health Insurance,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261535,7/16/2015,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Health Insurance obtained through COBRA,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261536,7/16/2015,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Indian Health Services Program,6/9/2015,,Urban Ministries of Durham - Durham County - S...,4677504,12/22/2016,No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICAID,4/20/2014,,Urban Ministries of Durham - Durham County(1562),1959563,4/21/2015,No
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,MEDICARE,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261533,7/16/2015,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other,6/9/2015,,Urban Ministries of Durham - Durham County - S...,4677505,12/22/2016,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Private Pay Health Insurance,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261538,7/16/2015,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,State Children's Health Insurance Program,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261532,7/16/2015,No
8,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,State Health Insurance for Adults,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261537,7/16/2015,No
9,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Veteran's Administration (VA) Medical Services,4/20/2015,,Urban Ministries of Durham - Durham County(1562),2261534,7/16/2015,No


In [50]:
# keep variables of interest
health_ins_entry=health_ins_entry[['EE UID', 'Covered', 'Health Insurance Type (Entry)']]
health_ins_entry.head()

Unnamed: 0,EE UID,Covered,Health Insurance Type (Entry)
0,687901,No,Employer - Provided Health Insurance
1,687901,No,Health Insurance obtained through COBRA
2,687901,No,Indian Health Services Program
3,687901,No,MEDICAID
4,687901,No,MEDICARE


In [51]:
# delete entries where health insurance type is NAN - all of these have covered values = nan too
health_ins_entry=health_ins_entry.dropna(subset=['Health Insurance Type (Entry)'])

In [52]:
#Transform data so 1 column for each insurance type and covered entry as the values.
health_ins_entry_t = health_ins_entry.pivot(index='EE UID', columns='Health Insurance Type (Entry)', values='Covered')
health_ins_entry_t.head()

Health Insurance Type (Entry),Employer - Provided Health Insurance,Health Insurance obtained through COBRA,Indian Health Services Program,MEDICAID,MEDICARE,Other,Private Pay Health Insurance,State Children's Health Insurance Program,State Health Insurance for Adults,Veteran's Administration (VA) Medical Services
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
687901,No,No,No,No,No,No,No,No,No,No
687902,No,No,,Yes,No,,No,No,No,No
687903,No,No,,No,No,,No,No,No,No
687904,No,No,,No,No,,No,No,No,No
687905,No,No,,No,No,,No,No,No,No


In [53]:
health_ins_entry_t['Any Health Insurance']="No"
for index in health_ins_entry_t.index:
    any_ins="No"
    for col in health_ins_entry_t.columns:
        if health_ins_entry_t[col][index] == "Yes":
            any_ins="Yes"
    health_ins_entry_t['Any Health Insurance'][index]=any_ins
health_ins_entry_t.head()

Health Insurance Type (Entry),Employer - Provided Health Insurance,Health Insurance obtained through COBRA,Indian Health Services Program,MEDICAID,MEDICARE,Other,Private Pay Health Insurance,State Children's Health Insurance Program,State Health Insurance for Adults,Veteran's Administration (VA) Medical Services,Any Health Insurance
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
687901,No,No,No,No,No,No,No,No,No,No,No
687902,No,No,,Yes,No,,No,No,No,No,Yes
687903,No,No,,No,No,,No,No,No,No,No
687904,No,No,,No,No,,No,No,No,No,No
687905,No,No,,No,No,,No,No,No,No,No


### INCOME_ENTRY_191102.tsv

In [54]:
income_entry = pd.read_csv("../data/income_entry_191102.tsv", delimiter='\t', encoding='utf-8')
income_entry.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Income (Entry),Income Source (Entry),Monthly Amount (Entry),Income Start Date (Entry),Income End Date (Entry),Recordset ID (140-recordset_id),Provider (140-provider),Date Added (140-date_added)
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Alimony or Other Spousal Support (HUD),,4/20/2015,,3263585,Urban Ministries of Durham - Durham County - S...,12/31/2015
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Child Support (HUD),,4/20/2015,,3263586,Urban Ministries of Durham - Durham County - S...,12/31/2015
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Earned Income (HUD),,4/20/2015,,3263590,Urban Ministries of Durham - Durham County - S...,12/31/2015
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,General Assistance (HUD),,4/20/2015,,3263587,Urban Ministries of Durham - Durham County - S...,12/31/2015
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other (HUD),,4/20/2015,,3263599,Urban Ministries of Durham - Durham County - S...,12/31/2015


In [55]:
income_entry = income_entry[income_entry["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
income_entry.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    68153
dtype: int64

In [56]:
income_entry.groupby("Income Source (Entry)").size()

Income Source (Entry)
Alimony or Other Spousal Support (HUD)                 4530
Child Support (HUD)                                    4530
Earned Income (HUD)                                    4613
General Assistance (HUD)                               4529
No Financial Resources                                   25
Other (HUD)                                            4515
Pension or retirement income from another job (HUD)    4547
Private Disability Insurance (HUD)                     4531
Retirement Income From Social Security (HUD)           4531
SSDI (HUD)                                             4573
SSI (HUD)                                              4559
TANF (HUD)                                             4527
Unemployment Insurance (HUD)                           4533
VA Non-Service Connected Disability Pension (HUD)      4537
VA Service Connected Disability Compensation (HUD)     4530
Worker's Compensation (HUD)                            4540
dtype: int64

In [57]:
# Remove the "(HUD)" from this response
income_entry['Income Source']=income_entry['Income Source (Entry)'].str.rstrip(" (HUD)")
income_entry.groupby('Income Source').size()

Income Source
Alimony or Other Spousal Support                 4530
Child Support                                    4530
Earned Income                                    4613
General Assistance                               4529
No Financial Resources                             25
Other                                            4515
Pension or retirement income from another job    4547
Private Disability Insurance                     4531
Retirement Income From Social Security           4531
SSDI                                             4573
SSI                                              4559
TANF                                             4527
Unemployment Insurance                           4533
VA Non-Service Connected Disability Pension      4537
VA Service Connected Disability Compensation     4530
Worker's Compensation                            4540
dtype: int64

In [58]:
income_entry.groupby('Receiving Income (Entry)').size()

Receiving Income (Entry)
Data Not Collected       33
No                    65942
Yes                    2164
dtype: int64

In [59]:
# change data not collected to NaN
income_entry['Receiving Income'] = income_entry["Receiving Income (Entry)"].replace('Data Not Collected', np.NaN)
income_entry.groupby("Receiving Income").size()

Receiving Income
No     65942
Yes     2164
dtype: int64

In [60]:
# sorting 
income_entry.sort_values(by=['EE UID', 'Client ID', 'Income Source', 'Date Added (140-date_added)'], inplace=True)
income_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Income (Entry),Income Source (Entry),Monthly Amount (Entry),Income Start Date (Entry),Income End Date (Entry),Recordset ID (140-recordset_id),Provider (140-provider),Date Added (140-date_added),Income Source,Receiving Income
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Alimony or Other Spousal Support (HUD),,4/20/2015,,3263585,Urban Ministries of Durham - Durham County - S...,12/31/2015,Alimony or Other Spousal Support,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Child Support (HUD),,4/20/2015,,3263586,Urban Ministries of Durham - Durham County - S...,12/31/2015,Child Support,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Earned Income (HUD),,4/20/2015,,3263590,Urban Ministries of Durham - Durham County - S...,12/31/2015,Earned Income,No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,General Assistance (HUD),,4/20/2015,,3263587,Urban Ministries of Durham - Durham County - S...,12/31/2015,General Assistance,No
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other (HUD),,4/20/2015,,3263599,Urban Ministries of Durham - Durham County - S...,12/31/2015,Other,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Pension or retirement income from another job ...,,4/20/2015,,3263588,Urban Ministries of Durham - Durham County - S...,12/31/2015,Pension or retirement income from another job,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Private Disability Insurance (HUD),,4/20/2015,,3263589,Urban Ministries of Durham - Durham County - S...,12/31/2015,Private Disability Insurance,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Retirement Income From Social Security (HUD),,4/20/2015,,3263591,Urban Ministries of Durham - Durham County - S...,12/31/2015,Retirement Income From Social Security,No
8,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,SSDI (HUD),,4/20/2015,,3263592,Urban Ministries of Durham - Durham County - S...,12/31/2015,SSDI,No
9,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,SSI (HUD),,4/20/2015,,3263593,Urban Ministries of Durham - Durham County - S...,12/31/2015,SSI,No


In [61]:
# dropping duplicate values - we will only keep the last dated record because this looks to me like it was an "update"
income_entry.drop_duplicates(subset=['EE UID', 'Client ID', 'Income Source'], keep='first',inplace=True)
income_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Income (Entry),Income Source (Entry),Monthly Amount (Entry),Income Start Date (Entry),Income End Date (Entry),Recordset ID (140-recordset_id),Provider (140-provider),Date Added (140-date_added),Income Source,Receiving Income
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Alimony or Other Spousal Support (HUD),,4/20/2015,,3263585,Urban Ministries of Durham - Durham County - S...,12/31/2015,Alimony or Other Spousal Support,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Child Support (HUD),,4/20/2015,,3263586,Urban Ministries of Durham - Durham County - S...,12/31/2015,Child Support,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Earned Income (HUD),,4/20/2015,,3263590,Urban Ministries of Durham - Durham County - S...,12/31/2015,Earned Income,No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,General Assistance (HUD),,4/20/2015,,3263587,Urban Ministries of Durham - Durham County - S...,12/31/2015,General Assistance,No
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other (HUD),,4/20/2015,,3263599,Urban Ministries of Durham - Durham County - S...,12/31/2015,Other,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Pension or retirement income from another job ...,,4/20/2015,,3263588,Urban Ministries of Durham - Durham County - S...,12/31/2015,Pension or retirement income from another job,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Private Disability Insurance (HUD),,4/20/2015,,3263589,Urban Ministries of Durham - Durham County - S...,12/31/2015,Private Disability Insurance,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Retirement Income From Social Security (HUD),,4/20/2015,,3263591,Urban Ministries of Durham - Durham County - S...,12/31/2015,Retirement Income From Social Security,No
8,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,SSDI (HUD),,4/20/2015,,3263592,Urban Ministries of Durham - Durham County - S...,12/31/2015,SSDI,No
9,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,SSI (HUD),,4/20/2015,,3263593,Urban Ministries of Durham - Durham County - S...,12/31/2015,SSI,No


In [62]:
# keep variables of interest
income_entry=income_entry[['EE UID', 'Receiving Income', 'Income Source']]
income_entry.head()

Unnamed: 0,EE UID,Receiving Income,Income Source
0,687901,No,Alimony or Other Spousal Support
1,687901,No,Child Support
2,687901,No,Earned Income
3,687901,No,General Assistance
4,687901,No,Other


In [63]:
income_entry=income_entry.dropna(subset=['Income Source'])

In [64]:
#Transform data so 1 column for each insurance type and covered entry as the values.
income_entry_t = income_entry.pivot(index='EE UID', columns='Income Source', values='Receiving Income')
income_entry_t.head()

Income Source,Alimony or Other Spousal Support,Child Support,Earned Income,General Assistance,No Financial Resources,Other,Pension or retirement income from another job,Private Disability Insurance,Retirement Income From Social Security,SSDI,SSI,TANF,Unemployment Insurance,VA Non-Service Connected Disability Pension,VA Service Connected Disability Compensation,Worker's Compensation
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
687901,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No
687902,No,No,No,No,,No,No,No,No,No,Yes,No,No,No,No,No
687903,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No
687904,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No
687905,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No


In [65]:
income_entry_t['Any Income Source']="No"
for index in income_entry_t.index:
    any_income="No"
    for col in income_entry_t.columns:
        if income_entry_t[col][index] == "Yes":
            any_income="Yes"
    income_entry_t['Any Income Source'][index]=any_income
income_entry_t.head()

Income Source,Alimony or Other Spousal Support,Child Support,Earned Income,General Assistance,No Financial Resources,Other,Pension or retirement income from another job,Private Disability Insurance,Retirement Income From Social Security,SSDI,SSI,TANF,Unemployment Insurance,VA Non-Service Connected Disability Pension,VA Service Connected Disability Compensation,Worker's Compensation,Any Income Source
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
687901,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No,No
687902,No,No,No,No,,No,No,No,No,No,Yes,No,No,No,No,No,Yes
687903,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No,No
687904,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No,No
687905,No,No,No,No,,No,No,No,No,No,No,No,No,No,No,No,No


### NONCASH_ENTRY_191102.tsv

In [66]:
noncash_entry = pd.read_csv("../data/noncash_entry_191102.tsv", delimiter='\t', encoding='utf-8')
noncash_entry.head()

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Benefit (Entry),Non-Cash Source (Entry),Non-Cash Start Date (Entry),Non-Cash End Date (Entry),Recordset ID (2704-recordset_id),Provider (2704-provider),Date Added (2704-date_added)
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other Source (HUD),4/20/2015,,2261552,Urban Ministries of Durham - Durham County(1562),7/16/2015
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other TANF-Funded Services (HUD),4/20/2015,,2261546,Urban Ministries of Durham - Durham County(1562),7/16/2015
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,"Section 8, Public Housing, or other ongoing re...",4/20/2015,,2261550,Urban Ministries of Durham - Durham County(1562),7/16/2015
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Special Supplemental Nutrition Program for WIC...,4/20/2015,,2261549,Urban Ministries of Durham - Durham County(1562),7/16/2015
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,TANF Child Care Services (HUD),4/20/2015,,2261547,Urban Ministries of Durham - Durham County(1562),7/16/2015


In [67]:
noncash_entry = noncash_entry[noncash_entry["EE Provider ID"]=='Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)']
noncash_entry.groupby("EE Provider ID").size()

EE Provider ID
Urban Ministries of Durham - Durham County - Singles Emergency Shelter - Private(5838)    33412
dtype: int64

In [68]:
noncash_entry.groupby("Non-Cash Source (Entry)").size()

Non-Cash Source (Entry)
Other Source (HUD)                                                     4407
Other TANF-Funded Services (HUD)                                       4408
Section 8, Public Housing, or other ongoing rental assistance (HUD)    3391
Special Supplemental Nutrition Program for WIC (HUD)                   4405
Supplemental Nutrition Assistance Program (Food Stamps) (HUD)          4582
TANF Child Care Services (HUD)                                         4409
TANF Transportation Services (HUD)                                     4408
Temporary rental assistance (HUD)                                      3391
dtype: int64

In [69]:
# Remove the "(HUD)" from this response
noncash_entry['Noncash Source']=noncash_entry['Non-Cash Source (Entry)'].str.rstrip(" (HUD)")
noncash_entry.groupby('Noncash Source').size()

Noncash Source
Other Source                                                     4407
Other TANF-Funded Services                                       4408
Section 8, Public Housing, or other ongoing rental assistance    3391
Special Supplemental Nutrition Program for WIC                   4405
Supplemental Nutrition Assistance Program (Food Stamps           4582
TANF Child Care Services                                         4409
TANF Transportation Services                                     4408
Temporary rental assistance                                      3391
dtype: int64

In [70]:
noncash_entry.groupby("Receiving Benefit (Entry)").size()

Receiving Benefit (Entry)
Data Not Collected       24
No                    31802
Yes                    1510
dtype: int64

In [71]:
# change data not collected to NaN
noncash_entry['Receiving Benefit'] = noncash_entry["Receiving Benefit (Entry)"].replace('Data Not Collected', np.NaN)
noncash_entry.groupby("Receiving Benefit").size()

Receiving Benefit
No     31802
Yes     1510
dtype: int64

In [72]:
# sorting 
noncash_entry.sort_values(by=['EE UID', 'Client ID', 'Noncash Source', 'Date Added (2704-date_added)'], inplace=True)
noncash_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Benefit (Entry),Non-Cash Source (Entry),Non-Cash Start Date (Entry),Non-Cash End Date (Entry),Recordset ID (2704-recordset_id),Provider (2704-provider),Date Added (2704-date_added),Noncash Source,Receiving Benefit
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other Source (HUD),4/20/2015,,2261552,Urban Ministries of Durham - Durham County(1562),7/16/2015,Other Source,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other TANF-Funded Services (HUD),4/20/2015,,2261546,Urban Ministries of Durham - Durham County(1562),7/16/2015,Other TANF-Funded Services,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,"Section 8, Public Housing, or other ongoing re...",4/20/2015,,2261550,Urban Ministries of Durham - Durham County(1562),7/16/2015,"Section 8, Public Housing, or other ongoing re...",No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Special Supplemental Nutrition Program for WIC...,4/20/2015,,2261549,Urban Ministries of Durham - Durham County(1562),7/16/2015,Special Supplemental Nutrition Program for WIC,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,Yes,Supplemental Nutrition Assistance Program (Foo...,4/20/2014,,1959562,Urban Ministries of Durham - Durham County(1562),4/21/2015,Supplemental Nutrition Assistance Program (Foo...,Yes
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,TANF Child Care Services (HUD),4/20/2015,,2261547,Urban Ministries of Durham - Durham County(1562),7/16/2015,TANF Child Care Services,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,TANF Transportation Services (HUD),4/20/2015,,2261548,Urban Ministries of Durham - Durham County(1562),7/16/2015,TANF Transportation Services,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Temporary rental assistance (HUD),4/20/2015,,2261551,Urban Ministries of Durham - Durham County(1562),7/16/2015,Temporary rental assistance,No
8,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,No,Other Source (HUD),6/13/2015,,3235617,Urban Ministries of Durham - Durham County - S...,12/22/2015,Other Source,No
9,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,No,Other Source (HUD),7/10/2015,,2521599,Urban Ministries of Durham - Durham County(1562),9/3/2015,Other Source,No


In [73]:
# dropping duplicate values - we will only keep the last dated record because this looks to me like it was an "update"
noncash_entry.drop_duplicates(subset=['EE UID', 'Client ID', 'Noncash Source'], keep='first',inplace=True)
noncash_entry

Unnamed: 0,EE Provider ID,EE UID,Client Unique ID,Client ID,Receiving Benefit (Entry),Non-Cash Source (Entry),Non-Cash Start Date (Entry),Non-Cash End Date (Entry),Recordset ID (2704-recordset_id),Provider (2704-provider),Date Added (2704-date_added),Noncash Source,Receiving Benefit
0,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other Source (HUD),4/20/2015,,2261552,Urban Ministries of Durham - Durham County(1562),7/16/2015,Other Source,No
1,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Other TANF-Funded Services (HUD),4/20/2015,,2261546,Urban Ministries of Durham - Durham County(1562),7/16/2015,Other TANF-Funded Services,No
2,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,"Section 8, Public Housing, or other ongoing re...",4/20/2015,,2261550,Urban Ministries of Durham - Durham County(1562),7/16/2015,"Section 8, Public Housing, or other ongoing re...",No
3,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Special Supplemental Nutrition Program for WIC...,4/20/2015,,2261549,Urban Ministries of Durham - Durham County(1562),7/16/2015,Special Supplemental Nutrition Program for WIC,No
7,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,Yes,Supplemental Nutrition Assistance Program (Foo...,4/20/2014,,1959562,Urban Ministries of Durham - Durham County(1562),4/21/2015,Supplemental Nutrition Assistance Program (Foo...,Yes
4,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,TANF Child Care Services (HUD),4/20/2015,,2261547,Urban Ministries of Durham - Durham County(1562),7/16/2015,TANF Child Care Services,No
5,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,TANF Transportation Services (HUD),4/20/2015,,2261548,Urban Ministries of Durham - Durham County(1562),7/16/2015,TANF Transportation Services,No
6,Urban Ministries of Durham - Durham County - S...,687901,pbkf09291954p610b236,397941,No,Temporary rental assistance (HUD),4/20/2015,,2261551,Urban Ministries of Durham - Durham County(1562),7/16/2015,Temporary rental assistance,No
8,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,No,Other Source (HUD),6/13/2015,,3235617,Urban Ministries of Durham - Durham County - S...,12/22/2015,Other Source,No
10,Urban Ministries of Durham - Durham County - S...,687902,kdaf01071967k400d635,130335,No,Other TANF-Funded Services (HUD),6/13/2015,,3235615,Urban Ministries of Durham - Durham County - S...,12/22/2015,Other TANF-Funded Services,No


In [74]:
# keep variables of interest
noncash_entry=noncash_entry[['EE UID', 'Receiving Benefit', 'Noncash Source']]
noncash_entry.head()

Unnamed: 0,EE UID,Receiving Benefit,Noncash Source
0,687901,No,Other Source
1,687901,No,Other TANF-Funded Services
2,687901,No,"Section 8, Public Housing, or other ongoing re..."
3,687901,No,Special Supplemental Nutrition Program for WIC
7,687901,Yes,Supplemental Nutrition Assistance Program (Foo...


In [75]:
noncash_entry=noncash_entry.dropna(subset=['Noncash Source'])

In [76]:
#Transform data so 1 column for each insurance type and covered entry as the values.
noncash_entry_t = noncash_entry.pivot(index='EE UID', columns='Noncash Source', values='Receiving Benefit')
noncash_entry_t.head()

Noncash Source,Other Source,Other TANF-Funded Services,"Section 8, Public Housing, or other ongoing rental assistance",Special Supplemental Nutrition Program for WIC,Supplemental Nutrition Assistance Program (Food Stamps,TANF Child Care Services,TANF Transportation Services,Temporary rental assistance
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
687901,No,No,No,No,Yes,No,No,No
687902,No,No,No,No,No,No,No,No
687903,No,No,No,No,,No,No,No
687904,No,No,No,No,No,No,No,No
687905,No,No,No,No,Yes,No,No,No


In [77]:
noncash_entry_t['Any Noncash Source']="No"
for index in noncash_entry_t.index:
    any_noncash="No"
    for col in noncash_entry_t.columns:
        if noncash_entry_t[col][index] == "Yes":
            any_noncash="Yes"
    noncash_entry_t['Any Noncash Source'][index]=any_noncash
noncash_entry_t.head()

Noncash Source,Other Source,Other TANF-Funded Services,"Section 8, Public Housing, or other ongoing rental assistance",Special Supplemental Nutrition Program for WIC,Supplemental Nutrition Assistance Program (Food Stamps,TANF Child Care Services,TANF Transportation Services,Temporary rental assistance,Any Noncash Source
EE UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
687901,No,No,No,No,Yes,No,No,No,Yes
687902,No,No,No,No,No,No,No,No,No
687903,No,No,No,No,,No,No,No,No
687904,No,No,No,No,No,No,No,No,No
687905,No,No,No,No,Yes,No,No,No,Yes


## Merge to create analytic dataset

In [78]:
from functools import reduce

In [79]:
data_frames = [client, entry_exit, ee_udes, disab_entry_t, health_ins_entry_t, income_entry_t, noncash_entry_t]
anl = reduce(lambda  left,right: pd.merge(left,right,on=['EE UID'], how='left'), data_frames)

In [80]:
anl.head()

Unnamed: 0,EE UID,Client ID,Client Age at Entry,Client Age at Exit,Client Gender,Client Primary Race,Client Ethnicity,Client Veteran Status,Entry Date,Exit Date,...,Any Income Source,Other Source,Other TANF-Funded Services,"Section 8, Public Housing, or other ongoing rental assistance",Special Supplemental Nutrition Program for WIC,Supplemental Nutrition Assistance Program (Food Stamps,TANF Child Care Services,TANF Transportation Services,Temporary rental assistance,Any Noncash Source
0,687901,397941,60.0,61.0,Female,White,Non-Hispanic/Non-Latino,No,2015-08-15,2016-07-11,...,No,No,No,No,No,Yes,No,No,No,Yes
1,687902,130335,48.0,48.0,Female,Black or African American,Non-Hispanic/Non-Latino,No,2015-08-15,2015-08-31,...,Yes,No,No,No,No,No,No,No,No,No
2,687903,188933,42.0,42.0,Female,Black or African American,Non-Hispanic/Non-Latino,No,2015-08-15,2015-09-19,...,No,No,No,No,No,,No,No,No,No
3,687904,168290,57.0,57.0,Male,White,Hispanic/Latino,No,2015-08-15,2016-03-07,...,No,No,No,No,No,No,No,No,No,No
4,687905,123122,51.0,51.0,Male,White,Non-Hispanic/Non-Latino,No,2015-08-15,2015-08-24,...,No,No,No,No,No,Yes,No,No,No,Yes


In [81]:
for col in anl.columns: 
    print(col)

EE UID
Client ID
Client Age at Entry
Client Age at Exit
Client Gender
Client Primary Race
Client Ethnicity
Client Veteran Status
Entry Date
Exit Date
Destination
LOS
Prior Living
Domestic violence victim/survivor
Alcohol Abuse
Both Alcohol and Drug Abuse
Chronic Health Condition
Developmental
Drug Abuse
Dual Diagnosis
HIV/AIDS
Hearing Impaired
Mental Health Problem
Other_x
Other: Learning
Other: Speech
Physical
Physical/Medical
Vision Impaired
Any Disability
Employer - Provided Health Insurance
Health Insurance obtained through COBRA
Indian Health Services Program
MEDICAID
MEDICARE
Other_y
Private Pay Health Insurance
State Children's Health Insurance Program
State Health Insurance for Adults
Veteran's Administration (VA) Medical Services
Any Health Insurance
Alimony or Other Spousal Support
Child Support
Earned Income
General Assistance
No Financial Resources
Other
Pension or retirement income from another job
Private Disability Insurance
Retirement Income From Social Security
SSDI
SS

In [82]:
anl.sort_values(by=['Client ID', 'Entry Date'], inplace=True)

In [83]:
# output first record only
anl_first = anl.drop_duplicates(subset='Client ID', keep='first')
anl_first.head()

Unnamed: 0,EE UID,Client ID,Client Age at Entry,Client Age at Exit,Client Gender,Client Primary Race,Client Ethnicity,Client Veteran Status,Entry Date,Exit Date,...,Any Income Source,Other Source,Other TANF-Funded Services,"Section 8, Public Housing, or other ongoing rental assistance",Special Supplemental Nutrition Program for WIC,Supplemental Nutrition Assistance Program (Food Stamps,TANF Child Care Services,TANF Transportation Services,Temporary rental assistance,Any Noncash Source
1982,822088,1096,61.0,61.0,Male,Black or African American,Non-Hispanic/Non-Latino,Yes,2016-10-31,2016-11-09,...,No,No,No,No,No,No,No,No,No,No
3088,943804,1097,58.0,58.0,Male,Black or African American,Non-Hispanic/Non-Latino,No,2018-01-12,2018-02-07,...,Yes,No,No,,No,Yes,No,No,,Yes
1948,818207,1555,31.0,31.0,Male,Black or African American,Non-Hispanic/Non-Latino,No,2016-10-17,2017-02-27,...,No,No,No,No,No,Yes,No,No,No,Yes
507,715037,1616,43.0,43.0,Male,Black or African American,Non-Hispanic/Non-Latino,No,2015-11-25,2015-12-24,...,No,No,No,No,No,No,No,No,No,No
291,700819,2024,46.0,46.0,Male,Black or African American,Non-Hispanic/Non-Latino,No,2015-10-05,2015-10-06,...,No,No,No,No,No,No,No,No,No,No


In [84]:
anl.to_csv("../data/analytic.tsv", sep='\t')
anl_first.to_csv("../data/analytic_first.tsv", sep='\t')