
# Mapping function for alphanumeric code 

### Table of Contents
[1.  Introduction](#1_Introduction) <br>
[2. Imports](#2_Imports) <br>
[3. Loading Data from census.gov](#3_Load_census)<br>
[4. Mapping function to get the description of alphanumeric code](#4_map_function)<br>
[5. Loading census 2010_GROUP_QUARTERS_POPULATION file](#5_group_quarters_pop)<br>
[6. Get the description of Alphanumeric code](#6_call_function)<br>
&emsp;[6.1 Replace the column header with desciption](#6.1_to_df)<br>
&emsp;[6.2 Saving the result to text file](#6.2_to_save)<br>
[7. Loading census 2010 Total population file](#7_load_total)<br>
[8. Get the description for Alphanumeric code in total population file](#8_load_total)<br>
[9. Save the result to total population file](#9_df_saveto)







### 1. Introduction<a id='1_Introduction'></a>

We have queried census data into multiple files.  This data has a headers that consists of alphanumeric codes. The full description of the alphanumeric code could be found in the table which is located here in this html page: https://api.census.gov/data/2010/dec/sf1/variables.html. This project is to create a mapping function that gives the descirption of the alphanumeric code in the Cencus data files.

### 2. Imports <a id='2_Imports'></a>

In [2]:
import pandas as pd


### 3. Loading Data from cencus.gov <a id='3_Load_census'></a>

In [3]:
# Census data URL
url = 'https://api.census.gov/data/2010/dec/sf1/variables.html'

# URL to DataFrame
data_frame = pd.read_html(url)

In [4]:
census_df = data_frame[0].copy()

In [5]:
#Final census data dataframe
census_df.head(5)

Unnamed: 0,Name,Label,Concept,Required,Attributes,Limit,Predicate Type,Group,Unnamed: 8
0,AIANHH,American Indian Area/Alaska Native Area/Hawaii...,,not required,,0,(not a predicate),,
1,AIHHTLI,American Indian Area (Off-Reservation Trust La...,,not required,,0,(not a predicate),,
2,AITSCE,American Indian Tribal Subdivision (Census),,not required,,0,(not a predicate),,
3,ANRC,Alaska Native Regional Corporation,,not required,,0,(not a predicate),,
4,BLKGRP,Census Block Group,,not required,,0,(not a predicate),,


In [6]:
census_df.shape

(9001, 9)

In [7]:
census_df.drop(columns='Unnamed: 8',inplace=True)

In [8]:
census_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9001 entries, 0 to 9000
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Name            9001 non-null   object
 1   Label           9001 non-null   object
 2   Concept         8964 non-null   object
 3   Required        9001 non-null   object
 4   Attributes      68 non-null     object
 5   Limit           9001 non-null   object
 6   Predicate Type  9001 non-null   object
 7   Group           8961 non-null   object
dtypes: object(8)
memory usage: 562.7+ KB


### 4. Mapping function to get the description of alpha numeric code <a id='4_map_function'></a>

In [9]:
#Mapping function to return the column label
def column_label(col):
    cond = census_df['Name'] == col
    result = census_df[cond]['Label']
    return result

In [10]:
#Just testing the function
column_label('AIANHH')


0    American Indian Area/Alaska Native Area/Hawaii...
Name: Label, dtype: object

### 5. Loading census 2010_GROUP_QUARTERS_POPULATION<a id='5_group_quarters_pop'></a>

In [105]:
from io import StringIO 

with open("C:/Users/Gayathri/Desktop/Nexstep_data/census2010_GROUP_QUARTERS_POPULATION.txt", 'r') as infile:
    data = infile.read()
    data = data.replace("[", "")
    data = data.replace("]","")
df = pd.read_csv(StringIO(data), error_bad_lines=False)



b'Skipping line 1182: expected 47 fields, saw 92\nSkipping line 1349: expected 47 fields, saw 92\nSkipping line 2875: expected 47 fields, saw 92\nSkipping line 3561: expected 47 fields, saw 92\nSkipping line 11618: expected 47 fields, saw 92\nSkipping line 12867: expected 47 fields, saw 92\nSkipping line 13700: expected 47 fields, saw 92\nSkipping line 13918: expected 47 fields, saw 92\nSkipping line 14097: expected 47 fields, saw 92\n'
b'Skipping line 18342: expected 47 fields, saw 92\nSkipping line 20311: expected 47 fields, saw 92\nSkipping line 20662: expected 47 fields, saw 92\nSkipping line 20960: expected 47 fields, saw 92\nSkipping line 24083: expected 47 fields, saw 92\nSkipping line 25594: expected 47 fields, saw 92\nSkipping line 26419: expected 47 fields, saw 92\nSkipping line 27189: expected 47 fields, saw 92\nSkipping line 28304: expected 47 fields, saw 92\nSkipping line 29452: expected 47 fields, saw 92\nSkipping line 29810: expected 47 fields, saw 92\nSkipping line 3121

In [104]:
df.head(4)

Unnamed: 0,NAME,GEO_ID,GEO_ID.1,PCO010001,PCO010002,PCO010003,PCO010004,PCO010005,PCO010006,PCO010007,...,PCO010035,PCO010036,PCO010037,PCO010038,PCO010039,NAME.1,state,county,tract,Unnamed: 46
0,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,,,,,,,,...,,,,,,"Census Tract 201, Autauga County, Alabama",1,1,20100,
1,"Census Tract 205, Autauga County, Alabama",1400000US01001020500,1400000US01001020500,,,,,,,,...,,,,,,"Census Tract 205, Autauga County, Alabama",1,1,20500,
2,"Census Tract 203, Autauga County, Alabama",1400000US01001020300,1400000US01001020300,,,,,,,,...,,,,,,"Census Tract 203, Autauga County, Alabama",1,1,20300,
3,"Census Tract 204, Autauga County, Alabama",1400000US01001020400,1400000US01001020400,,,,,,,,...,,,,,,"Census Tract 204, Autauga County, Alabama",1,1,20400,


In [107]:
df.columns

Index(['NAME', 'GEO_ID', 'GEO_ID.1', 'PCO010001', 'PCO010002', 'PCO010003',
       'PCO010004', 'PCO010005', 'PCO010006', 'PCO010007', 'PCO010008',
       'PCO010009', 'PCO010010', 'PCO010011', 'PCO010012', 'PCO010013',
       'PCO010014', 'PCO010015', 'PCO010016', 'PCO010017', 'PCO010018',
       'PCO010019', 'PCO010020', 'PCO010021', 'PCO010022', 'PCO010023',
       'PCO010024', 'PCO010025', 'PCO010026', 'PCO010027', 'PCO010028',
       'PCO010029', 'PCO010030', 'PCO010031', 'PCO010032', 'PCO010033',
       'PCO010034', 'PCO010035', 'PCO010036', 'PCO010037', 'PCO010038',
       'PCO010039', 'NAME.1', 'state', 'county', 'tract', 'Unnamed: 46'],
      dtype='object')

In [110]:
lis = ['PCO010001', 'PCO010002', 'PCO010003',
       'PCO010004', 'PCO010005', 'PCO010006', 'PCO010007', 'PCO010008',
       'PCO010009', 'PCO010010', 'PCO010011', 'PCO010012', 'PCO010013',
       'PCO010014', 'PCO010015', 'PCO010016', 'PCO010017', 'PCO010018',
       'PCO010019', 'PCO010020', 'PCO010021', 'PCO010022', 'PCO010023',
       'PCO010024', 'PCO010025', 'PCO010026', 'PCO010027', 'PCO010028',
       'PCO010029', 'PCO010030', 'PCO010031', 'PCO010032', 'PCO010033',
       'PCO010034', 'PCO010035', 'PCO010036', 'PCO010037', 'PCO010038',
       'PCO010039']

### 6. Get the description of Alphanumeric code <a id='6_call_function'></a>

In [113]:
# For each alphanumeric column calling column_lable function
# Storing the result in Dictionary
dic = {}
dic = {}
for col in lis:
    descript = column_label(col)
    dic[col] = str(descript.values)




### 6.1 Replace the column header with desciption <a id='6.1_to_df'></a>

In [116]:
df.rename(columns=dic,inplace=True)

In [118]:
df.head(4)

Unnamed: 0,NAME,GEO_ID,GEO_ID.1,"['Total (701-702, 704, 706, 801-802, 900-901, 903-904)']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male!!Under 5 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male!!5 to 9 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male!!10 to 14 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male!!15 to 19 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Male!!20 to 24 years']",...,"['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!65 to 69 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!70 to 74 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!75 to 79 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!80 to 84 years']","['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!85 years and over']",NAME.1,state,county,tract,Unnamed: 46
0,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,,,,,,,,...,,,,,,"Census Tract 201, Autauga County, Alabama",1,1,20100,
1,"Census Tract 205, Autauga County, Alabama",1400000US01001020500,1400000US01001020500,,,,,,,,...,,,,,,"Census Tract 205, Autauga County, Alabama",1,1,20500,
2,"Census Tract 203, Autauga County, Alabama",1400000US01001020300,1400000US01001020300,,,,,,,,...,,,,,,"Census Tract 203, Autauga County, Alabama",1,1,20300,
3,"Census Tract 204, Autauga County, Alabama",1400000US01001020400,1400000US01001020400,,,,,,,,...,,,,,,"Census Tract 204, Autauga County, Alabama",1,1,20400,


### 6.2 Saving the result to text file <a id='6.2_to_save'></a>

In [121]:
df.to_csv(r'C:/Users/Gayathri/Desktop/census2010_GROUP_QUARTERS_POPULATION_mapped.txt', index=False,na_rep='NULL')



### 7. Loading cencus 2010 total population file<a id='7_load_total'></a>

In [122]:
# Reading the text file
file = "C:/Users/Gayathri/Desktop/Nexstep_data/census2010_TOTAL POPULATION.txt"

with open(file, 'r') as infile:
    data = infile.read()
    data = data.replace("[", "")
    data = data.replace("]","")
df_1 = pd.read_csv(StringIO(data), error_bad_lines=False)


b'Skipping line 14751: expected 12 fields, saw 22\nSkipping line 19146: expected 12 fields, saw 22\nSkipping line 51580: expected 12 fields, saw 22\nSkipping line 61162: expected 12 fields, saw 22\n'
b'Skipping line 284353: expected 12 fields, saw 22\nSkipping line 307297: expected 12 fields, saw 22\nSkipping line 323000: expected 12 fields, saw 22\nSkipping line 326861: expected 12 fields, saw 22\n'
b'Skipping line 329655: expected 12 fields, saw 22\n'
b'Skipping line 412041: expected 12 fields, saw 22\nSkipping line 446983: expected 12 fields, saw 22\n'
b'Skipping line 459625: expected 12 fields, saw 22\nSkipping line 464312: expected 12 fields, saw 22\nSkipping line 515403: expected 12 fields, saw 22\n'
b'Skipping line 534877: expected 12 fields, saw 22\nSkipping line 544103: expected 12 fields, saw 22\nSkipping line 556148: expected 12 fields, saw 22\nSkipping line 568086: expected 12 fields, saw 22\nSkipping line 583162: expected 12 fields, saw 22\nSkipping line 585794: expected 1

In [123]:
df_1.head(4)

Unnamed: 0,NAME,GEO_ID,GEO_ID.1,NAME.1,PCT001001,PCT001001ERR,POPGROUP,POPGROUP_TTL,state,county,tract,Unnamed: 11
0,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1912,,1,Total population,1,1,20100,
1,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1622,,2,White alone,1,1,20100,
2,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1645,,3,White alone or in combination with one or more...,1,1,20100,
3,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",229,,5,Black or African American alone or in combinat...,1,1,20100,


### 8. Get the description for Alphanumeric code in total population file <a id='8_load_total'></a>


In [124]:
dict1 = {}
# Calling the function column_lable and storing the result in dict
descript1 = column_label('PCT001001')
dict1['PCT001001'] = str(descript.values)

In [126]:
df_1.rename(columns=dict1,inplace=True)

In [127]:
df_1.head(4)

Unnamed: 0,NAME,GEO_ID,GEO_ID.1,NAME.1,"['Total (701-702, 704, 706, 801-802, 900-901, 903-904)!!Female!!85 years and over']",PCT001001ERR,POPGROUP,POPGROUP_TTL,state,county,tract,Unnamed: 11
0,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1912,,1,Total population,1,1,20100,
1,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1622,,2,White alone,1,1,20100,
2,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",1645,,3,White alone or in combination with one or more...,1,1,20100,
3,"Census Tract 201, Autauga County, Alabama",1400000US01001020100,1400000US01001020100,"Census Tract 201, Autauga County, Alabama",229,,5,Black or African American alone or in combinat...,1,1,20100,


### 9. Save the result to total population file <a id='9_df_saveto'></a>


In [129]:
df_1.to_csv(r'C:/Users/Gayathri/Desktop/census2010_TOTAL POPULATION_mapped.txt', index=False,na_rep='NULL')