# Leveraging U.S. Census Data

## I will be doing the following:
1. Decide upon which state to invest in.
2. Decide upon which city after picking state.
3. After choosing city, obtain custom dataset from U.S. Census Website with 2018 Data for population per zip code
4. Sort through the U.S. Census Data to find the top zip codes to analyze
    
    
## Qualitative Decisions and Assumptions:
 - Focus on where people are moving to. Which state are people leaving and to which state are the most people going to?
 - The investment firm is a smaller firm looking to expand into a new area.
 - The firm will want to have clusters of zip codes nearby for ease of management.
 - Firm will not be outsourcing work to other property managers. Work will be done in-house.
 - We will not be buying apartment buildings but are open to do so in the future.
 - We will look for areas where laws are favorable to landlords as a bonus. 
 - Since dataset given has data until April 2018, we will use data on or before that date to simulate real time

## 1. Deciding on which state to invest in:


### We first look at U.S. Census Data insights from 2017. This article was published on December 20, 2017:

https://www.census.gov/newsroom/press-releases/2017/estimates-idaho.html#:~:text=DEC.,state%20population%20estimates%20released%20today

### Where are the people moving to?

#### Per the article, we have the following: 

<img src="Notebook Pics\US_Census_Data_Cleaning_and_Sorting\US_Census_Top10_1.png" width=600 height=600 /> 
<img src="Notebook Pics\US_Census_Data_Cleaning_and_Sorting\US_Census_Top10_2.png" width=600 height=600 />
<img src="Notebook Pics\US_Census_Data_Cleaning_and_Sorting\US_Census_Top10_3.png" width=600 height=600 />

### I took this data and color coded which ones have overlap on all 3 lists:
 - From this overview, I placed priority on "Most Populous" and "Numerical Growth"
 - We see that Texas was present in all 3 columns
 - There are close runner-ups for which, if I had more time, I can investigate more into.
    
<img src="Notebook Pics\US_Census_Data_Cleaning_and_Sorting\US_Census_excel_analysis.png" width=600 height=600 />


    

## 2. Deciding which city to invest in

### Going back to the Zillow dataset, I will filter out with pandas all the cities that exist in Texas and rank them based on the number of zip codes present in each city region.
 - Use value_counts() method to rank the top 5 cities
 - We will choose the city with the most zip codes to invest.

In [2]:
## Import the data

import pandas as pd
df_zillow = pd.read_csv('Data Files/zillow_data.csv')
df_zillow.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14723 entries, 0 to 14722
Columns: 272 entries, RegionID to 2018-04
dtypes: float64(219), int64(49), object(4)
memory usage: 30.6+ MB


In [4]:
#Narrow down to Texas only dataframe and do value_count() for cities

df_zillow_1_bool = df_zillow['State'].isin(['TX'])
df_zillow_1 = df_zillow[df_zillow_1_bool]
df_zillow_1

Unnamed: 0,RegionID,RegionName,City,State,Metro,CountyName,SizeRank,1996-04,1996-05,1996-06,...,2017-07,2017-08,2017-09,2017-10,2017-11,2017-12,2018-01,2018-02,2018-03,2018-04
1,90668,75070,McKinney,TX,Dallas-Fort Worth,Collin,2,235700.0,236900.0,236700.0,...,308000,310000,312500,314100,315000,316600,318100,319600,321100,321800
2,91982,77494,Katy,TX,Houston,Harris,3,210400.0,212200.0,212200.0,...,321000,320600,320200,320400,320800,321200,321200,323000,326900,329900
4,93144,79936,El Paso,TX,El Paso,El Paso,5,77300.0,77300.0,77300.0,...,119100,119400,120000,120300,120300,120300,120300,120500,121000,121500
5,91733,77084,Houston,TX,Houston,Harris,6,95000.0,95200.0,95400.0,...,157900,158700,160200,161900,162800,162800,162800,162900,163500,164300
8,91940,77449,Katy,TX,Houston,Harris,9,95400.0,95600.0,95800.0,...,166800,167400,168400,169600,170900,172300,173300,174200,175400,176200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14372,91640,76941,Mertzon,TX,San Angelo,Irion,14373,,,,...,122500,121800,121600,122200,123500,124700,124300,122600,121600,121600
14472,92897,79313,Anton,TX,Levelland,Hockley,14473,,,,...,55400,56300,55700,56300,58900,61500,63000,63600,63500,63300
14492,92921,79355,Plains,TX,,Yoakum,14493,,,,...,100500,100500,101000,100700,99700,97700,95800,94600,94000,93500
14599,92929,79366,Ransom Canyon,TX,Lubbock,Lubbock,14600,134500.0,134500.0,134400.0,...,252100,251600,251600,251500,251300,251500,251700,252500,255000,257500


### Now that we have decided that Texas is a good fit, we can grab the corresponding population data from the U.S. Census database for all the zip codes in Texas. The steps to do so are as follows:

1. I will first separate only the Texas zip codes from the 2018 dataset that I was given. 
2. Make a list of all the zip codes that exist only for texas
3. Census API can be called from the browser. All we have to do is input in the proper site address 
     - Input in the list of zip codes into the Census API
     - This is a workaround to manually choosing all the zip codes.
