---
title: "Data Gathering"
format: html
---

In [11]:
import requests
import json
import re
import pandas as pd

I will attempt to gather all the data I need to answer my 10 questions. Most of the data I acquire will come from census.gov. I will use data tables from the U.S. Census Bureau's American Community Survey (ACS), a nationwide survey that collects and produces information on social, economic, housing, and demographic characteristics about our nation's population each year.

I do not know how exactly I will be doing my analysis yet or what variables I think will be most useful, but to cover all my bases I will import the DP02-DP05 tables from 2017-2022, excluding 2020 because there was not accurate data that year due to COVID. I may not need all these tables or columns in the tables but it will be nice to have easy access to them in my future analysis.

Here are what the ACS tables contain:<br>
DP02: Selected Social Characteristics in the United States <br>
DP03: Selected Economic Characteristics in the United States <br>
DP04: Selected Housing Characteristics <br>
DP05: ACS Demographic and Housing Estimates

Here is a link to the webpage: https://www.census.gov/data/developers/data-sets/ACS-supplemental-data.html

I decided that I just want to focus on the real estate growth potential in every state. I can start broad by going by state, then in the future I can apply this study to different geographical levels, like if I wanted to focus on one state and compare different counties or cities. The methodology will be the same. So, a state in the United States will be my observational unit.

# Python API

I will start by using an API in python to get one table to see what we are working with.

In [42]:
DP02_URL_2017="https://api.census.gov/data/2017/acs/acs1/profile?get=group(DP02)&for=state:*"
DP02_2017= requests.get(DP02_URL_2017)
DP02_2017 = DP02_2017.json()
DP02_2017=pd.DataFrame(DP02_2017)
DP02_2017.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1209,1210,1211,1212,1213,1214,1215,1216,1217,1218
0,DP02_0001E,DP02_0001EA,DP02_0001M,DP02_0001MA,DP02_0001PE,DP02_0001PEA,DP02_0001PM,DP02_0001PMA,DP02_0002E,DP02_0002EA,...,DP02_0152EA,DP02_0152M,DP02_0152MA,DP02_0152PE,DP02_0152PEA,DP02_0152PM,DP02_0152PMA,GEO_ID,NAME,state
1,1091980,,9693,,1091980,,-888888888,(X),716451,,...,,9854,,73.4,,0.6,,0400000US28,Mississippi,28
2,2385135,,13054,,2385135,,-888888888,(X),1527260,,...,,15052,,81.3,,0.4,,0400000US29,Missouri,29
3,423091,,4068,,423091,,-888888888,(X),262726,,...,,4698,,81.3,,0.9,,0400000US30,Montana,30
4,754490,,4583,,754490,,-888888888,(X),484989,,...,,6206,,84.4,,0.5,,0400000US31,Nebraska,31


As you can see the column names do not tell us much right now because they are codes that match to different variable labels in the census data. For example, DP02_0001E maps to total household counts. In the data cleaning section, we will be sure to give each column a proper label that tells us what the column represents. Furthermore, we will only keep a select few columns out of the large number of variables in the data cleaning section.

I will use the following code to get each table I want and turn them into csvs for easy retrieval for the rest of my analysis.

In [81]:
string='https://api.census.gov/data/2016/acs/acs1/profile?get=group(DP05)&for=state'
list1=['2016','2017','2018','2019','2021']
list2=['2017','2018','2019','2021','2022']
list3=['DP01','DP02','DP03','DP04']
list4=['DP02','DP03','DP04','DP05']
for i in range(5):
    string=string.replace('DP05',list3[0])
    for w in range(4):
        file='./data/'
        csv='.csv'
        name=file+list2[i]+list4[w]+csv
        csvname=file+list2[i]+list3[w]
        string=string.replace(list1[i],list2[i])
        string=string.replace(list3[w],list4[w])
        response=requests.get(string)
        response = response.json()
        df=pd.DataFrame(response)
        df.to_csv(name, index=False)


# R API

Here I will use an API in R to retreive text data that can give me some insight into some real estate trends and what states people are talking about.

# Download

https://www.zillow.com/research/data/