# ‘Affordable Places to Raise a Family’ Analysis

by Alex Mahadevan, Age 29

In [2]:
import pandas as pd

### Population, Migration and Births

<p>Let's read in the population dataset from 2010 - 2016. As a plus it has births and migration data we can use in the analysis.</p>
<p>Data is courtesy of the [U.S. Census Bureau](https://www2.census.gov/programs-surveys/popest/datasets/2010-2016/metro/totals/cbsa-est2016-alldata.csv).</p>

In [3]:
df = pd.read_csv("/Users/alexmahadevan/Desktop/Projects/Raise a Family/cbsa-est2016-alldata.csv")

<p>If people are moving to a county or metropolitan area, that probably means it's a good place to live.</p>

<p>So we'll go ahead and create a new variable that takes into account the average annual net migration to a region called AVG_NETMIG. The higher the number, the greater average migration to the municipality.</p>
<p>But, wait. Won't way more people be moving to a huge city like Miami than a smaller one like Sarasota?</p>
<p>To control for population amount, let's find the net migration per capita by diving net migration in each year by the population estimate for that year.</p>

In [4]:
df['AVG_NETMIG'] = (df['NETMIG2010']/df['POPESTIMATE2010'] + df['NETMIG2011']/df['POPESTIMATE2011'] +
df['NETMIG2012']/df['POPESTIMATE2012'] + df['NETMIG2013']/df['POPESTIMATE2013'] +
df['NETMIG2014']/df['POPESTIMATE2014'] + df['NETMIG2015']/df['POPESTIMATE2015'] +
df['NETMIG2015']/df['POPESTIMATE2015'])/7

<p>But, we also want to take into account the overall change in population between 2010 and 2016</p>

<p>So we will create a new variable called POP_CHANGE</p>

In [5]:
df['POP_CHANGE'] = (df['POPESTIMATE2016'] - df['POPESTIMATE2010']) / df['POPESTIMATE2010']

<p>You probably want to raise a family where other people hare raising a family right?</p>
<p>This statistics may not be the best (since sometimes high birth rates correlate with poverty), but let's create a new variable for births per capita for 2016.</p>
<p>We shall call it BIRTHS_PER_CAPITA.</p>

In [6]:
df['BIRTHS_PER_CAPITA'] = df['BIRTHS2016']/df['POPESTIMATE2016']

<p>Are there any other variables we can create that would be important to someone looking to raise a family?</p>

<p>Not from this dataset, but while we're thinking about it, let's drill it down to the 382 metropolitan statistical areas.</p>

<p>First, let's count the number of types in the variable LSAD to make sure we have 382.</p>

In [7]:
df.LSAD.value_counts()

County or equivalent             1825
Micropolitan Statistical Area     551
Metropolitan Statistical Area     382
Metropolitan Division              31
Name: LSAD, dtype: int64

<p>Success! Now, let's drill it down by creating a new dataframe called MSA_POP.</p>

In [8]:
MSA_POP = df[df.LSAD == 'Metropolitan Statistical Area']

<p>Let's just leave that there for now, and explore other variables we can add to this analysis.</p>

### Museums

<p>I found a data set that has all of the museums in cities, let's have a look at that.</p>

In [9]:
MUSEUM = pd.read_csv("/Users/alexmahadevan/Desktop/Projects/Raise a Family/Workbook1.csv")

In [10]:
MUSEUM.head()

Unnamed: 0,ADCITY,ADSTATE,Unnamed: 2
0,SITKA,AK,"SITKA, AK"
1,KING SALMON,AK,"KING SALMON, AK"
2,FAIRBANKS,AK,"FAIRBANKS, AK"
3,PALMER,AK,"PALMER, AK"
4,KODIAK,AK,"KODIAK, AK"


<p>As you can see, we need to concatenate the city and state variables since there are a ton of similarly named cities throughout U.S. states</p>

In [11]:
MUSEUM['CITY'] = MUSEUM['ADCITY'] + ", " + MUSEUM['ADSTATE']

In [12]:
MUSEUM.CITY.value_counts()

NEW YORK, NY            297
CHICAGO, IL             193
WASHINGTON, DC          188
PHILADELPHIA, PA        179
LOS ANGELES, CA         162
HOUSTON, TX             133
BALTIMORE, MD           120
SAN FRANCISCO, CA       119
SEATTLE, WA             103
AUSTIN, TX               96
PORTLAND, OR             96
SAN DIEGO, CA            96
DALLAS, TX               94
BOSTON, MA               93
DENVER, CO               89
ATLANTA, GA              87
PITTSBURGH, PA           84
BROOKLYN, NY             77
LOUISVILLE, KY           76
CINCINNATI, OH           76
NEW ORLEANS, LA          72
MILWAUKEE, WI            70
RICHMOND, VA             69
SAN ANTONIO, TX          69
ALBUQUERQUE, NM          68
TUCSON, AZ               68
HUGO, OK                 67
CLEVELAND, OH            65
HONOLULU, HI             64
MIAMI, FL                64
                       ... 
HOUSTON TX, TX            1
FOREST PARK, GA           1
READS LANDING, MN         1
FREEPORT, MI              1
MOULTRIE, GA        

In [13]:
MUSEUM_CITIES = MUSEUM.CITY.value_counts()

In [14]:
MUSEUM_CITIES.head()

NEW YORK, NY        297
CHICAGO, IL         193
WASHINGTON, DC      188
PHILADELPHIA, PA    179
LOS ANGELES, CA     162
Name: CITY, dtype: int64