## Migration of Angelenos Out of LA

#### Objective: Use IPUMS data to better understand the migration residents out of Los Angeles (2006-2016)
\*Specifically, how many residents move out, where they move to, and what their demographics and economic characteristics are.

### Relevant Terms:
- **PUMA**: Public Use Microdata Area (geographic unit comprising between 100,000 & 200,000 people)
- **IPUMS**: Integrated Public Use Microdata Series (collection of high-precision samples of the American population drawn from fifteen federal censuses and from the American Community Surveys of 2000-2012)

### Summary of IPUMS Data:
- Based on the number of records available, it seems that PUMA became the standard geographical unit used by the Census Bureau after 2011, while metropolitan areas were the unit of choice before that; consequently, the standards of analysis for 2006-2011 & 2012-2016 migration data aren't identical.

### Choice of Variables:
- The main driver for choosing different variables for each set of years was the significantly greater amount of data available.
- I relied on metropolitan variables to analyze migration out of the LA *metro area* from 2006-2011 & PUMA data to analyze migration out of LA *county* from 2012-2016 (*Note:* PUMA variables indicate the county a resident moved out of but provide the precise PUMA a resident moved to).

### Key Variables (geographic & migration):
- **METAREAD**: Current metropolitan area of residence (https://usa.ipums.org/usa-action/variables/METAREA#description_section)
- **MIGMET1**: Metropolitan area of residence 1 year ago (https://usa.ipums.org/usa-action/variables/MIGMET1#description_section)
- **PUMA**: Current PUMA of residence (https://usa.ipums.org/usa-action/variables/PUMA#description_section)
- **MIGPUMA1**: PUMA of residence 1 year ago (https://usa.ipums.org/usa-action/variables/MIGPUMA1#description_section)
- **MIGPLAC1**: State of residence 1 year ago (https://usa.ipums.org/usa-action/variables/MIGPLAC1#description_section)

*All variables can be found here (HOUSEHOLD-GEOGRAPHIC & PERSON-MIGRATION): https://usa.ipums.org/usa-action/variables/group

------------------------------------------------------------------------

## 2006-2011 Data
#### Choosing the Best IPUMS Geographic Variables 

#### Options:
(1) PUMA/CPUMA0010 & MIGPUMA1

(2) COUNTY 

(3) METAREAD & MIGMET1

#### Findings:
- PUMA variables for 2006-2011 data only yield about 2.5k records of residents moving out of LA county annually (MIGPUMA1 actually records the county a resident moved out of- not the PUMA).

- The COUNTY variable (which indicates current county of residence) isn't an option since there is no corresponding variable that indicates the county a resident moved from (other than MIGPUMA1, which doesn't yield enough data).

#### Conclusion:
Tracking migration by **metropolitan area is the best choice** since it yields the most total records (about 16k), though it's still a relatively low number of records.

------------------------------------------------------------------------

### Comparing the Total Records Returned From Different Variables (2006-2011)

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Combining data for 2006-2011
combined1 = pd.DataFrame()
for year in range(2006,2012):
    file = '../ipums/ipums_' + str(year) + '.csv'
    data = pd.read_csv(file)
    combined1 = pd.concat([combined1, data], sort = False, ignore_index = True)

In [3]:
#Total before filtering to only include LA data
len(combined1)

18169497

#### (1) MIGMET/METAREAD - 16,110 total records
*Note: 68,075 was the total count of LA metro residents who moved during 2006-2011, but only 16,110 moved out of LA metro

In [4]:
#Filtering to only include records of residents who moved from LA metro (code: 4480) in the previous year (for any year between 2006-2011)
metro = combined1[(combined1.MIGMET1 == 4480) & (combined1.MIGPLAC1 == 6) & (combined1.METAREAD.isin([0, 4480]) == False)]

In [5]:
#Total number of records of residents moving out of LA metro (2006-2011)
len(metro)

16110

In [6]:
#Counts of top 20 metro destinations
metro.METAREAD.value_counts()[:20]

6780    3515
4482    2172
680      747
7320     714
8730     660
4120     642
6200     388
7361     377
7360     276
3360     251
7470     242
2840     241
5600     228
6920     227
520      210
7600     201
6440     187
8840     180
1920     180
7400     171
Name: METAREAD, dtype: int64

In [7]:
#Assigning name of metro to top 20 codes from the METAREAD variable above (from: https://usa.ipums.org/usa-action/variables/METAREA#codes_section)
destination_metros = {6780:'Riverside-San Bernardino, CA', 4482:'Orange County, CA', 680:'Bakersfield, CA', 7320:'San Diego, CA', 
                      8730:'Ventura-Oxnard-Simi Valley, CA', 4120:'Las Vegas, NV', 6200:'Phoenix, AZ', 7361:'Oakland, CA', 
                      7360:'San Francisco-Oakland-Vallejo, CA', 3360:'Houston-Brazoria, TX', 7470:'Santa Barbara-Santa Maria-Lompoc, CA', 
                      2840:'Fresno, CA', 5600:'New York, NY-Northeastern NJ', 6920:'Sacramento, CA', 520:'Atlanta, GA', 
                      7600:'Seattle-Everett, WA', 6440:'Portland-Vancouver, OR', 8840:'Washington, DC/MD/VA', 1920:'Dallas-Fort Worth, TX',
                      7400:'San Jose, CA'}

In [8]:
#Mapping name of top 20 metro areas to associated METAREAD code
metro_names = metro.copy()
metro_names['Metro'] = metro_names.METAREAD.map(destination_metros)

In [9]:
#Total number of records for each metro destination
metro_names.Metro.value_counts()

Riverside-San Bernardino, CA            3515
Orange County, CA                       2172
Bakersfield, CA                          747
San Diego, CA                            714
Ventura-Oxnard-Simi Valley, CA           660
Las Vegas, NV                            642
Phoenix, AZ                              388
Oakland, CA                              377
San Francisco-Oakland-Vallejo, CA        276
Houston-Brazoria, TX                     251
Santa Barbara-Santa Maria-Lompoc, CA     242
Fresno, CA                               241
New York, NY-Northeastern NJ             228
Sacramento, CA                           227
Atlanta, GA                              210
Seattle-Everett, WA                      201
Portland-Vancouver, OR                   187
Washington, DC/MD/VA                     180
Dallas-Fort Worth, TX                    180
San Jose, CA                             171
Name: Metro, dtype: int64

------------------------------------------------------------------------

### Other Variables:
*Info on PUMA variables found here: https://usa.ipums.org/usa/volii/cpuma0010.shtml

#### (2) MIGPUMA1/PUMA - 2,517 total records

In [10]:
#Filtering to only include records of residents who moved from LA county (PUMA code: 3700)
puma2006_2011 = combined1[(combined1.MIGPUMA1 == 3700) & (combined1.MIGPLAC1 == 6) & (combined1.PUMA != 3700)]

In [11]:
len(puma2006_2011)

2517

------------------------------------------------------------------------

#### (3) MIGPUMA1/CPUMA0010 (LA county) - 2,445 total records
*CPUMA0010 (https://usa.ipums.org/usa-action/variables/CPUMA0010#description_section)

In [12]:
#CPUMA0010 codes for LA county comprise codes 71-99
cpuma0010_county = combined1[(combined1.MIGPUMA1 == 3700) & (combined1.MIGPLAC1 == 6) & (combined1.CPUMA0010.isin(range(71,100)) == False)]

In [13]:
len(cpuma0010_county)

2445

------------------------------------------------------------------------

#### (4) MIGPUMA1/CPUMA0010 (LA city) - 2,517 total records

In [14]:
#File containing all CPUMA0010 codes
cpuma_codes = pd.read_csv('../ipums/PUMA2000_PUMA2010_crosswalk.csv')

In [15]:
#Returning unique CPUMA0010 codes corresponding to LA city
la_city_PUMA_codes = set(cpuma_codes.PUMA10[cpuma_codes.PUMA10_Name.str.contains('LA City')])

In [16]:
#Filtering out CPUMA0010 codes corresponding to LA city codes (above)
cpuma0010_city = combined1[(combined1.MIGPUMA1 == 3700) & (combined1.MIGPLAC1 == 6) & (combined1.CPUMA0010.isin(la_city_PUMA_codes) == False)]

In [17]:
len(cpuma0010_city)

2517

*Question to look into: Why do 2 & 4 yield the same exact record count? (I don't notice a mistake but maybe someone reading this can point it out if there is one)

------------------------------------------------------------------------

## 2012-2016 Data
##### PUMA was the only viable geographic variable (since metro data wasn't available after 2011)

In [18]:
#Combining data for 2012-2016
combined2 = pd.DataFrame()
for year in range(2012,2017):
    file = '../ipums/ipums_' + str(year) + '.csv'
    data = pd.read_csv(file)
    combined2 = pd.concat([combined2, data], sort = False, ignore_index = True)

In [19]:
#Total number of records before filtering out non-LA migration data
len(combined2)

15681927

In [20]:
#Total number of records after filtering out non-LA migration data
combined2 = combined2[(combined2.MIGPUMA1 == 3700) & (combined2.MIGPLAC1 == 6)]
len(combined2)

59110

#### Filtering Out LA City vs LA County Records...

In [21]:
#Filtering out migration within LA city
la_city_PUMA_codes = set(cpuma_codes.PUMA10[cpuma_codes.PUMA10_Name.str.contains('LA City')])
la_city = combined2[combined2.PUMA.isin(la_city_PUMA_codes) == False]
len(la_city)

41333

In [22]:
#Filtering out migration within LA county (returns much fewer records than just filtering out LA city)
la_county = combined2[combined2.CPUMA0010.isin(range(71,100)) == False]
len(la_county)

14686

#### Mapping PUMA Codes to Names of Areas

In [23]:
#File identifying names of areas encompassed by a particular PUMA
puma_codes = pd.read_csv('../ipums/puma_codes.csv')

In [24]:
#Keeping relevant variables before merging with combined2
puma_names = puma_codes.copy()[['State_FIPS', 'State_Name', 'PUMA', 'PUMA_Name']]

In [25]:
#Merging puma_names file with combined2 to assign PUMA name (i.e. parts of cities/counties encompassed by PUMA) to respective PUMA code
puma_codes_names = pd.merge(puma_names, combined2, how='inner', left_on=['State_FIPS', 'PUMA'], right_on=['STATEFIP', 'PUMA'])

#### Names of PUMAs: Including LA City vs Excluding LA City vs Excluding LA County

In [26]:
#Top 10 PUMA destinations before filtering out PUMAs encompassing part of LA city
incl_la_city = puma_codes_names.PUMA_Name
incl_la_city.value_counts()[:10]

Los Angeles County (West Central)--LA City (West Central/Westwood & West Los Angeles)    1550
Los Angeles County (Central)--LA City (East Central/Central City & Boyle Heights)        1244
Los Angeles County--LA City (East Central/Silver Lake, Echo Park & Westlake)             1086
Los Angeles County (Northwest)--Santa Clarita City                                       1038
Los Angeles County--LA City (Mount Washington, Highland Park & Glassell Park)             994
Los Angeles County (North)--LA City (Northwest/Chatsworth & Porter Ranch)                 933
Los Angeles County (Central)--LA City (East Central/Hollywood)                            927
Los Angeles County (North/Unincorporated)--Castaic                                        916
Los Angeles County--LA City (Northwest/Canoga Park, Winnetka & Woodland Hills)            908
Los Angeles County (North Central)--Palmdale City                                         905
Name: PUMA_Name, dtype: int64

In [27]:
#Top 10 PUMA destinations after filtering out PUMAs encompassing part of LA city
excl_la_city = puma_codes_names.PUMA_Name[puma_codes_names.PUMA_Name.str.contains('LA City') == False]
excl_la_city.value_counts()[:10]

Los Angeles County (Northwest)--Santa Clarita City                                      1038
Los Angeles County (North/Unincorporated)--Castaic                                       916
Los Angeles County (North Central)--Palmdale City                                        905
Los Angeles County--LA (Southwest/Marina del Rey & Westchester) & Culver City Cities     887
Los Angeles County (South)--South Gate & Lynwood Cities                                  835
Los Angeles County (East Central)--Pomona City                                           808
Los Angeles County (Central)--Glendale City                                              767
Los Angeles County (Central)--Pasadena City                                              764
Los Angeles County (North Central)--Lancaster City                                       725
Los Angeles County--Baldwin Park, Azusa, Duarte & Irwindale Cities                       688
Name: PUMA_Name, dtype: int64

In [28]:
#Top 10 PUMA destinations after filtering out PUMAs encompassing part of LA county
excl_la_county = puma_codes_names.PUMA_Name[puma_codes_names.PUMA_Name.str.contains('Los Angeles County') == False]
excl_la_county.value_counts()[:10]

Kern County (West)--Delano, Wasco & Shafter Cities                    318
Orange County (Central)--Irvine City (Central)                        291
Orange County (Northwest)--Buena Park, Cypress & Seal Beach Cities    286
San Bernardino County (Southwest)--Chino & Chino Hills Cities         281
San Bernardino County (Southwest)--Upland & Montclair Cities          208
Orange County (North Central)--Fullerton & Placentia Cities           199
Santa Barbara County--South Coast Region                              190
San Bernardino County (Southwest)--Rancho Cucamonga City              178
Ventura County (Southeast)--Thousand Oaks City                        168
San Bernardino County (Southwest)--Ontario City                       167
Name: PUMA_Name, dtype: int64