# Introduction
In this notebook, I want to examine the neighborhoods in Los Angeles and cluster them based on similarities in terms of location and nearby services. Finally, with this information, I want to determine which part of Los Angeles is the best for residence. I will extract the postal codes of Los Angeles and use FourSquare to collect geographical information in these neighborhoods.

# Data
The data I will use include:

- FourSqaure location data
- A list of Postal Codes in Los Angeles: https://www.geonames.org/postal-codes/US/CA/california.html

**Postal Code Data**

In [None]:
import pandas as pd

url = 'https://www.geonames.org/postal-codes/US/CA/california.html'
dfs = pd.read_html(url)

In [None]:
dfs[2]

Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
0,1.0,Beverly Hills,90210,United States,California,Los Angeles,
1,,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406
2,2.0,Los Angeles,90002,United States,California,Los Angeles,
3,,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246
4,3.0,Los Angeles,90003,United States,California,Los Angeles,
...,...,...,...,...,...,...,...
396,199.0,Santa Monica,90406,United States,California,Los Angeles,
397,,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491
398,200.0,Santa Monica,90407,United States,California,Los Angeles,
399,,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491


**Data Wrangling**

We can see that Pandas is putting the longitude and latitude of each location into a separate row. Also, we can see pandas read in two redundant columns Unnamed: 0 and Admin3, which we also will drop. Lastly, we want to change the names of columns Admin 1 and 2 to more descriptive titles. Let's fix these issues.

In [None]:
dfs[2][dfs[2].index % 2 != 0]

Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
1,,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406,34.09/-118.406
3,,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246,33.95/-118.246
5,,33.965/-118.273,33.965/-118.273,33.965/-118.273,33.965/-118.273,33.965/-118.273,33.965/-118.273
7,,34.076/-118.303,34.076/-118.303,34.076/-118.303,34.076/-118.303,34.076/-118.303,34.076/-118.303
9,,34.049/-118.292,34.049/-118.292,34.049/-118.292,34.049/-118.292,34.049/-118.292,34.049/-118.292
...,...,...,...,...,...,...,...
391,,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353
393,,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353,33.962/-118.353
395,,34.035/-118.503,34.035/-118.503,34.035/-118.503,34.035/-118.503,34.035/-118.503,34.035/-118.503
397,,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491,34.019/-118.491


In [None]:
coor_list = dfs[2][dfs[2].index % 2 != 0]['Place']

coor_list

1       34.09/-118.406
3       33.95/-118.246
5      33.965/-118.273
7      34.076/-118.303
9      34.049/-118.292
            ...       
391    33.962/-118.353
393    33.962/-118.353
395    34.035/-118.503
397    34.019/-118.491
399    34.019/-118.491
Name: Place, Length: 200, dtype: object

In [None]:
dfs[2] = dfs[2].iloc[::2]

dfs[2]

Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
0,1.0,Beverly Hills,90210,United States,California,Los Angeles,
2,2.0,Los Angeles,90002,United States,California,Los Angeles,
4,3.0,Los Angeles,90003,United States,California,Los Angeles,
6,4.0,Los Angeles,90004,United States,California,Los Angeles,
8,5.0,Los Angeles,90006,United States,California,Los Angeles,
...,...,...,...,...,...,...,...
392,197.0,Inglewood,90312,United States,California,Los Angeles,
394,198.0,Santa Monica,90402,United States,California,Los Angeles,
396,199.0,Santa Monica,90406,United States,California,Los Angeles,
398,200.0,Santa Monica,90407,United States,California,Los Angeles,


In [None]:
dfs[2] = dfs[2].reset_index(drop=True)

dfs[2]

Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
0,1.0,Beverly Hills,90210,United States,California,Los Angeles,
1,2.0,Los Angeles,90002,United States,California,Los Angeles,
2,3.0,Los Angeles,90003,United States,California,Los Angeles,
3,4.0,Los Angeles,90004,United States,California,Los Angeles,
4,5.0,Los Angeles,90006,United States,California,Los Angeles,
...,...,...,...,...,...,...,...
196,197.0,Inglewood,90312,United States,California,Los Angeles,
197,198.0,Santa Monica,90402,United States,California,Los Angeles,
198,199.0,Santa Monica,90406,United States,California,Los Angeles,
199,200.0,Santa Monica,90407,United States,California,Los Angeles,


In [None]:
dfs[2] = dfs[2].drop('Unnamed: 0', axis=1)
dfs[2] = dfs[2].drop('Admin3', axis=1)

dfs[2]

Unnamed: 0,Place,Code,Country,Admin1,Admin2
0,Beverly Hills,90210,United States,California,Los Angeles
1,Los Angeles,90002,United States,California,Los Angeles
2,Los Angeles,90003,United States,California,Los Angeles
3,Los Angeles,90004,United States,California,Los Angeles
4,Los Angeles,90006,United States,California,Los Angeles
...,...,...,...,...,...
196,Inglewood,90312,United States,California,Los Angeles
197,Santa Monica,90402,United States,California,Los Angeles
198,Santa Monica,90406,United States,California,Los Angeles
199,Santa Monica,90407,United States,California,Los Angeles


In [None]:
dfs[2] = dfs[2].drop(dfs[2].tail(1).index)

dfs[2]

Unnamed: 0,Place,Code,Country,Admin1,Admin2
0,Beverly Hills,90210,United States,California,Los Angeles
1,Los Angeles,90002,United States,California,Los Angeles
2,Los Angeles,90003,United States,California,Los Angeles
3,Los Angeles,90004,United States,California,Los Angeles
4,Los Angeles,90006,United States,California,Los Angeles
...,...,...,...,...,...
195,Inglewood,90311,United States,California,Los Angeles
196,Inglewood,90312,United States,California,Los Angeles
197,Santa Monica,90402,United States,California,Los Angeles
198,Santa Monica,90406,United States,California,Los Angeles


In [None]:
latitudes, longitudes = zip(*(s.split('/') for s in coor_list))

We can see from the data that the longitudes are negative, which suggest the website we fetched the data from used degree East for longitudes. Let's remember this and see later if we need to reformat the data.

In [None]:
dfs[2]['latitudes(degree North)'] = latitudes
dfs[2]['longitudes(degree East)'] = longitudes

dfs[2]

Unnamed: 0,Place,Code,Country,Admin1,Admin2,latitudes(degree North),longitudes(degree East)
0,Beverly Hills,90210,United States,California,Los Angeles,34.09,-118.406
1,Los Angeles,90002,United States,California,Los Angeles,33.95,-118.246
2,Los Angeles,90003,United States,California,Los Angeles,33.965,-118.273
3,Los Angeles,90004,United States,California,Los Angeles,34.076,-118.303
4,Los Angeles,90006,United States,California,Los Angeles,34.049,-118.292
...,...,...,...,...,...,...,...
195,Inglewood,90311,United States,California,Los Angeles,33.962,-118.353
196,Inglewood,90312,United States,California,Los Angeles,33.962,-118.353
197,Santa Monica,90402,United States,California,Los Angeles,34.035,-118.503
198,Santa Monica,90406,United States,California,Los Angeles,34.019,-118.491


Awesome! Now we have all the geographical data we need. Let's rename the Admin1 and Admin2 columns and save the data into a more descriptive variable.

In [None]:
la_neighborhoods = dfs[2].rename(columns={'Admin1': 'State', 'Admin2': 'City'})

la_neighborhoods

Unnamed: 0,Place,Code,Country,State,City,latitudes(degree North),longitudes(degree East)
0,Beverly Hills,90210,United States,California,Los Angeles,34.09,-118.406
1,Los Angeles,90002,United States,California,Los Angeles,33.95,-118.246
2,Los Angeles,90003,United States,California,Los Angeles,33.965,-118.273
3,Los Angeles,90004,United States,California,Los Angeles,34.076,-118.303
4,Los Angeles,90006,United States,California,Los Angeles,34.049,-118.292
...,...,...,...,...,...,...,...
195,Inglewood,90311,United States,California,Los Angeles,33.962,-118.353
196,Inglewood,90312,United States,California,Los Angeles,33.962,-118.353
197,Santa Monica,90402,United States,California,Los Angeles,34.035,-118.503
198,Santa Monica,90406,United States,California,Los Angeles,34.019,-118.491


**FourSquare Data**

In [None]:
pip install geocoder



In [None]:
import geocoder
from geopy.geocoders import Nominatim 

address = 'Los Angeles, California'

geolocator = Nominatim(user_agent="la_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Los Angeles are 34.0536909, -118.242766.
