# Finding the next NBA City

### Introduction

The sports entertainment industry is a multi-billion dollar industry. Comprised of four major sports leagues (the NFL, the NBA, MLB and the NHL) each with roughly 30 teams, the impact of these leagues is felt around the country. Yet, in spite of the seeming national saturation of sports teams and leagues, there is continual talk of adding additional teams to new cities to further expand their respective sports. 

A league that has seen growing popularity as of late is the National Basketball Association (NBA). In light of this, talk has been swirling of adding an expansion team to a new market. To further fuel these rumors, the NBA has relatively fewer teams than its two Fall/Winter sports league rivals; 30 teams vs 32 for the  NFL and (soon to be) 32 for the NHL.

With this as a backdrop, the NBA is looking to expand its team base and, therefore, fan and revenue base via expansion. The challenge is finding a city that is suitable for expansion and will create a corresponding increase in revenue without diluting the current product. 

The goal of this study is to help the NBA narrow down its search for suitable expansion cities. This will be done by looking at both bulk city data (population, TV market size, income) and the interests of the locals via popular venues and locales within the respective cities. 

In [189]:
# make the imports
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import sys
#!{sys.executable} -m pip install lxml
#import lxml

### Data

In addition to the FourSqure locale data, there are 4 data sources we need:

1. City Population Data - https://en.m.wikipedia.org/wiki/List_of_United_States_cities_by_population
2. TV Market Data - https://en.wikipedia.org/wiki/List_of_United_States_television_markets
3. Metro Area income data - https://en.wikipedia.org/wiki/List_of_United_States_metropolitan_areas_by_per_capita_income
4. List of current NBA teams - https://www.basketball-reference.com/teams/ 

The first three are located in Wikipedia and will need to be scraped separately then merged for later use.

The last one was exported as a .csv file through the basketball-reference website and will be read-in to a pandas dataframe directly

#### Population Data

First, we'll pull in the population data from Wikipedia

In [266]:
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [286]:
pop_wiki = requests.get("https://en.m.wikipedia.org/wiki/List_of_United_States_cities_by_population")
pop_soup = BeautifulSoup(pop_wiki.content,'html')
#pop_soup

In [296]:
# get the table reference
# By inspection of HTML, table we are interested in is stored in index 3
table = pop_soup.find_all('table')[3]
#table

We will define a function for extracting the table into a dataframe for the population data wiki page

In [297]:
def get_table_data(table):
    
    # Extract the column names
    columns = []
    for c in table.find_all('th'):
        columns.append(c.get_text().strip())
   
    # get the remaining elements based on the length of the column headers
    num_cols = len(columns)
    # create an empty dataframe with the columns discovered
    df = pd.DataFrame()
    
    i=0
    row=[]
    
    for td in table.find_all('td'):
        #print(td.get_text())
        if ('sq' in td.get_text()) and ('mi' in td.get_text()):
            i=i-1
        else:
            row.append(td.get_text().strip())
        i+=1
        # once the number of elements in the rows equals the number of columns, add it to the df and reset the row
        if i==num_cols:
            i=0
            df = df.append(pd.DataFrame(row).T)
            row=[]
    df.columns=columns
    return df
   

In [300]:
df = get_table_data(table)
cols = df.columns.values
cols[0]='Rank'
df.columns = cols
df.set_index('Rank', inplace=True) 
df

Unnamed: 0_level_0,City,State[c],2018estimate,2010Census,Change,2016 land area,2016 population density,Location
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,New York[d],New York,8398748,8175133,+2.74%,780.9 km2,"10,933/km2",40°39′49″N 73°56′19″W﻿ / ﻿40.6635°N 73.9387°W﻿...
2,Los Angeles,California,3990456,3792621,+5.22%,"1,213.9 km2","3,276/km2",34°01′10″N 118°24′39″W﻿ / ﻿34.0194°N 118.4108°...
3,Chicago,Illinois,2705994,2695598,+0.39%,588.7 km2,"4,600/km2",41°50′15″N 87°40′54″W﻿ / ﻿41.8376°N 87.6818°W﻿...
4,Houston[3],Texas,2325502,2100263,+10.72%,"1,651.1 km2","1,395/km2",29°47′12″N 95°23′27″W﻿ / ﻿29.7866°N 95.3909°W﻿...
5,Phoenix,Arizona,1660272,1445632,+14.85%,"1,340.6 km2","1,200/km2",33°34′20″N 112°05′24″W﻿ / ﻿33.5722°N 112.0901°...
6,Philadelphia[e],Pennsylvania,1584138,1526006,+3.81%,347.6 km2,"4,511/km2",40°00′34″N 75°08′00″W﻿ / ﻿40.0094°N 75.1333°W﻿...
7,San Antonio,Texas,1532233,1327407,+15.43%,"1,194.0 km2","1,250/km2",29°28′21″N 98°31′30″W﻿ / ﻿29.4724°N 98.5251°W﻿...
8,San Diego,California,1425976,1307402,+9.07%,842.3 km2,"1,670/km2",32°48′55″N 117°08′06″W﻿ / ﻿32.8153°N 117.1350°...
9,Dallas,Texas,1345047,1197816,+12.29%,882.9 km2,"1,493/km2",32°47′36″N 96°45′59″W﻿ / ﻿32.7933°N 96.7665°W﻿...
10,San Jose,California,1030119,945942,+8.90%,459.7 km2,"2,231/km2",37°17′48″N 121°49′08″W﻿ / ﻿37.2967°N 121.8189°...


Now we need to simplify this data to only keep what we need (City, State, 2018 estimated population, 2016 population density)

In [301]:
df_pop = df[ ['City','State[c]','2018estimate','2016 population density'] ]
# rename the columns
df_pop.columns = ['City','State','Population','Density']
df_pop

Unnamed: 0_level_0,City,State,Population,Density
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,New York[d],New York,8398748,"10,933/km2"
2,Los Angeles,California,3990456,"3,276/km2"
3,Chicago,Illinois,2705994,"4,600/km2"
4,Houston[3],Texas,2325502,"1,395/km2"
5,Phoenix,Arizona,1660272,"1,200/km2"
6,Philadelphia[e],Pennsylvania,1584138,"4,511/km2"
7,San Antonio,Texas,1532233,"1,250/km2"
8,San Diego,California,1425976,"1,670/km2"
9,Dallas,Texas,1345047,"1,493/km2"
10,San Jose,California,1030119,"2,231/km2"


Lastly, we will remove the /km2 from the density column and convert to type float

In [302]:
df_pop['Density']=df_pop['Density'].replace('/km2','',regex=True)
df_pop['City']=df_pop['City'].replace('\[.*\]','',regex=True)
df_pop

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0_level_0,City,State,Population,Density
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,New York,New York,8398748,10933
2,Los Angeles,California,3990456,3276
3,Chicago,Illinois,2705994,4600
4,Houston,Texas,2325502,1395
5,Phoenix,Arizona,1660272,1200
6,Philadelphia,Pennsylvania,1584138,4511
7,San Antonio,Texas,1532233,1250
8,San Diego,California,1425976,1670
9,Dallas,Texas,1345047,1493
10,San Jose,California,1030119,2231


#### Media Market Data

Next, we'll pull in the tv media market data from Wikipedia

In [304]:
tv_wiki = requests.get("https://en.wikipedia.org/wiki/List_of_United_States_television_markets")
tv_soup = BeautifulSoup(tv_wiki.content,'html')
tv_soup

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en"><head>
<meta charset="utf-8"/>
<title>List of United States television markets - Wikipedia</title>
<script>document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_United_States_television_markets","wgTitle":"List of United States television markets","wgCurRevisionId":914065393,"wgRevisionId":914065393,"wgArticleId":44933675,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles to be merged from January 2019","All articles to be merged","Market (economics)","American television-related lists"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Janua

In [305]:
# get the table reference
# By inspection of HTML, table we are interested in is stored in index 1
table = tv_soup.find_all('table')[1]
#table

In [306]:
def get_tv_table_data(table):
    
    # Extract the column names
    columns = []
    for c in table.find_all('th'):
        if 'scope' in c.attrs:
            pass
        else:
            columns.append(c.get_text().strip())
   
    # get the remaining elements based on the length of the column headers
    num_cols = len(columns)
    # create an empty dataframe with the columns discovered
    df = pd.DataFrame()
    
    i=0
    row=[]
    
    # get the market column based on the th tag
    markets = []
    for market in table.find_all('th'):
        if 'scope' in market.attrs:
            markets.append(market.get_text().strip())
   
    # get the remaining elements based on the length of the column headers
    num_cols = len(columns)
    # create an empty dataframe with the columns discovered
    df = pd.DataFrame()
    
    i=0
    j=0
    row=[]
       
    for td in table.find_all('td'):
        if i==1:
            try:
                row.append(markets[j])
                j+=1
                i+=1
            except:
                pass

        # if it's a name with a link, get the name out of the link otherwise just append it to the row
        if td.find('a')!=None:
            row.append(td.find('a').get_text())
        else:
            row.append(td.get_text().strip())
        i+=1
        # once the number of elements in the rows equals the number of columns, add it to the df and reset the row
        if i==num_cols:
            i=0
            df = df.append(pd.DataFrame(row).T)
            row=[]

    df.columns=columns
    df.set_index('Rank[1]', inplace=True)
    return df

['Rank[1]', 'Market', 'State', 'Counties  (or county-equivalents)  covered', 'TV households (2018–19)', 'Local ABC affiliate', 'Local CBS affiliate', 'Local CW affiliate', 'Local Fox affiliate', 'Local NBC affiliate', 'Other significant stations[2]']


Unnamed: 0_level_0,Market,State,Counties (or county-equivalents) covered,TV households (2018–19),Local ABC affiliate,Local CBS affiliate,Local CW affiliate,Local Fox affiliate,Local NBC affiliate,Other significant stations[2]
Rank[1],Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,New York,New York,Bronx,"7,100,300 (6.441%)",WABC-TV,WCBS-TV,WPIX,WNYW,WNBC,WFTY-DT
2,Los Angeles,California,Inyo,"5,276,600 (4.786%)",KABC-TV,KCBS-TV,KTLA,KTTV,KNBC,KCAL-TV
3,Chicago,Illinois,Cook,"3,251,370 (2.949%)",WLS-TV,WBBM-TV,WCIU-TV,WFLD,WMAQ-TV,WGN-TV
4,Philadelphia,Pennsylvania,Berks,"2,816,850 (2.555%)",WPVI-TV,KYW-TV,WPSG,WTXF-TV,WCAU,WFPA-CD
5,Dallas-Fort Worth,Texas,Anderson,"2,622,070 (2.378%)",WFAA,KTVT,KDAF,KDFW,KXAS-TV,KDFI


In [309]:
df_tv = get_tv_table_data(table)
df_tv.head()

['Rank[1]', 'Market', 'State', 'Counties  (or county-equivalents)  covered', 'TV households (2018–19)', 'Local ABC affiliate', 'Local CBS affiliate', 'Local CW affiliate', 'Local Fox affiliate', 'Local NBC affiliate', 'Other significant stations[2]']


Unnamed: 0_level_0,Market,State,Counties (or county-equivalents) covered,TV households (2018–19),Local ABC affiliate,Local CBS affiliate,Local CW affiliate,Local Fox affiliate,Local NBC affiliate,Other significant stations[2]
Rank[1],Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,New York,New York,Bronx,"7,100,300 (6.441%)",WABC-TV,WCBS-TV,WPIX,WNYW,WNBC,WFTY-DT
2,Los Angeles,California,Inyo,"5,276,600 (4.786%)",KABC-TV,KCBS-TV,KTLA,KTTV,KNBC,KCAL-TV
3,Chicago,Illinois,Cook,"3,251,370 (2.949%)",WLS-TV,WBBM-TV,WCIU-TV,WFLD,WMAQ-TV,WGN-TV
4,Philadelphia,Pennsylvania,Berks,"2,816,850 (2.555%)",WPVI-TV,KYW-TV,WPSG,WTXF-TV,WCAU,WFPA-CD
5,Dallas-Fort Worth,Texas,Anderson,"2,622,070 (2.378%)",WFAA,KTVT,KDAF,KDFW,KXAS-TV,KDFI


Now we'll remove the extraneous information and keep only what we need

In [310]:
df_tv_market = df_tv[ ['Market', 'State', 'TV households (2018–19)']]
df_tv_market

Unnamed: 0_level_0,Market,State,TV households (2018–19)
Rank[1],Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,New York,New York,"7,100,300 (6.441%)"
2,Los Angeles,California,"5,276,600 (4.786%)"
3,Chicago,Illinois,"3,251,370 (2.949%)"
4,Philadelphia,Pennsylvania,"2,816,850 (2.555%)"
5,Dallas-Fort Worth,Texas,"2,622,070 (2.378%)"
6,Washington (Hagerstown),District of Columbia,"2,482,480 (2.252%)"
7,Houston,Texas,"2,423,360 (2.198%)"
8,San Francisco-Oakland-San Jose,California,"2,414,470 (2.19%)"
9,Boston (Manchester),Massachusetts,"2,364,870 (2.145%)"
10,Atlanta,Georgia,"2,341,390 (2.124%)"


#### Median Income Data

Next, we'll pull in the median income data

In [361]:
inc_wiki = requests.get("https://en.wikipedia.org/wiki/List_of_United_States_metropolitan_areas_by_per_capita_income")
inc_soup = BeautifulSoup(inc_wiki.content,'html')
#inc_soup

In [362]:
# get the table reference
# By inspection of HTML, table we are interested in is stored in index 2
table = inc_soup.find_all('table')[2]
#table

In [366]:
df_inc = get_table_data(table)
df_inc.set_index('Rank', inplace=True)
df_inc.rename(columns={'Metropolitan statistical area':'Metro'}, inplace=True)
df_inc.drop('Population', axis=1,inplace=True)
df_inc

Unnamed: 0_level_0,Metro,Per capitaincome
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Washington-Arlington-Alexandria, D.C-Virginia-...","$47,411"
2,"San Jose-Santa Clara-Sunnyvale, California MSA","$40,392"
3,"Seattle-Tacoma-Bellevue, Washington MSA","$39,322"
4,"San Francisco-Oakland-Hayward, California MSA","$38,355"
5,"Boston–Worcester–Lawrence, Massachusetts–New H...","$37,311"
6,"Honolulu, Hawaii MSA","$36,339"
7,"Minneapolis-St. Paul-Bloomington, Minnesota MSA","$35,388"
8,"Hartford, Connecticut MSA","$34,310"
9,"Denver-Aurora-Lakewood, Colorado MSA","$32,399"
10,"Portland-Vancouver-Hillsboro, Oregon MSA","$31,377"


### NBA Team Cities
Finally, we'll pull in the list of NBA team cities by CSV

In [320]:
df_teams = pd.read_csv('NBA_cities.csv')

# filter by current teams (where To = 2020)
df_teams = df_teams[ df_teams['To']==2020 ]
# get list of unique NBA cities
nba_cities = list(df_teams.Franchise.unique())
nba_cities

['Atlanta Hawks',
 'Boston Celtics',
 'Brooklyn Nets',
 'Charlotte Hornets',
 'Chicago Bulls',
 'Cleveland Cavaliers',
 'Dallas Mavericks',
 'Denver Nuggets',
 'Detroit Pistons',
 'Golden State Warriors',
 'Houston Rockets',
 'Indiana Pacers',
 'Los Angeles Clippers',
 'Los Angeles Lakers',
 'Memphis Grizzlies',
 'Miami Heat',
 'Milwaukee Bucks',
 'Minnesota Timberwolves',
 'New Orleans Pelicans',
 'New York Knicks',
 'Oklahoma City Thunder',
 'Orlando Magic',
 'Philadelphia 76ers',
 'Phoenix Suns',
 'Portland Trail Blazers',
 'Sacramento Kings',
 'San Antonio Spurs',
 'Toronto Raptors',
 'Utah Jazz',
 'Washington Wizards']

### Merge Data Sets for Use
Now we will merge all of the datasets into a usable dataframe

First we'll add the per capita income

In [421]:
# store metro and pci information
metros = list(df_inc.Metro)
pci = list(df_inc['Per capitaincome'])

# create a function to extract it
def get_merged_value(city,options,values):
    for i in range(len(options)):
        if city in options[i]:
            return values[i]
    return np.nan
            

df_pop['PCI'] = df_pop['City'].apply(lambda x: get_merged_value(x,metros,pci))
df_pop

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0_level_0,City,State,Population,Density,PCI
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,New York,New York,8398748,10933,"$24,581"
2,Los Angeles,California,3990456,3276,"$21,170"
3,Chicago,Illinois,2705994,4600,
4,Houston,Texas,2325502,1395,"$21,701"
5,Phoenix,Arizona,1660272,1200,"$21,907"
6,Philadelphia,Pennsylvania,1584138,4511,"$22,874"
7,San Antonio,Texas,1532233,1250,"$18,518"
8,San Diego,California,1425976,1670,"$22,926"
9,Dallas,Texas,1345047,1493,"$23,616"
10,San Jose,California,1030119,2231,"$40,392"


In [422]:
# fixing the Nan, filling by state
df_data = df_pop.sort_values('State').fillna(method='pad')
df_data.index = df_data.index.astype(int)
df_data.sort_index(inplace=True)
df_data

Unnamed: 0_level_0,City,State,Population,Density,PCI
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,New York,New York,8398748,10933,"$24,581"
2,Los Angeles,California,3990456,3276,"$21,170"
3,Chicago,Illinois,2705994,4600,"$23,074"
4,Houston,Texas,2325502,1395,"$21,701"
5,Phoenix,Arizona,1660272,1200,"$21,907"
6,Philadelphia,Pennsylvania,1584138,4511,"$22,874"
7,San Antonio,Texas,1532233,1250,"$18,518"
8,San Diego,California,1425976,1670,"$22,926"
9,Dallas,Texas,1345047,1493,"$23,616"
10,San Jose,California,1030119,2231,"$40,392"


Adding the tv market in a similar fashion

In [424]:
market = list(df_tv_market['Market'])
households = list(df_tv_market['TV households (2018–19)'])

df_data['TV'] = df_data['City'].apply(lambda x: get_merged_value(x,market,households))
df_data

Unnamed: 0_level_0,City,State,Population,Density,PCI,TV
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,New York,New York,8398748,10933,"$24,581","7,100,300 (6.441%)"
2,Los Angeles,California,3990456,3276,"$21,170","5,276,600 (4.786%)"
3,Chicago,Illinois,2705994,4600,"$23,074","3,251,370 (2.949%)"
4,Houston,Texas,2325502,1395,"$21,701","2,423,360 (2.198%)"
5,Phoenix,Arizona,1660272,1200,"$21,907","1,864,420 (1.691%)"
6,Philadelphia,Pennsylvania,1584138,4511,"$22,874","2,816,850 (2.555%)"
7,San Antonio,Texas,1532233,1250,"$18,518","923,990 (0.838%)"
8,San Diego,California,1425976,1670,"$22,926","987,760 (0.896%)"
9,Dallas,Texas,1345047,1493,"$23,616","2,622,070 (2.378%)"
10,San Jose,California,1030119,2231,"$40,392","2,414,470 (2.19%)"


In [None]:
# fix the NaN by 

Finally, determine if the city already has an NBA team

In [426]:
# define a function to add it
def nba_city(city,nba_cities):
    for n in nba_cities:
        if city in n:
            return 1
    return 0

df_data['NBA']=df_data['City'].apply(lambda x: nba_city(x,nba_cities))
df_data

Unnamed: 0_level_0,City,State,Population,Density,PCI,TV,NBA
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,New York,New York,8398748,10933,"$24,581","7,100,300 (6.441%)",1
2,Los Angeles,California,3990456,3276,"$21,170","5,276,600 (4.786%)",1
3,Chicago,Illinois,2705994,4600,"$23,074","3,251,370 (2.949%)",1
4,Houston,Texas,2325502,1395,"$21,701","2,423,360 (2.198%)",1
5,Phoenix,Arizona,1660272,1200,"$21,907","1,864,420 (1.691%)",1
6,Philadelphia,Pennsylvania,1584138,4511,"$22,874","2,816,850 (2.555%)",1
7,San Antonio,Texas,1532233,1250,"$18,518","923,990 (0.838%)",1
8,San Diego,California,1425976,1670,"$22,926","987,760 (0.896%)",0
9,Dallas,Texas,1345047,1493,"$23,616","2,622,070 (2.378%)",1
10,San Jose,California,1030119,2231,"$40,392","2,414,470 (2.19%)",0


In [438]:
df_data.NBA.value_counts()

# set outlier manually (Golden State = San Francisco)
cols = df_data['City']=='San Francisco'
df_data.loc[cols,'NBA']=1
df_data

Unnamed: 0_level_0,City,State,Population,Density,PCI,TV,NBA
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,New York,New York,8398748,10933,"$24,581","7,100,300 (6.441%)",1
2,Los Angeles,California,3990456,3276,"$21,170","5,276,600 (4.786%)",1
3,Chicago,Illinois,2705994,4600,"$23,074","3,251,370 (2.949%)",1
4,Houston,Texas,2325502,1395,"$21,701","2,423,360 (2.198%)",1
5,Phoenix,Arizona,1660272,1200,"$21,907","1,864,420 (1.691%)",1
6,Philadelphia,Pennsylvania,1584138,4511,"$22,874","2,816,850 (2.555%)",1
7,San Antonio,Texas,1532233,1250,"$18,518","923,990 (0.838%)",1
8,San Diego,California,1425976,1670,"$22,926","987,760 (0.896%)",0
9,Dallas,Texas,1345047,1493,"$23,616","2,622,070 (2.378%)",1
10,San Jose,California,1030119,2231,"$40,392","2,414,470 (2.19%)",0


### Get the City Longitude and Latitude

In [441]:
!{sys.executable} -m pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Collecting geopy
  Downloading https://files.pythonhosted.org/packages/80/93/d384479da0ead712bdaf697a8399c13a9a89bd856ada5a27d462fb45e47b/geopy-1.20.0-py2.py3-none-any.whl (100kB)
[K    100% |████████████████████████████████| 102kB 1.6MB/s a 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/5b/ac/4f348828091490d77899bc74e92238e2b55c59392f21948f296e94e50e2b/geographiclib-1.49.tar.gz
Building wheels for collected packages: geographiclib
  Running setup.py bdist_wheel for geographiclib ... [?25ldone
[?25h  Stored in directory: /home/daniel/.cache/pip/wheels/99/45/d1/14954797e2a976083182c2e7da9b4e924509e59b6e5c661061
Successfully built geographiclib
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.49 geopy-1.20.0


In [446]:
# create a function to add the latitude and longitude for each city
def get_long_lat(city,state,long=1):
    address = str(city)+', '+str(state)
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    if long==1:
        return longitude
    else:
        return latitude
   
# get_long_lat('San Jose','California',0)

In [None]:
# add the long and lat to the dataframe
df_data['Long'] = df_data.apply(lambda x: get_long_lat(x['City'],x['State'],1),axis=1)
df_data['Lat'] = df_data.apply(lambda x: get_long_lat(x['City'],x['State'],0),axis=1)
df_data.head()

### Adding the FourSquare Data

In [None]:
'''
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan
'''

In [None]:
CLIENT_ID = 'X1RDABIJGUOLEDZZHTFOOIME4KPAMKJPOXRRZZLYWJD2NBHT' # your Foursquare ID
CLIENT_SECRET = 'JJ35MDOBWT40B44T4UD3WMMU4F0RSU5TFO2F5AYQMJCMLYG4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
# type your answer here
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# type your answer here
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

