# Coursera Capstone

## Disc Golf and Brewery Location Analysis

Disc Golf venues in the United States are popular places for small groups of people to enjoy the great outdoors. What do they need after being outside and enjoying camaraderie? Some refreshments of course. Small local breweries are growing and growing in the United States also. This analysis will find established Disc Golf venues that do not have a Brewery within walking distance and propose examining those sites to build a brewery. The idea is to present a list of potential locations for a brewery to potential brewery owners to conduct further analysis on.

## Data

We will be using data from the US Census Bureau to find cities to analyze and we will match that data up with Foursquare’s venue data.

### Obtain Population and Location data from the US Census Bureau

In [3]:
import requests
import pandas

In [4]:
census_source = requests.get('https://worldpopulationreview.com/us-cities/#cities').text

In [5]:
census_rawdata = pandas.read_html(census_source)

In [6]:
census_moddata = census_rawdata[0]

In [7]:
censusdf = pandas.DataFrame.from_records(census_moddata)

In [8]:
censusdf.head()

Unnamed: 0,Rank,Name,State,2020 Population,2010 Census,Change,2020 Density,Latitude/Longitude,Area (kmÂ²)
0,1,New York,New York,8622357,8175133.0,0.25%,"11,084/kmÂ²",40.66/-73.94,778
1,2,Los Angeles,California,4085014,3792621.0,0.67%,"3,365/kmÂ²",34.02/-118.41,1214
2,3,Chicago,Illinois,2670406,2695598.0,-0.32%,"4,535/kmÂ²",41.84/-87.68,589
3,4,Houston,Texas,2378146,2099451.0,0.79%,"1,443/kmÂ²",29.79/-95.39,1649
4,5,Phoenix,Arizona,1743469,1445632.0,1.88%,"1,300/kmÂ²",33.57/-112.09,1341


### Clean the Census Bureau data

In [9]:
censusdf = censusdf.rename(columns={"Area (kmÂ²)":"Area(km)"})

In [10]:
censusdf = censusdf.replace(to_replace='/kmÂ²', value='/km2', regex=True)

### Population Analysis

In [11]:
censusdf['2020 Population'].describe()

count    2.000000e+02
mean     4.162974e+05
std      7.315589e+05
min      1.350430e+05
25%      1.764738e+05
50%      2.255200e+05
75%      3.945832e+05
max      8.622357e+06
Name: 2020 Population, dtype: float64

#### *The middle 50% of cities' population range from 176,473 to 394,583.*    
#### *We will therefore conduct our analysis on cities with a population range between 175,000 and 400,000*

In [12]:
censusdf_trimmed = censusdf[(censusdf['2020 Population'] >= 175000) & (censusdf['2020 Population'] <= 400000)]

In [13]:
censusdf_trimmed.head()

Unnamed: 0,Rank,Name,State,2020 Population,2010 Census,Change,2020 Density,Latitude/Longitude,Area(km)
49,50,New Orleans,Louisiana,398523,343829.0,0.44%,908/km2,30.05/-89.93,439
50,51,Wichita,Kansas,393270,382368.0,0.21%,945/km2,37.69/-97.35,416
51,52,Bakersfield,California,391996,347483.0,1.01%,"1,011/km2",35.35/-119.04,388
52,53,Cleveland,Ohio,377797,396815.0,-0.53%,"1,877/km2",41.48/-81.68,201
53,54,Aurora,Colorado,375326,325078.0,0.92%,944/km2,39.69/-104.69,398


#### Split the Latitidue and Longitude column into two different columns

In [14]:
censusdf_trimmed[['Latitude', 'Longitude']] = censusdf_trimmed['Latitude/Longitude'].str.split("/", expand=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


In [15]:
censusdf_final = censusdf_trimmed[['Name','State','2020 Population','Latitude','Longitude','Area(km)','2020 Density']]

In [16]:
censusdf_final.head()

Unnamed: 0,Name,State,2020 Population,Latitude,Longitude,Area(km),2020 Density
49,New Orleans,Louisiana,398523,30.05,-89.93,439,908/km2
50,Wichita,Kansas,393270,37.69,-97.35,416,945/km2
51,Bakersfield,California,391996,35.35,-119.04,388,"1,011/km2"
52,Cleveland,Ohio,377797,41.48,-81.68,201,"1,877/km2"
53,Aurora,Colorado,375326,39.69,-104.69,398,944/km2


In [17]:
CLIENT_ID = 'EA3ER2PLBUQRTW1XH0CZADJRLEJ4D0RNO0EOCV2CQUEOE51G' # your Foursquare ID
CLIENT_SECRET = 'F3PNFMJAR04XR5ASLDK2RNP150GRMOSISR20KSKFCPPA00ZV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
CATEGORY_ID = '52e81612bcbc57f1066b79e8' # Disc Golf Category

In [18]:
def getVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url ='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            CATEGORY_ID)

        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])

    venuesdf = pandas.DataFrame([item for venue_list in venues_list for item in venue_list])
    venuesdf.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue',
                  'VenueID',
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Venue Category']
    return(venuesdf)

### Obtain Foursquare data using the Cities we are interested in and focusing on the Disc Golf category

In [19]:
city_venues = getVenues(names=censusdf_final['Name'],
                                   latitudes=censusdf_final['Latitude'],
                                   longitudes=censusdf_final['Longitude']
                                  )

New Orleans
Wichita
Bakersfield
Cleveland
Aurora
Anaheim
Honolulu
Riverside
Santa Ana
Lexington
Corpus Christi
Henderson
Stockton
St. Paul
Irvine
Orlando
Pittsburgh
Cincinnati
Anchorage
St. Louis
Plano
Lincoln
Greensboro
Durham
Newark
St. Petersburg
Chula Vista
Toledo
Scottsdale
Fort Wayne
Madison
Lubbock
Laredo
Jersey City
Reno
Chandler
North Las Vegas
Glendale
Gilbert
Buffalo
Winston Salem
Chesapeake
Irving
Norfolk
Fremont
Richmond
Boise
Hialeah
Garland
Spokane
Tacoma
Baton Rouge
Fontana
Modesto
San Bernardino
Des Moines
Moreno Valley
Oxnard
Mckinney
Birmingham
Fayetteville
Port St. Lucie
Rochester
Amarillo
Grand Prairie
Salt Lake City
Yonkers
Frisco
Grand Rapids
Huntsville
Tempe
Overland Park
Little Rock
Cape Coral
Huntington Beach
Augusta
Montgomery
Tallahassee
Akron
Mobile
Knoxville
Brownsville
Shreveport
Sioux Falls
Worcester
Santa Clarita
Vancouver
Rancho Cucamonga
Elk Grove
Fort Lauderdale
Ontario
Chattanooga
Newport News
Salem
Providence
Corona
Peoria
Eugene
Santa Rosa
Oceansi

In [20]:
city_venues.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,VenueID,Venue Latitude,Venue Longitude,Venue Category
0,New Orleans,30.05,-89.93,City Park Disc Golf Course,4d6033c7338bb60cdf8f25bd,29.991785,-90.093531,Disc Golf
1,New Orleans,30.05,-89.93,Lafreiere Park Disc Golf Course,4e542bbb1495ac3f02313e42,30.000182,-90.208831,Disc Golf
2,Wichita,37.69,-97.35,Disc Golf Mania,4bb9eb01b35776b0b247ca01,37.704375,-97.347938,Disc Golf
3,Wichita,37.69,-97.35,Oak Park Disc Golf Course,4e651cebb0fb188e8ed6d8c3,37.711936,-97.359596,Disc Golf
4,Wichita,37.69,-97.35,Herman Hill Park,4cee5623d29b2d435383edbb,37.64878,-97.33891,Park


In [21]:
city_venues['Venue Category'].value_counts()

Disc Golf              422
Park                    41
Golf Course              4
Sporting Goods Shop      3
Trail                    1
Campground               1
Athletics & Sports       1
Name: Venue Category, dtype: int64

#### Combine our Foursquare and Census data

In [22]:
combined_df = pandas.merge(city_venues, censusdf_final[['Name','State','2020 Population','Area(km)', '2020 Density']], left_on='City', right_on='Name', how='left')

In [23]:
combined_df.drop('Name', axis=1, inplace=True)

In [24]:
combined_df.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,VenueID,Venue Latitude,Venue Longitude,Venue Category,State,2020 Population,Area(km),2020 Density
0,New Orleans,30.05,-89.93,City Park Disc Golf Course,4d6033c7338bb60cdf8f25bd,29.991785,-90.093531,Disc Golf,Louisiana,398523,439,908/km2
1,New Orleans,30.05,-89.93,Lafreiere Park Disc Golf Course,4e542bbb1495ac3f02313e42,30.000182,-90.208831,Disc Golf,Louisiana,398523,439,908/km2
2,Wichita,37.69,-97.35,Disc Golf Mania,4bb9eb01b35776b0b247ca01,37.704375,-97.347938,Disc Golf,Kansas,393270,416,945/km2
3,Wichita,37.69,-97.35,Oak Park Disc Golf Course,4e651cebb0fb188e8ed6d8c3,37.711936,-97.359596,Disc Golf,Kansas,393270,416,945/km2
4,Wichita,37.69,-97.35,Herman Hill Park,4cee5623d29b2d435383edbb,37.64878,-97.33891,Park,Kansas,393270,416,945/km2


#### Clean the data by removing categories not specific to Disc Golf

In [25]:
dgolfdf = combined_df[combined_df['Venue Category'].str.contains('Disc Golf')]

In [26]:
dgolfdf

Unnamed: 0,City,City Latitude,City Longitude,Venue,VenueID,Venue Latitude,Venue Longitude,Venue Category,State,2020 Population,Area(km),2020 Density
0,New Orleans,30.05,-89.93,City Park Disc Golf Course,4d6033c7338bb60cdf8f25bd,29.991785,-90.093531,Disc Golf,Louisiana,398523,439,908/km2
1,New Orleans,30.05,-89.93,Lafreiere Park Disc Golf Course,4e542bbb1495ac3f02313e42,30.000182,-90.208831,Disc Golf,Louisiana,398523,439,908/km2
2,Wichita,37.69,-97.35,Disc Golf Mania,4bb9eb01b35776b0b247ca01,37.704375,-97.347938,Disc Golf,Kansas,393270,416,945/km2
3,Wichita,37.69,-97.35,Oak Park Disc Golf Course,4e651cebb0fb188e8ed6d8c3,37.711936,-97.359596,Disc Golf,Kansas,393270,416,945/km2
5,Wichita,37.69,-97.35,Stone Creek Disc Golf Course,4be347a7f07b0f47689af743,37.577079,-97.262254,Disc Golf,Kansas,393270,416,945/km2
...,...,...,...,...,...,...,...,...,...,...,...,...
468,Garden Grove,33.78,-117.96,La Mirada Disc Golf Course,4c3ce304a97bbe9aca21fbdd,33.905737,-118.006026,Disc Golf,California,175618,47,"3,776/km2"
469,Garden Grove,33.78,-117.96,El Dorado Disc Golf Course,4b48d7f0f964a5209c5926e3,33.808632,-118.098754,Disc Golf,California,175618,47,"3,776/km2"
470,Garden Grove,33.78,-117.96,Liberty Park Disc Golf Course,527fcf5711d242393c093f4b,33.857988,-118.089426,Disc Golf,California,175618,47,"3,776/km2"
471,Garden Grove,33.78,-117.96,Cerritos Disc Golf Course,4df3e3a5d1649c8a28e4c758,33.857028,-118.100967,Disc Golf,California,175618,47,"3,776/km2"


#### Foursquare has given us duplicate venues as evidenced by the VenueID. We now remove those duplicates

In [27]:
dgolfdf.drop_duplicates(subset ='VenueID', 
                     keep = 'first', inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [28]:
dgolfdf

Unnamed: 0,City,City Latitude,City Longitude,Venue,VenueID,Venue Latitude,Venue Longitude,Venue Category,State,2020 Population,Area(km),2020 Density
0,New Orleans,30.05,-89.93,City Park Disc Golf Course,4d6033c7338bb60cdf8f25bd,29.991785,-90.093531,Disc Golf,Louisiana,398523,439,908/km2
1,New Orleans,30.05,-89.93,Lafreiere Park Disc Golf Course,4e542bbb1495ac3f02313e42,30.000182,-90.208831,Disc Golf,Louisiana,398523,439,908/km2
2,Wichita,37.69,-97.35,Disc Golf Mania,4bb9eb01b35776b0b247ca01,37.704375,-97.347938,Disc Golf,Kansas,393270,416,945/km2
3,Wichita,37.69,-97.35,Oak Park Disc Golf Course,4e651cebb0fb188e8ed6d8c3,37.711936,-97.359596,Disc Golf,Kansas,393270,416,945/km2
5,Wichita,37.69,-97.35,Stone Creek Disc Golf Course,4be347a7f07b0f47689af743,37.577079,-97.262254,Disc Golf,Kansas,393270,416,945/km2
...,...,...,...,...,...,...,...,...,...,...,...,...
457,Oceanside,33.22,-117.31,Google Terrace Park Connecticut disc golf hole...,5181cb77498eb457e312c9a9,33.203842,-117.230995,Disc Golf,California,176800,107,"1,655/km2"
458,Oceanside,33.22,-117.31,Brengle Terrace Disc Golf Course,4ff84fe3e4b0f1c618fbbbe8,33.209847,-117.221780,Disc Golf,California,176800,107,"1,655/km2"
459,Oceanside,33.22,-117.31,Montiel Park Disc Golf Course,4bdb5e653904a5934c0e4a9e,33.131542,-117.114023,Disc Golf,California,176800,107,"1,655/km2"
460,Oceanside,33.22,-117.31,CSUSM Disc Golf,4c3e44d080bc20a164ccaa58,33.131623,-117.159290,Disc Golf,California,176800,107,"1,655/km2"


#### Definition of walking distance is from Wikipedia;  
https://en.wikipedia.org/wiki/Walking_distance_measure  
#### 80 meters per minute and the commonly accepted 10-15 minute walk. So we will limit searches to within 1000 meters of our disc golf venue. 1000 meters woud be a 12.5 minute walk.


In [29]:
# Brewery category ID
CATEGORY_ID = '50327c8591d4c4b30a586d5d'

In [30]:
def getNearbyBreweries(names, latitudes, longitudes, radius=1000):
    
    potential_locations_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url ='https://api.foursquare.com/v2/venues/search/?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&intent=checkin'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius,
            CATEGORY_ID)

        # make the GET request
        results = requests.get(url).json()['response']['venues']
        # build the dataframe with disc golf venues that don't have a brewery within walking distance
        if len(results) == 0:
            potential_locations_list.append([(name,lat,lng)])
            potential_locations_df = pandas.DataFrame([item for location_list in potential_locations_list for item in location_list])
            potential_locations_df.columns = ['Venue', 'Venue Latitude','Venue Longitude']

    return(potential_locations_df)

#### We now filter our list of Disc Golf venues by querying Foursquare to find Disc Golf venues that do not have an already established Brewery within walkng distance defined as within 1000 meters

In [31]:
potential_venues = getNearbyBreweries(names=dgolfdf['Venue'],
                                      latitudes=dgolfdf['Venue Latitude'],
                                      longitudes=dgolfdf['Venue Longitude']
                                  )

City Park Disc Golf Course
Lafreiere Park Disc Golf Course
Disc Golf Mania
Oak Park Disc Golf Course
Stone Creek Disc Golf Course
Hart Park Disc Golf (The Flats)
Shark Tooth Mountain Disc Golf Course
Parma Disc Golf Course
Lakeview Church Disc Golf Course
Tri-C Disc Golf Course
Parma Disc Golf
Expo Disc Golf
Greenwood Village Disc Golf Course
'The Dock' Disc Golf Course - Dry Dock North
Green Valley Ranch Disc Golf Course
Westcreek Disc Golf Course
Lighthouse Disc Golf Course
Lakewood Dry Gulch Disc Golf Course
CHU Disc Golf
Arapahoe Community College Disc Golf Course
Badlands Disc Golf Course
Adams Hollow Disc Golf Course
Springvale Park Disc Golf Course
Schaefer Disc Golf
Disc Golf At Matney Park
Twila Reid Park (Disc Golf Course)
La Mirada Disc Golf Course
El Dorado Disc Golf Course
Liberty Park Disc Golf Course
Cerritos Disc Golf Course
Huntington Beach Disc Golf Park
University of Hawaii - Manoa Disc Golf Course
Waahila Ridge "Hila Monster" Disc Golf Course
Roy's Backyard Disc Gol

In [32]:
potential_venues

Unnamed: 0,Venue,Venue Latitude,Venue Longitude
0,City Park Disc Golf Course,29.991785,-90.093531
1,Lafreiere Park Disc Golf Course,30.000182,-90.208831
2,Oak Park Disc Golf Course,37.711936,-97.359596
3,Stone Creek Disc Golf Course,37.577079,-97.262254
4,Hart Park Disc Golf (The Flats),35.446677,-118.912993
...,...,...,...
254,crane creek disc golf course,38.341822,-122.698517
255,Kinetic Disc Golf - Brengle Terrace Park,33.209775,-117.222538
256,Brengle Terrace Disc Golf Course,33.209847,-117.221780
257,Montiel Park Disc Golf Course,33.131542,-117.114023


#### Merge the Disc Golf venues with no nearby Brewery with out Census data

In [33]:
finaldf = pandas.merge(potential_venues, dgolfdf)

In [34]:
finaldf.sort_values(['State', '2020 Population'])

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,City,City Latitude,City Longitude,VenueID,Venue Category,State,2020 Population,Area(km),2020 Density
214,MUNI disc Golf Course,30.707131,-88.161564,Mobile,30.67,-88.10,4c2bab18ae6820a1a52e1843,Disc Golf,Alabama,190948,361,529/km2
215,Down South Disc Golf,30.599817,-87.902813,Mobile,30.67,-88.10,535a3f3b498e175cb0c0c8a4,Disc Golf,Alabama,190948,361,529/km2
216,Steele Creek Disc Golf Course,30.857668,-88.039450,Mobile,30.67,-88.10,510d9e63e4b00423da003658,Disc Golf,Alabama,190948,361,529/km2
217,Fairhope Disc Golf Course,30.541020,-87.887968,Mobile,30.67,-88.10,4dd2f0eae4cd1b19130d71d7,Disc Golf,Alabama,190948,361,529/km2
210,AUM Disc Golf,32.365859,-86.173768,Montgomery,32.35,-86.27,4b92a988f964a520f90e34e3,Disc Golf,Alabama,196442,414,475/km2
...,...,...,...,...,...,...,...,...,...,...,...,...
94,Token Creek Disc Golf,43.176738,-89.315491,Madison,43.09,-89.43,4dd93ed82271c5d36d671fb5,Disc Golf,Wisconsin,268303,199,"1,345/km2"
95,Glide Disc Golf,43.099062,-89.311073,Madison,43.09,-89.43,4c5595cd06901b8dc1d4bb4d,Disc Golf,Wisconsin,268303,199,"1,345/km2"
96,Capital Springs Recreational Area Disc Golf Co...,43.022267,-89.350753,Madison,43.09,-89.43,51d57123498e621174d5ff41,Disc Golf,Wisconsin,268303,199,"1,345/km2"
97,Vallarta-Ast Disc Golf Course,43.176758,-89.315520,Madison,43.09,-89.43,4c14d3bf7f7f2d7fdf6ae168,Disc Golf,Wisconsin,268303,199,"1,345/km2"


#### The final dataframe gives us 259 potential locations to build a Brewery near