Your tasks are as follows:
- Connect to the Foursquare API
- Connect to the Yelp API. This API offers similar services as Foursquare.
- For each of the bike stations in Part 1, query both APIs to retrieve information for the following in that location:
    - Restaurants or bars
    - Various POIs (points of interest) of your choice
 - Create a DataFrame for the Yelp results and Foursquare results.
 - Compare the quality of the Yelp and Foursquare API. For your location, which API gives you the most complete information/better coverage? NOTE: Your definition of 'coverage' is up to you. It could be simple 'number of POIs in the area', but it could also be something more specific like 'number of reviews per POI', or 'number of different attributes of each POI'.


In [52]:
# imports
import pandas as pd

import os
import requests
import ast
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

In [53]:
#From the data folder, get a dataframe of all the stations we want yelp/foursquare/usgs data for
stations_df=pd.read_csv('data/initial_stations_df.csv')

In [54]:
#grabbing empty station dictionary from csv dataframe and converting it into a dictionary
stations_empty_df=pd.read_csv('data/stations_empty_df.csv')
stations_to_venues_dict_empty=stations_empty_df.to_dict()

# Connect to the Foursquare API

Send a request to Foursquare with a small radius (100m) for all the bike stations in your city of choice.
- I chose 100m as opposed to 1000, to better model unique "hilliness" of the area's venues more directly. There may be too much of a range of distances to get useful information on hilliness if the spread of distances from the station is too great

In [55]:
def get_nearby_venues_fs(names, latitudes, longitudes,empty_station_dict):
    '''takes columns from a dataframe(series) and returns a dictionary containing the lat long and fsq_id of venues within a certain radius of a bike station also takes an empty dictionary with station ids'''
    #get environmental variable for fs key
    fs_key_name = 'fourquare_auth'
    # Use the os.environ dictionary to access the value of the environmental variable
    fs_key_value = os.environ.get(fs_key_name)
    # Try to access the dictionary with the specified name, create it if it doesn't exist
    filled_station_dict=empty_station_dict.copy()
    # Iterate through the provided names, latitudes, and longitudes
    for name, lat, lng in zip(names, latitudes, longitudes):
        # Print the current station name
        print(name)
        
        # Part 1: Creating the API request URL
        url = "https://api.foursquare.com/v3/places/search"
        latlong = str(lat) + ',' + str(lng)
        params = {
            "ll": latlong,
            "radius": 100  # Search radius in meters
        }

        headers = {
            "Accept": "application/json",
            "Authorization": "fsq3x8GlNfJTXly8FuBH09r407r22hhDIXfvCj/uL8qHZCs="
        }
        
        # Part 2: Making the GET request
        response = requests.request("GET", url, params=params, headers=headers)
        response_dict = response.json()
        list_of_venues=[]
        list_of_lats=[]
        list_of_longs=[]
        # Part 3: Processing nearby venues and appending relevant information to the list
        for result in response_dict['results']:
            if result['fsq_id'] is not None:
                list_of_venues.append(result['fsq_id'])
                list_of_lats.append(result['geocodes']['main']['latitude'])
                list_of_longs.append(result['geocodes']['main']['longitude'])
        venue_lat_long_l=list(zip(list_of_venues, list_of_lats, list_of_longs)) 
        # Store the list of venues in the station_venue_dict using the station name as key
        filled_station_dict[name] = venue_lat_long_l
        filled_station_dict
    # Return the dictionary containing nearby venues for each station
    return filled_station_dict

In [56]:
#run get_nearby_venues_fs function returning a dictionary, with stations as keys and lat, long, ids and items (can be multiple or no venues for a single station) 
station_venues_FS = get_nearby_venues_fs(names = stations_df['id'].head(10),
                                   latitudes = stations_df['latitude'].head(10),
                                   longitudes = stations_df['longitude'].head(10),
                                    
                                   empty_station_dict=stations_to_venues_dict_empty)

d0e8f4f1834b7b33a3faf8882f567ab8
983514094dd808b1604da2dcfc2d09af
da17603652106fda93da4e255a5b0a22
7a21c92b3b4cd2f7759107b4fdebf869
ce34d38fb230a23c1ced12d1e16df294
a3b487ad4ac93ab3e9f9654f87ed8c1e
b4b0088fb4fbb4587cad9d89ddc092cd
d576652cc151c23d6ec52b8454429d47
0a24b6ab9ca6684780b6682901b3c680
a4b234ab072402cbfcbd5e306588d9a9


Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)

In [57]:
#the only information I am grabbing at this point is lat longs and id, in the future I would likely want other info, perhaps ratings or distances

Put your parsed results into a DataFrame

In [58]:
#orients the dictionary so the rows are the keys (stations)
df_fs = pd.DataFrame.from_dict(station_venues_FS, orient='index') 
#inserts index into a new column 'station_id'
df_fs.insert(0, "station_id", df_fs.index) 
#names each column store 1 through n
new_columns = ["station_id"] + ["venue_" + str(i+1) for i in range(df_fs.shape[1]-1)] 
df_fs.columns=new_columns
#resets the index to row numbers
df_fs.reset_index(drop=True, inplace=True)
df_fs
#since I've already done this I will not save the resulting dataframe and rather load a previous one in 
df_fs=pd.read_csv('data/sanfran_venues_latlong_fs.csv')

# Yelp

Send a request to Yelp with a small radius (100m) for all the bike stations in San Fran

In [59]:
def get_nearby_venues_yelp(names, latitudes, longitudes,empty_station_dict):
    '''takes columns from a dataframe(series) and returns a dictionary containing the lat long and id of venues within a certain radius of a bike station also takes an empty dictionary with station ids'''
    filled_station_dict=empty_station_dict.copy()
    #get environmental variable for yelp key
    yelp_key_name = 'yelp_api_auth'
    # Use the os.environ dictionary to access the value of the environmental variable
    yelp_key_value = os.environ.get(yelp_key_name)    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
    
        url = "https://api.yelp.com/v3/businesses/search"
        headers = {
            "accept": "application/json",
            "Authorization": yelp_key_value
        }
        params = {
        "latitude":lat,
        "longitude":lng,
        "radius":100
        } 
        response = requests.request("GET", url, params=params, headers=headers)
        response_dict=response.json()

    
# Part 3 returning only relevant information for each nearby venue and append to the list
        list_of_venues=[]
        list_of_lats=[]
        list_of_longs=[]
        for result in response_dict['businesses']:
            if result['id'] is not None:
                list_of_venues.append(result['id'])
                list_of_lats.append(result['coordinates']['latitude'])
                list_of_longs.append(result['coordinates']['longitude'])
        venue_lat_long_l=list(zip(list_of_venues, list_of_lats, list_of_longs)) 
        filled_station_dict[name] = venue_lat_long_l
        filled_station_dict
    return (filled_station_dict)

Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)

In [60]:
#the only information I am grabbing at this point is lat longs and id, in the future I would likely want other info, perhaps ratings or distances

In [61]:
#returns a dictionary station_Library_Yelp containing  stations as keys and lat, long, ids as items (can be multiple or no venuesfor a single station) 
station_Library_Yelp = getNearbyCoffe_Yelp(names = stations_df['id'],
                              latitudes = stations_df['latitude'],
                            longitudes = stations_df['longitude'], 
                            empty_station_dict=stations_to_venues_dict_empty)

NameError: name 'getNearbyCoffe_Yelp' is not defined

Put your parsed results into a DataFrame

In [None]:
#orients the dictionary so the rows are the keys (stations)
df_yelp = pd.DataFrame.from_dict(station_Library_Yelp, orient='index') 
#inserts index into a new column 'station_id'
df_yelp.insert(0, "station_id", df_yelp.index) 
#names each column library 1 through n
new_columns = ["station_id"] + ["store_" + str(i+1) for i in range(df_yelp.shape[1]-1)] 
df_yelp.columns=new_columns
#resets the index to row numbers
df_yelp.reset_index(drop=True, inplace=True)
df_yelp

In [None]:
df_yelp=pd.read_csv('data/df_yelp_head_tail_250.csv')

df_yelp.head(5)

# Grab Elevations from USGS 
### https://epqs.nationalmap.gov/v1/json
since there is no elevation data in yelp or fs I went to another source

I need to extract only unique lat longs from venue_ids and request them from the usgs api

In [None]:
# Function to extract unique IDs+lat long  tuples
def extract_unique_ids_lat_longs(dataframe):
    '''takes a dataframe with only values exactly like ('pn-KO9C7bLiDqfkZiTFqdA', 37.77064, -122.4771) or NaN and returns a set with tuples of unique ids and lat, long '''
    unique_ids_lat_longs=set()
    for index, row in dataframe.iterrows():
        for column in dataframe.columns:
            if type(row[column]) == str:
                item=row[column]
                unique_ids_lat_longs.add(item)
    return unique_ids_lat_longs

# Apply the function to each row in the DataFrame
venues_only_df=df_yelp[df_yelp.columns[1:]]
venues_only_df
yelp_unique_ids_lat_longs=extract_unique_ids_lat_longs(venues_only_df)
yelp_unique_ids_lat_longs
yelp_unique_ids_lat_longs_list=list(yelp_unique_ids_lat_longs)


df_unique_yelp_ids_latlong = pd.DataFrame({'ID_LAT_LONG': yelp_unique_ids_lat_longs_list})
df_unique_yelp_ids_latlong.head(5)
#df_unique_yelp_ids_latlong.to_csv('data/df_unique_yelp_ids_latlong.csv')

In [None]:
df_unique_yelp_ids_latlong=pd.read_csv('data/df_unique_yelp_ids_latlong.csv')

In [None]:
def get_elev(names, latitudes, longitudes):
    '''takes id string, lat and long in as floats and returns a list '''
    # Create an empty list to hold the entries
    lat_long_list = []
    response = requests.get("https://epqs.nationalmap.gov/v1/json?x=-122&y=30&wkid=4326&units=Meters&includeDate=false")
    
    if response.status_code == 200:
        for name, lat, lng in zip(names, latitudes, longitudes):
            print(name)
            url = "https://epqs.nationalmap.gov/v1/json"
            params = {
                "x": lng,
                "y": lat,
                "wkid": 4326,
                "units": "Meters",
                "includeDate": True
            }

            response = requests.get(url, params=params)
            response_dict = response.json()


            entry = {
                'Station_ID': name,
                'Longitude': response_dict['location']['x'],
                'Latitude': response_dict['location']['y'],
                'Elevation': response_dict['value']
            }
            lat_long_list.append(entry)
            print(response_dict['location']['y'], response_dict['location']['x'], response_dict['value'])
            #saving each request in case there are errors during the request, so we can start over from where the error occured
            #pd.DataFrame(lat_long_list).to_csv('to416df_unique_yelp_ids_latlong_elev.csv', index=False)
    else:
        print("Request failed with status code:", response.status_code)
    
    return lat_long_list

In [None]:
#apply the get elev function from rows 1 to 416, as the row in 417 was causing errors, repeat this process until all venues have elevations
df_unique_yelp_ids_latlong_elev = get_elev(names = df_unique_yelp_ids_latlong['ID_LAT_LONG'][:416].apply(lambda x: x[0]),
                             latitudes = df_unique_yelp_ids_latlong['ID_LAT_LONG'][:416].apply(lambda x: x[1]),
                            longitudes = df_unique_yelp_ids_latlong['ID_LAT_LONG'][:416].apply(lambda x: x[2]))

Example of combing results of multiple requests:

In [None]:
union_df = pd.concat([to416_df_unique_yelp_ids_latlong_elev_df, to2525_df_unique_yelp_ids_latlong_elev_df], ignore_index=True)

In [None]:
df_unique_yelp_ids_latlong_elev=pd.read_csv('data/df_unique_yelp_ids_latlong_elev.csv')
df_unique_yelp_ids_latlong_elev.head(5)

# Which API provided you with more complete data? Provide an explanation. 

Yelp has a greater number of businesses and other venues in there database accessible to their API. Some of these locations might be out of businesses, 

# Comparing Results

In [None]:
df_yelp = pd.read_csv('data/df_yelp_head_tail_250.csv')

In [None]:
df_yelp.head(5)

In [None]:
df_fs=pd.read_csv('data/sanfran_venues_latlong_fs.csv')
df_fs.head(5)

In [None]:
#Count how many venues are found within 100m of each station for foursquare
df_fs['NonNullCount'] = df_fs.apply(lambda row: row.count(), axis=1)
#Count how many venues are found within 100m of each station for yelp
df_yelp['NonNullCount'] = df_yelp.apply(lambda row: row.count(), axis=1)
df_yelp[['station_id', 'NonNullCount']].head(5)

In [None]:
total_fs_hits=df_fs['NonNullCount'].head(250).sum()
total_yelp_hits=df_yelp['NonNullCount'].head(250).sum()
print(f"total yelp hits: {total_yelp_hits}")
print(f"total foursquare hits: {total_fs_hits}")

Yelp found more than double the venues within 100m of a station in San Francisco. I will therefore ignore the foursquare results for the remainder of the project going forward and focus on the yelp results

In [None]:
stat_elevs=pd.read_csv('data/Station_Elevations.csv')

stat_elevs.rename(columns={'Station_ID':'station_id'}, inplace=True)
stat_elevs.head(5)

In [None]:
stat_elevs['elevation'].hist()

In [None]:
stat_elevs['elevation'].describe()

This histogram shows that stations elevation is not normaly distributed, but rather left weighted.

There is however a significant range of elevations 

In [None]:
df_yelp.merge(stat_elevs, on='station_id', how='left').head(5)

In [None]:
yelp_venue_elev=pd.read_csv('data/yelp_venue_elev.csv')
yelp_venue_elev.head(5)

## Get the top 10 highest venues in terms of elevation (modified question)

In [None]:
top_10_elevations = yelp_venue_elev.nlargest(10, 'Elevation')
top_10_elevations

Highest venue is "Roli Roti Gourmet Rotisserie" with business id 7riqoD4pIgG3mTcUPbe4iA at Moraga Ave And La Salle Ave in the east of Oakland at 185m

In [None]:
stations_df.head(5)

In [None]:
stat_elevs.head(5)

In [None]:
def get_marker_color(value, min_value, max_value):
    norm = colors.Normalize(vmin=min_value, vmax=max_value)
    cmap = cm.get_cmap('viridis')  # Choose a colormap (e.g., 'viridis')
    rgba_color = cmap(norm(value))
    hex_color = colors.rgb2hex(rgba_color)
    return hex_color


# Folium Map of Stations

In [None]:
latitude = 37.766137
longitude = -122.347527

map_san_fran = folium.Map(location=[latitude, longitude], zoom_start=12)
min_value = stat_elevs['elevation'].min()
max_value = stat_elevs['elevation'].max()
# add markers to map
for station, lat, lng, elev in zip(stat_elevs['station_id'], stat_elevs['latitude'], stat_elevs['longitude'],stat_elevs['elevation']):
    label = '{}'.format(station)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color=get_marker_color(elev, min_value, max_value),
        fill=True,
        fill_color=get_marker_color(elev, min_value, max_value),
        fill_opacity=1,
        parse_html=False).add_to(map_san_fran)  
    
map_san_fran