# Foursquare

### Part 2: Connecting to Foursquare and Yelp APIs

Tasks are as follows:
1. Connect to the  [Foursquare](https://developer.foursquare.com/places) API
2. Connect to the [Yelp](https://docs.developer.yelp.com/docs/fusion-intro) API. This API offers similar services as Foursquare.
3. For each of the bike stations in Part 1, query both APIs to retrieve information for the following in that location:
 - Restaurants or bars
 - Various POIs (points of interest) of your choice
4. Create a DataFrame for the Yelp results and Foursquare results. 
5. Compare the quality of the Yelp and Foursquare API. For your location, which API gives you the most complete information/better coverage? *NOTE:* Your definition of 'coverage' is up to you. It could be simple 'number of POIs in the area', but it could also be something more specific like 'number of reviews per POI', or 'number of different attributes of each POI'.

**TASK**: Send a request to Foursquare with a small radius (1000m) for all the bike stations in your city of choice. 

In [1]:
# importing required libraries
import requests
import pandas as pd
import os
import time

In [2]:
# accessing environmental variables
FOURSQUARE_API_KEY = os.environ['FOURSQUAREAPIKEY']
YELP_API_KEY = os.environ['YELPAPIKEY']

In [3]:
# getting info from saved csv of Oslo bike stations
df = pd.read_csv("oslo_bikes_data.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 263 entries, 0 to 262
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       263 non-null    int64  
 1   name             263 non-null    object 
 2   latitude         263 non-null    float64
 3   longitude        263 non-null    float64
 4   bikes_available  263 non-null    int64  
 5   slots            263 non-null    int64  
dtypes: float64(2), int64(3), object(1)
memory usage: 12.5+ KB


Creating a request for restaurants and bars and parsing results:

In [8]:
results = []

for index, row in df.iterrows():
    latitude = row['latitude']
    longitude = row['longitude']

    url = 'https://api.foursquare.com/v3/places/search'
    params = {
        # 'location': 'Oslo, Norway',
        'll': f'{latitude},{longitude}',
        'categoryId': '13000,12013',    # for restaurants and bars
        'radius': 1000,
        'fields': 'name,location,rating,categories,distance,description,popularity,price',
        'sort': 'RATING'
    }
    headers = {
        'Accept': 'application/json',
        'Authorization': FOURSQUARE_API_KEY
    }

    # rate-limit API calls to avoid restriction
    time.sleep(1.0)

    response = requests.get(url, params=params, headers=headers)
    if response.status_code != 200:
        print("Request failed. Status code:", response.status_code)
    else:
        data = response.json()

    res = data.get('results')
    frame = pd.DataFrame(res)
    results.append({
        'Latitude': latitude,
        'Longitude': longitude,
        'Foursquare Name': frame.get('name', None),
        'Distance': frame.get('distance', None),
        'Popularity': frame.get('popularity', None),
        'Rating': frame.get('rating', None),
        'Price': frame.get('price', None)
        })

**TASK**: Put your parsed results into a DataFrame

In [10]:
df_foursquare_restobars = pd.DataFrame(results)

df_foursquare_restobars

Unnamed: 0,Latitude,Longitude,Foursquare Name,Distance,Popularity,Rating,Price
0,59.908055,10.747998,0 Torggata Botaniske 1 ...,0 920 1 224 2 264 3 874 4 720 5...,0 0.986438 1 0.999839 2 0.999927 3 ...,0 9.4 1 9.3 2 9.2 3 9.2 4 9.2 5...,0 2.0 1 NaN 2 3.0 3 2.0 4 2.0 5...
1,59.913720,10.735887,0 Vinmonopolet 1 ...,0 654 1 944 2 867 3 919 4 432 5...,0 0.992030 1 0.986438 2 0.990885 3 ...,0 9.4 1 9.4 2 9.2 3 9.2 4 9.1 5...,0 NaN 1 2.0 2 2.0 3 1.0 4 2.0 5...
2,59.903989,10.740627,0 Opera Roof (Operataket) 1 ...,0 737 1 778 2 682 3 824 4 683 5...,0 0.999839 1 0.999927 2 0.997006 3 ...,0 9.3 1 9.2 2 9.1 3 9.0 4 9.0 5...,0 NaN 1 3.0 2 NaN 3 NaN 4 NaN 5...
3,59.912711,10.735595,0 Vinmonopolet 1 ...,0 592 1 899 2 841 3 536 4 702 5...,0 0.992030 1 0.990885 2 0.987832 3 ...,0 9.4 1 9.2 2 9.2 3 9.1 4 9.1 5...,0 NaN 1 2.0 2 1.0 3 2.0 4 2.0 5...
4,59.920852,10.733357,0 Tunco 1 ...,0 502 1 520 2 582 3 756 4 557 5...,0 0.918818 1 0.984368 2 0.989740 3 ...,0 9.3 1 9.1 2 9.1 3 9.1 4 9.1 5...,0 2.0 1 NaN 2 2.0 3 NaN 4 2.0 5...
...,...,...,...,...,...,...,...
258,59.921768,10.730476,0 Tunco 1 ...,0 605 1 592 2 766 3 782 4 747 5...,0 0.918818 1 0.984368 2 0.989740 3 ...,0 9.3 1 9.1 2 9.1 3 9.1 4 9.1 5...,0 2.0 1 NaN 2 2.0 3 NaN 4 2.0 5...
259,59.916331,10.716349,0 Vinmonopolet 1 ...,0 796 1 928 2 630 3 772 4 309 5...,0 0.992030 1 0.987832 2 0.959299 3 ...,0 9.4 1 9.2 2 9.0 3 9.0 4 8.9 5...,0 NaN 1 1.0 2 2.0 3 NaN 4 2.0 5...
260,59.911392,10.747282,0 Torggata Botaniske 1 ...,0 586 1 502 2 519 3 840 4 733 5...,0 0.986438 1 0.999839 2 0.999927 3 ...,0 9.4 1 9.3 2 9.2 3 9.2 4 9.2 5...,0 2.0 1 NaN 2 3.0 3 2.0 4 2.0 5...
261,59.910924,10.736215,0 Vinmonopolet 1 Oper...,0 591 1 951 2 926 3 786 4 704 5...,0 0.992030 1 0.999839 2 0.990885 3 ...,0 9.4 1 9.3 2 9.2 3 9.2 4 9.1 5...,0 NaN 1 NaN 2 2.0 3 1.0 4 2.0 5...


In [10]:
# creating a function to save dataframes to csv files
def save_dataframe_to_csv(dataframe, file_path, index=False):
    try:
        dataframe.to_csv(file_path, index=index)
        print(f"DataFrame successfully saved to {file_path}")
    except Exception as e:
        print(f"An error occurred while saving the DataFrame to a CSV file: {e}")

In [None]:
# saving the DataFrame to CSV
save_dataframe_to_csv(df_foursquare_restobars, 'foursquare_restobars.csv')

**Other POIs (points of interest)**: museums

> As a part of an experiment, I compare proximity of bike stations to museums in Oslo. In order to do that, I loop through the data to find closesst POI and sort the results by distance. Radius limit remains the same.

In [11]:
# function for making a call to Foursquare API

def get_fsq(latitude, longitude, radius, api_key, categories):
    url = "https://api.foursquare.com/v3/places/search"

    params = {
        "ll": f"{latitude},{longitude}",
        "radius": radius,
        "categories": categories,
        "sort": "DISTANCE",
        "limit": 1 # adjustable
    }

    headers = {
        "Accept": "application/json",
        "Authorization": FOURSQUARE_API_KEY
    }
    response = requests.get(url, params=params, headers=headers)
    return response.json()

In [15]:
from pprint import pprint

# testing the function
res = get_fsq(latitude=59.915451, longitude=10.75833, radius=1000, api_key=FOURSQUARE_API_KEY, categories=10027)
if res:
    pprint(res)

{'context': {'geo_bounds': {'circle': {'center': {'latitude': 59.915451,
                                                  'longitude': 10.75833},
                                       'radius': 1000}}},
 'results': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/museum_history_',
                                       'suffix': '.png'},
                              'id': 10030,
                              'name': 'History Museum',
                              'plural_name': 'History Museums',
                              'short_name': 'History Museum'}],
              'chains': [],
              'closed_bucket': 'LikelyOpen',
              'distance': 188,
              'fsq_id': '535cdede498e088d7fd4be72',
              'geocodes': {'main': {'latitude': 59.916521,
                                    'longitude': 10.755344},
                           'roof': {'latitude': 59.916521,
                                    'longitude': 

Parsing through the response to get the POI details

In [12]:
# function to process each bike station:

def get_museums_near_station(row, api_key, radius=1000, categories='10027'):
    latitude = row['latitude']
    longitude = row['longitude']
    
    response = get_fsq(latitude, longitude, radius, api_key, categories)
    
    museums = []
    for result in response.get('results', []):
        museums.append({
            'bike_station_name': row['name'],
            'bike_station_lat': latitude,
            'bike_station_lon': longitude,
            'museum_name': result['name'],
            'museum_lat': result['geocodes']['main']['latitude'],
            'museum_lon': result['geocodes']['main']['longitude'],
            'distance': result['distance']
        })
    
    return museums

In [13]:
# reading the bike stations data
df = pd.read_csv("oslo_bikes_data.csv")

In [14]:
# getting museums near all bike stations from Foursquare
all_museums = []
total_stations = len(df)

for index, row in df.iterrows():
    museums = get_museums_near_station(row, FOURSQUARE_API_KEY)
    all_museums.extend(museums)
    print(f"Processed station {index + 1} of {total_stations}") 
    time.sleep(1)  # to avoid hitting rate limits

Processed station 1 of 263
Processed station 2 of 263
Processed station 3 of 263
Processed station 4 of 263
Processed station 5 of 263
Processed station 6 of 263
Processed station 7 of 263
Processed station 8 of 263
Processed station 9 of 263
Processed station 10 of 263
Processed station 11 of 263
Processed station 12 of 263
Processed station 13 of 263
Processed station 14 of 263
Processed station 15 of 263
Processed station 16 of 263
Processed station 17 of 263
Processed station 18 of 263
Processed station 19 of 263
Processed station 20 of 263
Processed station 21 of 263
Processed station 22 of 263
Processed station 23 of 263
Processed station 24 of 263
Processed station 25 of 263
Processed station 26 of 263
Processed station 27 of 263
Processed station 28 of 263
Processed station 29 of 263
Processed station 30 of 263
Processed station 31 of 263
Processed station 32 of 263
Processed station 33 of 263
Processed station 34 of 263
Processed station 35 of 263
Processed station 36 of 263
P

In [15]:
# now we convert results to df
museums_df = pd.DataFrame(all_museums)

In [16]:
# saving the DataFrame to CSV
save_dataframe_to_csv(museums_df, 'foursquare_museums.csv')

DataFrame successfully saved to foursquare_museums.csv


In [18]:
# read the CSV file to verify its contents
verification_df = pd.read_csv('foursquare_museums.csv')
print(verification_df.head())
print(f"Total rows: {len(verification_df)}")

    bike_station_name  bike_station_lat  bike_station_lon  \
0            Langkaia         59.908055         10.747998   
1    Spikersuppa Vest         59.913720         10.735887   
2    Vippetangen vest         59.903989         10.740627   
3    Kjeld Stubs gate         59.912711         10.735595   
4  Studentparlamentet         59.920852         10.733357   

                                 museum_name  museum_lat  museum_lon  distance  
0                                 Filmmuseet   59.909726   10.745849       236  
1     The Viking Planet Oslo: Hours, Address   59.913194   10.734102       115  
2                             Forsvarsmuseet   59.904487   10.740562        96  
3     The Viking Planet Oslo: Hours, Address   59.913194   10.734102        98  
4  Litteraturhuset - The House of Literature   59.920356   10.728641       268  
Total rows: 232


# Yelp

**TASK**: Send a request to Yelp with a small radius (1000m) for all the bike stations in your city of choice. 

Creating a request for restaurants and bars and parsing results:

In [28]:
# encountered errors, troubleshooting by testing a single request:

# Yelp API endpoint
url = 'https://api.yelp.com/v3/businesses/search'

# Yelp API key
YELP_API_KEY = os.environ['YELPAPIKEY']

# test a single request
test_lat, test_lon = 59.915451, 10.75833  # Example coordinates
test_params = {
    'latitude': test_lat,
    'longitude': test_lon,   
    'radius': 1000, 
    'categories': 'restaurants,bars',
    'sort_by': 'rating',
    'limit': 50
}
headers = {
    'Authorization': f'Bearer {YELP_API_KEY}'
}
test_response = requests.get(url, params=test_params, headers=headers)
print(f"Test response status: {test_response.status_code}")
print(f"Test response content: {test_response.text[:500]}...") # printing first 500 characters

Test response status: 200
Test response content: {"businesses": [{"id": "rfcAeM2bG8yHljvyH5Rflw", "alias": "happolati-oslo", "name": "Happolati", "image_url": "https://s3-media1.fl.yelpcdn.com/bphoto/hw--HczuWFI9BTrMSaKONw/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/happolati-oslo?adjust_creative=U4uVkVdIYTm8WBRF9rpxgw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=U4uVkVdIYTm8WBRF9rpxgw", "review_count": 9, "categories": [{"alias": "asianfusion", "title": "Asian Fusion"}], "rating": 5.0, "coordinates": ...


In [29]:
# test was successful, so we can proceed with the loop:

results = []

for index, row in df.iterrows():
    latitude = row['latitude']
    longitude = row['longitude']

    params = {
        # 'location': 'Oslo, Norway',
        'latitude': latitude,
        'longitude': longitude,   
        'radius': 1000, 
        'categories': 'restaurants,bars',
        'sort_by': 'rating',
        'limit': 50
    }

    headers = {
        'Authorization': f'Bearer {YELP_API_KEY}'
    }

    # rate-limit API calls to avoid restriction
    time.sleep(1.0) # optionally, increase delay between requests to 2.0

    response = requests.get(url, params=params, headers=headers)
    if response.status_code != 200:
        print("Request failed. Status code:", response.status_code)
    else:
        data = response.json()

    yelp_results = data.get('businesses', [])
    for business in yelp_results:
        results.append({
            'Bike Station Latitude': latitude,
            'Bike Station Longitude': longitude,
            'Yelp Name': business.get('name'),
            'Distance': business.get('distance'),
            'Review count': business.get('review_count'),
            'Rating': business.get('rating'),
            'Price': business.get('price')
        })

    print(f"Processed station {index + 1} of {len(df)}")

Processed station 1 of 263
Processed station 2 of 263
Processed station 3 of 263
Processed station 4 of 263
Processed station 5 of 263
Processed station 6 of 263
Processed station 7 of 263
Processed station 8 of 263
Processed station 9 of 263
Processed station 10 of 263
Processed station 11 of 263
Processed station 12 of 263
Processed station 13 of 263
Processed station 14 of 263
Processed station 15 of 263
Processed station 16 of 263
Processed station 17 of 263
Processed station 18 of 263
Processed station 19 of 263
Processed station 20 of 263
Processed station 21 of 263
Processed station 22 of 263
Processed station 23 of 263
Processed station 24 of 263
Processed station 25 of 263
Processed station 26 of 263
Processed station 27 of 263
Processed station 28 of 263
Processed station 29 of 263
Processed station 30 of 263
Processed station 31 of 263
Processed station 32 of 263
Processed station 33 of 263
Processed station 34 of 263
Processed station 35 of 263
Processed station 36 of 263
P

**TASK**: Put your parsed results into a DataFrame

In [30]:
df_yelp_restobars = pd.DataFrame(results)
df_yelp_restobars

Unnamed: 0,Bike Station Latitude,Bike Station Longitude,Yelp Name,Distance,Review count,Rating,Price
0,59.908055,10.747998,Einer,257.816462,5,5.0,
1,59.908055,10.747998,Statholderens Mat & Vinkjeller,303.633698,7,4.9,$$$$
2,59.908055,10.747998,Benares Indisk Restaurant and Bar,818.268331,5,4.8,$$
3,59.908055,10.747998,Statholdergaarden og Statholderens Mat og Vink...,294.249673,19,4.7,$$$$
4,59.908055,10.747998,Girotondo,794.542512,7,4.7,
...,...,...,...,...,...,...,...
12910,59.920956,10.714056,Oslo Mikrobryggeri,693.117781,31,4.2,$$
12911,59.920956,10.714056,Café Elise,505.255705,5,4.2,$$
12912,59.920956,10.714056,Kaffebrenneriet,527.618735,5,4.2,$
12913,59.920956,10.714056,Sawan,268.458155,17,4.2,$$$$


In [31]:
# saving df to csv
save_dataframe_to_csv(df_yelp_restobars, 'yelp_restobars.csv')

DataFrame successfully saved to yelp_restobars.csv


**Other POIs (points of interest)**: museums

In [4]:
# function for making a call to Yelp API

def get_yelp(latitude, longitude, radius, api_key, categories):
    url = "https://api.yelp.com/v3/businesses/search"

    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {YELP_API_KEY}"
    }
    params = {
        "latitude": latitude,
        "longitude": longitude, #changed to lowercase
        "radius": radius,
        "categories": categories,
        "sort_by": "distance",
        "limit": 1 # changed to 1 to get only the closest museum, due to rate limits on Yelp API
    }

    # testing returned validation error with latitude and longitude, part below was added for more detailed error info
    try:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()  # raises an HTTPError for bad responses
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        print(f"Response content: {response.text}")
        return None

In [46]:
# testing the function
test = get_yelp(latitude=59.915451, longitude=10.75833, radius=1000, api_key=YELP_API_KEY, categories="museums")
if test:
    pprint(test)
else: 
    print("Failed to retrieve data from Yelp API")

{'businesses': [{'alias': 'jødisk-museum-i-oslo-oslo',
                 'attributes': {'business_temp_closed': None,
                                'waitlist_reservation': None},
                 'business_hours': [{'hours_type': 'REGULAR',
                                     'is_open_now': True,
                                     'open': [{'day': 1,
                                               'end': '1500',
                                               'is_overnight': False,
                                               'start': '1000'},
                                              {'day': 2,
                                               'end': '1500',
                                               'is_overnight': False,
                                               'start': '1000'},
                                              {'day': 3,
                                               'end': '1900',
                                               'is_overnight': False,
   

Getting same POI details, this time from Yelp

In [6]:
# function to process each bike station:

def get_museums_near_station(row, api_key, radius=1000, categories='museums'):
    latitude = row['latitude']
    longitude = row['longitude']
    
    response = get_yelp(latitude, longitude, radius, api_key, categories)
    
    museums = []
    for business in response.get('businesses', []):  # Yelp uses 'businesses' instead of 'results'
        museums.append({
            'bike_station_name': row['name'],
            'bike_station_lat': latitude,
            'bike_station_lon': longitude,
            'museum_name': business['name'],
            'museum_lat': business['coordinates']['latitude'],  # Yelp uses 'coordinates' instead of 'geocodes'
            'museum_lon': business['coordinates']['longitude'],
            'distance': business['distance']
        })
    
    return museums

In [7]:
# getting museums near all bike stations from Yelp
all_museums = []
total_stations = len(df)

for index, row in df.iterrows():
    museums = get_museums_near_station(row, YELP_API_KEY)
    all_museums.extend(museums)
    print(f"Processed station {index + 1} of {total_stations}")
    time.sleep(1)  # to avoid hitting rate limits

Processed station 1 of 263
Processed station 2 of 263
Processed station 3 of 263
Processed station 4 of 263
Processed station 5 of 263
Processed station 6 of 263
Processed station 7 of 263
Processed station 8 of 263
Processed station 9 of 263
Processed station 10 of 263
Processed station 11 of 263
Processed station 12 of 263
Processed station 13 of 263
Processed station 14 of 263
Processed station 15 of 263
Processed station 16 of 263
Processed station 17 of 263
Processed station 18 of 263
Processed station 19 of 263
Processed station 20 of 263
Processed station 21 of 263
Processed station 22 of 263
Processed station 23 of 263
Processed station 24 of 263
Processed station 25 of 263
Processed station 26 of 263
Processed station 27 of 263
Processed station 28 of 263
Processed station 29 of 263
Processed station 30 of 263
Processed station 31 of 263
Processed station 32 of 263
Processed station 33 of 263
Processed station 34 of 263
Processed station 35 of 263
Processed station 36 of 263
P

API access limit was reached - probably due to multiple calls per station, or there might be some redundant calls in the code. Steps taken to mitigate this issue:

- **code review** - I checked `get_museums_near_station` function to make sure it's making only one API call per station
- **handling errors** - I believe that there are no other issues besides rate limit, but will implement additional code to catch and log specific errors
- **temporary solution** - Given a project deadline and a reached daily limit of 300 calls, I would save results of processed stations and skip them when I restart

**Note**: In order to normalize data for easier comparison, I set a limit on both Foursquare and Yelp API. Initially Foursquare had 3921 museums, which is insanely inaccurate and was caused by duplicates. After running query with limits, both APIs returned same results for museums. 
          As for restaurants & bars, a different approach was used, which yielded 263 results in Foursquare output, and 12915 in Yelp. More on that in "Comparing Results".

In [None]:
# now we convert results to df
museums_df = pd.DataFrame(all_museums)

In [19]:
# saving the DataFrame to CSV
save_dataframe_to_csv(museums_df, 'yelp_museums.csv')

DataFrame successfully saved to yelp_museums.csv


In [20]:
# read the CSV file to verify its contents
verification_df = pd.read_csv('yelp_museums.csv')
print(verification_df.head())
print(f"Total rows: {len(verification_df)}")

    bike_station_name  bike_station_lat  bike_station_lon  \
0            Langkaia         59.908055         10.747998   
1    Spikersuppa Vest         59.913720         10.735887   
2    Vippetangen vest         59.903989         10.740627   
3    Kjeld Stubs gate         59.912711         10.735595   
4  Studentparlamentet         59.920852         10.733357   

                                 museum_name  museum_lat  museum_lon  distance  
0                                 Filmmuseet   59.909726   10.745849       236  
1     The Viking Planet Oslo: Hours, Address   59.913194   10.734102       115  
2                             Forsvarsmuseet   59.904487   10.740562        96  
3     The Viking Planet Oslo: Hours, Address   59.913194   10.734102        98  
4  Litteraturhuset - The House of Literature   59.920356   10.728641       268  
Total rows: 232


> **Stats Break!**
>
> Every day, Copenhagen’s cyclists covered a total of 1.4 million km in 2016.

# Comparing Results

**QUESTION**: Which API provided you with more complete data? Provide an explanation. 

*Both Foursquare and Yelp provided complete data for museums, with zero missing values.*
*However, there are some differences in restaurants & bars data.*
*Yelp has a **staggering 12,915 entries**, with Foursquare returning a modest 263 results.*

*Check more details in a table below:*


| API | ENTRIES | MISSING VALUES | ADDITIONAL INFO |
|---------|----------------------|----------------|----------------|
| Foursquare | 263 | 0.38% for "Rating"<br>0.76% for "Price" | provides "Popularity" |
| Yelp | 12,915 | 32.82% for "Price" | provides "Review" count and "Rating" for all entries |

*At first, it seems that Yelp may have a broader coverage in Oslo, but data should be cleaned first before making further assumptions. Yelp is missing price info for about a third of entries*

In [21]:
# first we would load the data and to check the differences

df_foursquare_museums = pd.read_csv('foursquare_museums.csv')
df_foursquare_restobars = pd.read_csv('foursquare_restobars.csv')
df_yelp_museums = pd.read_csv('yelp_museums.csv')
df_yelp_restobars = pd.read_csv('yelp_restobars.csv')
df_oslo_bikes = pd.read_csv('oslo_bikes_data.csv')

# printing basic info for each dataset
for name, df in [('Foursquare Museums', df_foursquare_museums), 
                 ('Foursquare Restobars', df_foursquare_restobars),
                 ('Yelp Museums', df_yelp_museums),
                 ('Yelp Restobars', df_yelp_restobars)]:
    print(f"\n{name}:")
    print(df.info())
    print(f"Shape: {df.shape}")
    print(f"Percentage of missing values:\n{df.isnull().sum() / len(df) * 100}")
    
    print("\nDescriptive statistics:")
    print(df.describe())


Foursquare Museums:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232 entries, 0 to 231
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   bike_station_name  232 non-null    object 
 1   bike_station_lat   232 non-null    float64
 2   bike_station_lon   232 non-null    float64
 3   museum_name        232 non-null    object 
 4   museum_lat         232 non-null    float64
 5   museum_lon         232 non-null    float64
 6   distance           232 non-null    int64  
dtypes: float64(4), int64(1), object(2)
memory usage: 12.8+ KB
None
Shape: (232, 7)
Percentage of missing values:
bike_station_name    0.0
bike_station_lat     0.0
bike_station_lon     0.0
museum_name          0.0
museum_lat           0.0
museum_lon           0.0
distance             0.0
dtype: float64

Descriptive statistics:
       bike_station_lat  bike_station_lon  museum_lat  museum_lon    distance
count        232.000000        232.0

Comparing number of results

**Data volume**: _Initially, we encountered a significant discrepancy in the number of results. Foursquare_*returned 3,921 results for museums, while Yelp returned 12,915 results for restaurants and bars. This prompted us to limit the results to one per bike station for a more balanced comparison.*

In [22]:
print(f"Number of Foursquare Museums: {len(df_foursquare_museums)}")
print(f"Number of Yelp Museums: {len(df_yelp_museums)}")
print(f"Number of Foursquare Restobars: {len(df_foursquare_restobars)}")
print(f"Number of Yelp Restobars: {len(df_yelp_restobars)}")

Number of Foursquare Museums: 232
Number of Yelp Museums: 232
Number of Foursquare Restobars: 263
Number of Yelp Restobars: 12915


Get the top 10 restaurants according to their rating

In [26]:
# first, let's check the column names in both dfs:

print("Foursquare DataFrame columns:")
print(df_foursquare_restobars.columns)

print("\nYelp DataFrame columns:")
print(df_yelp_restobars.columns)

Foursquare DataFrame columns:
Index(['Latitude', 'Longitude', 'Foursquare Name', 'Distance', 'Popularity',
       'Rating', 'Price'],
      dtype='object')

Yelp DataFrame columns:
Index(['Bike Station Latitude', 'Bike Station Longitude', 'Yelp Name',
       'Distance', 'Review count', 'Rating', 'Price'],
      dtype='object')


In [27]:
# now that we know that `Rating` and `Distance` in both APIs begin with a cap letter, we can proceed
# for Foursquare
top_10_foursquare = df_foursquare_restobars.sort_values('Rating', ascending=False).head(10)
print("Top 10 Restaurants (Foursquare):")
print(top_10_foursquare[['Foursquare Name', 'Rating', 'Distance']])

# for Yelp
top_10_yelp = df_yelp_restobars.sort_values('Rating', ascending=False).head(10)
print("\nTop 10 Restaurants (Yelp):")
print(top_10_yelp[['Yelp Name', 'Rating', 'Distance']])

Top 10 Restaurants (Foursquare):
                                       Foursquare Name  \
187  0          Supreme Roastworks\n1    Le Benjami...   
161  0          Supreme Roastworks\n1    Le Benjami...   
77   0          Supreme Roastworks\n1    Le Benjami...   
97   0            Supreme Roastworks\n1      Le Ben...   
117  0            Supreme Roastworks\n1      Le Ben...   
74   0            Supreme Roastworks\n1      Le Ben...   
81   0            Supreme Roastworks\n1      Le Ben...   
248  0          Supreme Roastworks\n1    Le Benjami...   
84   0          Supreme Roastworks\n1    Le Benjami...   
177  0            Supreme Roastworks\n1      Le Ben...   

                                                Rating  \
187  0    9.4\n1    9.4\n2    9.4\n3    9.3\n4    9...   
161  0    9.4\n1    9.4\n2    9.4\n3    9.3\n4    9...   
77   0    9.4\n1    9.4\n2    9.4\n3    9.3\n4    9...   
97   0    9.4\n1    9.4\n2    9.4\n3    9.3\n4    9...   
117  0    9.4\n1    9.4\n2    9.4\n3  

Looks like Foursquare data has multiple entries combined into single rows, which causes a strange output. Perhaps that's due to how data was initially processed or stored. As a next step, this can be further cleaned to provide a better comparison. For now we can generate a **List of Top 10 Restaurants & Bars in Oslo** (bike-accessible) using Yelp data:

In [36]:
# Select and rename columns for display
top_10_display = top_10_yelp[['Yelp Name', 'Rating', 'Review count', 'Distance', 'Price']]
top_10_display = top_10_yelp.rename(columns={
    'Yelp Name': 'Restaurant Name',
    'Review count': 'Reviews'
})

# format the `Distance`` column to show only 2 decimal places
top_10_display['Distance'] = top_10_display['Distance'].round(2)

# show the table
print("Top 10 Restaurants and Bars in Oslo (Yelp Data):")
print(top_10_display.to_string(index=False))

Top 10 Restaurants and Bars in Oslo (Yelp Data):
 Bike Station Latitude  Bike Station Longitude                 Restaurant Name  Distance  Reviews  Rating Price
             59.908055               10.747998                           Einer    257.82        5     5.0   NaN
             59.920259               10.760629                         Bon Lío    223.40        3     5.0   NaN
             59.920259               10.760629 Stortorvet Charlies Kebab House   1229.78        2     5.0   NaN
             59.920259               10.760629                     Erlik Kaffe   1179.34        2     5.0   NaN
             59.917085               10.712880                       Silk Road    523.40        2     5.0   NaN
             59.917085               10.712880            De La Casa Pasta Bar    372.99        2     5.0   NaN
             59.917085               10.712880                      Norð & Vin    451.49        2     5.0    $$
             59.917085               10.712880         

In [37]:
from IPython.display import display

# set pandas options for display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

# display the DataFrame
display(top_10_display)

Unnamed: 0,Bike Station Latitude,Bike Station Longitude,Restaurant Name,Distance,Reviews,Rating,Price
0,59.908055,10.747998,Einer,257.82,5,5.0,
4173,59.920259,10.760629,Bon Lío,223.4,3,5.0,
4202,59.920259,10.760629,Stortorvet Charlies Kebab House,1229.78,2,5.0,
4206,59.920259,10.760629,Erlik Kaffe,1179.34,2,5.0,
10700,59.917085,10.71288,Silk Road,523.4,2,5.0,
10699,59.917085,10.71288,De La Casa Pasta Bar,372.99,2,5.0,
10695,59.917085,10.71288,Norð & Vin,451.49,2,5.0,$$
10694,59.917085,10.71288,Gioia,453.52,2,5.0,
10683,59.917085,10.71288,Wu Sushi,525.03,3,5.0,
10681,59.917085,10.71288,Emilio's Vinbar,247.59,3,5.0,$$


> **DID YOU KNOW?**
<br> The [highest average bicycle price](https://www.statista.com/statistics/395884/bicycle-average-prices-in-the-european-union-eu-by-country/) in 2016 was in the Netherlands (1,010 EUR), the 2nd highest in Denmark at 700 EUR, and the 3rd highest in Austria at 660 EUR. That's a FORTUNE!