<h1>Report</h1>
<li>Introduction where you discuss the business problem and who would be interested in this project.
<li>Data where you describe the data that will be used to solve the problem and the source of the data.
<li>Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
<li>Results section where you discuss the results.
<li>Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
<li>Conclusion section where you conclude the report.

The idea of this project will be based on the second provided idea: location recommendation for a restaurant.<br>
More specifically, we will recommend locations (in Athens, Greece) for overnight foodcarts/trucks/etc. to service customers in the after hours.<br>
Our audience would be foodcart/truck owners or fast-food managers looking for the best locations to open or setup their store.<br>
Foodcarts could take advantage of their mobility and move to other busier locations during the day, derived from modified queries using the same algorithm.

The needed data will be queried from Foursquare making use of the provided API. Specifically, the data to be requested will concern venue locations (Athens, Greece), categories (bar, clubs, non-food venues), hours (after midnight) and trends (busier venues will be preferred).

In [1]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.17.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


In [2]:
import numpy as np
import pandas as pd
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
print('Libraries imported.')

Libraries imported.


In [3]:
address = 'Athens, Greece'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = 37.976386  #location.latitude
longitude = 23.726139 #location.longitude
print('The geograpical coordinate of Athens, Greece are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Athens, Greece are 37.976386, 23.726139.


In [4]:
CLIENT_ID = '3DO3N550RX4XMAL3WG2TOTCILJBUYE2DDOMIKA410AP4KRKC' # your Foursquare ID
CLIENT_SECRET = 'YNDFTPZXCG3NGCT5MUGRCEMPKBPZMPZPY2LYWOSQ4YLOEBOB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3DO3N550RX4XMAL3WG2TOTCILJBUYE2DDOMIKA410AP4KRKC
CLIENT_SECRET:YNDFTPZXCG3NGCT5MUGRCEMPKBPZMPZPY2LYWOSQ4YLOEBOB


In [5]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [6]:
LIMIT = 100
radius = 50000
section = 'sights'
query = 'coffee'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, LIMIT)
url_id = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, LIMIT, section)
url_cat = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION)
url_mu = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, LIMIT, query)

results = requests.get(url_mu).json()
results

{'meta': {'code': 429,
  'errorDetail': 'Quota exceeded',
  'errorType': 'quota_exceeded',
  'requestId': '5bd03172db04f55c3f02339b'},
 'response': {}}

In [7]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.id']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

np.unique(nearby_venues.categories.values)
#nearby_venues

KeyError: 'groups'

In [None]:
#approvedCats = ['Art Gallery', 'Art Museum', 'Bar', 'Bistro', 'Café', 'Cocktail Bar', 'Coffee Shop', 
#                'Event Space', 'Gastropub', 'Hotel', 'Hookah Bar', 'Historic Site', 'Hotel Bar', 
#                'Indie Movie Theater', 'Irish Pub', 'Lounge', 'Movie Theater', 'Museum', 
#                'Music Venue', 'Nightclub', 'Ouzeri', 'Plaza', 'Pub', 'Park', 'Other Nightlife', 
#                'Performing Arts Venue', 'Roof Deck', 'Theater', 'Whisky Bar', 'Wine Bar']
approvedCats = ['Historic Site', 'History Museum', 'Art Museum', 'Monument / Landmark' , 'Museum']
approved_venues = nearby_venues[nearby_venues['categories'].isin(approvedCats)]
#np.unique(approved_venues.categories.values)
approved_venues.reset_index(inplace=True)

#pd.get_dummies(approved_venues.categories)#.value_counts())

<H1>Scrape world capitals coordinates

In [None]:
# Load website source code and retain only code containing table data
website_url = requests.get('https://www.jasom.net/list-of-capital-cities-with-latitude-and-longitude').text
website_table = website_url.split('<pre>')
website_table = website_table[1].split('</pre>')
# Split into rows
website_table = website_table[0].split("\n")

# Split rows into columns
splitStr = "</td>\\n<td>"
website_list = []
for row in website_table:
    row_df = row.split(",")
    website_list.append(row_df)

# Place cells into dataframe, clear strings from HTML tags, drop NaN cells
website_df = pd.DataFrame(website_list, columns=['Country', 'Capital', 'Latitude', 'Longitude'])
website_df.drop([0,1], inplace=True)
website_df.reset_index(drop=True, inplace=True)
website_df["Capital"] = website_df["Country"] + ", " + website_df["Capital"]
website_df.drop(columns=['Country'], inplace=True)
website_df.head()

In [None]:
def getSites(names, latitudes, longitudes, query):
    
    venues_list=[]
    LIMIT = 100
    radius = 5000

    for name, lat, lng in zip(names, latitudes, longitudes):
        try:
            print(name)
        
            # create the API request URL
            urlmu = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT, query)
            
            # make the GET request
            results = requests.get(urlmu).json()["response"]['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            break

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Capital', 
                  'CptLat', 
                  'CptLon', 
                  'Shop', 
                  'ShpLat', 
                  'ShpLon', 
                  'ShpCat']
    
    return(nearby_venues)

In [None]:
capital_cof = getSites(names=website_df['Capital'],#.head(5),
                                latitudes=website_df['Latitude'],
                                longitudes=website_df['Longitude'],
                                query='coffee')

In [None]:
capital_cof
#np.unique(capital_cof['Site Category'].values)
capital_sum_cof = capital_cof.groupby(['Capital']).count()
capital_sum_cof.drop(columns=['CptLat','CptLon','Shop','ShpLat','ShpLon'], inplace=True)

In [None]:
capital_tea = getSites(names=website_df['Capital'],#.head(5),
                                latitudes=website_df['Latitude'],
                                longitudes=website_df['Longitude'],
                                query='tea')

In [None]:
capital_tea
#np.unique(capital_tea['Site Category'].values)
capital_sum_tea = capital_tea.groupby(['Capital']).count()
capital_sum_tea.drop(columns=['CptLat','CptLon','Shop','ShpLat','ShpLon'], inplace=True)

In [None]:
capital_sum = capital_sum_cof.join(capital_sum_tea, how='inner', lsuffix='_coffee', rsuffix='_tea')
capital_sum['RatioCT'] = capital_sum['ShpCat_coffee']/capital_sum['ShpCat_tea']
capital_sum