# Purpose

This notebook is intended as a submission for the IBM professional data science certificate, it is not an academic resource, nor is it a complete product. 

# Introduction

You are in a new unfamiliar area, and looking to grab a coffee. Maybe you want to have a quiet chat, maybe you want to work, maybe you want to enjoy some good tunes; but how do you decide where to go?
With the recommender model I propose, you simply input a location and distance range then you and are recommended one, and only one place to go to. This is great for indecisive people looking for something new to try


# Data gathering

Since the product is supposed to help users based on their location, age, gender, and what they are looking for, we would ideally ask the user for these details. 
For the purpose of this notebook however, let us have a persona of a 20 year old female, at the Dubai mall, looking for somewhere with good music within 500 meteres.

In [1]:
user_Location="Manhattan mall" 
user_Preference=("quiet")
user_Age=22
user_Range=500 #in meters. 

## Begin by finding the coordinates of the dubai mall, then finding all the cafes within the specified range

In [2]:
#First transform the location into coordinates
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="UCafe")
location = geolocator.geocode(user_Location)
coordinates = (location.latitude, location.longitude)
coordinates

(40.74915154999999, -73.98927991787946)

In [3]:
#Setup foursquare credentials
CLIENT_ID = 'insert your id here as a string' # your Foursquare ID
CLIENT_SECRET = 'insert your secret here as a string' # your Foursquare Secret
VERSION = '20200506'
LIMIT = 500


In [4]:
#Set up search query
search_query= 'Cafe,Coffee'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, location.latitude, location.longitude, VERSION, search_query, user_Range, LIMIT)


In [5]:
#Search for results
import requests
results = requests.get(url).json()
#results

In [6]:
# assign relevant part of JSON to venues
import pandas
from pandas.io.json import json_normalize
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()


  import sys


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,...,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,location.crossStreet,venuePage.id,location.neighborhood
0,5d03930e1c0b34002cd41e53,Le Cafe Coffee,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1588780820,False,40th Street,40.754242,-73.986532,"[{'label': 'display', 'lat': 40.754242, 'lng':...",612,...,"[40th Street, New York, NY 10018, United States]",1961430.0,https://www.seamless.com/menu/le-cafe-coffee-1...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,,
1,4da4881522a5f04d67b03a55,Cafe R,"[{'id': '4bf58dd8d48988d120951735', 'name': 'F...",v-1588780820,False,116 W 32nd St,40.748753,-73.989755,"[{'label': 'display', 'lat': 40.74875252113379...",59,...,"[116 W 32nd St (btwn 6th & 7th Ave), New York,...",,,,,,,btwn 6th & 7th Ave,,
2,4bf5663994af2d7f15a93b72,Andrews Coffee Shop,"[{'id': '4bf58dd8d48988d147941735', 'name': 'D...",v-1588780820,False,463 7th Ave,40.751499,-73.990182,"[{'label': 'display', 'lat': 40.75149915407228...",272,...,"[463 7th Ave (at W 35th St.), New York, NY 100...",,,,,,,at W 35th St.,58938570.0,
3,4b9fa61df964a520943137e3,FCB Coffee Bar,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1588780820,False,100 W 33rd St,40.748792,-73.988575,"[{'label': 'display', 'lat': 40.74879242943224...",71,...,"[100 W 33rd St, New York, NY 10001, United Sta...",,,,,,,,,
4,4ff2459fe4b0a148f89af102,Gregorys Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1588780820,False,874 Avenue of the Americas,40.747728,-73.989083,"[{'label': 'display', 'lat': 40.7477283, 'lng'...",159,...,"[874 Avenue of the Americas (at W 31st St), Ne...",,,,,,,at W 31st St,,


### Clean the data 

In [7]:
#Use only relevant columns
dataframe = dataframe[['id','name','location.distance']]
dataframe.head()

Unnamed: 0,id,name,location.distance
0,5d03930e1c0b34002cd41e53,Le Cafe Coffee,612
1,4da4881522a5f04d67b03a55,Cafe R,59
2,4bf5663994af2d7f15a93b72,Andrews Coffee Shop,272
3,4b9fa61df964a520943137e3,FCB Coffee Bar,71
4,4ff2459fe4b0a148f89af102,Gregorys Coffee,159


In [8]:
#Remove rows with null values
dataframe.dropna()

Unnamed: 0,id,name,location.distance
0,5d03930e1c0b34002cd41e53,Le Cafe Coffee,612
1,4da4881522a5f04d67b03a55,Cafe R,59
2,4bf5663994af2d7f15a93b72,Andrews Coffee Shop,272
3,4b9fa61df964a520943137e3,FCB Coffee Bar,71
4,4ff2459fe4b0a148f89af102,Gregorys Coffee,159
5,4aa52d50f964a520834720e3,Stumptown Coffee Roasters,382
6,4c20b685132f0f47860ea796,Coffee Bagel Roll Cart,46
7,50707408e4b0dc1abfeddd39,Herald Square Café,164
8,4b6c8e66f964a520dc422ce3,Cafe Rustico II,329
9,49e8b08bf964a5206c651fe3,Captain Cafe,477


In [9]:
#Add columns to dataframe to find reviews,and tips
import numpy as np
dataframe["Tips"]=np.nan
dataframe["MostFrequentAge"]=np.nan
dataframe.head()

Unnamed: 0,id,name,location.distance,Tips,MostFrequentAge
0,5d03930e1c0b34002cd41e53,Le Cafe Coffee,612,,
1,4da4881522a5f04d67b03a55,Cafe R,59,,
2,4bf5663994af2d7f15a93b72,Andrews Coffee Shop,272,,
3,4b9fa61df964a520943137e3,FCB Coffee Bar,71,,
4,4ff2459fe4b0a148f89af102,Gregorys Coffee,159,,


In [10]:
#For each venue id, get the tips
for index,row in dataframe.iterrows():
    venue_id=row['id']
    url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, 1)
    results = requests.get(url).json()
    print(results)
    tips = results['response']['tips']['items']
    print(row['name'])
    print("Tips\n",tips)


{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5eb2dd35dd0f85001b3ab0f0'}, 'response': {}}


KeyError: 'tips'

**Since I am only able to access two tips per day total via the api, and not able to get demographic stats, I decided to mock them**

In [11]:
from random import randint, choice
mock_tips=('Good music','great food', 'quiet', 'bad service','quick service','authentic coffee','original brews', 'family fun', 'work atmosphere')
df_tips =[] #to be added to the dataframe
df_age=[] #To be added to data frame
num_rows = len(dataframe.index)
for i in range(0,num_rows):
    df_tips.append(choice(mock_tips))
    temp_age=randint(10,91)
    df_age.append(temp_age-(temp_age%10)) #generate a random age groups binned into 10s from 10 to 90
    
dataframe.drop('Tips',axis=1,inplace=True)
dataframe.drop('MostFrequentAge',axis=1,inplace=True)
dataframe['tips']=df_tips
dataframe['Age']=df_age


In [12]:
dataframe.head()

Unnamed: 0,id,name,location.distance,tips,Age
0,5d03930e1c0b34002cd41e53,Le Cafe Coffee,612,work atmosphere,40
1,4da4881522a5f04d67b03a55,Cafe R,59,Good music,70
2,4bf5663994af2d7f15a93b72,Andrews Coffee Shop,272,bad service,10
3,4b9fa61df964a520943137e3,FCB Coffee Bar,71,authentic coffee,80
4,4ff2459fe4b0a148f89af102,Gregorys Coffee,159,quiet,20


# Data analytics
Now we can begin running different algorithms such as KNN to reccomend something based on tips and age.

In [13]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import preprocessing
#Preprocess data
X = dataframe[['tips', 'Age']].values
Y = dataframe['name']
X[0]
le_tips = preprocessing.LabelEncoder()
le_tips.fit(mock_tips)
X[:,0] = le_tips.transform(X[:,0]) 

In [14]:
reccomender = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
reccomender.fit(X,Y)


DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',
                       max_depth=4, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

**Finally, reccomend a place to the user**

In [15]:
le_name_mapping = dict(zip(le_tips.classes_, le_tips.transform(le_tips.classes_)))
le_name_mapping
processed_user_pref=le_name_mapping.get(user_Preference,0)
reccomendation = reccomender.predict([[processed_user_pref,user_Age]])


**This finally gives us a reccomendation of**

In [16]:
reccomendation[0]

'Garden Cafe'