## Business Problem.

Suppose you are a person who likes to go to Cafes, Malls and Restaurants. Usually, when you arrive in a new city or neighborhood you like to receive suggestions for nearby venues that may be of interest to you, but this time you don't want to visit another Café, Shopping or Restaurant, you want to know other venues that people with preferences similar to yours usually go to and that you have a good chance of enjoying it too.

In this project we are going to try to solve this problem developing a recommendation system using the APIs provided by Foursquare.

In [894]:
#!conda install -c conda-forge geopy --yes
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab

In [2]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors
from datetime import datetime
from math import sqrt
print("**************** imported *****************")

pd.pandas.set_option('display.max_columns', None)

**************** imported *****************


## Data

To get data we will use the APIs provided by Foursquare. These APIs give us relevant informations about the venues, such as name, address and category (kind of venue). They also allow us to explore the lists of a particular user, which are used by the user to group places of interest.

By the end of collecting data we want to have a list of users and their places of interest, which will be used as base to our recommendation system. For this we will execute the following steps:

- Search for the venues in a region - the same region where you just arrived and want to receive recommendation of venues to visit;

- From these venues we will search the users that liked ou added these venues in their lists;

- In the last step we will analyze the lists of each user to discover their interests, preferences and build our recommendation system.

### Foursquare API call functions

In [23]:
OAUTHTOKEN = ''
VERSION = '20180605'

In [16]:
def search_venues_by_address(address):
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    lat = location.latitude
    lng = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(address, lat, lng))

    LIMIT = 999999
    radius = 250

    url = 'https://api.foursquare.com/v2/venues/explore?&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
        OAUTHTOKEN,
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)

    # make the GET request
    return requests.get(url).json()

def get_venues_result_toDf(results):
    venues=[]

    for item in results["response"]["groups"][0]["items"]:
        for category in item["venue"]["categories"]:
            venues.append((
                item["venue"]['id'],
                item["venue"]['name'],
                item["venue"]["location"].get("address"),
                item["venue"]['location'].get('postalCode'),
                item["venue"]['location'].get('distance'),
                item["venue"]['location'].get('cc'),
                item["venue"]['location'].get('city'),
                item["venue"]['location'].get('state'),
                item["venue"]['location'].get('country'),
                item["venue"]['location'].get('formattedAddress'),
                item["venue"]['location'].get('lat'),
                item["venue"]['location'].get('lng'),
                category['id'],
                category.get('name'),
                category.get('pluralName'),
                category.get('shortName'),
                category.get('primary')
            ))
            
    columns = [ 'ID', 
                'Name',
                'Address',
                'Postal Code',
                'Distance',
                'CC',
                'City',
                'State',
                'Country',
                'Formatted Address',
                'Latitude',
                'Longitude',
                'Category Id',
                'Category Name',
                'Category Plural Name',
                'Category Short Name',
                'Category Primary']

    return venues, columns

def get_venues_id(results):
    venues=[]

    for item in results["response"]["groups"][0]["items"]:
        for category in item["venue"]["categories"]:
            venues.extend((
                item["venue"]['id'],
            ))
            
    return venues

def get_people_who_like_venues_df(venues_id_list):
    
    people_who_liked_a_venue = []

    for id in venues_id_list:
        url = 'https://api.foursquare.com/v2/venues/{}/likes?&oauth_token={}&v={}'.format(
            id,
            OAUTHTOKEN,
            VERSION)

        results = requests.get(url).json()

        #items contains users who liked the venue
        if ("items" in results["response"]["likes"]):
            for people in results["response"]["likes"]["items"]:
                people_who_liked_a_venue.append((
                    id,
                    people.get('id'),
                    people.get('firstName'),
                    people.get('lastName'),
                    people.get('gender')
                ))

    columns = [ 'Venue Id',
                'People Id',
                'First Name',
                'Last Name',
                'Gender']

    return people_who_liked_a_venue, columns

def get_lists_venue_is_on(venues):
    listed = []
    for id in venues:
        url = 'https://api.foursquare.com/v2/venues/{}/listed?&oauth_token={}&v={}'.format(
            id,
            OAUTHTOKEN,
            VERSION)

        results = requests.get(url).json()

        for group in results["response"]["lists"]["groups"]:
            for item in group["items"]:
                listed.append((
                    id,
                    group.get("type"),
                    group.get("name"),
                    item.get("id"),
                    item.get("name"),
                    item.get("description"),
                    item.get("type"),
                    item.get("public"),
                    item.get("collaborative"),
                    item.get("canonicalUrl"),
                    item.get("createdAt"),
                    item.get("updatedAt"),
                    item["followers"].get("count"),
                    item["user"].get("id"),
                    item["user"].get("firstName"),
                    item["user"].get("lastName"),
                    item["user"].get("gender"),

                ))
                
    columns= [  'Venue Id',
                'Group Type',
                'Group Name',
                'List ID',
                'List Name',
                'List Description',
                'List Type',
                'List is Public',
                'List is Collaborative',
                'List URL',
                'List Created At',
                'List Updated At',
                'List Followers Count',
                'List Creator ID',
                'List Creator First Name',
                'List Creator Last Name',
                'List Creator Gender']
    
    return listed, columns

def get_lists_details(listId):
    lists_details = []
    list_followers = []

    url = 'https://api.foursquare.com/v2/lists/{}?&oauth_token={}&v={}'.format(
        listId,
        OAUTHTOKEN,
        VERSION)

    results = requests.get(url).json()

    if ("items" in results["response"]["list"]["followers"]):
        for follower in results["response"]["list"]["followers"]["items"]:
            list_followers.append((
                listId,
                follower.get("id"),
                follower.get("firstName"),
                follower.get("lastName"),
                follower.get("gender"),
                follower.get("homeCity"),
                follower.get("bio")
            ))

    listName = results["response"]["list"].get("name")
    listDescription = results["response"]["list"].get("description")
    listType = results["response"]["list"].get("type")
    listUserId = results["response"]["list"]["user"].get("id")
    listUserFirstName = results["response"]["list"]["user"].get("firstName")
    listUserLastName = results["response"]["list"]["user"].get("lastName")
    listUserGender = results["response"]["list"]["user"].get("gender")
    listIsPublic = results["response"]["list"].get("public")
    listUrl = results["response"]["list"].get("canonicalUrl")
    listCreatedAt = results["response"]["list"].get("createdAt")
    listUpdatedAt = results["response"]["list"].get("updatedAt")

    if (listCreatedAt != None):
        listCreatedAt = datetime.fromtimestamp(listCreatedAt)

    if (listUpdatedAt != None):
        listUpdatedAt = datetime.fromtimestamp(listUpdatedAt)

    for item in results["response"]["list"]["listItems"]["items"]:
        if ("venue" in item):
            for category in item["venue"]["categories"]:
                if ("tip" in item):
                    tipId = item["tip"].get("id")
                    tipText = item["tip"].get("text")
                    tipAgreeCount = item["tip"].get("agreeCount")
                    tipDisagreeCount = item["tip"].get("disagreeCount")
                    tipUserId = item["tip"]["user"].get("id")
                    tipUserFirstName = item["tip"]["user"].get("firstName")
                    tipUserGender = item["tip"]["user"].get("gender")
                    tipUserType = item["tip"]["user"].get("type")
                else:
                    tipId = 0
                    tipText = ""
                    tipAgreeCount = 0
                    tipDisagreeCount = 0
                    tipUserId = 0
                    tipUserFirstName = ""
                    tipUserGender = ""
                    tipUserType = ""
                lists_details.append((
                    listId,
                    listName,
                    listDescription,
                    listType,
                    listUserId,
                    listUserFirstName,
                    listUserLastName,
                    listUserGender,
                    listIsPublic,
                    listUrl,
                    listCreatedAt,
                    listUpdatedAt,
                    item["venue"].get('id'),
                    item["venue"].get('name'),
                    item["venue"]["location"].get("address"),
                    item["venue"]['location'].get('postalCode'),
                    item["venue"]['location'].get('distance'),
                    item["venue"]['location'].get('cc'),
                    item["venue"]['location'].get('city'),
                    item["venue"]['location'].get('state'),
                    item["venue"]['location'].get('country'),
                    item["venue"]['location'].get('formattedAddress'),
                    item["venue"]['location'].get('lat'),
                    item["venue"]['location'].get('lng'),
                    category['id'],
                    category.get('name'),
                    category.get('pluralName'),
                    category.get('shortName'),
                    category.get('primary'),
                    tipId,
                    tipText,
                    tipAgreeCount,
                    tipDisagreeCount,
                    tipUserId,
                    tipUserFirstName,
                    tipUserGender,
                    tipUserType
                ))

    list_followers_columns= ['List Id',
                            'Follower ID',
                            'Follower First Name',
                            'Follower Last Name',
                            'Follower Gender',
                            'Follower Home City',
                            'Follower Bio']

    lists_details_columns = ['List Id',
                            'List Name',
                            'List Description',
                            'List Type',
                            'List User ID',
                            'List User First Name',
                            'List User Last Name',
                            'List User Gender',
                            'List Is Public',
                            'List URL',
                            'List Created At',
                            'List Updated At',
                            'Venue ID',
                            'Venue Name',
                            'Venue Address',
                            'Venue Postal Code',
                            'Venue Distance',
                            'Venue CC',
                            'Venue City',
                            'Venue State',
                            'Venue Country',
                            'Venue Formatted Address',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category ID',
                            'Venue Category Name',
                            'Venue Category Plural Name',
                            'Venue Category Short Name',
                            'Venue Category Primary',
                            'Tip ID',
                            'Tip Text',
                            'Tip Agree Count',
                            'Tip Disagree Count',
                            'Tip User ID',
                            'Tip User First Name',
                            'Tip User Gender',
                            'Tip User Type']

    return list_followers, list_followers_columns, lists_details, lists_details_columns


def get_users_lists(users):
    listed = []
    for id in users:
        url = 'https://api.foursquare.com/v2/users/{}/lists?&oauth_token={}&v={}'.format(
            id,
            OAUTHTOKEN,
            VERSION)

        results = requests.get(url).json()

        for group in results["response"]["lists"]["groups"]:
            for item in group["items"]:
                listed.append((
                    id,
                    group.get("type"),
                    group.get("name"),
                    item.get("id"),
                    item.get("name"),
                    item.get("description"),
                    item.get("type")
                ))
                
    columns= [  'User Id',
                'Group Type',
                'Group Name',
                'List ID',
                'List Name',
                'List Description',
                'List Type']
    
    return listed, columns

#### Searching for venues existents in the region

Let's suppose you just arrived in your destination e ask to Foursquare for recommendation of nearby venues which can be of your interest.

Our first step is get from Foursquare API a list of all nearby venues in the region, in a specific range - 250 meters in this case.

In [5]:
results = search_venues_by_address('Savassi, BH')
list_venues_id = get_venues_id(results)
venus_details, columns = get_venues_result_toDf(results)
venues_result_details_Df = pd.DataFrame(venus_details)
venues_result_details_Df.columns = columns
print("Found places: ", venues_result_details_Df.shape[0])
venues_result_details_Df.head()

The geograpical coordinate of Savassi, BH are -19.9352205, -43.934446340197454.
Found places:  68


Unnamed: 0,ID,Name,Address,Postal Code,Distance,CC,City,State,Country,Formatted Address,Latitude,Longitude,Category Id,Category Name,Category Plural Name,Category Short Name,Category Primary
0,527a61ee11d270266b7786af,Casa Amora,"R. Paraíba, 941",30130-141,159,BR,Belo Horizonte,MG,Brasil,"[R. Paraíba, 941, Belo Horizonte, MG, 30130-141]",-19.934666,-43.933046,4bf58dd8d48988d16b941735,Brazilian Restaurant,Brazilian Restaurants,Brazilian,True
1,4c432d315faf76b0efe04820,Brilhantina Brechó,R Tomé de Souza 821 lj 3,30140-130,209,BR,Belo Horizonte,MG,Brasil,"[R Tomé de Souza 821 lj 3, Belo Horizonte, MG,...",-19.937087,-43.934669,4bf58dd8d48988d116951735,Antique Shop,Antique Shops,Antiques,True
2,54f5e636498eb1d2a634d987,Santa Rita,"R. Santa Rita Durão, 999",,97,BR,Belo Horizonte,MG,Brasil,"[R. Santa Rita Durão, 999 (R. Pernambuco), Bel...",-19.934479,-43.934937,52e81612bcbc57f1066b79f4,Buffet,Buffets,Buffet,True
3,57437527498e2cd167c53dbf,O Vegano,"R. Sta. Rita Durão, 985A",,96,BR,Belo Horizonte,MG,Brasil,"[R. Sta. Rita Durão, 985A (R. Pernambuco), Bel...",-19.934432,-43.934831,4bf58dd8d48988d1d3941735,Vegetarian / Vegan Restaurant,Vegetarian / Vegan Restaurants,Vegetarian / Vegan,True
4,5479c66a498e141c4cd00d2d,The Box CrossFit,"R. dos Inconfidentes, 911",30140-120,77,BR,Belo Horizonte,MG,Brasil,"[R. dos Inconfidentes, 911, Belo Horizonte, MG...",-19.935889,-43.934237,4bf58dd8d48988d175941735,Gym / Fitness Center,Gyms or Fitness Centers,Gym / Fitness,True


#### Looking for who liked the venues found previously

After getting the list of venues in the region, now we need to find the users who demonstrate be interesting for them.

First we will discover the users who liked these venues.

This is a value information but we can only view likes whose allowed public access to this information, and only few users do this.

In [6]:
list_users, columns = get_people_who_like_venues_df(list_venues_id)

In [7]:
people_who_liked_a_venue_Df = pd.DataFrame(list_users)
people_who_liked_a_venue_Df.columns = columns
print("Total of people: ", people_who_liked_a_venue_Df.shape[0])
people_who_liked_a_venue_Df.head(10)

Total of people:  33


Unnamed: 0,Venue Id,People Id,First Name,Last Name,Gender
0,4e78da431495f00a427569a5,58828058,Gesmari Zen,Taborda,female
1,4e78da431495f00a427569a5,72125320,Humberto,De Souza Faria,male
2,4e78da431495f00a427569a5,45737462,Cris,Oya,female
3,4bf6a42dabdaef3b324fa184,50957912,Sirley,Rufino,female
4,4bf6a42dabdaef3b324fa184,20580510,Augusto,Ribeiro,male
5,4bf6a42dabdaef3b324fa184,46820502,Robson,Paiva,male
6,5479d8fb498e61d2bc6d8269,93708804,Jose,carlos,male
7,5479d8fb498e61d2bc6d8269,35593244,Tharik,Ursine,male
8,5479d8fb498e61d2bc6d8269,128786548,Priscila,CnrCnr,female
9,4c485ef2972c0f4724c12621,25527245,Euri,Cruz,male


#### Looking for who added these venues in their lists

Next we will discover the users that added these venues to their lists

In [8]:
lists, columns = get_lists_venue_is_on(list_venues_id)

In [9]:
list_venue_is_on_Df = pd.DataFrame(lists)
list_venue_is_on_Df.columns= columns
print("Number of lists: ", list_venue_is_on_Df.shape[0])
list_venue_is_on_Df.head()

Number of lists:  95


Unnamed: 0,Venue Id,Group Type,Group Name,List ID,List Name,List Description,List Type,List is Public,List is Collaborative,List URL,List Created At,List Updated At,List Followers Count,List Creator ID,List Creator First Name,List Creator Last Name,List Creator Gender
0,527a61ee11d270266b7786af,others,Lists from other people,4e88d23a754a51bcbc7d424c,"Top 10 favorites places in Belo Horizonte, Brasil",,others,True,False,https://foursquare.com/litzamattos/list/top-10...,1317589562,1480038483,2,6635090,Litza,Mattos,female
1,527a61ee11d270266b7786af,others,Lists from other people,50c3af5be4b076c8e409f5f4,Lugares para não esquecer de ir!!,,others,True,False,https://foursquare.com/user/8320936/list/lugar...,1355001691,1402628261,10,8320936,Larissa,Amaral Giori,female
2,4c432d315faf76b0efe04820,others,Lists from other people,569aa7eb498e703a36353d5f,Belo Horizonte,,others,True,False,https://foursquare.com/cepriana/list/belo-hori...,1452976107,1455412573,1,19059492,Maria Felisbela,Cepriana,female
3,4c432d315faf76b0efe04820,others,Lists from other people,56a8c9e8498eaa3e5a03483d,Guia Slow BH | Moda,,others,True,False,https://foursquare.com/reviewslow/list/guia-sl...,1453902312,1457527908,0,152455785,Review,Slow Living,female
4,54f5e636498eb1d2a634d987,others,Lists from other people,54f5e614498e891f99598203,Almoço Savassi,,others,True,False,https://foursquare.com/arturhoo/list/almo%C3%A...,1425401364,1456146126,0,591078,Artur,Rodrigues,male


At this point we have the relation of people that liked the venues in the region and the relation of people that added these venues on their lists.

Now we are going to analyse the liking and preferences of these people using their lists.

The Foursquare allow us to explore the lists created by the users and in each list we will retrieve the venue and the venue's category (category tell us if the venue is a bar, restaurant, pub...).

In [10]:
users_list = list(list_venue_is_on_Df["List Creator ID"])
users_list.extend((list(people_who_liked_a_venue_Df["People Id"])))
users_list = list(set(users_list))
print("Total de usuários coletados: ", len(users_list))

Total de usuários coletados:  109


In [17]:
user_lists, columns = get_users_lists(users_list)

In [18]:
user_lists_Df = pd.DataFrame(user_lists)
user_lists_Df.columns = columns
print("Total de listas de usuários: ", user_lists_Df.shape[0])
user_lists_Df.head(10)

Total de listas de usuários:  1218


Unnamed: 0,User Id,Group Type,Group Name,List ID,List Name,List Description,List Type
0,9347429,yours,Your Places,9347429/todos,Willian's Saved Places,,todos
1,9347429,yours,Your Places,9347429/venuelikes,Willian’s Liked Places,,liked
2,9347429,created,Lists Willian Created,53606e7f11d2ce653fbe44ca,Willian Max Corp.,,
3,9347429,followed,Lists Willian Follows,538a50fc498e73ff6f0ac488,Locais,Sugestões de locais para a realização de um Ua...,
4,44594,yours,Your Places,44594/todos,Leonardo's Saved Places,,todos
5,44594,yours,Your Places,44594/venuelikes,Leonardo’s Liked Places,,liked
6,44594,created,Lists Leonardo Created,51d7a444498e6fad4b747484,Food,,
7,44594,created,Lists Leonardo Created,51d7a47d498e9d28e2a8c0e7,Coffee,,
8,44594,created,Lists Leonardo Created,545e5e15498ec969ba7bf6ed,Conhecer,,
9,44594,created,Lists Leonardo Created,56254d7f498e6e3ca5adfc88,Burger,,


Now we are going to analyze the details of each list

In [None]:
lists_ids = list(set(user_lists_Df["List ID"]))
print(len(lists_ids))
list_followers = []
list_followers_columns = []
lists_details = []
lists_details_columns = []
i = 0
length_list = len(lists_ids)
for list_id in lists_ids:
    i = i + 1
    a, b, c, d = get_lists_details(list_id)
    list_followers.extend(a)
    list_followers_columns = b
    lists_details.extend(c)
    lists_details_columns = d

In [15]:
list_followers_Df = pd.DataFrame(list_followers)
list_followers_Df.columns = list_followers_columns

lists_Df = pd.DataFrame(lists_details)
lists_Df.columns = lists_details_columns
print("Venues found: ", lists_Df.shape[0])
lists_Df.head()

Total de locais encontrados:  17459


Unnamed: 0,List Id,List Name,List Description,List Type,List User ID,List User First Name,List User Last Name,List User Gender,List Is Public,List URL,List Created At,List Updated At,Venue ID,Venue Name,Venue Address,Venue Postal Code,Venue Distance,Venue CC,Venue City,Venue State,Venue Country,Venue Formatted Address,Venue Latitude,Venue Longitude,Venue Category ID,Venue Category Name,Venue Category Plural Name,Venue Category Short Name,Venue Category Primary,Tip ID,Tip Text,Tip Agree Count,Tip Disagree Count,Tip User ID,Tip User First Name,Tip User Gender,Tip User Type
0,50d1dac4e4b0a239c751811b,Estilo,,others,28496588,Luiza,Sá,female,True,https://foursquare.com/formeness/list/estilo,2012-12-19 12:18:28,2013-01-05 14:47:28,4e78da431495f00a427569a5,Mercado,"R. Pernambuco, 767",30130-151,,BR,Belo Horizonte,MG,Brasil,"[R. Pernambuco, 767, Belo Horizonte, MG, 30130...",-19.934062,-43.934296,4bf58dd8d48988d108951735,Women's Store,Women's Stores,Women's Store,True,0,,0,0,0,,,
1,50d1dac4e4b0a239c751811b,Estilo,,others,28496588,Luiza,Sá,female,True,https://foursquare.com/formeness/list/estilo,2012-12-19 12:18:28,2013-01-05 14:47:28,4be9cfc418389521c4c30acf,Mercado,"R. Paraíba, 1385",,,BR,Belo Horizonte,MG,Brasil,"[R. Paraíba, 1385, Belo Horizonte, MG]",-19.938463,-43.934237,4bf58dd8d48988d102951735,Accessories Store,Accessories Stores,Accessories,True,0,,0,0,0,,,
2,51f347b1498e12f51615d469,Comida Ogra | BH,,others,19533969,Frank,Martins,male,True,https://foursquare.com/frankmartins/list/comid...,2013-07-27 01:08:17,2014-10-04 12:17:16,50e89d57e4b064546cd83f38,Elvis King Pub,R. Santa Rita Durão 309,30140-110,,BR,Belo Horizonte,MG,Brasil,"[R. Santa Rita Durão 309 (Av. Afonso Pena), Be...",-19.936165,-43.9284,4bf58dd8d48988d155941735,Gastropub,Gastropubs,Gastropub,True,0,,0,0,0,,,
3,51f347b1498e12f51615d469,Comida Ogra | BH,,others,19533969,Frank,Martins,male,True,https://foursquare.com/frankmartins/list/comid...,2013-07-27 01:08:17,2014-10-04 12:17:16,4cf58967665854814d66c498,Beco Do Vinil,Perto da PUC,,,BR,Belo Horizonte,MG,Brasil,"[Perto da PUC, Belo Horizonte, MG]",-19.925293,-43.991481,4bf58dd8d48988d116941735,Bar,Bars,Bar,True,0,,0,0,0,,,
4,51f347b1498e12f51615d469,Comida Ogra | BH,,others,19533969,Frank,Martins,male,True,https://foursquare.com/frankmartins/list/comid...,2013-07-27 01:08:17,2014-10-04 12:17:16,4cde231eaba88cfa55f13fd7,"Nonô "" Rei Do Caldo De Mocotó""","Av. Amazonas, 840 - centro",,,BR,Belo Horizonte,MG,Brasil,"[Av. Amazonas, 840 - centro, Belo Horizonte, MG]",-19.920964,-43.94225,4bf58dd8d48988d16b941735,Brazilian Restaurant,Brazilian Restaurants,Brazilian,True,0,,0,0,0,,,


The Dataframe above give us detailed informations about each user's preference. We will use mainly the category and the user in our recommendation system.

We conclude our data collecting process. Now we are going to start the development of our recommendation system.


## Methodology

To solve the propposed problem let's elaborate a colaborative recommendation system. One of the main advantages is because it also consider the preferences of others users who have similar liking (similarity between users), in addition to adapt to user's preference, which can change over time.

In the first step we have explored the nearby venues and based on these venues we discovered the preferences of the users who frequent them.

The next step is the elaboration of the recommendation system using as base the data collected previously.

And as the last step we will show the nearby venues suggested by the recommendation system.


## Analysis

Now let's define the user's preferences who will receive the recommendation. Suppose that he likes Bar, Italian Restaurants, Pub's, Coffee and Steakhouse. For each one of these kind of venue (category) we will give a rating, varying between 1 and 5, and 5 meaning the user like more.

In [59]:
userInput = pd.DataFrame([("Coffee Shop", 3.5), 
                          ("Italian Restaurant", 5.0),
                          ("Steakhouse", 4.0),
                          ("Pub", 5.0),
                          ("Bar", 4.0)
                         ])
userInput.columns = ["Venue Category Name","Total"]
userInput

Unnamed: 0,Venue Category Name,Total
0,Coffee Shop,3.5
1,Italian Restaurant,5.0
2,Steakhouse,4.0
3,Pub,5.0
4,Bar,4.0


Now we need to know what venue category each user likes more e for this we will consider the number of times he attended each one.

In [19]:
user_categories = lists_Df[['List User ID','Venue Category Name']]
user_categories["Total"] = 1
user_categories_rating = user_categories.groupby(["List User ID", "Venue Category Name"]).count().reset_index()
user_categories_rating.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,List User ID,Venue Category Name,Total
0,1087955,Fruit & Vegetable Store,1
1,1087955,Shopping Mall,8
2,110781253,Argentinian Restaurant,5
3,110781253,Art Gallery,4
4,110781253,Art Museum,10
5,110781253,Asian Restaurant,1
6,110781253,BBQ Joint,1
7,110781253,Bakery,4
8,110781253,Bar,15
9,110781253,Beer Bar,5


If we analyze the following user, we will realize he likes more of Bar and Restaurants.

In [20]:
user_categories_rating[user_categories_rating["List User ID"] == "9677872"].sort_values(by="Total", ascending=False)

Unnamed: 0,List User ID,Venue Category Name,Total
5617,9677872,Restaurant,20
5601,9677872,Italian Restaurant,18
5581,9677872,Brazilian Restaurant,15
5578,9677872,Bar,14
5614,9677872,Pizza Place,9
5583,9677872,Burger Joint,7
5593,9677872,French Restaurant,7
5595,9677872,Gastropub,5
5588,9677872,Deli / Bodega,4
5616,9677872,Pub,4


We can also realize that there is a high amplitude among the most visited and less visited category, and this can damage our similirity index calculation ahead. To solve it we will apply a kind of normalization and, in this way, we won't have distant values. In this case we will adopt the maximum value of 5.

In [21]:
def normalize(df):
    df = df.sort_values(by="Total", ascending=False)
    maxTotal = df.iloc[0]["Total"]
    df["Total"] = df["Total"] * 5 / maxTotal
    return df

In [22]:
users_distincts = list(user_categories_rating["List User ID"].unique())

for user in users_distincts:
    values = normalize(user_categories_rating[user_categories_rating["List User ID"] == str(user)])
    user_categories_rating[user_categories_rating["List User ID"] == str(user)] = values


In [34]:
user_categories_rating[user_categories_rating["List User ID"] == "9677872"].sort_values(by="Total", ascending=False)

Unnamed: 0,List User ID,Venue Category Name,Total
5617,9677872,Restaurant,5.0
5601,9677872,Italian Restaurant,4.5
5581,9677872,Brazilian Restaurant,3.75
5578,9677872,Bar,3.5
5614,9677872,Pizza Place,2.25
5583,9677872,Burger Joint,1.75
5593,9677872,French Restaurant,1.75
5595,9677872,Gastropub,1.25
5588,9677872,Deli / Bodega,1.0
5616,9677872,Pub,1.0


To a better visualization of data we have until now, let's verify what are the 10 most visited categories by each user.

In [24]:
user_categories_onehot = pd.get_dummies(user_categories[['Venue Category Name']], prefix="", prefix_sep="")
user_categories_onehot["List User ID"] = user_categories["List User ID"]
user_categories_onehot_grouped = user_categories_onehot.groupby('List User ID').mean().reset_index()

In [25]:
def return_most_common_categories(row, num_top_categories):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    for ind in np.arange(num_top_categories):
        if (row_categories_sorted[ind] == 0):
            row_categories_sorted.index.values[ind] = "-"
    
    return row_categories_sorted.index.values[0:num_top_categories]

In [26]:
num_top_categories = 10

columns = ['List User ID']
for ind in np.arange(num_top_categories):
    columns.append('#{} Most Common Categories'.format(ind+1))

# create a new dataframe
user_categories_onehot_sorted = pd.DataFrame(columns=columns)
user_categories_onehot_sorted['List User ID'] = user_categories_onehot_grouped['List User ID']

for ind in np.arange(user_categories_onehot_sorted.shape[0]):
    user_categories_onehot_sorted.iloc[ind, 1:] = return_most_common_categories(user_categories_onehot_grouped.iloc[ind, :], num_top_categories)

user_categories_onehot_sorted.head(20)

Unnamed: 0,List User ID,#1 Most Common Categories,#2 Most Common Categories,#3 Most Common Categories,#4 Most Common Categories,#5 Most Common Categories,#6 Most Common Categories,#7 Most Common Categories,#8 Most Common Categories,#9 Most Common Categories,#10 Most Common Categories
0,1087955,Shopping Mall,Fruit & Vegetable Store,-,-,-,-,-,-,-,-
1,110781253,Coffee Shop,Café,Cocktail Bar,Bar,Art Museum,Brewery,Sandwich Place,Ice Cream Shop,Restaurant,Pizza Place
2,113020858,Restaurant,Coffee Shop,Italian Restaurant,Brazilian Restaurant,Deli / Bodega,Baiano Restaurant,Burger Joint,Salad Place,Dessert Shop,Bar
3,11327576,Vegetarian / Vegan Restaurant,Hotel,Plaza,Shopping Mall,Brazilian Restaurant,Campground,Park,Neighborhood,Other Great Outdoors,Bakery
4,11442102,Bar,Brazilian Restaurant,Neighborhood,Fast Food Restaurant,Shopping Mall,Pharmacy,Snack Place,Bookstore,Restaurant,Clothing Store
5,115099,Bar,Coffee Shop,Ice Cream Shop,Brazilian Restaurant,Middle Eastern Restaurant,Burger Joint,French Restaurant,Dessert Shop,Cultural Center,Breakfast Spot
6,11607917,Café,Bakery,Brazilian Restaurant,Coffee Shop,Gourmet Shop,Creperie,Tapiocaria,Bistro,-,-
7,11823338,Italian Restaurant,Café,Coffee Shop,Brazilian Restaurant,Burger Joint,Pizza Place,Bar,Park,Dessert Shop,Restaurant
8,12019100,Japanese Restaurant,Italian Restaurant,Restaurant,Brazilian Restaurant,Bar,Sushi Restaurant,Ice Cream Shop,Asian Restaurant,Hotel,Cocktail Bar
9,12050756,Café,Coffee Shop,Italian Restaurant,Pizza Place,Restaurant,Brazilian Restaurant,Bistro,Snack Place,Mineiro Restaurant,Pastry Shop


Now let's start doing the correlation and similarity index calculation

In [42]:
userSubset = user_categories_rating[user_categories_rating['Venue Category Name'].isin(userInput['Venue Category Name'])]
userSubset.head()

Unnamed: 0,List User ID,Venue Category Name,Total
8,110781253,Bar,1.25
19,110781253,Burger Joint,0.083333
38,110781253,Ice Cream Shop,0.666667
56,110781253,Pizza Place,0.5
71,113020858,Bar,1.0


In [43]:
userSubsetGroup = userSubset.groupby(['List User ID'])
userSubsetGroup = sorted(userSubsetGroup,  key=lambda x: len(x[1]), reverse=True)

In [44]:
#Store the Pearson Correlation in a dictionary, where the key is the user Id and the value is the coefficient
pearsonCorrelationDict = {}

#For every user group in our subset
for name, group in userSubsetGroup:
    #Let's start by sorting the input and current user group so the values aren't mixed up later on
    group = group.sort_values(by='Venue Category Name')
    userInput = userInput.sort_values(by='Venue Category Name')
    #Get the N for the formula
    nRatings = len(group)
    #Get the review scores for the movies that they both have in common
    temp_df = userInput[userInput['Venue Category Name'].isin(group['Venue Category Name'].tolist())]
    #And then store them in a temporary buffer variable in a list format to facilitate future calculations
    tempRatingList = temp_df['Total'].tolist()
    #Let's also put the current user group reviews in a list format
    tempGroupList = group['Total'].tolist()
    #Now let's calculate the pearson correlation between two users, so called, x and y
    Sxx = sum([i**2 for i in tempRatingList]) - pow(sum(tempRatingList),2)/float(nRatings)
    Syy = sum([i**2 for i in tempGroupList]) - pow(sum(tempGroupList),2)/float(nRatings)
    Sxy = sum( i*j for i, j in zip(tempRatingList, tempGroupList)) - sum(tempRatingList)*sum(tempGroupList)/float(nRatings)
    
    #If the denominator is different than zero, then divide, else, 0 correlation.
    if Sxx != 0 and Syy > 0:
        pearsonCorrelationDict[name] = Sxy/sqrt(Sxx*Syy)
    else:
        pearsonCorrelationDict[name] = 0


In [45]:
pearsonDF = pd.DataFrame.from_dict(pearsonCorrelationDict, orient='index')
pearsonDF.columns = ['similarityIndex']
pearsonDF['userId'] = pearsonDF.index
pearsonDF.index = range(len(pearsonDF))

In [46]:
topUsers=pearsonDF[pearsonDF["similarityIndex"] > 0.75].sort_values(by='similarityIndex', ascending=False)

In [47]:
topUsersRating=topUsers.merge(user_categories_rating, left_on='userId', right_on='List User ID', how='inner')

In [48]:
topUsersRating['weightedRating'] = topUsersRating['similarityIndex']*topUsersRating['Total']

In [49]:
tempTopUsersRating = topUsersRating.groupby('Venue Category Name').sum()[['similarityIndex','weightedRating']]
tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']

In [50]:
recommendation_df = pd.DataFrame()

recommendation_df['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating']/tempTopUsersRating['sum_similarityIndex']
recommendation_df['Venue Category Name'] = tempTopUsersRating.index
recommendation_df = recommendation_df.sort_values(by='weighted average recommendation score', ascending=False)

In this moment we have all recommended venues and the "weighted average recomendation score" - which define the most and less recommended venues.

Let's filter only the venues near to the user who will receiver the recommendation

In [51]:
venues_recommended_in_region = venues_result_details_Df.loc[venues_result_details_Df["Category Name"].isin(recommendation_df['Venue Category Name'].tolist())]
venues_recommended_in_region = venues_recommended_in_region.merge(recommendation_df, left_on='Category Name', right_on='Venue Category Name', how='inner')
venues_recommended_in_region = venues_recommended_in_region.sort_values(by="weighted average recommendation score", ascending=False)
venues_recommended_in_region[["Name", "Category Name", "Address", "Latitude", "Longitude"]]

Defaulting to column, but this will raise an ambiguity error in a future version
  


Unnamed: 0,Name,Category Name,Address,Latitude,Longitude
27,Caprices de Paris,Café,"R Alagoas, 777",-19.933651,-43.935675
29,Café Club,Café,"R. Paraíba, 1096",-19.9361,-43.933675
28,Rossignol Patisserie,Café,"R. Alagoas, 777",-19.933945,-43.935722
42,Burger King,Fast Food Restaurant,R. Pernambuco,-19.937151,-43.93539
41,Napolitano Pães e Pasteis,Fast Food Restaurant,"R. Pernambuco, 971",-19.93579,-43.9346
15,La Sanha,Italian Restaurant,"R. Santa Rita Durão, 941",-19.934545,-43.934347
14,Go Pasta - Fresh & Gourmet,Italian Restaurant,"R. Tomé de Souza, 912, Savassi",-19.936876,-43.935272
13,Pastificio Primo,Italian Restaurant,"R. Alagoas, 957",-19.935383,-43.936087
11,Fazenda de Minas,Coffee Shop,"R. Sta. Rita Durão, 941",-19.934424,-43.934614
12,Café Concert,Coffee Shop,"R. Alagoas, 1000",-19.93589,-43.936269


## Results and discussions

As a result, we have a simple recommendation system, using only the APIs provided by Foursquare.

For the development of this project I used APIs provided for a free account, which limits the data available, and because of that I used as a basis some premises that need to be checked if they make sense in reality, such as ensuring that the number of times that a category appears in a user's list indicates that he likes this type of place more or less. Perhaps the ideal would be to use other parameters such as number of likes, comments about the places (in this case we would have to carry out an analysis of feelings), in order to be more accurate.

Anyway, it was demonstrated that it is possible to generate a system of recommendations from the APIs available.

## Conclusion

This project shows us that it is possible to generate a recommendation system using only the APIs provided by Foursquare. It can be useful when you are going somewhere new and want to find out what options you can visit.