# Place Details

Detailed information about a place listed on Google can be accessed through their Places API. Besides address components (country, city, street, latitude, longitude, phone number, etc.), type of business, opening hours and images, **ratings** and **reviews** can be otained as well. The latter is essential for the analysis of the Restaurant dataset. 

## Place Details API Request

Requesting detailed informations about a place requires an API-Key and a 'place_id' like the ones obtained in the previous notebook. Google lets you choose the output parameters and will charge you on that account. The full documentation of all output parameters can be viewed [here](https://developers.google.com/places/web-service/details).

Since my 90 days trial was still active, I've decided to go with the default settings and include all availabe output parameters. In addition, there are two availabe output types: json and xml. The built-in Python package, makes json a convenient choice.

### 1) Choosing an 'place_id' for each restaurant

Lets have another look at the results of the previous notebook:

In [None]:
import ast
from fuzzywuzzy import fuzz
import googlemaps
import json
import numpy as np
import pandas as pd

# Load modified dataframe
data = pd.read_pickle(r'data/restaurants_with_google_id.pkl')
df = pd.DataFrame(data, columns=[
    'name',
    'fon_place_id',
    'name_place_id',
])

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df)

Remember that we used two approaches receiving the ids. Therefore, some restaurants have multiple entries while others have no entry at all. Since it is not possible to tell which id is the right one for each place, I've decided to run API requests for all available ids and match the output afterwards:

In [None]:
# Creating a list with all available ids
place_ids = []

for ids in df['fon_place_id']:
    id_list = ast.literal_eval(ids)  # dissolve nested list in dataframe
    for id in id_list:
        if id in place_ids:  # list shall only contain unique ids
            pass
        else:
            place_ids.append(id)

for ids in df['name_place_id']:
    id_list = ast.literal_eval(ids)  # dissolve nested list in dataframe
    for id in id_list:
        if id in place_ids:  # list shall only contain unique ids
            pass
        else:
            place_ids.append(id)
            
print("list of ids: ")
print(str(len(place_ids)) + " entries")
print("\n")
print(place_ids)

### 2) Requesting Place Details

Now that we have decided on the input, the API request can be specified:

In [None]:
# API specification
api_key = input("Enter your API-Key here: ")
gmaps = googlemaps.Client(key=api_key)

# Executing the API request
for id in place_ids:
        places_result = gmaps.place(place_id=id)  # output parameters can be specified here - default used
        with open(f'place_details/{id}.json', 'w') as outfile:  # file name = place_id
            json.dump(places_result, outfile)  # json package for saving each output

From the request I received **812 json files**. Therefore, all requests were successful.

So what is the output looking like?

In [None]:
with open('data/ChIJow5SxuxRqEcRpk680jewg50.json', 'r') as infile:
    example = json.load(infile)
print(json.dumps(example, indent=5, sort_keys=True))

<br>As mentioned above, the file contains detailed address ('address_components', 'geometry', 'international_phone_number') and business ('name", 'types', 'business_status',  'permanently_closed', 'website', 'photos') informations. Of interest for the analysis are the **'rating'** and **'user_ratings_total'** specifications. Furthermore, up to five user reviews are included.

## Merging API Results and Restaurants Data

The data from the 812 json files shall be appended to the original Restaurants dataset. We make use of the fact, that each json file is named by its corresponding 'place_id'. Json files are structured in dictionary-like way, making it easy to access each element of the file.

### 1) Structuring the JSON output

We start by creating a pandas dataframe with all relevant output from the API request:

In [None]:
# Creating empty dataframe with relevant output for all 'place_id's
details_df = pd.DataFrame(index=place_ids, columns=['name',
                                                    'city',
                                                    'bezirk',
                                                    'street_nr',
                                                    'lat',
                                                    'lng',
                                                    'types',
                                                    'business_status',
                                                    'price_level',
                                                    'rating',
                                                    'user_ratings_total'])

# Fill dataframe with informations by iterating over json files
for id in place_ids:
    try:
        f = open(f'place_details/{id}.json', 'r')  # there is a corresponding json file for each id 
        data = json.loads(f.read())
        
        # add name
        name = data['result']['name']  # path in dictionary / json file
        details_df.loc[id, 'name'] = name
        # add city
        city = data['result']['address_components'][3]['long_name']
        details_df.loc[id, 'city'] = city
        # add bezirk
        bezirk = data['result']['address_components'][2]['short_name']
        details_df.loc[id, 'bezirk'] = bezirk
        # add street_nr
        street = data['result']['address_components'][1]['long_name']
        number = data['result']['address_components'][0]['short_name']
        street_nr = street + " " + number
        details_df.loc[id, 'street_nr'] = street_nr
        # add lat
        lat = data['result']['geometry']['location']['lat']
        details_df.loc[id, 'lat'] = lat
        # add lng
        lng = data['result']['geometry']['location']['lng']
        details_df.loc[id, 'lng'] = lng
        # add types
        types = data['result']['types']
        details_df.loc[id, 'types'] = types
         # add business_status
        business_status = data['result']['business_status']
        details_df.loc[id, 'business_status'] = business_status
        # add price_level
        price_level = data['result']['price_level']
        details_df.loc[id, 'price_level'] = price_level
        # add rating
        rating = data['result']['rating']
        details_df.loc[id, 'rating'] = rating
        # add user_ratings
        user_ratings_total = data['result']['user_ratings_total']
        details_df.loc[id, 'user_ratings_total'] = user_ratings_total
 
    except KeyError:  # price_level isn't specified for all places
        try:
            # add rating
            rating = data['result']['rating']
            details_df.loc[id, 'rating'] = rating
            # add num_ratings
            user_ratings_total = data['result']['user_ratings_total']
            details_df.loc[id, 'user_ratings_total'] = user_ratings_total
        except KeyError: # when there is no 'rating' there is no need to end the loop since 'user_ratings_total' will be null as well
            pass
    except IndexError:  # in case some API requests were not sucessful
        continue

# Replacing NaN with zero
details_df['user_ratings_total'].fillna(0, inplace=True)

display(details_df)

# Saving results
details_df.to_pickle(r'data/details_df.pkl')
details_df.to_csv(r'data/details_df.csv', sep=';', index=True)

<br> Each row is now indexed with the 'place_id' and all column information origins from the API request.

### 2) Merging Dataframes

The Restaurants dataframe from previous notebook has 720 entries. The API results dataframe has 812 entries. Since there is a surplus of data to match, we need to select the right API results to append to the Restaurant dataframe. The 'place_id' seems to be a convenient matching key, since it is represented in both dataframes. However, as already mentioned some places have multiple ids. In that case, other matching criterias are needed.

We start by reading both dataframes. Since the ids within the Restaurants dataframe are not recognized as lists but as a sequence of strings, we need to convert them using the 'ast' module:

In [None]:
# Load dataframes
data = pd.read_pickle(r'data/restaurants_with_google_id.pkl')
restaurants_df = pd.DataFrame(data)

data = pd.read_pickle(r'data/details_df.pkl')
details_df = pd.DataFrame(data)

# Convert string with ids to list
restaurants_df.insert(loc=8, column='clean_name_place_id', value=None)
for idx, ids in enumerate(restaurants_df['name_place_id']):
    clean_ids = ast.literal_eval(ids)
    restaurants_df.at[idx, 'clean_name_place_id']=clean_ids

restaurants_df.insert(loc=8, column='clean_fon_place_id', value=None)
for idx, ids in enumerate(restaurants_df['fon_place_id']):
    clean_ids = ast.literal_eval(ids)
    restaurants_df.at[idx, 'clean_fon_place_id']=clean_ids

# Create list with unique ids for each restaurant
restaurants_df['place_ids'] = restaurants_df['clean_fon_place_id']+restaurants_df['clean_name_place_id']  # combine place ids
restaurants_df.insert(loc=11, column='clean_place_ids', value=None)  # create new column
restaurants_df['clean_place_ids'] = restaurants_df['clean_place_ids'].astype('object')  # set type
for idx, ids in enumerate(restaurants_df['place_ids']):
    unique_ids = list(set(ids))  # remove duplicates
    restaurants_df.at[idx, 'clean_place_ids'] = unique_ids

**Dataframes to be merged:**

In [None]:
display(restaurants_df)
display(details_df)

The informations from the API requests shall now be appended to the Restaurants dataframe. We start by iterating over the Restaurants dataframe. Within each row, there are zero, one or multiple place ids. When there is only one entry, we can simply look up the id in the details dataframe and append the corresponding informations. For multiple ids, a benchmark on choosing the right API result is needed:

In [None]:
# Prepare dataframe for appending place details
restaurants_df.insert(loc=12, column='place_id', value=None)
restaurants_df.insert(loc=13, column='bezirk', value=None)
restaurants_df.insert(loc=14, column='lat', value=None)
restaurants_df.insert(loc=15, column='lng', value=None)
restaurants_df.insert(loc=16, column='types', value=None)
restaurants_df.insert(loc=17, column='price_level', value=None)
restaurants_df.insert(loc=18, column='rating', value=None)
restaurants_df.insert(loc=19, column='user_ratings_total', value=None)

# Initialize criteria lists on which the API results are matched
benchmark_name = []
benchmark_address = []
benchmark_number_of_ratings = []
benchmark_types = []
benchmark_city = []

# Appending place details
for index, restaurant in restaurants_df.iterrows():
    for id in restaurant['clean_place_ids']:
        
        # name benchmark
        name1 = restaurant['name']
        name2 = details_df.loc[id, 'name']
        try:
            levenshtein_distance_name = fuzz.ratio(name1.lower(), name2.lower())  # calculate Levenshtein distance
        except AttributeError: # names with integers cannot be lowered
            try:
                levenshtein_distance_name = fuzz.ratio(name1, name2)
            except TypeError:  # check for missing values (NaN = integer)
                levenshtein_distance_name = 0
        except TypeError:  # check for missing values (NaN = integer)
            levenshtein_distance_name = 0
        benchmark_name.append(levenshtein_distance_name)
        
        # address benchmark
        address1 = restaurant['strasse_nr']
        address2 = details_df.loc[id, 'street_nr']
        try:
            levenshtein_distance_address = fuzz.ratio(address1, address2)
        except TypeError:  # check for missing values (NaN = integer)
            levenshtein_distance_address = 0
        benchmark_address.append(levenshtein_distance_address)
       
        # user ratings benchmark
        user_ratings_total = details_df.loc[id, 'user_ratings_total']
        benchmark_number_of_ratings.append(user_ratings_total)
        
        # types benchmark
        types = details_df.loc[id, 'types']
        try:
            if 'restaurant' in types:
                type_score = 1
            elif 'food' in types:
                type_score = 0.75
            elif 'cafe' in types:
                type_score = 0.75
            elif 'bar' in types:
                type_score = 0.5
            else:
                type_score = 0
        except TypeError:  # check for missing values (NaN = integer)
            type_score = 0
        benchmark_types.append(type_score)
        
        # city benchmark
        city = details_df.loc[id, 'city']
        try:
            if city == "Berlin":
                city_score = 1
            elif city == "Germany":
                city_score = 0.5
            else:
                city_score = 0
        except TypeError:  # check for missing values (NaN = integer)
            city_score = 0
        benchmark_city.append(city_score)

    # Choosing the right place_id
    bname = np.array(benchmark_name)
    baddress = np.array(benchmark_address)
    bnumber_of_ratings = np.array(benchmark_number_of_ratings)
    btypes = np.array(benchmark_types)
    bcity = np.array(benchmark_city)

    benchmark = (bname + baddress/2 + bnumber_of_ratings/10)*btypes*bcity
    try:
        highest_ranked = np.argmax(benchmark)
        best_id = restaurant['clean_place_ids'][highest_ranked]
    except ValueError:  # some places are without id
        best_id = None

    # Appending information to restaurant dataframe
    try:
        restaurants_df.at[index, 'place_id'] = best_id
        restaurants_df.at[index, 'bezirk'] = details_df.loc[best_id, 'bezirk']
        restaurants_df.at[index, 'lat'] = details_df.loc[best_id, 'lat']
        restaurants_df.at[index, 'lng'] = details_df.loc[best_id, 'lng']
        restaurants_df.at[index, 'types'] = details_df.loc[best_id, 'types']
        restaurants_df.at[index, 'price_level'] = details_df.loc[best_id, 'price_level']
        restaurants_df.at[index, 'rating'] = details_df.loc[best_id, 'rating']
        restaurants_df.at[index, 'user_ratings_total'] = details_df.loc[best_id, 'user_ratings_total']
    except KeyError:  # some values are missing
        continue

    # Prepare next loop
    del benchmark_name[:]
    del benchmark_address[:]
    del benchmark_number_of_ratings[:]
    del benchmark_types[:]
    del benchmark_city[:]

# Delete redundant columns
restaurants_df.drop("fon_place_id", axis=1, inplace=True)
restaurants_df.drop("clean_fon_place_id", axis=1, inplace=True)
restaurants_df.drop("name_place_id", axis=1, inplace=True)
restaurants_df.drop("clean_name_place_id", axis=1, inplace=True)
restaurants_df.drop("place_ids", axis=1, inplace=True)
restaurants_df.drop("clean_place_ids", axis=1, inplace=True)
restaurants_df.drop("name_street", axis=1, inplace=True)

Lets have a closer look at the benchmarks that decided which output to use, when there where multiple ids:

*1. Name Benchmark*
<br>Uses Levenshtein distance to compare the name of the restaurant in the original dataframe and in the API output.

*2. Address Benchmark*
<br>Uses Levenshtein distance to compare the address of the restaurant in the original dataframe and in the API output.

*3. User ratings Benchmark*
<br>Places with more reviews are more likely to be restaurants.

*4. Types Benchmark*
<br>Checks if the place is a restaurant, something similar or totally different.

*5. City Benchmark*
<br>Checks if the place is loacted in Berlin.

<br> Based on the *Total Benchmark* that is created of all five benchmarks, the most promising 'place_id' output is choosen.

In [None]:
print(">>> merged dataframe <<<")
display(restaurants_df)

# Saving results
restaurants_df.to_pickle(r'data/detailed_restaurants.pkl')
restaurants_df.to_csv(r'data/detailed_restaurants.csv', sep=';', encoding='utf-8', index=True)

<br>Now the information of the place details API request is appended to the Restaurants data.