# Zomato API 

In this notebook we gather our restaurant data from the Zomato API. We arranged five requests for five major cities in Australia - Sydney Melbourne Brisbane Perth Adelaide to collect the data of 500 restaurants.

We had to import pyzomato to work with the API.

In [1]:
# Dependencies and Setup
import pandas as pd
import requests
import json
from pprint import pprint
from pyzomato import Pyzomato

# Zomato API Key
from config import z_key

# Retrieving the information from the Zomato API:
We created empty lists to append with the information pulled from the API.
We created a loop to get the data of 100 restaurants by batches of 20 as it is the maximum the API allowed us to access.
We tried to loop the five cities to not repeat ourself but when doing so the API didn't provide us the same data. 
The API was returning five times the same 100 restaurants instead of five batches of 100 restaurants matching each city.
The first code is for Sydney with its latitude lat="-33.8688", lon="151.2094"

In [2]:
p = Pyzomato(z_key)

# set up other lists to hold data info
restaurant_name = []
restaurant_id = []
restaurant_address = []
restaurant_locality = []
restaurant_city = []
latitude = []
longitude = []
zip_code = []
cuisines = []
price_range = []
average_cost_for_two = []
user_rating = []
rating_text = []
votes = []
all_reviews_count = []
    
# Create a loop that iterates five times starting from 0th record and increases by increments of 20
for a in range(0,100,20):
    response = p.search(lat="-33.8688",lon="151.2094",
                           count=20,
                           sort='rating',
                           order='desc',
                           category=(10),
                           start=a)
    for i in range(0,20,1):
        try:
            restaurant_id.append(response["restaurants"][i]["restaurant"]['id'])
            restaurant_name.append(response["restaurants"][i]["restaurant"]['name'])
            restaurant_locality.append(response["restaurants"][i]["restaurant"]["location"]["locality"])
            restaurant_address.append(response["restaurants"][i]["restaurant"]["location"]["address"])
            restaurant_city.append(response["restaurants"][i]["restaurant"]["location"]["city"])
            latitude.append(response["restaurants"][i]["restaurant"]["location"]["latitude"])
            longitude.append(response["restaurants"][i]["restaurant"]["location"]["longitude"])
            zip_code.append(response["restaurants"][i]["restaurant"]["location"]["zipcode"])
            cuisines.append(response["restaurants"][i]["restaurant"]['cuisines'])
            price_range.append(response["restaurants"][i]["restaurant"]['price_range'])
            average_cost_for_two.append(response["restaurants"][i]["restaurant"]['average_cost_for_two'])
            user_rating.append(response["restaurants"][i]["restaurant"]["user_rating"]['aggregate_rating'])
            rating_text.append(response["restaurants"][i]["restaurant"]["user_rating"]['rating_text'])
            votes.append(response["restaurants"][i]["restaurant"]["user_rating"]['votes'])
            all_reviews_count.append(response["restaurants"][i]["restaurant"]["all_reviews_count"])
        except KeyError:
            print(f"restaurant not found! Skipping")
            pass
    
print("---------------------------")
print("Data Retrieval Complete")

---------------------------
Data Retrieval Complete


In [4]:
#Convert Raw data to DataFrame
sydney_df = pd.DataFrame({
    'Restaurant_id': restaurant_id,
    'Name': restaurant_name,
    'Locality':restaurant_locality,
    'Address':restaurant_address,
    'City': restaurant_city,
    'Latitude': latitude,
    'Longitude': longitude,
    'Zip Code': zip_code,
   'Cuisines' : cuisines,
    'Price Range' :price_range,
    'Average Cost for two' :average_cost_for_two,
    'User Rating' : user_rating,
    'Rating Text' : rating_text,
    "Votes" : votes,
    "all_reviews_count": all_reviews_count
})

sydney_df.head()

Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Zip Code,Cuisines,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count
0,15547004,Restaurant Hubert,CBD,"15 Bligh Street, CBD, Sydney",Sydney,-33.865348331,151.2106238678,2000.0,"French, European",4,150,4.9,Excellent,592,225
1,16558798,Quay,Circular Quay,"Upper Level, Overseas Passenger Terminal 5 Hic...",Sydney,-33.8580292558,151.2099704146,2000.0,Modern Australian,4,500,4.9,Excellent,1366,454
2,16559171,Tetsuya's,CBD,"529 Kent Street, CBD, Sydney",Sydney,-33.8751428662,151.2049315497,,Japanese,4,440,4.9,Excellent,1235,329
3,16569454,LuMi Bar & Dining,Pyrmont,"56 Pirrama Road, \tPyrmont, Pyrmont, Sydney",Sydney,-33.8671367304,151.1975169182,2009.0,"Italian, Japanese",4,190,4.9,Excellent,452,196
4,15545439,Manpuku,Chatswood,"226 Victoria Avenue, Chatswood, Sydney",Sydney,-33.7944174758,151.1895420402,2067.0,"Japanese, Ramen",2,40,4.9,Excellent,486,190


# Melbourne data

In [5]:
#As Zomato API has a limitation of 100 restaurants per call, we couldn't create a loop for the five cities.
#Unfortunately we have to repeat our code five times to generate five different dataframes.

p = Pyzomato(z_key)

# set up other lists to hold data info
restaurant_name = []
restaurant_id = []
restaurant_address = []
restaurant_locality = []
restaurant_city = []
latitude = []
longitude = []
zip_code = []
cuisines = []
price_range = []
average_cost_for_two = []
user_rating = []
rating_text = []
votes = []
all_reviews_count = []

# Create a loop that iterates five times starting from 0th record and increases by increments of 20
for a in range(0,100,20):
    response = p.search(lat="-37.8136",lon="144.9631",
                           count=20,
                           sort='rating',
                           order='desc',
                           category=(10),
                           start=a)
    for i in range(0,20,1):
        try:
            restaurant_id.append(response["restaurants"][i]["restaurant"]['id'])
            restaurant_name.append(response["restaurants"][i]["restaurant"]['name'])
            restaurant_locality.append(response["restaurants"][i]["restaurant"]["location"]["locality"])
            restaurant_address.append(response["restaurants"][i]["restaurant"]["location"]["address"])
            restaurant_city.append(response["restaurants"][i]["restaurant"]["location"]["city"])
            latitude.append(response["restaurants"][i]["restaurant"]["location"]["latitude"])
            longitude.append(response["restaurants"][i]["restaurant"]["location"]["longitude"])
            zip_code.append(response["restaurants"][i]["restaurant"]["location"]["zipcode"])
            cuisines.append(response["restaurants"][i]["restaurant"]['cuisines'])
            price_range.append(response["restaurants"][i]["restaurant"]['price_range'])
            average_cost_for_two.append(response["restaurants"][i]["restaurant"]['average_cost_for_two'])
            user_rating.append(response["restaurants"][i]["restaurant"]["user_rating"]['aggregate_rating'])
            rating_text.append(response["restaurants"][i]["restaurant"]["user_rating"]['rating_text'])
            votes.append(response["restaurants"][i]["restaurant"]["user_rating"]['votes'])
            all_reviews_count.append(response["restaurants"][i]["restaurant"]["all_reviews_count"])
        except KeyError:
            print(f"restaurant not found! Skipping")
            pass
    
print("---------------------------")
print("Data Retrieval Complete")

#Convert Raw data to DataFrame

melbourne_df = pd.DataFrame({
    'Restaurant_id': restaurant_id,
    'Name': restaurant_name,
    'Locality':restaurant_locality,
    'Address':restaurant_address,
    'City': restaurant_city,
    'Latitude': latitude,
    'Longitude': longitude,
    'Zip Code': zip_code,
   'Cuisines' : cuisines,
    'Price Range' :price_range,
    'Average Cost for two' :average_cost_for_two,
    'User Rating' : user_rating,
    'Rating Text' : rating_text,
    "Votes" : votes,
    "all_reviews_count": all_reviews_count
})


melbourne_df.head()

---------------------------
Data Retrieval Complete


Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Zip Code,Cuisines,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count
0,16585905,Tipo 00,CBD,"361 Little Bourke Street, CBD, Melbourne",Melbourne,-37.8135277429,144.9619733915,3000.0,Italian,4,150,4.9,Excellent,1927,717
1,16586014,Minamishima,Richmond,"4 Lord Street, Richmond, Melbourne, VIC",Melbourne,-37.8198314176,145.005193837,3121.0,"Japanese, Sushi",4,450,4.9,Excellent,748,290
2,17881527,Dexter,Preston,"456 High Street, Preston, Melbourne",Melbourne,-37.736195646,145.0044562295,,"American, BBQ",4,110,4.9,Excellent,1477,685
3,16572612,Vue de monde,CBD,"Level 55, Rialto, 525 Collins Street, CBD, Mel...",Melbourne,-37.8189544974,144.9579336494,3000.0,"Australian, Contemporary",4,600,4.9,Excellent,3226,987
4,16574138,Suzuran,Camberwell,"1025 Burke Road, Camberwell, Melbourne",Melbourne,-37.8217645249,145.0584645197,3124.0,"Japanese, Sushi",2,35,4.9,Excellent,807,195


# Brisbane data


In [6]:
#Brisbane data

p = Pyzomato(z_key)

# set up other lists to hold data info
restaurant_name = []
restaurant_id = []
restaurant_address = []
restaurant_locality = []
restaurant_city = []
latitude = []
longitude = []
zip_code = []
cuisines = []
price_range = []
average_cost_for_two = []
user_rating = []
rating_text = []
votes = []
all_reviews_count = []


# Create a loop that iterates five times starting from 0th record and increases by increments of 20
for a in range(0,100,20):
    response = p.search(lat="-27.4698",lon="153.0251",
                           count=20,
                           sort='rating',
                           order='desc',
                           category=(10),
                           start=a)
    for i in range(0,20,1):
        try:
            restaurant_id.append(response["restaurants"][i]["restaurant"]['id'])
            restaurant_name.append(response["restaurants"][i]["restaurant"]['name'])
            restaurant_locality.append(response["restaurants"][i]["restaurant"]["location"]["locality"])
            restaurant_address.append(response["restaurants"][i]["restaurant"]["location"]["address"])
            restaurant_city.append(response["restaurants"][i]["restaurant"]["location"]["city"])
            latitude.append(response["restaurants"][i]["restaurant"]["location"]["latitude"])
            longitude.append(response["restaurants"][i]["restaurant"]["location"]["longitude"])
            zip_code.append(response["restaurants"][i]["restaurant"]["location"]["zipcode"])
            cuisines.append(response["restaurants"][i]["restaurant"]['cuisines'])
            price_range.append(response["restaurants"][i]["restaurant"]['price_range'])
            average_cost_for_two.append(response["restaurants"][i]["restaurant"]['average_cost_for_two'])
            user_rating.append(response["restaurants"][i]["restaurant"]["user_rating"]['aggregate_rating'])
            rating_text.append(response["restaurants"][i]["restaurant"]["user_rating"]['rating_text'])
            votes.append(response["restaurants"][i]["restaurant"]["user_rating"]['votes'])
            all_reviews_count.append(response["restaurants"][i]["restaurant"]["all_reviews_count"])
        except KeyError:
            print(f"restaurant not found! Skipping")
            pass
    
print("---------------------------")
print("Data Retrieval Complete")

#Convert Raw data to DataFrame

brisbane_df = pd.DataFrame({
    'Restaurant_id': restaurant_id,
    'Name': restaurant_name,
    'Locality':restaurant_locality,
    'Address':restaurant_address,
    'City': restaurant_city,
    'Latitude': latitude,
    'Longitude': longitude,
    'Zip Code': zip_code,
   'Cuisines' : cuisines,
    'Price Range' :price_range,
    'Average Cost for two' :average_cost_for_two,
    'User Rating' : user_rating,
    'Rating Text' : rating_text,
    "Votes" : votes,
    "all_reviews_count": all_reviews_count
})

brisbane_df.head()

---------------------------
Data Retrieval Complete


Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Zip Code,Cuisines,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count
0,16593535,Rogue Bistro,Newstead,"14 Austin Street, Newstead, Brisbane",Brisbane,-27.4459601711,153.0439606681,4006,Modern Australian,4,140,4.9,Excellent,492,206
1,16595540,Julius Pizzeria,South Brisbane,"77 Grey Street, South Brisbane, Brisbane",Brisbane,-27.473622216,153.0179620162,4101,"Pizza, Italian",3,80,4.9,Excellent,494,181
2,16590678,Oishii Sushi Bar,Sunnybank Hills,"Shop 2, 70 Pinelands Road, Sunnybank Hills, Br...",Brisbane,-27.5910676128,153.0605843291,4109,"Japanese, Sushi",2,40,4.8,Excellent,722,194
3,16590663,Grill'd - Rosalie,Rosalie Village,"Rosalie Village, Shop 19, 21 Nash Street, Padd...",Brisbane,-27.4654874434,152.9970585555,4064,Burger,2,40,4.8,Excellent,263,55
4,16594075,Bird's Nest Yakitori,South Brisbane,"Shop 5, 220 Melbourne Street, South Brisbane, ...",Brisbane,-27.4769737009,153.0130639672,4101,"Asian, Japanese, Tapas",3,80,4.7,Excellent,490,188


# Perth data

In [7]:
#Perth data

p = Pyzomato(z_key)

# set up other lists to hold data info
restaurant_name = []
restaurant_id = []
restaurant_address = []
restaurant_locality = []
restaurant_city = []
latitude = []
longitude = []
zip_code = []
cuisines = []
price_range = []
average_cost_for_two = []
user_rating = []
rating_text = []
votes = []
all_reviews_count = []

# Create a loop that iterates five times starting from 0th record and increases by increments of 20
for a in range(0,100,20):
    response = p.search(lat="-31.9505",lon="115.8605",
                           count=20,
                           sort='rating',
                           order='desc',
                           category=(10),
                           start=a)
    for i in range(0,20,1):
        try:
            restaurant_id.append(response["restaurants"][i]["restaurant"]['id'])
            restaurant_name.append(response["restaurants"][i]["restaurant"]['name'])
            restaurant_locality.append(response["restaurants"][i]["restaurant"]["location"]["locality"])
            restaurant_address.append(response["restaurants"][i]["restaurant"]["location"]["address"])
            restaurant_city.append(response["restaurants"][i]["restaurant"]["location"]["city"])
            latitude.append(response["restaurants"][i]["restaurant"]["location"]["latitude"])
            longitude.append(response["restaurants"][i]["restaurant"]["location"]["longitude"])
            zip_code.append(response["restaurants"][i]["restaurant"]["location"]["zipcode"])
            cuisines.append(response["restaurants"][i]["restaurant"]['cuisines'])
            price_range.append(response["restaurants"][i]["restaurant"]['price_range'])
            average_cost_for_two.append(response["restaurants"][i]["restaurant"]['average_cost_for_two'])
            user_rating.append(response["restaurants"][i]["restaurant"]["user_rating"]['aggregate_rating'])
            rating_text.append(response["restaurants"][i]["restaurant"]["user_rating"]['rating_text'])
            votes.append(response["restaurants"][i]["restaurant"]["user_rating"]['votes'])
            all_reviews_count.append(response["restaurants"][i]["restaurant"]["all_reviews_count"])
        except KeyError:
            print(f"restaurant not found! Skipping")
            pass
    
print("---------------------------")
print("Data Retrieval Complete")

#Convert Raw data to DataFrame

perth_df = pd.DataFrame({
    'Restaurant_id': restaurant_id,
    'Name': restaurant_name,
    'Locality':restaurant_locality,
    'Address':restaurant_address,
    'City': restaurant_city,
    'Latitude': latitude,
    'Longitude': longitude,
    'Zip Code': zip_code,
   'Cuisines' : cuisines,
    'Price Range' :price_range,
    'Average Cost for two' :average_cost_for_two,
    'User Rating' : user_rating,
    'Rating Text' : rating_text,
    "Votes" : votes,
    "all_reviews_count": all_reviews_count
})

perth_df.head()

---------------------------
Data Retrieval Complete


Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Zip Code,Cuisines,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count
0,16596036,Ha-Lu,"Oxford Street, Leederville","4/401 Oxford Street, Mount Hawthorn, Perth",Perth,-31.9233773227,115.8411462978,6016,"Japanese, Tapas",3,60,4.9,Excellent,1067,241
1,16598837,Run Amuk,"Orient Street, South Fremantle","386A South Terrace, South Fremantle, Fremantle...",Perth,-32.0722768677,115.7530652359,6162,Fast Food,2,50,4.9,Excellent,1038,340
2,16597513,Pacific Rim Mix Plate,Applecross,"Shop B, 755 Canning Highway, Applecross, Melvi...",Perth,-32.0219001116,115.8322872967,6153,"Hawaiian, Japanese",2,40,4.9,Excellent,731,211
3,16598976,Marumo,Nedlands,"22/145 Stirling Hwy, Nedlands, Nedlands & Dalk...",Perth,-31.9802157847,115.7966588438,6907,"Japanese, Seafood, Modern Australian",1,0,4.9,Excellent,437,140
4,16598168,Nobu Perth,Burswood,"Crown Metropol Perth, Great Eastern Highway, B...",Perth,-31.9605255258,115.8940253779,6100,"Japanese, Sushi",4,410,4.8,Excellent,1822,659


# Adelaide data

In [8]:
#Adelaide data

p = Pyzomato(z_key)

# set up other lists to hold data info
restaurant_name = []
restaurant_id = []
restaurant_address = []
restaurant_locality = []
restaurant_city = []
latitude = []
longitude = []
zip_code = []
cuisines = []
price_range = []
average_cost_for_two = []
user_rating = []
rating_text = []
votes = []
all_reviews_count = []

# Create a loop that iterates five times starting from 0th record and increases by increments of 20
for a in range(0,100,20):
    response = p.search(lat="-34.9285",lon="138.6007",
                           count=20,
                           sort='rating',
                           order='desc',
                           category=(10),
                           start=a)
    for i in range(0,20,1):
        try:
            restaurant_id.append(response["restaurants"][i]["restaurant"]['id'])
            restaurant_name.append(response["restaurants"][i]["restaurant"]['name'])
            restaurant_locality.append(response["restaurants"][i]["restaurant"]["location"]["locality"])
            restaurant_address.append(response["restaurants"][i]["restaurant"]["location"]["address"])
            restaurant_city.append(response["restaurants"][i]["restaurant"]["location"]["city"])
            latitude.append(response["restaurants"][i]["restaurant"]["location"]["latitude"])
            longitude.append(response["restaurants"][i]["restaurant"]["location"]["longitude"])
            zip_code.append(response["restaurants"][i]["restaurant"]["location"]["zipcode"])
            cuisines.append(response["restaurants"][i]["restaurant"]['cuisines'])
            price_range.append(response["restaurants"][i]["restaurant"]['price_range'])
            average_cost_for_two.append(response["restaurants"][i]["restaurant"]['average_cost_for_two'])
            user_rating.append(response["restaurants"][i]["restaurant"]["user_rating"]['aggregate_rating'])
            rating_text.append(response["restaurants"][i]["restaurant"]["user_rating"]['rating_text'])
            votes.append(response["restaurants"][i]["restaurant"]["user_rating"]['votes'])
            all_reviews_count.append(response["restaurants"][i]["restaurant"]["all_reviews_count"])
        except KeyError:
            print(f"restaurant not found! Skipping")
            pass
    
print("---------------------------")
print("Data Retrieval Complete")

#Convert Raw data to DataFrame

adelaide_df = pd.DataFrame({
    'Restaurant_id': restaurant_id,
    'Name': restaurant_name,
    'Locality':restaurant_locality,
    'Address':restaurant_address,
    'City': restaurant_city,
    'Latitude': latitude,
    'Longitude': longitude,
    'Zip Code': zip_code,
   'Cuisines' : cuisines,
    'Price Range' :price_range,
    'Average Cost for two' :average_cost_for_two,
    'User Rating' : user_rating,
    'Rating Text' : rating_text,
    "Votes" : votes,
    "all_reviews_count": all_reviews_count
})
adelaide_df.head()

---------------------------
Data Retrieval Complete


Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Zip Code,Cuisines,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count
0,16588873,Peel St,"Peel Street, City Centre","9 Peel Street, City Centre, Adelaide",Adelaide,-34.9233805556,138.5979472222,5000,"Asian, Middle Eastern, Modern Australian",4,110,4.9,Excellent,391,156
1,16588993,Orana,"Rundle Street, City Centre","285 Rundle Street, Adelaide",Adelaide,-34.922626,138.610128,5000,Australian,4,350,4.9,Excellent,153,78
2,16587626,Mandoo,"Bank Street, City Centre","3/26 Bank Street, Adelaide, SA",Adelaide,-34.922517543,138.5974917188,5000,"Korean, Dumplings",2,45,4.8,Excellent,764,290
3,16587409,Indian Temptations,"Main North Road, Enfield","490 Main North Road, Blair Athol",Adelaide,-34.8526305556,138.6004333333,5085,Indian,2,50,4.8,Excellent,403,156
4,16587463,Parwana Afghan Restaurant,"Henley Beach Road, Torrensville","124b Henley Beach Road, Torrensville, Adelaide",Adelaide,-34.9237361111,138.5674583333,5031,"Middle Eastern, Afghan",3,80,4.7,Excellent,606,187


# Combining the five dataframes together

In [9]:
#Concatenate the five dataframes into one called "master"

frames = [sydney_df, melbourne_df, brisbane_df, perth_df, adelaide_df]

master = pd.concat(frames)
master.to_csv('output_data/restaurants_category_10_combined_cities.csv')
master.count()

Restaurant_id           500
Name                    500
Locality                500
Address                 500
City                    500
Latitude                500
Longitude               500
Zip Code                500
Cuisines                500
Price Range             500
Average Cost for two    500
User Rating             500
Rating Text             500
Votes                   500
all_reviews_count       500
dtype: int64

# Cleaning the data

In [10]:
five_cities = pd.read_csv("output_data/restaurants_category_10_combined_cities.csv")

# Steps:
Dropping irrelevant columns, 
Removing duplicates, 
Removing any row with Dropna

In [11]:
 #Reorganising columns inplace for future use - dropping irrelevant columns (unnamed and zipcode)  
combined = five_cities[['Restaurant_id',"Name", "Locality", "Address", "City", "Latitude", "Longitude", "Price Range", 
                      "Average Cost for two", "User Rating", "Rating Text", "Votes", "all_reviews_count", "Cuisines"]]

 #Removing the duplicates sorted by relevant fields
final_clean = combined.drop_duplicates(subset = ['Restaurant_id', 'Address'], keep = "first")       

 #Cleaning data for any empty cells in critical fields 
final_clean.dropna(subset=['Price Range','Average Cost for two','User Rating','Rating Text','Votes','all_reviews_count'],
                how = 'any')

final_clean.head()

# Removing irrelevant values

In [12]:
cleaned_df = final_clean.loc[(final_clean["Average Cost for two"] != 0) &
                        (final_clean["Average Cost for two"] != 25000017)]
cleaned_df.to_csv('output_data/Top_497.csv')
cleaned_df.head()

Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count,Cuisines
0,15547004,Restaurant Hubert,CBD,"15 Bligh Street, CBD, Sydney",Sydney,-33.865348,151.210624,4,150,4.9,Excellent,589,224,"French, European"
1,16558798,Quay,Circular Quay,"Upper Level, Overseas Passenger Terminal 5 Hic...",Sydney,-33.858029,151.20997,4,500,4.9,Excellent,1366,454,Modern Australian
2,16559171,Tetsuya's,CBD,"529 Kent Street, CBD, Sydney",Sydney,-33.875143,151.204932,4,440,4.9,Excellent,1235,329,Japanese
3,16569454,LuMi Bar & Dining,Pyrmont,"56 Pirrama Road, \tPyrmont, Pyrmont, Sydney",Sydney,-33.867137,151.197517,4,190,4.9,Excellent,452,196,"Italian, Japanese"
4,15545439,Manpuku,Chatswood,"226 Victoria Avenue, Chatswood, Sydney",Sydney,-33.794417,151.189542,2,40,4.9,Excellent,486,190,"Japanese, Ramen"


# Printing how many rows were lost during the process

In [13]:
    #Finding total amount of restaurants included 
total_restaurants = len(combined)
    #Calculating total amount of duplicates
dup_result = len(combined.drop_duplicates())
    #Finding the total amount of duplicates
dup_amount =  total_restaurants - dup_result
new_rest_count1 = dup_result
    #Finding how many rows dropped due by invalid values        
dropna_count = len(final_clean.dropna(subset = ['Price Range','Average Cost for two','User Rating','Rating Text','Votes',
                                             'all_reviews_count'], how = 'any'))
items_dropped = new_rest_count1 - dropna_count
new_rest_count2 = dropna_count

    #Removing invalid values from average cost for two
drop_invalid = new_rest_count2 - len(cleaned_df)

    #tallying invalid totals
invalid_count = drop_invalid + items_dropped

    #rows removed count
rows_removed = (dup_amount + invalid_count)    
#Showing how many rows needed to be removed during the cleaning process
print(f'Following the cleaning process, {rows_removed} rows were removed as there was {dup_amount} duplicates and {invalid_count} invalid values')


Following the cleaning process, 3 rows were removed as there was 0 duplicates and 3 invalid values


# Creating Data Frame of Top100 Restaurants among all cities

In [14]:
#Selecting the top 100 restaurants in Australia based on User ratings and number of votes
ranked_df = cleaned_df.sort_values(["User Rating", "Votes"], ascending = False)

#Reset the index
new_index_ranked = ranked_df.reset_index(drop=True)
new_index_ranked

#Select the top 100 restaurants
top_100 = new_index_ranked.loc[new_index_ranked.index <100]
top_100.to_csv('output_data/Top_100.csv')
top_100

Unnamed: 0,Restaurant_id,Name,Locality,Address,City,Latitude,Longitude,Price Range,Average Cost for two,User Rating,Rating Text,Votes,all_reviews_count,Cuisines
0,16572612,Vue de monde,CBD,"Level 55, Rialto, 525 Collins Street, CBD, Mel...",Melbourne,-37.818954,144.957934,4,600,4.9,Excellent,3225,987,"Australian, Contemporary"
1,16585905,Tipo 00,CBD,"361 Little Bourke Street, CBD, Melbourne",Melbourne,-37.813528,144.961973,4,150,4.9,Excellent,1927,717,Italian
2,17881527,Dexter,Preston,"456 High Street, Preston, Melbourne",Melbourne,-37.736196,145.004456,4,110,4.9,Excellent,1475,685,"American, BBQ"
3,16558798,Quay,Circular Quay,"Upper Level, Overseas Passenger Terminal 5 Hic...",Sydney,-33.858029,151.209970,4,500,4.9,Excellent,1366,454,Modern Australian
4,16559171,Tetsuya's,CBD,"529 Kent Street, CBD, Sydney",Sydney,-33.875143,151.204932,4,440,4.9,Excellent,1235,329,Japanese
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,16590137,Zum Kaiser,Woolloongabba,"416 Vulture Street, Woolloongabba, Brisbane",Brisbane,-27.484477,153.036863,3,60,4.7,Excellent,296,83,German
96,16561357,Ormeggio At The Spit,"D'Albora Marinas, Mosman","D'Albora Marinas The Spit, Spit Road, Mosman, ...",Sydney,-33.804225,151.245839,4,300,4.7,Excellent,286,131,Italian
97,16574463,Katik Take Away Food,Campbellfield,"349 Barry Road, Campbellfield, Melbourne, VIC",Melbourne,-37.666874,144.948233,2,40,4.7,Excellent,284,109,"Middle Eastern, Turkish"
98,16589254,127 Days,Croydon Park,"127 Days Road, Croydon Park, Adelaide",Adelaide,-34.875846,138.566375,2,50,4.7,Excellent,267,123,"American, Burger, Sandwich"
