<h3 style="text-align: center">The capstone project for the final course in Applied Data Science specialization</h3>


## Import required modules

In [1]:
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import json
import requests
import folium
import docx

## Introduction/Business Problem

I'll decide what is the best city to start a business in between two cities: **City of Toronto and New York City**

It's important, because what is the matter of starting a business and it'll fail?

This is for anyone who wants  to open a restaurant in one of these two cities.

## Data

I'll use the **foursqaure api.**

I'll compare the venues in the two cities. The city which has a few restaurants in its top venues, is the best one to open a restaurant in.

I'll use the **explore** endpoint for doing this comparison.

The endpoint group is **venues.**

## Methodology

I'll get the data for the two cities from the **foursquare api** with a 5000m radius to explore the top 100 venues around them, I'll need to get the cities geographic coordinates, I'll use the **geopy** library for this. I'll use **pandas** to create the dataframes and **folium** to create a map about each city data, it'll help a lot to decide which city is the best to open a restaurant in.

The reason why I chose a 5000m radius because there were no restaurants around **City of Toronto** with 500m radius.

### First, let's get the cordinates

In [2]:
# Get Toronto location cordinates
t_address = "City of Toronto, ON"
t_locater = Nominatim(user_agent="tr_explorer")
t_latitude = t_locater.geocode(t_address).latitude
t_longitude = t_locater.geocode(t_address).longitude

# Get New York City location cordinates
ny_address = "New York City, NY"
n_locater = Nominatim(user_agent="ny_explorer")
n_latitude = n_locater.geocode(ny_address).latitude
n_longitude = n_locater.geocode(ny_address).longitude
print(f"New York city cordinates are: [{n_latitude}, {n_longitude}]")
print(f"City of Toronto cordinates are: [{t_latitude}, {t_longitude}]")

New York city cordinates are: [40.7127281, -74.0060152]
City of Toronto cordinates are: [43.7170226, -79.41978303501344]


### Now, let's define our URLs

In [3]:
# Important credentials
CLIENT_ID = "YJU3NJMH0W2M5FSKE1EWKFF3QW2OBDN4SO33HFCAL5ISXY4U"
CLIENT_SECRET = "11CFTR0YPYCSVJIAHW4SRXO30521GC5URFH4XXEL5SH5U2BH"
VERSION = "20180608"
LIMIT = 100
RADIUS = 5000

# The URLs
ny_url = f"https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={n_latitude},{n_longitude}&radius={RADIUS}&limit={LIMIT}"

t_url = f"https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={t_latitude},{t_longitude}&radius={RADIUS}&limit={LIMIT}"

### Making the requests to the api

In [4]:
ny_results = requests.get(ny_url).json()
t_results = requests.get(t_url).json()

In [5]:
ny_results

: [{'id': '4bf58dd8d48988d1dc931735',
         'name': 'Tea Room',
         'pluralName': 'Tea Rooms',
         'shortName': 'Tea Room',
         'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/tearoom_',
          'suffix': '.png'},
         'primary': True}],
       'photos': {'count': 0, 'groups': []}},
      'referralId': 'e-0-56292858498e5a271a288948-84'},
     {'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '510c85e7e4b0056826b88297',
       'name': 'Kura',
       'location': {'address': '130 Saint Marks Pl',
        'crossStreet': 'btwn 1st Ave & Ave A',
        'lat': 40.726802644699376,
        'lng': -73.98344407523645,
        'labeledLatLngs': [{'label': 'display',
          'lat': 40.726802644699376,
          'lng': -73.98344407523645}],
        'distance': 2466,
        'postalCode': '10009',
        'cc': 'US',
        'cit

### Defining a function will help later

In [6]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Get New York City venues

In [7]:
ny_venues = ny_results['response']['groups'][0]['items']
ny_nearby_venues = pd.json_normalize(ny_venues)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
ny_nearby_venues = ny_nearby_venues.loc[:, filtered_columns]
ny_nearby_venues['venue.categories'] = ny_nearby_venues.apply(get_category_type, axis=1)
ny_nearby_venues.columns = [col.split(".")[-1] for col in ny_nearby_venues.columns]
ny_restaurants = ny_nearby_venues[ny_nearby_venues.categories.str[-10:] == "Restaurant"]
ny_restaurants.reset_index(drop=True, inplace=True)
ny_restaurants.replace("Restaurant", np.nan, inplace=True)
ny_restaurants.fillna("Coffee shop", inplace=True)
ny_restaurants.to_csv('ny_restaurants.csv')
ny_restaurants.head()

Unnamed: 0,name,categories,lat,lng
0,Crown Shy,Coffee shop,40.706187,-74.00749
1,Le Coucou,French Restaurant,40.719114,-74.000202
2,Kiki's,Greek Restaurant,40.714476,-73.992036
3,Wayla,Thai Restaurant,40.718291,-73.992584
4,CAVA,Mediterranean Restaurant,40.721928,-73.996512


### Get Toronto venues

In [8]:
t_venues = t_results['response']['groups'][0]['items']
t_nearby_venues = pd.json_normalize(t_venues)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
t_nearby_venues = t_nearby_venues.loc[:, filtered_columns]
t_nearby_venues['venue.categories'] = t_nearby_venues.apply(get_category_type, axis=1)
t_nearby_venues.columns = [col.split(".")[-1] for col in t_nearby_venues.columns]
 
t_restaurants = t_nearby_venues[t_nearby_venues.categories.str[-10:] == "Restaurant"]
t_restaurants.reset_index(drop=True, inplace=True)
t_restaurants.replace("Restaurant", np.nan, inplace=True)
t_restaurants.fillna("Coffee shop", inplace=True)
t_restaurants.to_csv('t_restaurants.csv')
t_restaurants.head()

Unnamed: 0,name,categories,lat,lng
0,Tutto Pronto,Italian Restaurant,43.728235,-79.418086
1,La Vecchia Ristorante,Italian Restaurant,43.710167,-79.399086
2,Cibo Wine Bar,Italian Restaurant,43.711464,-79.39957
3,Grazie Ristorante,Italian Restaurant,43.709329,-79.398823
4,Balsamico,Italian Restaurant,43.701505,-79.397162


### Get the number of restaurants in each city

In [9]:
print(f'Number of restaurants in New York City: {ny_restaurants.shape[0]}')
print(f'Number of restaurants in City of Toronto: {t_restaurants.shape[0]}')

Number of restaurants in New York City: 19
Number of restaurants in City of Toronto: 22


To be more specific, let's get number of each category of all restaurants in the two cities

#### New York City:

In [10]:
ny_restaurants.categories.value_counts()

Italian Restaurant          5
Mediterranean Restaurant    2
Thai Restaurant             2
French Restaurant           2
Coffee shop                 1
Falafel Restaurant          1
Greek Restaurant            1
Seafood Restaurant          1
Japanese Restaurant         1
Udon Restaurant             1
New American Restaurant     1
Moroccan Restaurant         1
Name: categories, dtype: int64

#### City of Toronto:

In [11]:
t_restaurants.categories.value_counts()

Italian Restaurant       10
Indian Restaurant         3
Japanese Restaurant       2
Coffee shop               1
Chinese Restaurant        1
Indonesian Restaurant     1
Fast Food Restaurant      1
Sushi Restaurant          1
Ramen Restaurant          1
French Restaurant         1
Name: categories, dtype: int64

I don't think that's enough to determine which city is the best, so, let's create a map for each city

### New York City map

In [12]:
ny_map = folium.Map(location=[n_latitude, n_longitude], zoom_start=13)
for lat, lng, name, cat in zip(ny_restaurants.lat, ny_restaurants.lng, ny_restaurants.name, ny_restaurants.categories):
    label = folium.Popup(f'{name}, {cat}', parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="red",
        fill="True",
        fill_color="#650D1B",
        fill_opacity=0.8,
        parse_html=False).add_to(ny_map)

ny_map        

### City of Toronto map:

In [13]:
t_map = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)
for lat, lng, name, cat in zip(t_restaurants.lat, t_restaurants.lng, t_restaurants.name, t_restaurants.categories):
    label = folium.Popup(f'{name}, {cat}', parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill="True",
        fill_color="#2364AA",
        fill_opacity=0.8,
        parse_html=False).add_to(t_map)

t_map      

## Results

So, to determine what is the best city, we have to look at some dimensions.

* There are no any restaurants around **City of Toronto** with 500m radius but there are 19 restaurants around **New York City** with 500m radius in the top 100 venues.

* There are 22 restaurants around **City of Toronto** with 5000m radius in the top 100 venues.

* There are 10 Italian restaurants around **City of Toronto** but there are 5 ones only in **New York City**.

* There are more different restaurants categories around New York City than City of Toronto.

* The top 5 restaurants around **City of Toronto** are Italian restaurants.

* There are 3 Indian restaurants in **City of Toronto** but there're no Indian restaurants around **New York City.**

## Discussion

**City of Toronto and New York City** are big cities, and a lot of people will intend to open a restaurant in one of these two cities, or maybe there is who has a restaurant in one of the two cities and wants to immigrate to the other one, and they are afraid if their business failed in the other city.

They have to know who are their competitors, what do they serve, and where their businesses.

If they want their businesses to success in the city which they will immigrate to, they have to be careful making their choices.

## Conclusion

If there is anyone who wants to open a restaurant in one of the two cities, they have to notice these things:

  * If they will open the restaurant with 500m radius, the best city is **City of Toronto* because there are no restaurants around it with 500m radius in the top 100 venues.

  * If they don't care about distance, and they will open it anywhere around the city, they will have to notice these things:
  
    1. If they will open an Italian or a Japanese Restaurant, the best city is **New York City** because there are less Italian and Japanese restaurants around **New York City** than **City of Toronto** in the top 100 venues.

    2. If they will open an Indian, Chinese, Sushi, Fast Food, Ramen or an Indonesian Restaurant, the best city is **New York City**, because these types of restaurants are less around **New York City** than **City of Toronto** in the top 100 venues.

    3. If they will open a French Restaurant, the best city is **City of Toronto**, because there are less French Restaurants around **City of Toronto** than **New York City** in the top 100 venues.

    4. If they will open a Mediterranean, Thai, Seafood, Falafel, Moroccan, New American, Udon, or Greek Restaurant, the best city is **City of Toronto**, because these type of restaurants are not around **City of Toronto** in the top 100 venues.

    5. If they will open a coffee shop, there is no best city in this situation, because there are a coffee shop around each of the two cities in the top 100 venues.

    6. If they will open other type of restaurant, it's fine to choose any city they want.

So, They have to decide what they want to do before open a restaurant in one of the two cities