# Capstone Project - The Battle of the Neighborhoods

## Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [1. Introduction](#introduction)
    * [1.1 Background](#Background)
    * [1.2 Problem](#Problem)
    * [1.3 Interest](#Interest)
* [2. Data](#data)
    * [2.1 Data sources](#data_sources)
    * [2.2 Data acquisition and cleaning](#Data_acquisition_and_cleaning)
    * [2.2 Feature selection](#feature_selection)
* [3. Methodology](#methodology)
* [4. Analysis](#analysis)
* [5. Results and Discussion](#results)
* [6. Conclusion](#conclusion)

## 1. Introduction <a name="introduction"></a>

### 1.1 Background <a name="Background"></a>

&emsp; The world is constantly opening new entertainment venues, such as nightclubs. This is a great place to relax and have fun. In them you can not only relax, but also make interesting acquaintances. Such establishments bring a good profit. There are always great opportunities in the implementation of any idea for the design and selection of themes for a nightclub.

### 1.2 Problem <a name="Problem"></a>

&emsp; There is a lot of competition in the market. With all the benefits, many establishments are closed, making one of the main mistakes at the start-choosing the wrong location. In this project, we will choose the optimal location of a **nightclub in Moscow.**

### 1.3 Interest <a name="Interest"></a>

&emsp; This project will be interesting for entrepreneurs who have decided to open a nightclub in Moscow. It will also be of interest to landlords, as you can find customers for a nightclub and, reasonably, set a price and build a long-term relationship with the tenant.

## 2. Data <a name="data"></a>

### 2.1 Data sources <a name="data_sources"></a>

&emsp; Let's determine the factors that influence the successful choice of location for a nightclub:
* Number of Nightclubs located right nearby
* Distance to the center
* Rating of clubs in the neighborhood

&emsp; Data sources for the project:
1. To determine the coordinates, use the **geoby** library
2. All night clubs, their locations and ratings are obtained using the **Foursquare API**
We will limit the search for a place to the third transport ring, which is approximately 7 km in radius.

## 2.2 Data acquisition and cleaning <a name="Data_acquisition_and_cleaning"></a>

Let's import some useful libraries

In [2]:
import pandas as pd 
import numpy as np
import json
import requests 
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import folium

In [3]:
geolocator = Nominatim(user_agent="mos_explorer")
address = 'Red Square Moscow, Russia'

location = geolocator.geocode(address)

latitude = location.latitude
longitude = location.longitude

In [4]:
CLIENT_ID = 'ZXYIVVBGMBCQP204I5NPB3BVWWK3ORXAZWGOCNAE0N4BTEVX' # your Foursquare ID
CLIENT_SECRET = 'WVD2ZXL1M0P3NHNNMKMNMUTTRYVM53SDIPEACYQZIRJDBQAJ' # your Foursquare Secret
ACCESS_TOKEN = 'UXQXQ3NXUD1U5QI5OGG4EHHZ33NX1WBKBJJS4ZSC5JJHIH2B' # your FourSquare Access Token
VERSION = '20211001'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZXYIVVBGMBCQP204I5NPB3BVWWK3ORXAZWGOCNAE0N4BTEVX
CLIENT_SECRET:WVD2ZXL1M0P3NHNNMKMNMUTTRYVM53SDIPEACYQZIRJDBQAJ


Так как у Foursquare API ограничение в получении заведений = 50, поставим 4 точки, равноудаленные от центра и проведем поиск по радиусу, пересекающие друг друга. Так как растояния небольшие применим упрощение: для широты 1 градус = 111.1 км, 1 градус долготы приблизительно равен 64 км

In [5]:
rad = 7 #kilometres
def four_points_mos(lat, lon):
    points = []
    lat_one_min = 111.1/60
    lon_one_min = 64/60
    new_lat_1 = lat +(((rad/2)/lat_one_min)/60) 
    new_lon_1 = lon +(((rad/2)/lon_one_min)/60)
    new_lat_2 = lat -(((rad/2)/lat_one_min)/60) 
    new_lon_2 = lon -(((rad/2)/lon_one_min)/60) 
    points = [[new_lat_1, lon],
              [lat, new_lon_1],
              [new_lat_2 , lon],
              [lat, new_lon_2]]
    return points

In [6]:
new_coordinate = four_points_mos(latitude, longitude)
new_coordinate.append([latitude, longitude])
new_coordinate

[[55.785131450315035, 37.62137960067377],
 [55.7536283, 37.67606710067377],
 [55.72212514968497, 37.62137960067377],
 [55.7536283, 37.56669210067377],
 [55.7536283, 37.62137960067377]]

Для перекрытия берем радиус поиска на 500 метров больше, на сайте Foursquare находим категорию ночного клуба для поиска. Так же убираем, так как зоны выборки пересекаются, убираем дубликаты клубов

In [7]:
radius = 4000
category_id = '4bf58dd8d48988d11f941735'

In [8]:
def table_venues(radius, category, locations, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION, LIMIT=50):
    df = pd.DataFrame(columns=['id'])
    for lat, long in locations:
        url = f'https://api.foursquare.com/v2/venues/search?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={lat},{long}\
        &oauth_token={ACCESS_TOKEN}&v={VERSION}&categoryId={category_id}&radius={radius}&limit={LIMIT}'
        result = json_normalize(requests.get(url).json()['response']['venues'])
        print(result.shape)
        df = pd.concat([df, result],axis=0)
    df = df.drop_duplicates(subset='id')
    return df

In [9]:
nightclubs = table_venues(radius, category_id, new_coordinate, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION)

  


(50, 19)
(49, 19)
(48, 19)
(49, 19)
(49, 19)


Изучим получившийся датасет

In [10]:
nightclubs.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.address,location.postalCode,location.city,location.state,location.formattedAddress,location.crossStreet,location.neighborhood,venuePage.id
0,4f4f4800e4b09d63cc3b57bd,Shanti Chiilout,"[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",v-1610543437,False,55.789855,37.634923,"[{'label': 'display', 'lat': 55.78985484610564...",997.0,RU,Россия,,,,,,,,
1,4fdcc6e3e4b09473e194b08e,"Клуб ""Ленинград""","[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",v-1610543437,False,55.819021,37.649485,"[{'label': 'display', 'lat': 55.81902078048416...",4162.0,RU,Россия,,,,,,,,
2,5bbe29c7deb4950025f20840,Клуб Город,"[{'id': '5032792091d4c4b30a586d5c', 'name': 'C...",v-1610543437,False,55.783303,37.596547,"[{'label': 'display', 'lat': 55.78330271134569...",1567.0,RU,Россия,Лесная ул. 30а,125047.0,Москва,Москва,"[Лесная ул. 30а, 125047, Москва]",,,
3,5e63e29d29ef8200088b8dd4,Moskova,"[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",v-1610543437,False,55.755825,37.6173,"[{'label': 'display', 'lat': 55.755825, 'lng':...",3272.0,RU,Россия,,,,,,,,
4,5d2759425459f20023a4b94f,ТехникаБезОпасности,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",v-1610543437,False,55.782595,37.600707,"[{'label': 'display', 'lat': 55.782595, 'lng':...",1324.0,RU,Россия,21с10 Сущёвская улица,127030.0,Москва,Москва,"[21с10 Сущёвская улица (Бар), 127030, Москва]",Бар,,


## 2.3 Feature selection <a name="feature_selection"></a>

Проверим коллчество полученных ночных клубов и параметров

In [11]:
nightclubs.shape

(94, 19)

У нас получилось 94 клуба и 19 features

Выделим только нужные нам features для дальнейшего анализа.

In [12]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in nightclubs.columns if col.startswith('location.')] + ['id']
nightclubs_filtered = nightclubs.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
nightclubs_filtered['categories'] = nightclubs_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
nightclubs_filtered.columns = [column.split('.')[-1] for column in nightclubs_filtered.columns]

nightclubs_filtered.head()

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,cc,country,address,postalCode,city,state,formattedAddress,crossStreet,neighborhood,id
0,Shanti Chiilout,Nightclub,55.789855,37.634923,"[{'label': 'display', 'lat': 55.78985484610564...",997.0,RU,Россия,,,,,,,,4f4f4800e4b09d63cc3b57bd
1,"Клуб ""Ленинград""",Nightclub,55.819021,37.649485,"[{'label': 'display', 'lat': 55.81902078048416...",4162.0,RU,Россия,,,,,,,,4fdcc6e3e4b09473e194b08e
2,Клуб Город,Concert Hall,55.783303,37.596547,"[{'label': 'display', 'lat': 55.78330271134569...",1567.0,RU,Россия,Лесная ул. 30а,125047.0,Москва,Москва,"[Лесная ул. 30а, 125047, Москва]",,,5bbe29c7deb4950025f20840
3,Moskova,Nightclub,55.755825,37.6173,"[{'label': 'display', 'lat': 55.755825, 'lng':...",3272.0,RU,Россия,,,,,,,,5e63e29d29ef8200088b8dd4
4,ТехникаБезОпасности,Bar,55.782595,37.600707,"[{'label': 'display', 'lat': 55.782595, 'lng':...",1324.0,RU,Россия,21с10 Сущёвская улица,127030.0,Москва,Москва,"[21с10 Сущёвская улица (Бар), 127030, Москва]",Бар,,5d2759425459f20023a4b94f


In [13]:
need_columns = ['name', 'lat', 'lng', 'address']
final_nightclubs = nightclubs_filtered[need_columns]

In [14]:
final_nightclubs.head()

Unnamed: 0,name,lat,lng,address
0,Shanti Chiilout,55.789855,37.634923,
1,"Клуб ""Ленинград""",55.819021,37.649485,
2,Клуб Город,55.783303,37.596547,Лесная ул. 30а
3,Moskova,55.755825,37.6173,
4,ТехникаБезОпасности,55.782595,37.600707,21с10 Сущёвская улица


Мы получили подготовленный датасет с нужной информацией. Отобразим все клубы на карте, для наглядности.

In [15]:
mos_map = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, name in zip(final_nightclubs['lat'], final_nightclubs['lng'], final_nightclubs['name']):
    label = f'{name}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(mos_map) 

    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup='Red Square',
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(mos_map)
    
mos_map

In [16]:
# !pip install haversine

*Создадим чекпоинт для удобства работы.*

In [17]:
nc = final_nightclubs.copy() #Checkpoint

Посчитаем расстояние каждого заведения до центра в километрах.

In [18]:
from haversine import haversine, Unit

distance_center = []
for lat, long in zip(nc['lat'], nc['lng']):
    distance_center.append(haversine([latitude, longitude], [lat, long]))

nc['distance_to_the_center'] = distance_center

In [19]:
nc.head()

Unnamed: 0,name,lat,lng,address,distance_to_the_center
0,Shanti Chiilout,55.789855,37.634923,,4.116315
1,"Клуб ""Ленинград""",55.819021,37.649485,,7.480641
2,Клуб Город,55.783303,37.596547,Лесная ул. 30а,3.646983
3,Moskova,55.755825,37.6173,,0.353312
4,ТехникаБезОпасности,55.782595,37.600707,21с10 Сущёвская улица,3.470835


In [None]:
https://api.foursquare.com/v2/venues/VENUE_ID

In [57]:
def venues_rating(raitings_of_nclubs, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION):
#     df = pd.DataFrame(columns=['id', 'rating'])
    ratings = []
    for r in raitings_of_nclubs:
        url = f'https://api.foursquare.com/v2/venues/{r}?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&oauth_token={ACCESS_TOKEN}&v={VERSION}'
        try:
            result = json_normalize(requests.get(url).json()['response']['venue']['rating'])
            ratings.append(result)
        except:
            ratings.append(0)
#         df = pd.concat([df, result],axis=0)
#     df = df.drop_duplicates(subset='id')
    return ratings

In [56]:
ratings = venues_rating(nightclubs_filtered['id'], CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION)

  import sys
From cffi callback <function _verify_callback at 0x000001CFC6210F78>:
Traceback (most recent call last):
  File "C:\Users\volon\anaconda3\lib\site-packages\OpenSSL\SSL.py", line 305, in wrapper
    @wraps(callback)
KeyboardInterrupt


In [21]:
nightclubs_filtered['id']

0     4f4f4800e4b09d63cc3b57bd
1     4fdcc6e3e4b09473e194b08e
2     5bbe29c7deb4950025f20840
3     5e63e29d29ef8200088b8dd4
4     5d2759425459f20023a4b94f
                ...           
17    5d86c85c6c23c00007954932
20    4ff7346ee4b055f897147ffd
41    59a1b78c47f8765b4377c3f9
42    4d75305194985481f7d8337e
48    4d2b7b3f888af04daf54e0af
Name: id, Length: 94, dtype: object

In [None]:
ratings = []

In [65]:
url = f'https://api.foursquare.com/v2/venues/4d2b7b3f888af04daf54e0af?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&oauth_token={ACCESS_TOKEN}&v={VERSION}'

try:
    result = json_normalize(requests.get(url).json()['response']['venue']['rating'])
    ratings.append(result)
except:
    ratings.append(0)

In [59]:
result

Unnamed: 0,notifications,meta.code,meta.requestId,response.venue.id,response.venue.name,response.venue.location.lat,response.venue.location.lng,response.venue.location.labeledLatLngs,response.venue.location.cc,response.venue.location.country,...,response.venue.attributes.groups,response.venue.bestPhoto.id,response.venue.bestPhoto.createdAt,response.venue.bestPhoto.source.name,response.venue.bestPhoto.source.url,response.venue.bestPhoto.prefix,response.venue.bestPhoto.suffix,response.venue.bestPhoto.width,response.venue.bestPhoto.height,response.venue.bestPhoto.visibility
0,"[{'type': 'notificationTray', 'item': {'unread...",200,5ffef3b51c80e062c5e8ab7c,4f4f4800e4b09d63cc3b57bd,Shanti Chiilout,55.789855,37.634923,"[{'label': 'display', 'lat': 55.78985484610564...",RU,Россия,...,"[{'type': 'price', 'name': 'Price', 'summary':...",5b29e25882a750002c143ddd,1529471576,Swarm for iOS,https://www.swarmapp.com,https://fastly.4sqi.net/img/general/,/25053138_gd45lEr60Xa7UpsdmkvUYigmkXDRa_ZA9xEc...,1440,1920,public


In [66]:
ratings

[0, 0, 0, 0]

In [67]:
requests.get(url).json()

{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5ffef5733253c51e9af3fb08'},
 'response': {}}