# What is the best location for new coffeeshop, bookstore and bookcafe in Baku?
## Vusala Shikhaliyeva
### This notebook is based on Capstone Project in Coursera courses
#### Week2

## Launching new cooffee/book shop and bookcafe in Baku

- Build a dataframe of neighborhoods in Baku by reading html data from website
- Get the geographical coordinates of the neighborhoods
- Show the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new coffeeshop, bookshop or bookcafe

In [1]:
import pandas as pd
import numpy as np
#!pip install folium
import folium
import requests
import json
#!pip install geocoder
import geocoder
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

### Get data about neighborhoods in Baku

In [2]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Baku_geography_stubs").text
soup = BeautifulSoup(data, 'html.parser')

regionList = []

for row in soup.find_all("div", class_="mw-content-ltr")[0].findAll("li"):
    regionList.append(row.text)
del regionList[:3]

regions = pd.DataFrame({"Neighborhood": regionList})

regions.head()

Unnamed: 0,Neighborhood
0,Badamdar
1,Bakıxanov
2,Balaxanı
3,Baş Ələt
4,Bibiheybət


### Getting geolocations of the regions

In [4]:
def get_latlng(university):
    coordinats = None
    while(coordinats is None):
        i = geocoder.arcgis('{}, Baku, Azerbaijan'.format(university))
        coordinats = i.latlng
    return coordinats

coordinats = [ get_latlng(university) for university in regions["Neighborhood"].tolist() ]
coordinats

[[40.361599648164855, 49.81508025040292],
 [40.43198000000007, 49.95330000000007],
 [40.46255072907915, 49.9343836061088],
 [40.410660000000064, 49.87222000000003],
 [40.33624099687983, 49.82380512464341],
 [40.44440000000003, 49.805660000000046],
 [40.410660000000064, 49.87222000000003],
 [40.410660000000064, 49.87222000000003],
 [40.466731277528304, 49.843503490449706],
 [40.35000000000008, 49.833330000000046],
 [40.372021668998926, 49.84465435423397],
 [40.51789000000008, 50.11390000000006],
 [40.49750000000006, 50.21222000000006],
 [40.283060000000035, 49.28074000000004],
 [40.320350000000076, 50.59199000000007],
 [40.320350000000076, 50.59199000000007],
 [40.316670000000045, 50.583330000000046],
 [40.44053000000008, 50.27612000000005],
 [40.433520000000044, 50.276380000000074],
 [40.38513000000006, 49.94174000000004],
 [40.410660000000064, 49.87222000000003],
 [40.410660000000064, 49.87222000000003],
 [40.410660000000064, 49.87222000000003],
 [40.401200000000074, 50.33402000000006

### Assingning geolocations to regions

In [5]:
coordinats = pd.DataFrame(coordinats, columns=['Latitude', 'Longitude'])
regions['Latitude'] = coordinats['Latitude']
regions['Longitude'] = coordinats['Longitude']
regions.shape
regions.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Badamdar,40.3616,49.81508
1,Bakıxanov,40.43198,49.9533
2,Balaxanı,40.462551,49.934384
3,Baş Ələt,40.41066,49.87222
4,Bibiheybət,40.336241,49.823805


### Creating the map of Baku

In [7]:
address = 'Baku, Azerbaijan'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of the capital of Azerbaijan: Baku is  {}, {}.'.format(latitude, longitude))

The geograpical coordinates of the capital of Azerbaijan: Baku is  40.3754434, 49.8326748.


In [10]:
mapBaku = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighborhood in zip(regions['Latitude'], regions['Longitude'], regions["Neighborhood"]):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(mapBaku)  
    
mapBaku

### Foursquare API usage in exploration of neighborhoods.

In [12]:
CLIENT_ID = "QBFJDWVSUB0GOQKFAHKNMVM0RWAKFEEFIEB13RZAXBVAFZIX" 
CLIENT_SECRET = "NIURO4Z0K1AYNJVUJYJJP5YLZ4FPF2KC4HAAUWJ0RAN0IETK" 
VERSION = '20191204' 

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: QBFJDWVSUB0GOQKFAHKNMVM0RWAKFEEFIEB13RZAXBVAFZIX
CLIENT_SECRET:NIURO4Z0K1AYNJVUJYJJP5YLZ4FPF2KC4HAAUWJ0RAN0IETK


In [20]:
radius = 2000
LIMIT = 100

region = []

for lat, long, neighborhood in zip(regions['Latitude'], regions['Longitude'], regions['Neighborhood']):

    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for i in results:
        region.append((
            neighborhood,
            lat, 
            long, 
            i['venue']['name'], 
            i['venue']['location']['lat'], 
            i['venue']['location']['lng'],  
            i['venue']['categories'][0]['name']))

region_df = pd.DataFrame(region)

region_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(region_df.shape)
region_df.head()

(2125, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Badamdar,40.3616,49.81508,Beerbaşa,40.363921,49.818902,Brewery
1,Badamdar,40.3616,49.81508,Nakhchivan Restaurant,40.363588,49.819427,Restaurant
2,Badamdar,40.3616,49.81508,Şəki Restoranı,40.357243,49.811641,Restaurant
3,Badamdar,40.3616,49.81508,Bouquet,40.363285,49.820232,Flower Shop
4,Badamdar,40.3616,49.81508,Buxara,40.362943,49.816071,Comfort Food Restaurant


In [21]:
region_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Badamdar,100,100,100,100,100,100
Bakıxanov,24,24,24,24,24,24
Balaxanı,5,5,5,5,5,5
Baş Ələt,69,69,69,69,69,69
Bibiheybət,57,57,57,57,57,57
Bilgəh,69,69,69,69,69,69
Biləcəri,11,11,11,11,11,11
"Binə, Baku",69,69,69,69,69,69
Binəqədi raion,11,11,11,11,11,11
Bukhta-Ilicha,100,100,100,100,100,100


In [22]:
print('There are {} uniques categories.'.format(len(region_df['VenueCategory'].unique())))

There are 181 uniques categories.


In [23]:
"Neighborhood" in venues_df['VenueCategory'].unique()

False

### Results

The avability of the neighborhoods was read form Wikipedia. For today the capital of Azerbaijan has 12 neighborhoods totally. We used scrapping methods: Python packages such as requests and BeatifulSoup. We  utilized the geocode of each neighborhood in Baku, and got the exact location of them. We will use Foursquare API to order to get venue data for neighborhoods. Additionally, using this API we get crucial categorical information which help to figure out business problem of my project. In this project, Web Scrapping, working with Foursquare API, data cleaning and wrangling, machine learning and map visualization.