# Coursera Capstone Project

## Table of contents
1. [Introduction](#Introduction)

2. [Data extraction](#Data)

3. [Data preprocessing](#Preprocessing)

4. [Analysis](#GeoAnalysis)

5. [Visualisation](#Visualisation)

6. [Results](#Results)

### Introduction
I was born in a small town Langepas in Western Siberia and haven't been there for a long time. Consider the following situation: I want to return home and open a coffee shop. The question is: "**Is it possible to compete with other cafes and coffee shops in the town?**".

To answer this question I will use the methods learnt in previous Coursera IBM courses.

In [108]:
import pandas as pd
import folium
from folium import plugins
import requests

### GeoAnalysis
This class is able to give us the required information about venues in any city. Class ideology is chosen for the project because it can greatly lower required efforts in investigation of different locations. Nevertheless, in the current work I want to analyze my hometown, the same methods are applicable for any other cities. It is enough to initialize another object with its new address.

In [114]:
class GeoAnalysis(object):
 
    def __init__(self, adress, CLIENT_ID, CLIENT_SECRET, version, radius, limit):
        """Constructor"""
        self.adress = adress
        self.CLIENT_ID = CLIENT_ID
        self.CLIENT_SECRET = CLIENT_SECRET
        self.version = version
        self.radius = radius
        self.limit = limit
    
    def get_data(self, query):
        request_parameters = {
            "client_id": self.CLIENT_ID,
            "client_secret": self.CLIENT_SECRET,
            "v": self.version,
            "section": query,
            "near": self.adress,
            "radius": self.radius,
            "limit": self.limit}
        d = requests.get("https://api.foursquare.com/v2/venues/explore", params=request_parameters)
        data = d.json()["response"]
        return data 
    
    # Basic query info
    def query_info(self, data):
        print('Query consists of: ', data.keys())
        print('Number of venues: ',data['totalResults'])
        print('Пасхалка, yoпта')
        center = data['geocode']['center']
        print('Coordinates of the center: ', center)
        return center
        
    def get_dataframe(self, data): # including all NaN values and their rows!
        items = data['groups'][0]['items']
        df_raw = []
        for item in items:
            venue = item['venue']
            categories, uid, name, location = venue['categories'], venue['id'], venue['name'], venue['location']
            assert len(categories) == 1
            shortname = categories[0]['shortName']
            if 'address' in location:
                address = location['address']
            else:
                address = 'NaN'
            if 'postalCode' in location:
                postalcode = location['postalCode']
            else:
                postalcode = 'NaN'            
            lat = location['lat']
            lng = location['lng']
            datarow = (uid, name, shortname, address, postalcode, lat, lng)
            df_raw.append(datarow)
        df = pd.DataFrame(df_raw, columns=['uid', 'name', 'shortname', 'address', 'postalcode', 'lat', 'lng'])
        print('Found %i cafes' % len(df))
        return df
    
    
    def get_map(self, df, center):
        folium_map = folium.Map(location=[center["lat"], center["lng"]], zoom_start=14)
        print(folium_map, 'INITIALIZED')
        def add_markers(df):
            for (j, row) in df.iterrows():
                label = folium.Popup(row["name"], parse_html=True)
                folium.CircleMarker(
                    [row["lat"], row["lng"]],
                    radius=5,
                    popup=label,
                    color='blue',
                    fill=True,
                    fill_color='#3186cc',
                    fill_opacity=0.7,
                    parse_html=False).add_to(folium_map)

        add_markers(df)
        hm_data = df[["lat", "lng"]].as_matrix().tolist()
        folium_map.add_child(plugins.HeatMap(hm_data))
        return folium_map
        
        
        

### Foursquare client information

In [115]:
CLIENT_ID = 'AMJZEBWPL41RAG3DM0DOLWY0YRCQMU0ZU4AVPQGXOTG433Y5'
CLIENT_SECRET = 'IJWJ4ZMZK5HXF10FJHBAWPOGVBL21EGCMT3MV3JFTDNF3MQA'
version = '20200711'
radius = 15000
limit = 50
address = 'Langepas, Russia'


### Data
We can get the required information by creating an object of our city and executing built-in methods. Objects should be provided with Foursquare client information, name of location (address) and with radius of search and limit of items. Also, it is necessary to specify kind of query when we use method *get_data*. In the following cells we search for cafes and coffee shops in Langepas.

In [121]:
Langepas = GeoAnalysis(address, CLIENT_ID, CLIENT_SECRET, version, radius, limit)
data = Langepas.get_data('cafe')
center = Langepas.query_info(data)
cafe = Langepas.get_dataframe(data)
cafe

Query consists of:  dict_keys(['geocode', 'headerLocation', 'headerFullLocation', 'headerLocationGranularity', 'totalResults', 'suggestedBounds', 'groups'])
Number of venues:  4
Пасхалка, yoпта
Coordinates of the center:  {'lat': 61.25439, 'lng': 75.2124}
found 4 cafes


Unnamed: 0,uid,name,shortname,address,postalcode,lat,lng
0,4f2ceec7e4b040eafeb39112,Cherry,Eastern European,ул. Мира 32,628672.0,61.253353,75.191354
1,500784b2c84c614d5a6f1f73,Олимпия,Hotel,Солнечная,,61.255913,75.180334
2,50fbad05e4b0396365f45535,Ж/Д станция Лангепасовский,Train Station,,628672.0,61.275138,75.220189
3,505ab903e4b0279819cd48db,ЖД Вокзал,Platform,,,61.27547,75.219677


In [122]:
data1 = Langepas.get_data('coffee')
Langepas.query_info(data1)
coffee = Langepas.get_dataframe(data1)
coffee

Query consists of:  dict_keys(['geocode', 'headerLocation', 'headerFullLocation', 'headerLocationGranularity', 'query', 'totalResults', 'suggestedBounds', 'groups'])
Number of venues:  6
Пасхалка, yoпта
Coordinates of the center:  {'lat': 61.25439, 'lng': 75.2124}
found 6 cafes


Unnamed: 0,uid,name,shortname,address,postalcode,lat,lng
0,51251615e4b088917cc635cc,Анапа,Café,,,61.259221,75.190859
1,4fdd5dbfe4b094b0d1901f4d,Кофейня В Универмаге Лагнепаса,Café,,,61.24792,75.183689
2,53aeb314498eaefb4f1c435a,"Кафе ""Олимп""",Café,,,61.255863,75.180224
3,5131e74ae4b008f2628d269b,"Ресторан ""Юбилей""",Café,,,61.246075,75.178075
4,5b49e073e96d0c0039627ab0,"Тендер, Кофейня",Coffee Shop,,628671.0,61.251934,75.17398
5,53aed31a498e345e4a238d34,1001 ночь,Café,,,61.252789,75.167664


### Preprocessing
The main goal of this project is the analysis of venues in Langepas. As the numbers of found cafes and coffee shops is small, these dataframes will be merged. Also, there are lots of NaN values in the 'address' and 'postalcode' columns, so there is a point to drop these columns.

In [137]:
df = pd.concat([cafe, coffee])
df.drop(['address','postalcode'], axis=1, inplace=True)
df = df.reset_index(drop=True)
df


Unnamed: 0,uid,name,shortname,lat,lng
0,4f2ceec7e4b040eafeb39112,Cherry,Eastern European,61.253353,75.191354
1,500784b2c84c614d5a6f1f73,Олимпия,Hotel,61.255913,75.180334
2,50fbad05e4b0396365f45535,Ж/Д станция Лангепасовский,Train Station,61.275138,75.220189
3,505ab903e4b0279819cd48db,ЖД Вокзал,Platform,61.27547,75.219677
4,51251615e4b088917cc635cc,Анапа,Café,61.259221,75.190859
5,4fdd5dbfe4b094b0d1901f4d,Кофейня В Универмаге Лагнепаса,Café,61.24792,75.183689
6,53aeb314498eaefb4f1c435a,"Кафе ""Олимп""",Café,61.255863,75.180224
7,5131e74ae4b008f2628d269b,"Ресторан ""Юбилей""",Café,61.246075,75.178075
8,5b49e073e96d0c0039627ab0,"Тендер, Кофейня",Coffee Shop,61.251934,75.17398
9,53aed31a498e345e4a238d34,1001 ночь,Café,61.252789,75.167664


### Visualisation
Folium library is used for the purposes of the visualization of the data. It is enough to use method *get_map* with appropriate variables to get an image of the city with venue labels on it.

In [133]:
foli = Langepas.get_map(df,center)
foli

<folium.folium.Map object at 0x000001E3DC5C0F60> INITIALIZED
<folium.folium.Map object at 0x000001E3DC5C0F60>




### Results
We can easily see that the density of cafes in Langepas is low and uniform in the sense of distribution. Also, part of cafes are situated separately from others as they are close to the city bus station.

