# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

##### The main problem of business when opening is the search for a successful place for business. We can search for a place for a restaurant or for a grocery store. But our task is to find the best place for an animal store. This subject is underdeveloped in Moscow, although there are a lot of pets in the city. Residents are increasingly switching to professional food and professional toys. It is on this basis that our assumption is based that people increasingly need professional animal stores.

##### Since Moscow is a large city with a population of more than 14 million people, the main infrastructure of the district associated with transport is the metro. More than 4 million people travel by subway per day.

##### To do this, we are looking for a place that has high transport accessibility, with no competitors. An additional factor for decision-making should be data on the density of the population of the region.

## Data <a name="data"></a>

We will use open data sources for analysis.
The first is Wikipedia with data on areas of the city and population.
The second is geojson, broken down by district, for building a map of the city.
The third is the data with the coordinates of the metro stations.
Fourth is the foursquere data on stores for animals near metro stations.

#### STEP 1. Prepare data for analysis

##### We process and load data from sources.

In [2]:
import numpy as np
import pandas as pd
import folium
import requests

##### We load the data on the coordinates of Moscow districts

In [3]:
url = 'http://gis-lab.info/data/mos-adm/mo.geojson'
r = requests.get(url)
data_moscow = r.json()

##### We load data from wikipedia

In [4]:
page = 'https://ru.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%BE%D0%B2_%D0%B8_%D0%BF%D0%BE%D1%81%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B9_%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D1%8B'    
wikitables = pd.read_html(page, index_col=0, attrs={"class":'standard sortable'}, header=0)
data_district = wikitables[0]

##### We need to remove the extra columns and special characters from the table.

In [5]:
data_district = data_district.drop(['Флаг','Герб','Адми-нистра-тивныйокруг','Пло-щадьжилого фонда(01.01.2010)[9],тыс. м²','Жил-площадьначело-века(01.01.2010),м²/чел.', 'Название района[2]/поселения[3][4]'], axis=1)
data_district.columns = ['area', 'square', 'population', 'density']
data_district['area'] = data_district['area'].apply(lambda x: x.split(', ')).apply(lambda x: x[0])
data_district['population'] = data_district['population'].apply(lambda x: x.replace("↗", "")).apply(lambda x: x.replace("\xa0", "")).astype(int)
data_district = data_district.set_index('area').reset_index()

##### As a result, we still have a table with 4 columns.

In [6]:
data_district.head()

Unnamed: 0,area,square,population,density
0,Академический,583,109231,18736.02
1,Алексеевский,529,80391,15196.79
2,Алтуфьевский,325,57408,17664.0
3,Арбат,211,35529,16838.39
4,Аэропорт,458,79294,17313.1


##### Check our results and visualize them; The map shows areas of Moscow depending on population density.

In [7]:
moscow_map = folium.Map(location=[55.7537, 37.6198], zoom_start=10)

folium.Choropleth(
    geo_data=data_moscow, 
    data=data_district,
    columns=['area', 'density'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.5
).add_to(moscow_map)

moscow_map

##### Getting started with api foursquare

In [7]:
CLIENT_ID = '...' # your Foursquare ID
CLIENT_SECRET = '...' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

##### We import data on the coordinates of metro stations

In [9]:
path = 'list_of_moscow_metro_stations.csv'
metro_stat = pd.read_csv(path)
metro_stat.head()

Unnamed: 0,Line,LineColor,Name,Latitude,Longitude,Order
0,Калининская,FFCD1C,Новокосино,55.745113,37.864052,0
1,Калининская,FFCD1C,Новогиреево,55.752237,37.814587,1
2,Калининская,FFCD1C,Перово,55.75098,37.78422,2
3,Калининская,FFCD1C,Шоссе энтузиастов,55.75809,37.751703,3
4,Калининская,FFCD1C,Авиамоторная,55.751933,37.717444,4


In [10]:
# using coordinates for the previous view
map_metro = folium.Map(location=[55.7537, 37.6198], zoom_start=10)

In [11]:
folium.Choropleth(
    geo_data=data_moscow, 
    data=data_district,
    columns=['area', 'density'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    fill_opacity=0.8,
    line_opacity=0.4
).add_to(map_metro)

for lat, lng, label in zip(metro_stat['Latitude'], metro_stat['Longitude'], metro_stat['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_metro)  
    
map_metro

#### STEP 2. We are looking for shops for animals

##### Create a function that collects all the places around the designated point.

In [12]:
LIMIT = 1000
radius = 1000

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
moscow_neighborhoods = metro_stat
moscow_venues = getNearbyVenues(names=moscow_neighborhoods['Name'],
                                   latitudes=moscow_neighborhoods['Latitude'],
                                   longitudes=moscow_neighborhoods['Longitude']
                                  )

Новокосино
Новогиреево
Перово
Шоссе энтузиастов
Авиамоторная
Площадь Ильича
Марксистская
Третьяковская
Деловой центр
Парк Победы
Речной вокзал
Водный стадион
Войковская
Сокол
Аэропорт
Динамо
Белорусская
Маяковская
Тверская
Театральная
Новокузнецкая
Павелецкая
Автозаводская
Технопарк
Коломенская
Каширская
Кантемировская
Царицыно
Орехово
Домодедовская
Красногвардейская
Алма-Атинская
Медведково
Бабушкинская
Свиблово
Ботанический сад
ВДНХ
Алексеевская
Рижская
Проспект Мира
Сухаревская
Тургеневская
Китай-город
Третьяковская
Октябрьская
Шаболовская
Ленинский проспект
Академическая
Профсоюзная
Новые Черемушки
Калужская
Беляево
Коньково
Теплый Стан
Ясенево
Новоясеневская
Бульвар Рокоссовского
Черкизовская
Преображенская площадь
Сокольники
Красносельская
Комсомольская
Красные ворота
Чистые пруды
Лубянка
Охотный ряд
Библиотека им.Ленина
Кропоткинская
Парк культуры
Фрунзенская
Спортивная
Воробьевы горы
Университет
Проспект Вернадского
Юго-Западная
Тропарево
Румянцево
Саларьево
Щелковская
Первомай

##### Choose Venue Category = Pet Store

In [14]:
moscow_venues_pet_shops = moscow_venues[moscow_venues['Venue Category']=='Pet Store']

for lat, lng, label in zip(moscow_venues_pet_shops['Neighborhood Latitude'], moscow_venues_pet_shops['Neighborhood Longitude'], moscow_venues_pet_shops['Neighbourhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        color='white',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_metro) 
    
map_metro

## Analysis <a name="analysis"></a>

##### Unfortunately, the complexity of the analysis is obscured by the fact that many metro stations are not included in any district, but are separated from along the border. Thus, a sensible and mathematical choice of a metro station cannot lead us to the correct result. We choices are based on visual observations.

In [15]:
data_district_result = data_district[(data_district['area']== 'Зябликово')|(data_district['area']== 'Новогиреево')|(data_district['area']== 'Новокосино')|(data_district['area']== 'Бибирево')|(data_district['area']== 'Савёловский')|(data_district['area']== 'Коньково')|(data_district['area']== 'Преображенское')]
data_district_result

Unnamed: 0,area,square,population,density
9,Бибирево,645,160053,24814.42
32,Зябликово,438,133096,30387.21
36,Коньково,718,156211,21756.41
67,Новогиреево,445,98382,22108.31
68,Новокосино,360,107646,29901.67
79,Преображенское,561,90017,16045.81
86,Савёловский,270,59184,21920.0


## Results and Discussion <a name="results"></a>

##### Our analysis shows only the main point to start a real business analysis when choosing a site for a pet store. We chose mathematically more suitable places where there is no competition, good transport accessibility and the need for business. In addition, it is necessary to take into account rental rates, the welfare of the area and the convenience of the location of retail space (however, we chose places where the consumer would have no choice but our store, so the lack of choice is compensated by the inconvenience of the store location)

In [16]:
map_result = folium.Map(location=[55.7537, 37.6198], zoom_start=10)

folium.Choropleth(
    geo_data=data_moscow, 
    data=data_district_result,
    columns=['area', 'density'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.5
).add_to(map_result)

map_result

## Conclusion <a name="conclusion"></a>

##### Customers received a choice of 6 districts of the city, where the need for an animal store within walking distance from metro stations is most acutely felt. The survey includes population density, infrastructure, and competitors.