# Capstone Project - The Battle of the Neighborhoods (Week 1)
### -- Osaka vs Manhattan--

## Purpose
This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate  program –Coursera Capstone.  In this project, it is aimed to compare the neighborhoods of Osaka and Manhattan and determine how similar or dissimilar they are.


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Acquisition](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

 **Osaka** is one of the most famous cities in Japan.  My friend is moving from **Manhattan**, NY to **Osaka** for a career change. 
 
 The topic assumed here is to help her to find a place living in Osaka where the environment is similar to her previous home in Manhattan. In this project, I will cluster the venues of all neighborhoods in both cities and make a comparision to understand the similarity and dissimilarity of them. I will also collect and provide a data driven recommendation about where to eat or visit in Osaka.

## Data Acquisition <a name="data"></a>

#### Osaka neighborhoods names
* Osaka districts names will be retrieved from [Wikipedia](https://en.wikipedia.org/wiki/Osaka) 

#### Osaka , Manhattan and  their neighborhoods location
* Data coordinates of Osaka and Manhattan's neighborhood will be retrieved using google API.

#### Osaka top Venue recommendations 
(Foursquare Category:  https://developer.foursquare.com/docs/resources/categories)
* Osaka and Mahattan's neighborhoods are explored using Foursquare API . The following information are retrieved. 
  
  - Venue ID
  - Venue name
  - Coordinates: Latitude and Longitude
  - Category names 
  - Venue ratings (Due to the Foursquare access limitation,  only 2 types of ratings were retrieved in this project)


## Methodology <a name="methodology"></a>

1. The website information will be retrieved using **Beutifulsoup tool**.

2. Using **Pandas** for proper cleaning to create a dataframe.

3. The locations are marked upon the map to obtain the co-ordinates of the places via the **Geocoding API** from Google.

4. **K-means clustering** algorithm will be use to analyze the similarity or dissimilarity between two cities. 

let's download all the dependencies that we will need.

In [1]:
!conda install -c conda-forge folium=0.5.0 --yes # comment/uncomment if not yet installed.
!conda install -c conda-forge geopy --yes        # comment/uncomment if not yet installed

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

# Numpy and Pandas libraries were already imported at the beginning of this notebook.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

import requests # library to handle requests
import lxml.html as lh
import bs4 as bs
import urllib.request

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.6.11

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /opt/anaconda3

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cryptography-2.6.1         |   py36h1905b30_0         603 KB  conda-forge
    python-3.6.7               |    ha0a29de_1004        13.0 MB  conda-forge
    krb5-1.16.3                |    hd2bbab6_1001         1.7 MB  conda-forge
    altair-2.2.2               |             py_0         278 KB  conda-forge
    tk-8.6.9                   |    h84994c4_1001         3.7 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    libgcc-ng-8.2.0            |       h822a55f_1      

### Exploring Osaka
Osaka has 24 neighborhoods. In order to segement these neighborhoods and explore them, we will essentially need a dataset that contains them. I scraped the following Wikipedia page inorder to get the information. 

In [3]:
from bs4 import BeautifulSoup
source = requests.get('https://en.wikipedia.org/wiki/Osaka#Neighborhoods').text
soup = BeautifulSoup(source, 'html5lib')
table = soup.find_all('table')[4] 
df = pd.read_html(str(table))
#df[0]

In [8]:
# to clean up the list.
col_rename = {0:'index',1:'Neighborhood', 2:'Neighborhood (Kanji)'}
df_ward = df[0].drop([0,1,2,3]).rename(columns = col_rename).set_index('index')
Osaka_ku = df_ward.replace({'Kita-ku (administrative center)':'Kita-ku'})
Osaka_ku.head()

Unnamed: 0_level_0,Neighborhood,Neighborhood (Kanji)
index,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Abeno-ku,阿倍野区
2,Asahi-ku,旭区
3,Chūō-ku,中央区
4,Fukushima-ku,福島区
5,Higashinari-ku,東成区


In [9]:
#retrieve the coordinates
Osaka_ku['Latitude'] = float(0)
Osaka_ku['Longitude'] = float(0)

geolocator = Nominatim(user_agent="nj_explorer")
for index,Place_name in Osaka_ku['Neighborhood'].iteritems():
    location = geolocator.geocode(Place_name)
    lat = location.latitude
    lon = location.longitude
    if Osaka_ku.loc[index,'Latitude'] == 0:
        Osaka_ku.loc[index,'Latitude']= lat
    if Osaka_ku.loc[index,'Longitude'] == 0:
        Osaka_ku.loc[index,'Longitude']= lon

Osaka_ku.head()

Unnamed: 0_level_0,Neighborhood,Neighborhood (Kanji),Latitude,Longitude
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Abeno-ku,阿倍野区,34.627501,135.514095
2,Asahi-ku,旭区,35.476018,139.53192
3,Chūō-ku,中央区,35.666255,139.775565
4,Fukushima-ku,福島区,34.692104,135.474812
5,Higashinari-ku,東成区,34.672912,135.550567
