# Find your desired community/neighborhood in a city of Germany
## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

When you live in Germany and want to explore your neighborhoods or you would like to move to a desired neighborhood in a city of Germany. For example, you are a non-Aachener who will work in Aachen next year and want to find your desired neighborhoods based on different venues. You can enter your priority of venues of a specific city and get the area with zip code which macths to your demand.

### target audience
The people who live in germany and want to explore the neighborhood or move to another neighborhood or move to a new city.


## Data <a name="data"></a>
### Description of the Data:¶

#### The following data is required to answer the issues of the problem:

- List of Zipcode with corresponde geodata (latitud and longitud), population of the area.
- Venues and its location of the city.
- Foursquare API credential.
- keyword venues from Foursquare API. You can find a list [here](https://developer.foursquare.com/docs/build-with-foursquare/categories/).

### How the data will be used to solve the problem

#### The data will be used as follows:

- Use Geojson data of Germany to locate the centroid of a zip code area.
- Use Foursquare and geopy data get desired venues
- Addresses from locations will be converted to geodata( lat, long) using Geopy-distance and Nominatim.
- Cluster the venues based on their location.
- Create a list with keyword venues. 
- Use Folium to plot the data.
- Use Folium to plot the data.

In [7]:
# Enter Foursquare credential
# foursquare_credential = ['', '']

## Methodology <a name="methodology"></a>

Import libs

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files 
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values 

import requests # library to handle requests 
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe 

# Matplotlib and associated plotting modules 
import matplotlib.cm as cm 
import matplotlib.colors as colors 

# import k-means from clustering stage 
from sklearn.cluster import KMeans 

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab 
import folium # map rendering library 
from folium.plugins import MarkerCluster 

import geopandas as gpd 
import os 
from scipy.spatial import ConvexHull 

### There is one main method **find_desired_community_germany** to find desired community in the selected city with selected keyword venues.


### function to find desired community of germany. 
### `find_desired_community_germany(city, foursquare_credential, keywordlist,limit=100, radius=500)`

In [2]:
def find_desired_community_germany(city, foursquare_credential, keywordlist,limit=100, radius=500):
        '''
        @Param:,
            city, city name;
            foursquare_credential, list like, [CLIENT_ID,CLIENT_SECRET];
            keywordlist, list like, include the keyworde of interested venues, such as Supermarket, Bus Stop;
            limit, int, limit of foursquare searched venues of a point;
            radius, in meters.
        @Return:
            a folium map of desired community. 
        '''
        with open('../data/plz-5stellig-centroid.geojson', encoding='utf8') as json_data:
            germany_data = json.load(json_data)
        zipcode_data = germany_data['features']
    
        # define the dataframe columns
        column_names = ['PLZ', 'City', 'Latitude', 'Longitude', 'qkm', 'population'] 
        # instantiate the dataframe
        zipcodes = pd.DataFrame(columns=column_names)
    
        for data in zipcode_data:
            PLZ = data['properties']['plz'] 
            C= data['properties']['note']
    
            zipcode_latlon = data['geometry']['coordinates']
            zipcode_lat = zipcode_latlon[1]
            zipcode_lon = zipcode_latlon[0]
            qkm = data['properties']['qkm']
            population = data['properties']['einwohner']
    
            zipcodes = zipcodes.append({'PLZ': PLZ,
                                                  'City': C,
                                                  'Latitude': zipcode_lat,
                                                  'Longitude': zipcode_lon,
                                                  'qkm' : qkm,
                                                  'population' : population
                                                 }, ignore_index=True)
            City_zipcodes = zipcodes[zipcodes['City'].str.contains(city)]
            City_zipcodes.reset_index(drop=True, inplace=True)
            City_zipcodes['population_density'] = City_zipcodes.population / City_zipcodes.qkm
            
        
        #map city center
        address = '{}, Germany'.format(city)
        geolocator = Nominatim(user_agent='my-application')
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
    
        #foursquare API
        global CLIENT_ID, CLIENT_SECRET, VERSION
        CLIENT_ID = foursquare_credential[0] # your Foursquare ID
        CLIENT_SECRET = foursquare_credential[1] # your Foursquare Secret
        VERSION = '20180605' # Foursquare API version
    
        #city venues
        City_venues = getNearbyVenues(names=City_zipcodes['PLZ'], latitudes=City_zipcodes['Latitude'], longitudes=City_zipcodes['Longitude'], LIMIT=limit, radius=radius)
    
        #desired_veneus
        my_desired_venues = find_desired_venues(keywordlist, City_venues)
    #     display(my_desired_venues)
        n = my_desired_venues.PLZ.nunique()
        X=my_desired_venues.loc[:,['Venue Latitude','Venue Longitude']]
        kmeans = KMeans(n_clusters=n, random_state=0).fit(X)
        my_desired_venues['Location Cluster'] = kmeans.labels_
    
        #make map
        map_my_desired = folium.Map(location=[latitude, longitude], zoom_start=12)
        loc_kclusters = n
    
        # set color scheme for the clusters
        x = np.arange(loc_kclusters)
        ys = [i + x + (i*x)**2 for i in range(loc_kclusters)]
        loc_colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
        loc_rainbow = [colors.rgb2hex(i) for i in loc_colors_array]
    
        for i in range(loc_kclusters):
            pts = []
            tmp = my_desired_venues[my_desired_venues['Location Cluster'] == i]
            for lat, lng in zip(tmp['Venue Latitude'], tmp['Venue Longitude']):
                pts.append([lat, lng])
            try:
                conhu = ConvexHull(pts)
                polygon = [pts[i] for i in conhu.vertices]
                folium.Polygon(locations=polygon, fill_color='blue', popup=[keywordlist[i] for i in tmp['Category Label'].unique()]).add_to(map_my_desired)
                #Get centoid
                cx = np.mean(conhu.points[conhu.vertices,0])
                cy = np.mean(conhu.points[conhu.vertices,1])
                folium.Circle([cx, cy], fill=True, radius=radius, parse_html=False, popup=[keywordlist[i] for i in tmp['Category Label'].unique()]).add_to(map_my_desired)
    
            except:
                print('Points not enough for a convexhull',i, pts)
                folium.Circle(pts[0], fill= True, radius=radius, parse_html=False, popup=[keywordlist[i] for i in tmp['Category Label'].unique()]).add_to(map_my_desired)

        # add markers to map,
        for lat, lng, venue, my_desired_venue, cluster in zip(my_desired_venues['Venue Latitude'], my_desired_venues['Venue Longitude'], my_desired_venues['Venue'], my_desired_venues['Venue Category'], my_desired_venues['Location Cluster']):
            label = '{},{}'.format(venue, my_desired_venue)
            label = folium.Popup(label, parse_html=True)

            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
        #         color='blue',
                fill=True,
        #         fill_color='#3186cc',
                fill_opacity=0.7,
                color=loc_rainbow[cluster-1],
                parse_html=False).add_to(map_my_desired),
        map_my_desired.save(os.path.join('results', '{}_desired_community.html'.format(city)))
    
    
        return map_my_desired

#### Sub functions for the upper function ```find_desired_community_germany```
- `getNearbyVenues(names, latitudes, longitudes, LIMIT=100, radius=500)` to get the nearby venues.

In [3]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=100, radius=500):
       
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
               
            # create the API request URL,
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID,
                CLIENT_SECRET,
                VERSION,
                lat,
                lng,
                radius,
                LIMIT
            )
               
            # make the GET request,
            results = requests.get(url).json()["response"]['groups'][0]['items']
           
            # return only relevant information for each nearby venue,
            venues_list.append([(
                name,
                lat,
                lng,
                v['venue']['name'],
                v['venue']['location']['lat'],
                v['venue']['location']['lng'], 
                v['venue']['categories'][0]['name']) for v in results]
            )
   
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['PLZ',
                      'PLZ centroid Latitude',
                      'PLZ centroid Longitude',
                      'Venue',
                      'Venue Latitude',
                      'Venue Longitude',
                      'Venue Category']
       
        return(nearby_venues)

- `find_desired_venues(keywordlist, City_venues)` to find the desired venues with keywords in the keywordlist.

In [4]:
def find_desired_venues(keywordlist, City_venues):
        frames = []
        for keyword in keywordlist:
            try:
                vars()['df_' + str(keyword)] =  City_venues[City_venues['Venue Category'].str.contains(keyword)]
                df = eval('df_' + str(keyword))
                df['Category Label'] = keywordlist.index(keyword)
                frames.append(df)
                df = pd.concat(frames)
            except:
                print("No %s founded"%keyword)
        return df

## Result and Discussion <a name="results"></a>


### How to interpret the map.
- colored maker shows the different venues cluster based on the location. 
- Polygon shows a desired community of a convex hull of a cluster.
- Circle shows a desired community of a centroid of a convex hull based on customer given radius in meter.
- Marker's popup is the venue name and its category.
- Polygon and Circle's popup is the contained keyword venue of this community.

### There are three case studies with different cities and different venues. 

### Case Study I
As I am study at RWTH Aachen of Germany, I would like to find my desired community in Aachen. As a Student, my selected keywords would be 'Supermarket', 'Restaurant', 'Café', 'Bus' etc.
- [Desired Community in Aachen](https://nbviewer.jupyter.org/github/RuikunLi/Capstone_Coursera/blob/master/project/results/Aachen_desired_community.html)

In [30]:
keywordlist = ['Supermarket', 'Restaurant', 'Café', 'Bus']
aachen_map = find_desired_community_germany('Aachen', foursquare_credential , keywordlist, radius=1000)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


Points not enough for a convexhull 6 [[50.79666367568123, 6.155273604022005]]


In [31]:
aachen_map

### Case Study II
My girlfriend and I lived in Köln an we love this city. So we would like to know if we move back to Köln in the future, where is our desired community. Our selected keywords would be 'Movie', 'Supermarket', 'Restaurant', 'Pharmacy', etc.
- [Desired Community in Köln](https://nbviewer.jupyter.org/github/RuikunLi/Capstone_Coursera/blob/master/project/results/K%C3%B6ln_desired_community.html)

In [32]:
keywordlist = ['Movie', 'Supermarket', 'Restaurant', 'Pharmacy']
CGN_map = find_desired_community_germany('Köln', foursquare_credential , keywordlist, radius=1000)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


Points not enough for a convexhull 8 [[51.06024, 6.86099], [51.06366677, 6.86100064]]
Points not enough for a convexhull 24 [[50.9715761220114, 6.860773326765303], [50.97386686035939, 6.863760814270557]]
Points not enough for a convexhull 28 [[51.001669, 7.041959]]
Points not enough for a convexhull 37 [[50.90471558, 7.06345782]]
Points not enough for a convexhull 38 [[51.02523645776217, 6.889133155345917]]


In [33]:
CGN_map

### Case Study III
I lived in Duisburg for a while and a friend of mine would like to move to Duisburg. He study very hard therefore he want to find the community which may contain college facilities.The selected keywords would be 'Supermarket', 'College', 'Café', 'Zoo', etc.
- [Desired Community in Duisburg](https://nbviewer.jupyter.org/github/RuikunLi/Capstone_Coursera/blob/master/project/results/Duisburg_desired_community.html)

In [6]:
keywordlist = ['Supermarket', 'College', 'Café', 'Zoo']
duisburg_map = find_desired_community_germany('Duisburg', foursquare_credential , keywordlist, radius=1000)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Points not enough for a convexhull 0 [[51.40284081198879, 6.72380372883606], [51.4103414, 6.7205844]]
Points not enough for a convexhull 2 [[51.462040479088145, 6.732797704183127], [51.45385401897169, 6.731302280384661]]
Points not enough for a convexhull 6 [[51.52083, 6.7427], [51.5299806, 6.7420491]]
Points not enough for a convexhull 8 [[51.453322114192, 6.698827533729514], [51.44884439, 6.68986671]]
Points not enough for a convexhull 9 [[51.49731566, 6.80016608], [51.4967514, 6.8019409]]
Points not enough for a convexhull 11 [[51.35900848, 6.78019303], [51.35168848, 6.78476836]]
Points not enough for a convexhull 13 [[51.39788691046552, 6.664576688781381], [51.39775097316572, 6.663723382144318]]
Points not enough for a convexhull 14 [[51.39177293, 6.79749219]]
Points not enough for a convexhull 15 [[51.47734747, 6.70211353]]
Points not enough for a convexhull 16 [[51.435842514038086, 6.718547344207764]]
Points not enough for a convexhull 17 [[51.50320653512187, 6.765172541668107]]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [9]:
duisburg_map

## Conclusion <a name="conclusion"></a>

Purpose of this project is to find the desired community/neighborhood of a city in Germany. By given selected keywords of venues to explore venues from Foursquare API, then cluster them based on their location, draw convex hull and polygon to show the desired community/neighborhood. 

Future version would add a filter of keywords, for example I want to filter the desired community which fulfill my requirements(all keywords contained). 