## Introduction

Belgium is known all over the world for making unbeatable chocolates. It is paradise for the chocolate lovers. The country has a long and illustrious history of chocolate making. With around 2,000 chocolate companies and shops all over Belgium, the country remains one of the reigning producers and exporters of chocolate in the world. Based on available figures, Belgium exports more than 400,000 tons of chocolate with an annual turnover of over 4 billion euros.  
Behind every top chocolate brand, stands a team of top chocolatiers. They use their knowledge, experience and craftsmanship to create the most finest and sophisticated pralines, using the best products: high quality Belgian chocolate. They don’t shy away from the latest innovation and technological developments in the chocolate sector. And that makes them award-winning in several international competitions like the Patisserie World Cup.

## Business Problem

A successful Belgian  chocolatier is going to expand his business into the United States.  Los Angeles is decided to be the starting point to open a new Belgian coffee shop combined with chokolate shop. Since  Los Angeles is so big and has lots of different coffee shops and chcolate chops developed by famous brands, my client needs deeper insight from available data in oder to decide where to establish his first Belgian coffee shop in the US. Another problem is that LA has very high lease rents for retail property.  
To solve this business problem, we are going to cluster LA neighborhoods in order to recommend venues and the current average rent of lease in order business owner could make a decision to start a coffee shop. For this purpose we will try to find the optimal solution in terms of competitive location, comfortable lease rents, as well as surrounding venues.

## Discussion

Let's discuss the above mentioned problem statements. First of all we know that our client, famous Belgian chocolatier, wants to lease a retaile place for his unnique coffee shop combined with chocolate shop. Also he needs to find out the level of competition - how many coffee shops and restaurants are there in different neighborhoods. If there are more than 2-5 coffee shops and restaurants in a neighborhood then that would be a great risk to open new coffee shops in that neighborhood. Selecting a place where there is less or no coffee shops / restaurants would be of great choice, considering the lease rent of neighborhood too. Places like Downtown, Movie theatre, Parks, Malls & Gas stations would help his business running.

The target audience is broad, it ranges from any company which is going to open new business entity in LA, tourists and those who are passionate about coffe shops  with wide range of Belgian chocolate.

## Data Description

This project will rely on public data from real estate agencies and Foursquare.

For this project we just need to analyse the current lease rent range. So I collect the lease rent data from open sources like https://www.rentcafe.com/average-rent-market-trends/us/ca/san-francisco/ and https://www.zillow.com/research/data/ according to neighborhoods, so that it's easy for us to check the lease rent data. Prepared data I have uploaded on my github repository.

In [1]:
import numpy as np # library for vectorized computation

import pandas as pd # library to process data as dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim     # convert an address into latitude and longitude values

!pip -q install geocoder
import geocoder

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.19.0-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  30.28 MB/s
geopy-1.19.0-p 100% |################################| Time: 0:00:00  50.83 MB/s
Libraries imported.


After importing the necessary libraries, we download the data from my Github repository as follows:

In [2]:
git = 'https://raw.githubusercontent.com/tarastsukarev/Coursera_Capstone/master/rent_data.csv'
LA_rentdata = pd.read_csv(git)
LA_rentdata.head(10)

Unnamed: 0,State,City,Neighborhood,Average Rent (per SqFoot)
0,CA,Los Angeles,Jefferson Park,1.59
1,CA,Los Angeles,El Sereno,1.77
2,CA,Los Angeles,Winnetka,1.83
3,CA,Los Angeles,Glassell Park,1.83
4,CA,Los Angeles,Cypress Park,1.83
5,CA,Los Angeles,Vermont Vista,1.85
6,CA,Los Angeles,Vermont Knolls,1.85
7,CA,Los Angeles,Panorama City,1.89
8,CA,Los Angeles,Leimert Park,1.9
9,CA,Los Angeles,North Hills,1.91


In obtaining the location data of the locations, the Geocoder package is used with the arcgis_geocoder to obtain the latitude and longitude of the needed locations.

These will help to create a new dataframe that will be used subsequently for LA neiborhoods.

In [3]:
# Let's start geocoder

def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Los Angeles, United States'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

Then we proceed to store the location data - latitude and longitude as follows. The obtained coordinates are then joined to LA_rentdata to create new data frame.

In [4]:
coord = LA_rentdata['Neighborhood']    
coordinates = [get_latlng(coord) for coord in coord.tolist()]

# This will store LA dataframe with coordinates
df_LA = LA_rentdata
df_LA_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_LA['Latitude'] = df_LA_coordinates['Latitude']
df_LA['Longitude'] = df_LA_coordinates['Longitude']
df_LA.head(10)

Unnamed: 0,State,City,Neighborhood,Average Rent (per SqFoot),Latitude,Longitude
0,CA,Los Angeles,Jefferson Park,1.59,34.03264,-118.31802
1,CA,Los Angeles,El Sereno,1.77,34.07685,-118.17934
2,CA,Los Angeles,Winnetka,1.83,34.20329,-118.57098
3,CA,Los Angeles,Glassell Park,1.83,34.11896,-118.23099
4,CA,Los Angeles,Cypress Park,1.83,34.09448,-118.22678
5,CA,Los Angeles,Vermont Vista,1.85,34.092407,-118.29101
6,CA,Los Angeles,Vermont Knolls,1.85,34.092407,-118.29101
7,CA,Los Angeles,Panorama City,1.89,34.22301,-118.44875
8,CA,Los Angeles,Leimert Park,1.9,34.00838,-118.33045
9,CA,Los Angeles,North Hills,1.91,34.23563,-118.4847


In [5]:
#Let's now take only Neighorhood, Average Rent (per SqFoot) and coordinates
df_LA = df_LA[['Neighborhood','Average Rent (per SqFoot)', 'Latitude', 'Longitude']]
df_LA.head(10)

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Latitude,Longitude
0,Jefferson Park,1.59,34.03264,-118.31802
1,El Sereno,1.77,34.07685,-118.17934
2,Winnetka,1.83,34.20329,-118.57098
3,Glassell Park,1.83,34.11896,-118.23099
4,Cypress Park,1.83,34.09448,-118.22678
5,Vermont Vista,1.85,34.092407,-118.29101
6,Vermont Knolls,1.85,34.092407,-118.29101
7,Panorama City,1.89,34.22301,-118.44875
8,Leimert Park,1.9,34.00838,-118.33045
9,North Hills,1.91,34.23563,-118.4847


Los Angeles is really large city (has more than 100 neighborhoods) and due to the limitations in the number of calls for the Foursquare API, we're going to analyze only 50 neighborhoods excluding known in adance the most expensive locations like  Santa Monica, North of Montana, Pacific Palisades, etc.

The Foursquare API will be used to obtain the geographical location data for Los Angeles. These will be used to explore the venues in the neighbourhoods of LA. The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the Belgian coffee shop.

The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the Belgian coffee shop.

### How data will be used to solve the problem

The data from the lease rent dataset and location, as well as  Foursquare will be explored by considering the venues within the neighborhoods of LA. These neighborhoods' coffee shops / restaurants would be checked in terms of the types of coffee shops / restaurants within a certain mile radius and the size of lease rent. Due to Foursquare restrictions, the number of venues will be limited to 100 venues. The proximity to Downtown, Movie theatre, Parks, Malls & Gas stations and other amenities would be considered.