<h1>New restaurant in Charleston, SC</h1>
<p><h2>Table of contents</h2>
<ol>
<li><a href=#desc>Introduction</a></li>
<li><a href=#data>Data section</a></li>
    <ol><li><a href=#listrent><i>List of potential addresses avaliable for rent</i></a></li>
        <li><a href=#listvenues><i>List of nearby venues</i></a></li></ol>
    <li><a href=#metod>Methodology section</a></li>
    <ol><li><a href=#lexploratory><i>Exploratory data analysis</i></a></li>
        <li><a href=#ML><i>Machine Learning K-means</a></i></li></ol>
<li>Results section</li>
<li>Discussion section</li>
<li>Conclusion section</li>
</ol>    
<h3><a id="desc"></a>Introduction. Problem and the background</h3>
<img src = "CharlestonSymbol.jpg" align="right", alt="Charleston city symbol", width="160">
<p>Charleston is the oldest and largest city in the U.S. state of South Carolina. It was founded in 1670 as Charles Town, honoring King Charles II of England. Known for its rich history, well-preserved architecture, distinguished restaurants, and hospitable people, Charleston is a popular tourist destination. It has received numerous accolades, including <a href = "https://www.travelandleisure.com/slideshows/americas-friendliest-cities#charleston">"America's Most Friendly City"</a> in 2016 by Travel & Leisure. The city is known for its unique culture, which blends traditional Southern U.S., English, French, and West African elements. The downtown peninsula has gained a reputation for its art, music, local cuisine, and fashion. What's also interesting, in 2013, the Milken Institute ranked the Charleston region as the ninth-best performing economy in the US because of its growing IT sector. 
<p>Not surprisingly, to find and to rent a place for a restaurant is not an easy task. An investor ho wants to open the restaurant is my friend so I decided helping him in making the decison, by using Foursquare venues data and some Data Science magic. <b>Using the list of avaliable properties I have clustered them into different categories, based on the separate list of sorrounding venues</b>. In order to do that I had to do at least the following: 
<ol>
<li>Get the geo-coordinates for given adresses</li>
<li>Get the list of venues with additional information</li>
<li>Visualise and explore the data</li>
<li>Preproces the datasets in order to use ML algorithm</li>
<li>Run unsupervised ML algorithm (K-means) to find clusters</li>
<li>Visualize results and make conclusions</li>
</ol> 

In [3]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import json # library to handle JSON files
import requests # library to handle requests
import folium # map rendering library
import matplotlib.pyplot as plt # Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
print("Libraries imported.")

Libraries imported.


<a id=data></a><h3>Data section. The data and how it was used</h3>
<p>I have worked on two datasets: 
<ol>
<li><b>List of potential addresses avaliable for rent.</b> Since avaliable automatic methods does not work well or you have to pay for that kind of services I decided to get and enter geo-coordinates manually, using <a href = "www.latlong.net">www.latlong.net</a>. Using MS Windows notepad prepare a dataset in .CSV file format and download it into the pandas dataframe. The <b>16 given locations</b> ware spreaded across Charleston area. The dataset contains 7 features:</li>
<ol>
<li>address,</li>
<li>city (street),</li>
<li>state,</li>
<li>country,</li>
<li>latitude,</li>
<li>longtitude,</li>
<li>source (of data),</li>
</ol>     
<li><b>List of nearby venues.</b> The dataset was created by downoading a list of nearby venues <b>within the 500 m radius</b>, through <a href="www.foursquare.com">Foursquare.com</a> API. It containes:</li>
<ol>
<li>venue name,</li>
<li>venue latitude,</li>
<li>venue longitude,</li>
<li>venue category.</li>
</ol>
</ol>
Both datasets ware visualy explored using Folium Python library or Matplotlib charts and preprocesed in order to aplly K-means ML algorithm.

<h3><i><a id=rentlist></a>List of potential addresses avaliable for rent</i></h3>

In [9]:
df_addresses = pd.read_csv("CharlestonAddresses.csv")
df_addresses

Unnamed: 0,Address,City,State,Country,Latitude,Longtitude,Source
0,10 Murry Blvd,Charleston,South Carolina,USA,32.76994,-79.93324,https://www.latlong.net/
1,8 Queen St,Charleston,South Carolina,USA,32.77869,-79.92785,https://www.latlong.net/
2,12 Huger St,Charleston,South Carolina,USA,32.79939,-79.94998,https://www.latlong.net/
3,67 Line St,Charleston,South Carolina,USA,32.794708,-79.943268,https://www.latlong.net/
4,5 Columbus St,Charleston,South Carolina,USA,32.79433,-79.94071,https://www.latlong.net/
5,8 Mount Pleasant St,Charleston,South Carolina,USA,32.8124,-79.9549,https://www.latlong.net/
6,9 Davis St,Charleston,South Carolina,USA,32.8401,-79.95685,https://www.latlong.net/
7,2 Carr St,Charleston,South Carolina,USA,32.78522,-79.87342,https://www.latlong.net/
8,18 Ocean Boulevard,Charleston,South Carolina,USA,32.7808,-79.79824,https://www.latlong.net/
9,4 Middle Street,Charleston,South Carolina,USA,32.77874,-79.86881,https://www.latlong.net/


<h3><i><a id="listvenues"></a>List of nearby venues</i></h3>
<p>Passing credentials to Foursqare API.

In [89]:
CLIENT_ID = "***" # my Foursquare ID
CLIENT_SECRET = "***" # my Foursquare Secret
VERSION = '20181122' # Foursquare API version
LIMIT = 100 #Just in case 

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: ***
CLIENT_SECRET:***


Preparing the API query function. 

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for adress, lat, lng in zip(df_addresses.Address, df_addresses.Latitude, df_addresses.Longtitude):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(adress, lat, lng, 
            v['venue']['name'],               
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    df_nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    df_nearby_venues.columns = ['Address', 
                  'Latitude', 
                  'Longitude', 
                  'Venue Name',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(df_nearby_venues)

Executing the function and geting the data into the pandas dataframe.

In [12]:
df_charleston_venues = getNearbyVenues(names=df_addresses.Address, latitudes=df_addresses.Latitude, 
                                        longitudes=df_addresses.Longtitude) 
print("Et voila! The nearby venues dataset.")
df_charleston_venues.head()

Et voila! The nearby venues dataset.


Unnamed: 0,Address,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,10 Murry Blvd,32.76994,-79.93324,The Battery,32.770012,-79.92946,Scenic Lookout
1,10 Murry Blvd,32.76994,-79.93324,White Point Gardens,32.769963,-79.930176,Park
2,10 Murry Blvd,32.76994,-79.93324,The Gazebo At The Battery,32.769864,-79.93022,Historic Site
3,10 Murry Blvd,32.76994,-79.93324,Calhoun Mansion,32.771461,-79.930224,Historic Site
4,10 Murry Blvd,32.76994,-79.93324,Two Meeting Street,32.7705,-79.930099,Bed & Breakfast
