## Importing The Necessary Libraries

In [1]:
import pandas as pd
import folium
import numpy as np
import matplotlib.pyplot as plt
import requests

#### Reading the data provided from the governement's website

In [2]:
cars_df = pd.read_csv("daily_cars.csv" , encoding="ISO-8859-1",sep=";")

In [3]:
cars_df.columns = ["date","sensor_name","lon","lat","num_cars"]

#### Setting the column names (They are originally in Turkish)

In [4]:
cars_df.head()

Unnamed: 0,date,sensor_name,lon,lat,num_cars
0,1.01.2020,ciragan Cad.,29016617,41044845,86521
1,1.01.2020,Kco cekmekoy Kavsagi,2919354,41051371,9451
2,1.01.2020,Gunesli 2 Basin Ekspres yolu,28811125,41024099,53991
3,1.01.2020,Buyukdere 1.Levent,29015483,41073533,102531
4,1.01.2020,Cevizlibag,28914282,41018157,129090


Dataset:
This dataset contains the number of cars a particular sensor has detected in a day <p>
Date column contains days <p>
Sensor name contains the name of the place that a particular sensor is located at <p>
Lon stands for longitude <p>
Lat stands for lattitude <p>
Num_cars contains the number of cars

In [5]:
cars_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47398 entries, 0 to 47397
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   date         47398 non-null  object
 1   sensor_name  47398 non-null  object
 2   lon          47101 non-null  object
 3   lat          47101 non-null  object
 4   num_cars     47398 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 1.8+ MB


1- From that output , we see that there are missing locations and without that info , those rows are useless. We could fill those rows by searching for the sensor's name on the google and looking for coordinates but there's no need for that <p>

2- lat and lon columns are objects because in turkey , we use ',' as the floating point indicator. We need to replace ',' s with '.' s to convert that column to floats

In [6]:
cars_df.dropna(inplace=True)

In [7]:
cars_df.lat = cars_df.lat.apply(lambda x: str(x).replace(",","."))

In [8]:
cars_df.lon = cars_df.lon.apply(lambda x: str(x).replace(",","."))

In [9]:
cars_df.lat = cars_df.lat.astype("float64")
cars_df.lon = cars_df.lon.astype("float64")

In [10]:
cars_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47101 entries, 0 to 47397
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         47101 non-null  object 
 1   sensor_name  47101 non-null  object 
 2   lon          47101 non-null  float64
 3   lat          47101 non-null  float64
 4   num_cars     47101 non-null  int64  
dtypes: float64(2), int64(1), object(2)
memory usage: 2.2+ MB


Now , our 'lat' and 'lon' columns are in the form we want them to be. We need to convert the date column to get the day out of it and after that , we can get the average number of cars passing per day.

In [11]:
cars_df["date"] = pd.to_datetime(cars_df["date"])

In [12]:
cars_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47101 entries, 0 to 47397
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   date         47101 non-null  datetime64[ns]
 1   sensor_name  47101 non-null  object        
 2   lon          47101 non-null  float64       
 3   lat          47101 non-null  float64       
 4   num_cars     47101 non-null  int64         
dtypes: datetime64[ns](1), float64(2), int64(1), object(1)
memory usage: 2.2+ MB


Now we can extract the day and the month 

In [13]:
cars_df["day"] = cars_df.date.apply(lambda x: x.day)

In [14]:
cars_df["month"] = cars_df.date.apply(lambda x: x.month)

We don't need the date column anymore so let's drop it to save some memory space

In [15]:
cars_df.drop("date" , axis=1 , inplace=True)

In [16]:
cars_df.head()

Unnamed: 0,sensor_name,lon,lat,num_cars,day,month
0,ciragan Cad.,29.016617,41.044845,86521,1,1
1,Kco cekmekoy Kavsagi,29.19354,41.051371,9451,1,1
2,Gunesli 2 Basin Ekspres yolu,28.811125,41.024099,53991,1,1
3,Buyukdere 1.Levent,29.015483,41.073533,102531,1,1
4,Cevizlibag,28.914282,41.018157,129090,1,1


#### Now , we can group our dataset by sensor names to get the average number of cars , It's ok to get the average lat and lon because every row contains the same value for them.

In [17]:
gby = cars_df.groupby("sensor_name").mean()[["num_cars","lon","lat"]]

In [18]:
gby.head()

Unnamed: 0_level_0,num_cars,lon,lat
sensor_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alemdag,11515.674797,29.2282,41.0449
15 Temmuz sehitler Koprusu Anadolu,124149.208054,29.0439,41.035517
15 Temmuz sehitler Koprusu Yildiz Katilimi,104345.873333,29.018432,41.057702
Akom onu,88500.40411,28.961073,41.090599
Alemdag Kavsagi,28806.20155,29.27,41.0289


Let's visualize our data on the map and see if we have done any mistakes

In [19]:
gmap = folium.Map(location = (gby.lat[0],gby.lat[1]))

In [20]:
map_osm = folium.Map(location=[gby["lat"][0],gby["lon"][0]], zoom_start=10)
gby.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["num_cars"] ,
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

It's almost perfect. We need to get rid of the sensor on the sea because we can't get any information about them from Foursquare but its not necessary now, we will do it later.

Now , we're going to divide our number of cars data to clusters so we can visualize them easily ( I coldn't find any way to visualize contunious values with folium)

In [21]:
gby["num_cars"].max()

166731.7232142857

In [22]:
gby["density"] = 0

In [23]:
gby["density"].loc[(gby["num_cars"] > 100000)] = 3
gby["density"].loc[(gby["num_cars"] > 50000) & (gby["num_cars"] <= 100000)] = 2
gby["density"].loc[(gby["num_cars"] < 50000)] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [24]:
gby.head()

Unnamed: 0_level_0,num_cars,lon,lat,density
sensor_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alemdag,11515.674797,29.2282,41.0449,1
15 Temmuz sehitler Koprusu Anadolu,124149.208054,29.0439,41.035517,3
15 Temmuz sehitler Koprusu Yildiz Katilimi,104345.873333,29.018432,41.057702,3
Akom onu,88500.40411,28.961073,41.090599,2
Alemdag Kavsagi,28806.20155,29.27,41.0289,1


Perfect , we now have a density column that describes amount traffic. <p>
Now , we need to create a dictionary that contains a color for every density value we have

In [25]:
colors = {1:"Green",2:"Orange",3:"Red"}

In [26]:
map_osm = folium.Map(location=[gby["lat"][0],gby["lon"][0]], zoom_start=10)
gby.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row.name , color = colors[row["density"]] ,  fill_color = colors[row["density"]],
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

Looks great , now we can get some data from foursquare to see the relationship between traffic and venues

In [27]:
client_id = "EAK1OOZAKEURNDYJR0PLZW0E3UK23J5NUCSZP4Y03XZ14LK4"
client_secret = "DCESO2AV0L0CMNDACYKCV3E3MRNKVUPOYTJUMFANW1MG5PAW"
version = "20200710"

In [28]:
top100 = pd.DataFrame()
for row in gby.iterrows():
    cx = []
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(client_id, client_secret, row[1].lat , row[1].lon , version , 500,100)
    responses = requests.get(url).json()["response"]["venues"][0:]
    for response in responses:
        try:
            cx.append(response["categories"][0]["name"])
        except:
            cx.append(np.nan)
    while len(cx) < 100:
        cx.append(np.nan)
    top100[row[0]] = [i for i in cx]


In [29]:
top100.head()

Unnamed: 0,Alemdag,15 Temmuz sehitler Koprusu Anadolu,15 Temmuz sehitler Koprusu Yildiz Katilimi,Akom onu,Alemdag Kavsagi,Alibeykoy TEM Katilimi,Altunizade,Altunizade umraniye Katilimi,Anadolu Feneri,Anadolu Feneri ust Gecit,...,umraniye Kavsagi,umraniye Kucuksu,umraniye Otopazari,uskudar Sahil Yolu,Ýhsaniye Kavsagi,Ýkitelli Basin Ekspres yolu,Ýshakli,Ýstac Kati Atik Tesisi,Ýstanbul Havalimani cikis,Ýstoc
0,Bridge,Bridge,Bridge,Bridge,Summer Camp,Trail,Bridge,Scenic Lookout,Restaurant,Beach,...,Cemetery,Elementary School,Diner,Waterfront,Lake,Plaza,Farm,Buffet,Airport,Industrial Estate
1,Factory,Seafood Restaurant,,Bridge,Toll Plaza,Housing Development,Shoe Store,Bridge,Cruise,Cruise,...,Soccer Stadium,High School,Furniture / Home Store,Bridge,Airport,Building,Tunnel,Moving Target,Zoo,Shopping Mall
2,Toll Plaza,Bus Stop,Coworking Space,Nightclub,Park,Garden,Office,Karaoke Bar,Lighthouse,Transportation Service,...,Shopping Mall,College Academic Building,Plaza,Mosque,Coworking Space,Other Great Outdoors,Wedding Hall,Island,Airport Service,Factory
3,Residential Building (Apartment / Condo),Park,Rest Area,Shopping Mall,Housing Development,Historic Site,Mini Golf,Island,Bridge,Bridge,...,Office,Auto Dealership,Other Great Outdoors,Waterfront,Forest,Office,Water Park,Golf Course,Airport Service,Motorcycle Shop
4,Farm,Arcade,Cruise,Bar,Kofte Place,Park,University,Pool,Beach,Paintball Field,...,Residential Building (Apartment / Condo),Bank,Scenic Lookout,Waterfront,Housing Development,Building,Coworking Space,Coffee Shop,Airport Terminal,Tech Startup


Perfect. Now we can get the most frequent value of each column by mode() function

In [30]:
most_freq = top100.mode()

In [31]:
most_freq.head()

Unnamed: 0,Alemdag,15 Temmuz sehitler Koprusu Anadolu,15 Temmuz sehitler Koprusu Yildiz Katilimi,Akom onu,Alemdag Kavsagi,Alibeykoy TEM Katilimi,Altunizade,Altunizade umraniye Katilimi,Anadolu Feneri,Anadolu Feneri ust Gecit,...,umraniye Kavsagi,umraniye Kucuksu,umraniye Otopazari,uskudar Sahil Yolu,Ýhsaniye Kavsagi,Ýkitelli Basin Ekspres yolu,Ýshakli,Ýstac Kati Atik Tesisi,Ýstanbul Havalimani cikis,Ýstoc
0,Factory,Coworking Space,Office,Residential Building (Apartment / Condo),College Residence Hall,Residential Building (Apartment / Condo),Office,Office,Farm,Farm,...,Office,Office,Residential Building (Apartment / Condo),Office,Factory,Factory,Farm,Café,Airport Service,Factory
1,,,,,,,,,,,...,,,,,,,,Coworking Space,,
2,,,,,,,,,,,...,,,,,,,,Residential Building (Apartment / Condo),,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [32]:
most_freq = most_freq.T

In [33]:
most_freq = pd.DataFrame(most_freq[0])

In [34]:
most_freq.columns = ["most_freq"]

In [35]:
most_freq

Unnamed: 0,most_freq
Alemdag,Factory
15 Temmuz sehitler Koprusu Anadolu,Coworking Space
15 Temmuz sehitler Koprusu Yildiz Katilimi,Office
Akom onu,Residential Building (Apartment / Condo)
Alemdag Kavsagi,College Residence Hall
...,...
Ýkitelli Basin Ekspres yolu,Factory
Ýshakli,Farm
Ýstac Kati Atik Tesisi,Café
Ýstanbul Havalimani cikis,Airport Service


Perfect. Now we can merge our two datasets to one

In [36]:
final_df = pd.merge(gby , most_freq , left_on = gby.index , right_on = most_freq.index)

In [37]:
final_df.head()

Unnamed: 0,key_0,num_cars,lon,lat,density,most_freq
0,Alemdag,11515.674797,29.2282,41.0449,1,Factory
1,15 Temmuz sehitler Koprusu Anadolu,124149.208054,29.0439,41.035517,3,Coworking Space
2,15 Temmuz sehitler Koprusu Yildiz Katilimi,104345.873333,29.018432,41.057702,3,Office
3,Akom onu,88500.40411,28.961073,41.090599,2,Residential Building (Apartment / Condo)
4,Alemdag Kavsagi,28806.20155,29.27,41.0289,1,College Residence Hall


In [38]:
final_df.columns = ["sensor_name","num_cars","lon","lat","density","most_freq"]

Now , lets create a map that popups the most common type of venue around when user clicks

In [39]:
map_osm = folium.Map(location=[gby["lat"][0],gby["lon"][0]], zoom_start=10)
final_df.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["most_freq"] , color = colors[row["density"]] ,  fill_color = colors[row["density"]],
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

Perfect. Now , lets cluster our sensors into central places

In [40]:
lat_lons = final_df[["lat","lon"]]

In [41]:
lat_lons

Unnamed: 0,lat,lon
0,41.044900,29.228200
1,41.035517,29.043900
2,41.057702,29.018432
3,41.090599,28.961073
4,41.028900,29.270000
...,...,...
300,41.056482,28.810554
301,41.130788,29.286221
302,41.203025,28.856191
303,41.246301,28.737081


In [42]:
from sklearn.cluster import KMeans

In [43]:
model = KMeans()

8 clusters would be ok.

In [44]:
model.fit(lat_lons)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=8, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=None, tol=0.0001, verbose=0)

In [45]:
model.predict(lat_lons)

array([5, 0, 7, 7, 6, 7, 0, 0, 5, 5, 7, 4, 7, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 1, 7, 1, 7, 3, 4, 2, 3, 7, 7, 7, 7, 7, 7,
       0, 0, 0, 3, 1, 0, 0, 1, 7, 1, 0, 6, 6, 7, 7, 6, 6, 0, 0, 1, 1, 0,
       0, 7, 1, 7, 6, 6, 1, 1, 3, 7, 1, 3, 6, 0, 0, 7, 0, 1, 1, 0, 3, 3,
       7, 5, 4, 7, 7, 6, 0, 1, 3, 7, 7, 2, 2, 0, 2, 2, 2, 4, 1, 1, 1, 7,
       3, 3, 7, 7, 1, 1, 1, 1, 1, 5, 5, 0, 7, 6, 6, 6, 6, 6, 6, 5, 0, 0,
       5, 0, 7, 7, 7, 2, 0, 2, 4, 0, 2, 0, 0, 0, 7, 4, 1, 7, 0, 1, 7, 7,
       7, 7, 1, 7, 3, 0, 0, 0, 5, 7, 1, 1, 1, 1, 2, 2, 7, 7, 0, 7, 1, 1,
       6, 6, 2, 7, 4, 5, 5, 5, 5, 5, 0, 4, 6, 1, 7, 7, 7, 7, 7, 7, 7, 5,
       6, 1, 7, 0, 0, 6, 0, 3, 1, 3, 3, 3, 3, 0, 3, 0, 1, 1, 7, 7, 1, 7,
       7, 7, 7, 7, 6, 6, 0, 0, 0, 6, 7, 1, 1, 1, 1, 1, 6, 6, 6, 6, 3, 6,
       6, 6, 6, 1, 1, 3, 0, 0, 0, 1, 3, 3, 7, 0, 7, 1, 6, 7, 7, 4, 4, 4,
       7, 1, 2, 4, 6, 4, 7, 1, 1, 0, 1, 1, 7, 1, 7, 7, 7, 0, 0, 0, 0, 5,
       2, 7, 2, 2, 5, 0, 0, 0, 0, 0, 0, 0, 7, 2, 1,

In [46]:
lat_lons["cluster"] = model.predict(lat_lons)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [47]:
lat_lons

Unnamed: 0,lat,lon,cluster
0,41.044900,29.228200,5
1,41.035517,29.043900,0
2,41.057702,29.018432,7
3,41.090599,28.961073,7
4,41.028900,29.270000,6
...,...,...,...
300,41.056482,28.810554,1
301,41.130788,29.286221,5
302,41.203025,28.856191,2
303,41.246301,28.737081,2


Great. Now we can add cluster column to final_df

In [48]:
final_df = pd.concat([final_df , lat_lons["cluster"]],axis=1)

In [49]:
final_df.head()

Unnamed: 0,sensor_name,num_cars,lon,lat,density,most_freq,cluster
0,Alemdag,11515.674797,29.2282,41.0449,1,Factory,5
1,15 Temmuz sehitler Koprusu Anadolu,124149.208054,29.0439,41.035517,3,Coworking Space,0
2,15 Temmuz sehitler Koprusu Yildiz Katilimi,104345.873333,29.018432,41.057702,3,Office,7
3,Akom onu,88500.40411,28.961073,41.090599,2,Residential Building (Apartment / Condo),7
4,Alemdag Kavsagi,28806.20155,29.27,41.0289,1,College Residence Hall,6


Now let's create a dictionary for cluster colors just like we did before. We are going to pick a color for every unique cluster we have

In [50]:
final_df.cluster.unique()

array([5, 0, 7, 6, 4, 1, 3, 2])

In [51]:
cluster_colors = {0:"Blue" , 1: "Green" , 2: "Yellow" , 3: "Red" , 4: "Purple" , 5: "Orange" , 6: "Pink" , 7: "White" }

Let's visualize ,

In [53]:
map_osm = folium.Map(location=[final_df["lat"][0],final_df["lon"][0]], zoom_start=10)
final_df.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["sensor_name"] , color = cluster_colors[row["cluster"]] ,  fill_color = colors[row["density"]],
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

Seems working well

In [54]:
final_df.head()

Unnamed: 0,sensor_name,num_cars,lon,lat,density,most_freq,cluster
0,Alemdag,11515.674797,29.2282,41.0449,1,Factory,5
1,15 Temmuz sehitler Koprusu Anadolu,124149.208054,29.0439,41.035517,3,Coworking Space,0
2,15 Temmuz sehitler Koprusu Yildiz Katilimi,104345.873333,29.018432,41.057702,3,Office,7
3,Akom onu,88500.40411,28.961073,41.090599,2,Residential Building (Apartment / Condo),7
4,Alemdag Kavsagi,28806.20155,29.27,41.0289,1,College Residence Hall,6


Let's get rid of the points that are on the sea

In [58]:
on_sea = [final_df[(final_df["sensor_name"] == "TEM Kartal") |( final_df["sensor_name"] == "Kasimpasa Tunel")]]

In [59]:
on_sea[0]

Unnamed: 0,sensor_name,num_cars,lon,lat,density,most_freq,cluster
129,Kasimpasa Tunel,0.0,29.4578,41.4587,1,,5
224,TEM Kartal,98028.890845,29.152771,40.777727,2,Boat or Ferry,6


In [60]:
final_df = final_df.drop(on_sea[0].index , axis=0)

In [62]:
map_osm = folium.Map(location=[final_df["lat"][0],final_df["lon"][0]], zoom_start=10)
final_df.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["sensor_name"] , color = cluster_colors[row["cluster"]] ,  fill_color = colors[row["density"]],
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

Looks like it worked.

Now , we can cluster our points again to get the place clusters without points on the sea

In [63]:
model.fit(final_df[["lat","lon"]])

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=8, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=None, tol=0.0001, verbose=0)

In [64]:
model.predict(final_df[["lat","lon"]])

array([7, 0, 6, 6, 7, 6, 0, 0, 7, 7, 6, 5, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2,
       2, 0, 2, 2, 2, 2, 2, 2, 2, 6, 2, 6, 4, 5, 1, 4, 6, 6, 6, 6, 6, 6,
       0, 0, 0, 4, 2, 0, 0, 2, 6, 2, 0, 3, 3, 6, 6, 3, 3, 0, 0, 2, 2, 0,
       0, 6, 2, 6, 3, 3, 2, 2, 4, 6, 2, 4, 3, 0, 0, 6, 0, 2, 2, 0, 4, 4,
       6, 7, 5, 6, 6, 3, 0, 2, 4, 6, 6, 1, 1, 0, 1, 1, 1, 5, 2, 2, 2, 6,
       4, 4, 6, 6, 2, 2, 2, 2, 2, 7, 7, 0, 6, 3, 3, 3, 3, 3, 3, 0, 0, 7,
       7, 6, 6, 6, 1, 0, 1, 5, 0, 1, 0, 0, 0, 6, 5, 2, 6, 0, 2, 6, 6, 6,
       6, 2, 6, 4, 0, 0, 0, 7, 6, 2, 2, 2, 2, 1, 1, 6, 6, 0, 6, 2, 2, 3,
       3, 1, 6, 5, 7, 7, 7, 7, 7, 0, 5, 3, 2, 6, 6, 6, 6, 6, 6, 6, 7, 3,
       2, 6, 0, 0, 3, 0, 4, 2, 4, 4, 4, 4, 0, 4, 0, 2, 2, 6, 6, 2, 6, 6,
       6, 6, 6, 3, 0, 0, 0, 3, 6, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 3, 3, 3,
       3, 2, 2, 4, 0, 0, 0, 2, 4, 4, 6, 7, 6, 2, 3, 6, 6, 5, 5, 5, 6, 2,
       1, 5, 3, 5, 6, 2, 2, 0, 2, 2, 6, 2, 6, 6, 6, 0, 0, 0, 0, 7, 1, 6,
       1, 1, 7, 0, 0, 0, 0, 0, 0, 0, 6, 1, 2, 7, 1,

In [65]:
final_df["cluster"] = model.predict(final_df[["lat","lon"]])

In [66]:
map_osm = folium.Map(location=[final_df["lat"][0],final_df["lon"][0]], zoom_start=10)
final_df.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["sensor_name"] , color = cluster_colors[row["cluster"]] ,  fill_color = colors[row["density"]],
                                              radius=10)
                                             .add_to(map_osm), axis=1)
map_osm

Perfect. Now we can group our data by place clusters and get the most common venue types

In [67]:
gby_2 = final_df.groupby("cluster")[["density","lat","lon"]].mean()

In [68]:
gby_2["cluster"] = gby_2.index

In [69]:
gby_2.head()

Unnamed: 0_level_0,density,lat,lon,cluster
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,2.389831,41.016325,29.096421,0
1,1.0,41.203215,28.843207,1
2,2.15942,41.021847,28.826788,2
3,1.806452,40.923332,29.250214,3
4,1.52381,41.04455,28.594061,4


In [70]:
gby_2.index.name="index"

In [71]:
gby_2.head()

Unnamed: 0_level_0,density,lat,lon,cluster
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,2.389831,41.016325,29.096421,0
1,1.0,41.203215,28.843207,1
2,2.15942,41.021847,28.826788,2
3,1.806452,40.923332,29.250214,3
4,1.52381,41.04455,28.594061,4


Let's get the most frequent venue type for each cluster

In [72]:
from scipy import stats
final_df.groupby("cluster").agg(lambda x: stats.mode(x))[["most_freq"]]

Unnamed: 0_level_0,most_freq
cluster,Unnamed: 1_level_1
0,"([Office], [26])"
1,"([Factory], [8])"
2,"([Office], [15])"
3,"([Factory], [11])"
4,"([Housing Development], [5])"
5,"([Beach], [4])"
6,"([Office], [27])"
7,"([Farm], [10])"


Now we can concat

In [73]:
gby_2 = pd.concat([gby_2 , final_df.groupby("cluster").agg(lambda x: stats.mode(x))[["most_freq"]]],axis=1)

In [74]:
gby_2

Unnamed: 0,density,lat,lon,cluster,most_freq
0,2.389831,41.016325,29.096421,0,"([Office], [26])"
1,1.0,41.203215,28.843207,1,"([Factory], [8])"
2,2.15942,41.021847,28.826788,2,"([Office], [15])"
3,1.806452,40.923332,29.250214,3,"([Factory], [11])"
4,1.52381,41.04455,28.594061,4,"([Housing Development], [5])"
5,1.0,41.21467,29.046572,5,"([Beach], [4])"
6,2.014085,41.061036,28.977095,6,"([Office], [27])"
7,1.0,41.119136,29.242382,7,"([Farm], [10])"


In [75]:
map_osm = folium.Map(location=[gby_2["lat"][0],gby_2["lon"][0]], zoom_start=10)
gby_2.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lon"]], popup = row["most_freq"][0][0] , color = cluster_colors[row["cluster"]] ,  fill_color = colors[int(row["density"])+1],
                                              radius=80)
                                             .add_to(map_osm), axis=1)
map_osm

Perfect, from that map we can clearly see that there is a relation between traffic and venues. Particularly , traffic is more dense in places where there are many offices. And less in places with farms