In [1]:
import pandas as pd
import numpy as np

### Prepare the location-dataset
First, we need to prepare the dataset, as most of the data (or columns) we won't need. To decide whether an entry (or location) is part of our interest, we shall only consider the geographic data and "amenity" of the location. Later on, we will also filter the locations out which have attributes that do not really have a logical cohesion with our bike-rental data.

In [2]:
# we create two "versions" of our datasets, one which only includes entries with a name and an amenity. And one 
# where entries might have no name, but have to be assigned an amenity. Let's compare the two:
LA = pd.read_csv("csv/POIs.csv", dtype = str)
POI = LA[["name", "lon", "lat", "amenity"]]
POI1 = POI.dropna(0).drop_duplicates().copy()
POI2 = POI[POI["amenity"].notna()].drop_duplicates().copy()

  POI1 = POI.dropna(0).drop_duplicates().copy()


In [3]:
# We divide through 4, as we have 4 columns/attributes per entry
print("PO1-Size:", POI1.size/4)
print("POI2-Size: ", POI2.size/4)

PO1-Size: 34701.0
POI2-Size:  77125.0


What's interesting, is that POI2 has a much bigger row-count. Let's check why that is.

In [13]:
poi1amenity = POI1["amenity"].to_numpy()
poi2amenity = POI2["amenity"].to_numpy()
print("POI1-Amenity:", poi1amenity,
      "\nPOI2-Amenity:", poi2amenity)

POI1-Amenity: ['studio' 'fast_food' 'fast_food' ... 'ice_cream' 'bank' 'fast_food'] 
POI2-Amenity: ['toilets' 'parking' 'studio' ... 'university' 'social_facility'
 'place_of_worship']


As we can see, POI2 includes entries f.e. with amenities like "toilets" or "parking". These additional entries can give us an insight into how many of our bike renters also use cars, for example. It would not be very smart to just don't consider these values, so I will continue to work with POI2, instead.

As I am not going to remove any entries from my table, I still want to know what sort of locations I have stored in my DF, so I will check my "amenity"-column exactly for an insight into interesting data. Later on, we can filter out any data we don't consider useful for our analysis.

In [15]:
#Here I extract the "amenity" column, convert it into a numPy array and then sort out the redundant data with np.unique
amenity = np.unique(POI2["amenity"].to_numpy())
print(amenity)

['Addiction Treatment' 'Aviation Laboratory' 'Casting_Center'
 'Comfort Shoe Store' 'Commercial Printer' 'Elementary School'
 'Festival_Grounds' 'Lockers' 'Open Green Area' 'Skincare'
 'Solar Panel Installation' 'Website Designer' 'Window Treatments'
 'acting_school' 'amphitheater' 'amphitheatre' 'animal_boarding'
 'animal_breeding' 'animal_shelter' 'animal_training' 'apartment' 'arcade'
 'archive' 'art_gallery' 'art_school' 'art_work' 'arts_centre' 'ash_tray'
 'assisted_living;skilled_nursing_facility' 'atm' 'audiologist'
 'auditorium' 'auto_towing' 'baby_hatch' 'bandstand' 'bank'
 'bank;fire_station;fast_food' 'bar' 'batting_cage' 'bbq' 'bear_box'
 'bench' 'bicycle_parking' 'bicycle_rental' 'bicycle_repair_station'
 'biergarten' 'bikelane' 'binoculars' 'bleachers' 'boat_rental'
 'boat_sharing' 'boat_storage' 'book_dropoff' 'book_return'
 'bowling alley' 'brothel' 'bureau_de_change' 'bus_station'
 'business_center' 'cafe' 'canteen' 'car_pooling' 'car_rental'
 'car_sharing' 'car_wash' 

#### Extended Cleansing
As some of the amenity-entries are not 1:1 the same but very similar, we are going to "join" some og the attributes, as listed below. Also, I am creating a "relevants" Series for later use.

In [27]:
##For now, I will store especially interesting data in a List, so later on we can use it for looking at the most relevant data isolated.
relevants = ['bicycle_parking', 'bicycle_rental', 'bicycle_repair_station', 'bikelane', 'car_rental', 'car_sharing', 'car_pooling', 'motorcycle_parking', 'motorcycle_rental', 'student_accommodation', ]
##'ampitheater' = 'ampitheatre'
#'hookah lounge' = 'hooklah lounge'
#'juice bar' = 'juice_bar'
#'locker' = 'lockers'
#'mail room' = 'mailroom'
#'strip club' = 'stripclub''
POI2.loc[POI2['amenity']=='ampitheater', 'amenity'] = 'ampitheatre'
POI2.loc[POI2['amenity']== "juice bar", 'amenity'] = 'juice_bar'
POI2.loc[POI2['amenity']== "hooklah lounge", 'amenity'] = 'hookah lounge'
POI2.loc[POI2['amenity']== "locker", 'amenity'] = 'lockers'
POI2.loc[POI2['amenity']== "mail room", 'amenity'] = 'mailroom'
POI2.loc[POI2['amenity']== "juice bar", 'amenity'] = 'juice_bar'
POI2.loc[POI2['amenity']== "strip club", 'amenity'] = 'stripclub'

amenity = np.unique(POI2["amenity"].to_numpy())
print(amenity)

['Addiction Treatment' 'Aviation Laboratory' 'Casting_Center'
 'Comfort Shoe Store' 'Commercial Printer' 'Elementary School'
 'Festival_Grounds' 'Lockers' 'Open Green Area' 'Skincare'
 'Solar Panel Installation' 'Website Designer' 'Window Treatments'
 'acting_school' 'amphitheater' 'amphitheatre' 'animal_boarding'
 'animal_breeding' 'animal_shelter' 'animal_training' 'apartment' 'arcade'
 'archive' 'art_gallery' 'art_school' 'art_work' 'arts_centre' 'ash_tray'
 'assisted_living;skilled_nursing_facility' 'atm' 'audiologist'
 'auditorium' 'auto_towing' 'baby_hatch' 'bandstand' 'bank'
 'bank;fire_station;fast_food' 'bar' 'batting_cage' 'bbq' 'bear_box'
 'bench' 'bicycle_parking' 'bicycle_rental' 'bicycle_repair_station'
 'biergarten' 'bikelane' 'binoculars' 'bleachers' 'boat_rental'
 'boat_sharing' 'boat_storage' 'book_dropoff' 'book_return'
 'bowling alley' 'brothel' 'bureau_de_change' 'bus_station'
 'business_center' 'cafe' 'canteen' 'car_pooling' 'car_rental'
 'car_sharing' 'car_wash' 