# Achievement 6 - Relative Distance Calculation

#### The Purpose of this exercise is to generate a value for listing distance to Amsterdam City Center. The reason for this because the only values we currently have relating to location are subjective user scores and catagorical neighbourhood values. Whilst these neighbourhood categories are great for plotting in a map and allow us to get a visual on the location relative to the rest of amsterdam, they don't allow us to perform any statistical tests on them.

#### As a result, using the help of the Geopy Library and some Youtube tutorials (https://www.youtube.com/watch?v=fItaMyy7874&ab_channel=TheWhiz). This code should enable us to create a new variable to be added onto our main Dataset for us to use during the rest of our research.

### Sections:

- 1. Importing Libraries & Data
- 2. Creating user-defined functions to calculate distance & append results onto main dataframe
- 3. Results review
- 4. Export

### Section 1:

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import os

from geopy.geocoders import Nominatim
import geopy.distance
geolocator = Nominatim(user_agent='DistanceCalculator')
path = r'C:\Users\mojos\Documents\Career Foundry Course\Data Immersion\Section 6\AirBnB Data'
df = pd.read_csv(os.path.join(path,'prepared','listings_sub1000.csv'),index_col = 0)

In [6]:
df.columns

Index(['id', 'name', 'summary', 'host_id', 'host_is_superhost',
       'neighbourhood_cleansed', 'latitude', 'longitude', 'property_type',
       'room_type', 'accommodates', 'bedrooms', 'beds', 'bed_type',
       'amenities', 'price', 'guests_included', 'minimum_nights',
       'maximum_nights', 'availability_365', 'number_of_reviews',
       'review_scores_rating', 'review_scores_location', 'review_scores_value',
       'instant_bookable', 'reviews_per_month', 'rated', 'pricing_tier',
       'group_size', 'groupsize_pricing_combined'],
      dtype='object')

In [7]:
#Performing a test of geo.distance function in isolation referencing the first lat/long position of the DF
coord1 = (52.365755, 4.941419) ## Value represents first row in DF
coord2 = (52.3676, 4.9041) 
geopy.distance.geodesic(coord1, coord2).km

2.5502782210274013

### Section 2

In [8]:
#Main coord is indicative of the central long/lat position of Amsterdam
main_coord = (52.3676, 4.9041)

def get_dist(row):
  coord = (row["latitude"], row["longitude"])
  return geopy.distance.geodesic(main_coord, coord).km


# Apply the function to each row
df["distance_from_center"] = df.apply(get_dist, axis=1)

### Section 3: Review

In [16]:
df.head()

Unnamed: 0,id,name,summary,host_id,host_is_superhost,neighbourhood_cleansed,latitude,longitude,property_type,room_type,...,review_scores_rating,review_scores_location,review_scores_value,instant_bookable,reviews_per_month,rated,pricing_tier,group_size,groupsize_pricing_combined,distance_from_center
0,2818,Quiet Garden View Room & Super Fast WiFi,Quiet Garden View Room & Super Fast WiFi,3159,t,Oostelijk Havengebied - Indische Buurt,52.365755,4.941419,Apartment,Private room,...,97.0,9.0,10.0,t,2.1,True,Low,Small,Small Low,2.550299
1,3209,"Quiet apt near center, great view",You will love our spacious (90 m2) bright apar...,3806,f,Westerpark,52.390225,4.873924,Apartment,Entire home/apt,...,96.0,9.0,9.0,f,1.03,True,Mid-High-End,Large,Large Mid-High-End,3.249737
2,20168,100%Centre-Studio 1 Private Floor/Bathroom,"Cozy studio on your own private floor, 100% in...",59484,f,Centrum-Oost,52.365087,4.893541,Townhouse,Entire home/apt,...,87.0,10.0,9.0,f,2.18,True,Low,Small,Small Low,0.771682
3,25428,Lovely apt in City Centre (Jordaan),,56142,f,Centrum-West,52.373114,4.883668,Apartment,Entire home/apt,...,100.0,10.0,10.0,f,0.09,True,Mid-Low-End,Medium,Medium Mid-Low-End,1.520884
4,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,97647,t,Centrum-West,52.386727,4.892078,Houseboat,Private room,...,99.0,10.0,10.0,t,2.03,True,Mid-High-End,Small,Small Mid-High-End,2.280408


In [18]:
df['distance_from_center'].size # All rows appear to have a value

17341

In [22]:
df.info() # New variable is the correct type, float

<class 'pandas.core.frame.DataFrame'>
Index: 17341 entries, 0 to 20007
Data columns (total 31 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   id                          17341 non-null  int64  
 1   name                        17314 non-null  object 
 2   summary                     16908 non-null  object 
 3   host_id                     17341 non-null  int64  
 4   host_is_superhost           17339 non-null  object 
 5   neighbourhood_cleansed      17341 non-null  object 
 6   latitude                    17341 non-null  float64
 7   longitude                   17341 non-null  float64
 8   property_type               17341 non-null  object 
 9   room_type                   17341 non-null  object 
 10  accommodates                17341 non-null  int64  
 11  bedrooms                    17341 non-null  float64
 12  beds                        17341 non-null  float64
 13  bed_type                    17341 no

In [20]:
df.isnull().sum()
#No missing values in our new variable

id                              0
name                           27
summary                       433
host_id                         0
host_is_superhost               2
neighbourhood_cleansed          0
latitude                        0
longitude                       0
property_type                   0
room_type                       0
accommodates                    0
bedrooms                        0
beds                            0
bed_type                        0
amenities                       0
price                           0
guests_included                 0
minimum_nights                  0
maximum_nights                  0
availability_365                0
number_of_reviews               0
review_scores_rating            0
review_scores_location          0
review_scores_value             0
instant_bookable                0
reviews_per_month               0
rated                           0
pricing_tier                    0
group_size                      0
groupsize_pric

### Section 4: Exporting

In [23]:
df.to_csv(os.path.join(path,'prepared','listings_sub1K.csv'))