# Data Engineering 1: Lab 03
---------------

## **The Dataset**

The dataset to be used in this lab is a CSV file named `airbnb.csv`, which contains data on airbnb listings in the state of New York. It contains the following columns:

- `listing_id`: The unique identifier for a listing
- `description`: The description used on the listing
- `host_id`: Unique identifier for a host
- `host_name`: Name of host
- `neighbourhood_full`: Name of boroughs and neighbourhoods
- `coordinates`: Coordinates of listing _(latitude, longitude)_
- `Listing added`: Date of added listing
- `room_type`: Type of room 
- `rating`: Rating from 0 to 5.
- `price`: Price per night for listing
- `number_of_reviews`: Amount of reviews received 
- `last_review`: Date of last review
- `reviews_per_month`: Number of reviews per month
- `availability_365`: Number of days available per year
- `Number of stays`: Total number of stays thus far


## **Getting started**

In [68]:
import re

# Import libraries
import pandas as pd
import numpy as np
import datetime as dt

In [69]:
# Read in the dataset
airbnb = pd.read_csv('DE1_Lab03_airbnb.csv', index_col = 'Unnamed: 0')

## **Diagnosing data cleaning problems using simple `pandas`** 

Some important and common methods needed to get a better understanding of DataFrames and diagnose potential data problems are the following: 

- `.head()` prints the header of a DataFrame
- `.dtypes` prints datatypes of all columns in a DataFrame
- `.info()` provides a bird's eye view of column data types and missing values in a DataFrame
- `.describe()` returns a distribution of numeric columns in your DataFrame
- `.isna().sum()` allows us to break down the number of missing values per column in our DataFrame
- `.unique()` finds the number of unique values in a DataFrame column

## **Our task list:**

_Data type problems:_

- **Task 1:** Split `coordinates` into 2 columns (`latitude` and `longitude`) and convert them to `float`.
- **Task 2**: Remove `$` from `price` and convert it to `float`
- **Task 3**: Convert `listing_added` and `last_review` to `datetime`

<br>

_Text/categorical data problems:_

- **Task 4**: We need to collapse `room_type` into correct categories
- **Task 5**: Divide `neighbourhood_full` into 2 columns and making sure they are clean

<br>

_Data range problems:_

- **Task 6**: Make sure we set the correct maximum for `rating` column out of range values

<br>

_Dealing with date problems:_

- **Task 7**: Check consitent date format in the date columns.

<br>


_Dealing with duplicate data:_

- **Task 8**: Check for duplicate data.

## **Tasks** 

##### **Task 1:** Split `coordinates` into 2 columns (`latitude` and `longitude`) and convert them to `float`.

To perform this task, we will use the following methods:

- `.str.replace("","")` replaces one string in each row of a column with another
- `.str.split("")` takes in a string and lets you split a column into two based on that string
- `.astype()` lets you convert a column from one type to another

In [70]:
# Remove "(" and ")" from coordinates
airbnb.coordinates = airbnb.coordinates.str.replace("(", "").str.replace(")", "")

# Print the header of the column
airbnb.coordinates.head()

0    40.63222, -73.93398
1    40.78761, -73.96862
2     40.7007, -73.99517
3    40.79169, -73.97498
4    40.71884, -73.98354
Name: coordinates, dtype: object

In [71]:
# Split column into two
two_columns = airbnb.coordinates.str.split(", ", expand = True)
two_columns

Unnamed: 0,0,1
0,40.63222,-73.93398
1,40.78761,-73.96862
2,40.7007,-73.99517
3,40.79169,-73.97498
4,40.71884,-73.98354
...,...,...
10014,40.80379,-73.95257
10015,40.79531,-73.9333
10016,40.68266,-73.96743000000002
10017,40.68832,-73.96366


In [72]:
# Assign correct columns to latitude and longitude columns in airbnb
airbnb['lat'] = two_columns[0]
airbnb['long'] = two_columns[1]

# Print the header and confirm new column creation
airbnb

Unnamed: 0,listing_id,name,host_id,host_name,neighbourhood_full,coordinates,room_type,price,number_of_reviews,last_review,reviews_per_month,availability_365,rating,number_of_stays,5_stars,listing_added,lat,long
0,13740704,"Cozy,budget friendly, cable inc, private entra...",20583125,Michel,"Brooklyn, Flatlands","40.63222, -73.93398",Private room,45$,10,2018-12-12,0.70,85,4.100954,12.0,0.609432,2018-06-08,40.63222,-73.93398
1,22005115,Two floor apartment near Central Park,82746113,Cecilia,"Manhattan, Upper West Side","40.78761, -73.96862",Entire home/apt,135$,1,2019-06-30,1.00,145,3.367600,1.2,0.746135,2018-12-25,40.78761,-73.96862
2,21667615,Beautiful 1BR in Brooklyn Heights,78251,Leslie,"Brooklyn, Brooklyn Heights","40.7007, -73.99517",Entire home/apt,150$,0,,,65,,,,2018-08-15,40.7007,-73.99517
3,6425850,"Spacious, charming studio",32715865,Yelena,"Manhattan, Upper West Side","40.79169, -73.97498",Entire home/apt,86$,5,2017-09-23,0.13,0,4.763203,6.0,0.769947,2017-03-20,40.79169,-73.97498
4,22986519,Bedroom on the lively Lower East Side,154262349,Brooke,"Manhattan, Lower East Side","40.71884, -73.98354",Private room,160$,23,2019-06-12,2.29,102,3.822591,27.6,0.649383,2020-10-23,40.71884,-73.98354
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10014,22307861,Lovely 1BR Harlem apartment,16004068,Rachel,"Manhattan, Harlem","40.80379, -73.95257",Entire home/apt,105$,4,2018-05-28,0.21,0,4.757555,4.8,0.639223,2017-11-22,40.80379,-73.95257
10015,953275,Apartment For Your Holidays in NYC!,4460034,Alain,"Manhattan, East Harlem","40.79531, -73.9333",Entire home/apt,125$,50,2018-05-06,0.66,188,4.344704,60.0,0.648778,2017-10-31,40.79531,-73.9333
10016,3452835,"Artsy, Garden Getaway in Central Brooklyn",666862,Amy,"Brooklyn, Clinton Hill","40.68266, -73.96743000000002",Entire home/apt,100$,45,2016-11-27,0.98,0,3.966214,54.0,0.631713,2016-05-24,40.68266,-73.96743000000002
10017,23540194,"Immaculate townhouse in Clinton Hill, Brooklyn",67176930,Sophie,"Brooklyn, Clinton Hill","40.68832, -73.96366",Entire home/apt,450$,2,2019-05-31,0.17,99,4.078581,2.4,0.703360,2018-11-25,40.68832,-73.96366


In [73]:
# Print out dtypes again
airbnb.lat.dtypes
airbnb.long.dtypes
airbnb.coordinates.dtypes

dtype('O')

In [74]:
# Convert latitude and longitude to float
airbnb.lat = airbnb.lat.astype(float)
airbnb.long = airbnb.long.astype(float)
# Print dtypes again
airbnb.long.dtypes

dtype('float64')

In [75]:
# Drop coordinates column
airbnb.drop(columns = ['coordinates'], inplace = True)
airbnb

Unnamed: 0,listing_id,name,host_id,host_name,neighbourhood_full,room_type,price,number_of_reviews,last_review,reviews_per_month,availability_365,rating,number_of_stays,5_stars,listing_added,lat,long
0,13740704,"Cozy,budget friendly, cable inc, private entra...",20583125,Michel,"Brooklyn, Flatlands",Private room,45$,10,2018-12-12,0.70,85,4.100954,12.0,0.609432,2018-06-08,40.63222,-73.93398
1,22005115,Two floor apartment near Central Park,82746113,Cecilia,"Manhattan, Upper West Side",Entire home/apt,135$,1,2019-06-30,1.00,145,3.367600,1.2,0.746135,2018-12-25,40.78761,-73.96862
2,21667615,Beautiful 1BR in Brooklyn Heights,78251,Leslie,"Brooklyn, Brooklyn Heights",Entire home/apt,150$,0,,,65,,,,2018-08-15,40.70070,-73.99517
3,6425850,"Spacious, charming studio",32715865,Yelena,"Manhattan, Upper West Side",Entire home/apt,86$,5,2017-09-23,0.13,0,4.763203,6.0,0.769947,2017-03-20,40.79169,-73.97498
4,22986519,Bedroom on the lively Lower East Side,154262349,Brooke,"Manhattan, Lower East Side",Private room,160$,23,2019-06-12,2.29,102,3.822591,27.6,0.649383,2020-10-23,40.71884,-73.98354
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10014,22307861,Lovely 1BR Harlem apartment,16004068,Rachel,"Manhattan, Harlem",Entire home/apt,105$,4,2018-05-28,0.21,0,4.757555,4.8,0.639223,2017-11-22,40.80379,-73.95257
10015,953275,Apartment For Your Holidays in NYC!,4460034,Alain,"Manhattan, East Harlem",Entire home/apt,125$,50,2018-05-06,0.66,188,4.344704,60.0,0.648778,2017-10-31,40.79531,-73.93330
10016,3452835,"Artsy, Garden Getaway in Central Brooklyn",666862,Amy,"Brooklyn, Clinton Hill",Entire home/apt,100$,45,2016-11-27,0.98,0,3.966214,54.0,0.631713,2016-05-24,40.68266,-73.96743
10017,23540194,"Immaculate townhouse in Clinton Hill, Brooklyn",67176930,Sophie,"Brooklyn, Clinton Hill",Entire home/apt,450$,2,2019-05-31,0.17,99,4.078581,2.4,0.703360,2018-11-25,40.68832,-73.96366


##### **Task 2:** Remove `$` from `price` and convert it to `float`

To perform this task, we will be using the following methods:

- `.str.strip()` which removes a specified string from each row in a column
- `.astype()`

In [76]:
# Remove $ from price before conversion to float
airbnb.price = airbnb.price.str.replace("$", "")
# Print header to make sure change was done
airbnb.price

0         45
1        135
2        150
3         86
4        160
        ... 
10014    105
10015    125
10016    100
10017    450
10018     90
Name: price, Length: 10019, dtype: object

In [77]:
# Convert price to float
airbnb.price = airbnb.price.astype(float)
# Calculate mean of price after conversion
airbnb.price.mean()

150.90512217564665

##### **Task 3:** Convert `listing_added` and `last_review` columns to `datetime`

To perform this task, we will use the following functions:

- `pd.to_datetime(format = "")`
  - `format` takes in the desired date format `"%Y-%m-%d"`

In [167]:
# Print header of two columns
airbnb[['listing_added', 'last_review']]

Unnamed: 0,listing_added,last_review
0,2018-06-08,2018-12-12
1,2018-12-25,2019-06-30
2,2018-08-15,
4,2020-10-23,2019-06-12
5,2018-12-15,2019-06-20
...,...,...
8532,2018-08-08,2019-02-11
8600,2018-12-05,2019-06-10
8646,2018-10-15,2019-04-20
8869,2018-04-11,


In [169]:
# Convert both columns to datetime
airbnb.listing_added = pd.to_datetime(airbnb.listing_added, format = '%Y-%M-%d')
airbnb.last_review = pd.to_datetime(airbnb.last_review, format='%Y-%M-%d')
airbnb[['listing_added', 'last_review']]

Unnamed: 0,listing_added,last_review
0,2018-01-08 00:06:00,2018-01-12 00:12:00
1,2018-01-25 00:12:00,2019-01-30 00:06:00
2,2018-01-15 00:08:00,NaT
4,2020-01-23 00:10:00,2019-01-12 00:06:00
5,2018-01-15 00:12:00,2019-01-20 00:06:00
...,...,...
8532,2018-01-08 00:08:00,2019-01-11 00:02:00
8600,2018-01-05 00:12:00,2019-01-10 00:06:00
8646,2018-01-15 00:10:00,2019-01-20 00:04:00
8869,2018-01-11 00:04:00,NaT


In [None]:
# Print header and datatypes of both columns again
airbnb[['listing_added', 'last_review']].head()
airbnb[['listing_added', 'last_review']].dtypes

### Text and categorical data problems

##### **Task 4:** We need to collapse `room_type` into correct categories

To perform this task, we will be using the following methods:

- `.str.lower()` to lowercase all rows in a string column
- `.str.strip()` to remove all white spaces of each row in a string column
- `.replace()` to replace values in a column with another

In [110]:
# Print unique values of `room_type`
airbnb.room_type.unique()

array(['Private room', 'Entire home/apt', 'Private', 'Shared room',
       'PRIVATE ROOM', 'home'], dtype=object)

In [126]:
# Deal with capitalized values
airbnb.room_type = airbnb.room_type.str.lower()
airbnb.room_type.unique()

array(['private room', 'entire home/apt', 'private', 'shared room',
       'home'], dtype=object)

In [127]:
# Deal with trailing spaces
airbnb.room_type = airbnb.room_type.str.strip()
airbnb.room_type.unique()

array(['private room', 'entire home/apt', 'private', 'shared room',
       'home'], dtype=object)

In [150]:
# Replace values to 'Shared room', 'Entire place', 'Private room' and 'Hotel room' (if applicable).
airbnb.room_type = airbnb.room_type.str.replace(r"^private$", "private room")
airbnb.room_type = airbnb.room_type.str.replace(r"^entire home/apt$", "entire place")
airbnb.room_type = airbnb.room_type.str.replace(r"^home$", "entire place")
airbnb.loc[airbnb['room_type'].str.contains("private"), "room_type"] = "private room"
airbnb.loc[airbnb['room_type'].str.contains("home"), "room_type"] = "entire place"
airbnb.loc[airbnb['room_type'].str.contains("shared"), "room_type"] = "shared room"
airbnb.room_type.unique()

array(['private room', 'entire place', 'shared room'], dtype=object)

##### **Task 5:** Divide `neighbourhood_full` into 2 columns and making sure they are clean

In [151]:
# Print header of column
airbnb.neighbourhood_full.head()

0           Brooklyn, Flatlands
1    Manhattan, Upper West Side
2    Brooklyn, Brooklyn Heights
3    Manhattan, Upper West Side
4    Manhattan, Lower East Side
Name: neighbourhood_full, dtype: object

In [152]:
# Split neighbourhood_full
two_columns = airbnb.neighbourhood_full.str.split(", ", expand=True)

In [153]:
# Create borough and neighbourhood columns
airbnb['borough'] = two_columns[0]
airbnb['neighbourhood'] = two_columns[1]
# Print header of columns
print(airbnb.borough.head())
airbnb.neighbourhood.head()

0     Brooklyn
1    Manhattan
2     Brooklyn
3    Manhattan
4    Manhattan
Name: borough, dtype: object


0           Flatlands
1     Upper West Side
2    Brooklyn Heights
3     Upper West Side
4     Lower East Side
Name: neighbourhood, dtype: object

In [154]:
# Drop neighbourhood_full column
airbnb.drop_duplicates(subset=['neighbourhood'], inplace=True)

In [157]:
# Print out unique values of borough and neighbourhood
airbnb.borough.unique()
airbnb.neighbourhood.unique()

array(['Flatlands', 'Upper West Side', 'Brooklyn Heights',
       'Lower East Side', 'Greenwich Village', 'Harlem', 'Sheepshead Bay',
       'Theater District', 'Bushwick', 'Laurelton', 'Mott Haven',
       'Flushing', 'Crown Heights', 'Midtown', 'Financial District',
       'East Village', 'Park Slope', 'Washington Heights', 'Williamsburg',
       'Chelsea', 'Bedford-Stuyvesant', 'Gowanus', 'Upper East Side',
       'Ditmars Steinway', 'Cypress Hills', "Hell's Kitchen", 'Ridgewood',
       'Marble Hill', 'Kips Bay', 'Prospect Heights', 'East New York',
       'Concord', 'Stapleton', 'Astoria', 'East Harlem', 'Sunnyside',
       'Gramercy', 'Prospect-Lefferts Gardens', 'Sunset Park',
       'Forest Hills', 'Windsor Terrace', 'Clinton Hill', 'Murray Hill',
       'Flatiron District', 'Greenpoint', 'East Flatbush', 'Tribeca',
       'Woodhaven', 'Fort Greene', 'Inwood', 'Chinatown',
       'Rockaway Beach', 'Woodside', 'Bayside', 'Bensonhurst', 'SoHo',
       'Red Hook', 'West Village', 

In [160]:
# Strip white space from neighbourhood column
airbnb.borough = airbnb.borough.str.strip()

# Print unique values again
airbnb.borough.unique()

array(['Brooklyn', 'Manhattan', 'Queens', 'Bronx', 'Staten Island'],
      dtype=object)

##### **Task 6:** Make sure we set the correct maximum for `rating` column out of range values

In [164]:
# Isolate rows of rating > 5.0
airbnb.rating[airbnb.rating > 5.0]

Series([], Name: rating, dtype: float64)

In [89]:
# Drop these rows and make sure we have effected changes

In [165]:
# Get the maximum
airbnb.rating.max()

4.984868192025561

##### **Task 7:** Check consitent date format in the date columns.

In [91]:
# Doing some sanity checks on date data

In [173]:
# Are there reviews in the future?
import datetime
airbnb.last_review[airbnb.last_review > datetime.datetime.now()]

Series([], Name: last_review, dtype: datetime64[ns])

In [175]:
# Are there listings in the future?
airbnb.listing_added[airbnb.listing_added > datetime.datetime.now()]

Series([], Name: listing_added, dtype: datetime64[ns])

In [94]:
# Drop these rows since they are only 4 rows
# Nope

In [176]:
# Are there any listings with listing_added > last_review
airbnb.listing_added[airbnb.listing_added > airbnb.last_review]

4      2020-01-23 00:10:00
50     2020-01-17 00:02:00
5587   2018-01-31 00:05:00
Name: listing_added, dtype: datetime64[ns]

In [180]:
# Drop these rows since they are only 2 rows
airbnb.drop(index=airbnb.listing_added[airbnb.listing_added > airbnb.last_review].index) 

Unnamed: 0,listing_id,name,host_id,host_name,neighbourhood_full,room_type,price,number_of_reviews,last_review,reviews_per_month,availability_365,rating,number_of_stays,5_stars,listing_added,lat,long,borough,neighbourhood
0,13740704,"Cozy,budget friendly, cable inc, private entra...",20583125,Michel,"Brooklyn, Flatlands",private room,45.0,10,2018-01-12 00:12:00,0.70,85,4.100954,12.0,0.609432,2018-01-08 00:06:00,40.63222,-73.93398,Brooklyn,Flatlands
1,22005115,Two floor apartment near Central Park,82746113,Cecilia,"Manhattan, Upper West Side",entire place,135.0,1,2019-01-30 00:06:00,1.00,145,3.367600,1.2,0.746135,2018-01-25 00:12:00,40.78761,-73.96862,Manhattan,Upper West Side
2,21667615,Beautiful 1BR in Brooklyn Heights,78251,Leslie,"Brooklyn, Brooklyn Heights",entire place,150.0,0,NaT,,65,,,,2018-01-15 00:08:00,40.70070,-73.99517,Brooklyn,Brooklyn Heights
4,22986519,Bedroom on the lively Lower East Side,154262349,Brooke,"Manhattan, Lower East Side",private room,160.0,23,2019-01-12 00:06:00,2.29,102,3.822591,27.6,0.649383,NaT,40.71884,-73.98354,Manhattan,Lower East Side
5,271954,Beautiful brownstone apartment,1423798,Aj,"Manhattan, Greenwich Village",entire place,150.0,203,2019-01-20 00:06:00,2.22,300,4.478396,243.6,0.743500,2018-01-15 00:12:00,40.73388,-73.99452,Manhattan,Greenwich Village
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8532,24624154,1 Bedroom Cozy Apartment,139756559,Maria,"Staten Island, Great Kills",entire place,75.0,3,2019-01-11 00:02:00,0.22,129,3.872523,3.6,0.707174,2018-01-08 00:08:00,40.54878,-74.13908,Staten Island,Great Kills
8600,20136828,2 BD cozy house,143199593,Val,"Staten Island, Midland Beach",entire place,105.0,20,2019-01-10 00:06:00,1.05,358,4.477754,24.0,0.665771,2018-01-05 00:12:00,40.57509,-74.09606,Staten Island,Midland Beach
8646,7791636,**Dominique's NYC couch bed **sleep*shower & ...,310670,Vie,"Bronx, Eastchester",shared room,45.0,1,2019-01-20 00:04:00,0.37,318,4.479847,1.2,0.614649,2018-01-15 00:10:00,40.88211,-73.83625,Bronx,Eastchester
8869,11656721,3 bedroom near Park,46502890,Jackie,"Queens, Jamaica Estates",private room,750.0,0,NaT,,0,,,,2018-01-11 00:04:00,40.72191,-73.78207,Queens,Jamaica Estates


##### **Task 8:** Let's deal with duplicate data


There are two notable types of duplicate data:

- Identical duplicate data across all columns
- Identical duplicate data cross most or some columns

To diagnose, and deal with duplicate data, we will be using the following methods and functions:

- `.duplicated(subset = , keep = )`
  - `subset` lets us pick one or more columns with duplicate values.
  - `keep` returns lets us return all instances of duplicate values.
- `.drop_duplicates(subset = , keep = )`
  

In [181]:
# Print the header of the DataFrame again
airbnb.head()

Unnamed: 0,listing_id,name,host_id,host_name,neighbourhood_full,room_type,price,number_of_reviews,last_review,reviews_per_month,availability_365,rating,number_of_stays,5_stars,listing_added,lat,long,borough,neighbourhood
0,13740704,"Cozy,budget friendly, cable inc, private entra...",20583125,Michel,"Brooklyn, Flatlands",private room,45.0,10,2018-01-12 00:12:00,0.7,85,4.100954,12.0,0.609432,2018-01-08 00:06:00,40.63222,-73.93398,Brooklyn,Flatlands
1,22005115,Two floor apartment near Central Park,82746113,Cecilia,"Manhattan, Upper West Side",entire place,135.0,1,2019-01-30 00:06:00,1.0,145,3.3676,1.2,0.746135,2018-01-25 00:12:00,40.78761,-73.96862,Manhattan,Upper West Side
2,21667615,Beautiful 1BR in Brooklyn Heights,78251,Leslie,"Brooklyn, Brooklyn Heights",entire place,150.0,0,NaT,,65,,,,2018-01-15 00:08:00,40.7007,-73.99517,Brooklyn,Brooklyn Heights
4,22986519,Bedroom on the lively Lower East Side,154262349,Brooke,"Manhattan, Lower East Side",private room,160.0,23,2019-01-12 00:06:00,2.29,102,3.822591,27.6,0.649383,NaT,40.71884,-73.98354,Manhattan,Lower East Side
5,271954,Beautiful brownstone apartment,1423798,Aj,"Manhattan, Greenwich Village",entire place,150.0,203,2019-01-20 00:06:00,2.22,300,4.478396,243.6,0.7435,2018-01-15 00:12:00,40.73388,-73.99452,Manhattan,Greenwich Village


In [185]:
# Find duplicates
airbnb[airbnb.duplicated()]

Unnamed: 0,listing_id,name,host_id,host_name,neighbourhood_full,room_type,price,number_of_reviews,last_review,reviews_per_month,availability_365,rating,number_of_stays,5_stars,listing_added,lat,long,borough,neighbourhood


In [99]:
# Find duplicates

In [100]:
# Remove identical duplicates

In [101]:
# Find non-identical duplicates

In [102]:
# Show all duplicates

To treat identical duplicates across some columns, we will chain the `.groupby()` and `.agg()` methods where we group by the column used to find duplicates (`listing_id`) and aggregate across statistical measures for `price`, `rating` and `list_added`. The `.agg()` method takes in a dictionary with each column's aggregation method - we will use the following aggregations:

- `mean` for `price` and `rating` columns
- `max` for `listing_added` column
- `first` for all remaining column

*A note on dictionary comprehensions:*

Dictionaries are useful data structures in Python with the following format
`my_dictionary = {key: value}` where a `key` is mapped to a `value` and whose `value` can be returned with `my_dictionary[key]` - dictionary comprehensions allow us to programmatically create dicitonaries using the structure:

```
{x: x*2 for x in [1,2,3,4,5]} 
{1:2, 2:4, 3:6, 4:8, 5:10}
```

In [103]:
# Get column names from airbnb

In [104]:
# Create dictionary comprehension with 'first' as value for all columns not being aggregated

In [105]:
# Remove non-identical duplicates

In [106]:
# Make sure no duplication happened

In [107]:
# Print header of DataFrame

## **Record Linkage**

Some selected examples will be presented in the on-site meeting.

In [108]:
# Task 1: Choose 3 different examples of word pairs, draw a matrix for each example, and calculate the levensthein distance manually without any digital support.

In [109]:
# Task 2: Calculate the levensthein distance for the three examples by using a python levensthein distance library of your choice.