# Data Cleaning and Column Information

## Action Summary
This code performs a series of data cleaning and transformation tasks on a DataFrame containing the scrapped data from **Hostelworld.com**. It aims to make the data more consistent and suitable for analysis.

## Column Information and transformation

During the data scrapping it was noticed that hostelworld showed twice in the website results for "cusco" and "cuzco". In the first step of this data cleaning, "cusco" was replaced with "cuzco" for consistency. After this, all duplicates were removed from the dataframe.

The information that should be contained in each column and the transformations that were performed to each one are described in the following:

- **country**: The 'country' column holds the name of the country were the hostel is located. In this column, the first letter of each word capitalized for aesthetical purposes. The column was then converted to category type for efficiency and memory saving.

- **city**: This column contains city names where the hostel is located or nearest to.  In this column, hyphens were removed and the first letter of each word capitalized for aesthetical purposes. The column was then converted to category type for efficiency and memory saving.

- **name**: The 'name' column holds the name of hostel. No transformations were required for this column.

- **description**: The 'description' column holds a brief description of the hostel. No transformations were required for this column.

- **rating**: The 'rating' column stores the average rating a hostel has received. No transformations were required for this column.

- **review**: The 'review' column stores the number of reviews that a hostel has received. In this column, parentheses were removed and missing values were filled with 0 before converting the data to integers.

- **km_to_centre**: This column contain the distances in kilometers to the city center. All non numerical characters were removed before converting the data to floats. 

- **min_private_price** and **min_dorm_price**: These columns represent prices in Euros of the cheapest private and dorm rooms offered by each hostel. The euro character was removed before converting the data to floats. 


In [3]:
# Import pandas
import pandas as pd

# Import and check dataframe for data types and missing values
df = pd.read_csv("backpacking_hostel_data.csv")
display(df.info())
display(df.head())

# Replace "cusco" with "cuzco" in the 'city' column
df["city"] = df["city"].str.replace("cusco", "cuzco")

# Remove duplicate rows from the DataFrame
df = df.drop_duplicates()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1763 entries, 0 to 1762
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   country            1763 non-null   object 
 1   city               1763 non-null   object 
 2   name               1763 non-null   object 
 3   description        1743 non-null   object 
 4   rating             1426 non-null   float64
 5   reviews            1426 non-null   object 
 6   km_to_centre       1763 non-null   object 
 7   min_private_price  1653 non-null   object 
 8   min_dorm_price     1157 non-null   object 
dtypes: float64(1), object(8)
memory usage: 124.1+ KB


None

Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
0,colombia,medellin,Black Sheep Hostel Medellin,A travellers favourite in Medellin for over fi...,9.2,(3170),- 5.39km from city centre,€19.69,€15.75
1,colombia,medellin,Oasis Hostel,Oasis Hostel is a unique hostel in the heart o...,9.1,(236),- 2.21km from city centre,,
2,colombia,medellin,Secret Buddha,Home away from home. That Simple. If you wish ...,9.6,(160),- 14.65km from city centre,€15.64,€14.63
3,colombia,medellin,Baku laureles Hostel,A perfect place for travellers who want to imm...,10.0,(6),- 1.85km from city centre,,€11.25
4,colombia,medellin,Secret Buddha,Home away from home. That Simple. If you wish ...,9.6,(160),- 14.65km from city centre,€15.64,€14.63


In [4]:
# Capitalize the first letter of each word in the 'country' column and convert it to a categorical variable
df["country"] = df["country"].str.title().astype("category")

# Replace hyphens with spaces in the 'city' column and capitalize the first letter of each word
df["city"] = df["city"].str.replace("-", " ")
df["city"] = df["city"].str.title().astype("category")

# Replace ratings of 0 with missing values
df['rating'] = df['rating'].replace(0, None)
df['rating'] = df['rating'].astype('float')

# Remove parentheses from the 'review' column, fill missing values with "0", and convert to integer
df['reviews'] = df['reviews'].str.strip('()')
df['reviews'] = df['reviews'].fillna("0")
df['reviews'] = df['reviews'].astype('int')

# Remove the string '- kmfromcitycentre' from the 'km_to_centre' column and convert to float
df['km_to_centre'] = df['km_to_centre'].str.strip('- kmfromcitycentre')
df['km_to_centre'] = df['km_to_centre'].astype('float')

# Remove the euro symbol '€' from 'min_private_price' and 'min_dorm_price' columns and convert to float
for column in ["min_private_price", "min_dorm_price"]:
    df[column] = df[column].str.strip('€')
    df[column] = df[column].astype('float')

# Check the updated DataFrame information
print(df.info())
display(df.head())


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1580 entries, 0 to 1762
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   country            1580 non-null   category
 1   city               1580 non-null   category
 2   name               1580 non-null   object  
 3   description        1560 non-null   object  
 4   rating             1185 non-null   float64 
 5   reviews            1580 non-null   int32   
 6   km_to_centre       1580 non-null   float64 
 7   min_private_price  1476 non-null   float64 
 8   min_dorm_price     1028 non-null   float64 
dtypes: category(2), float64(4), int32(1), object(2)
memory usage: 102.9+ KB
None


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
0,Colombia,Medellin,Black Sheep Hostel Medellin,A travellers favourite in Medellin for over fi...,9.2,3170,5.39,19.69,15.75
1,Colombia,Medellin,Oasis Hostel,Oasis Hostel is a unique hostel in the heart o...,9.1,236,2.21,,
2,Colombia,Medellin,Secret Buddha,Home away from home. That Simple. If you wish ...,9.6,160,14.65,15.64,14.63
3,Colombia,Medellin,Baku laureles Hostel,A perfect place for travellers who want to imm...,10.0,6,1.85,,11.25
5,Colombia,Medellin,Hostel Metro Floresta,Our hostel is always full of lovely vibes. Her...,9.6,28,2.44,10.35,10.35


## Checking of column with numerical values

After asserting all correct data types and an initial data cleaning, columns with numerical values where looked into more detail for potential outliers or negative values. The following observations and changes where made:

- **rating**: It was noticed that some hostels had a rating of 0, since this was considered unusual some research was done into this hostels on the hostelworld website. It seems that hostelworld automatically assigns a rating of 0 to hostels that have not received a review in the last 12 months. For this reason all 0 values were replaced with a missing value. 

- **review**: No outliers or negative values were observed in this column. The difference between the median and mean values can probably be explained by the fact that some exceptionally popular hostels receive a high amount of reviews while the majority of hostels tend to have a lower review count. 

- **km_to_centre**: A significant number of hostels reported exceptionally high values for the "km_to_centre" parameter. The distance metric is derived from the city centre assigned by hostelworld and the actual location of the hostel on the map. In certain cases, these distances seemed disproportionately large. Hostels situated more than 25 km from the city centre were identified as outliers and were assigned missing values. This threshold was chosen based on the dimensions of the largest cities within our dataset, such as Lima and Buenos Aires, where hostels beyond a 25 km radius were considered to be outside the city limits.

- **min_private_price** and **min_dorm_price**: Some significant outliers where observed in this two categories. Based on the fact that usually private rooms in hostels  and dorms are not higher than 250€ and 100 € respectively, Values exceeding these thresholds were replaced with the median values for private rooms and dorms within the country where the hostel is located. The same procedure was applied to privat rooms and dorms with prices below 4€ and 2.50€ respectively. This adjustment aimed to mitigate the impact of extreme values on the dataset's overall distribution.

During the exploration of the highest reviews hostel it was noticed that some hostel were duplicated except for the name of the city which was Puerto Narino and the fact that the description was missing. After doing some research in the hostelworld website it was noticed that the website that was supposed to take the user to the hostels in Puerto Narino instead would take you to the main Colombia site in the website, and thus the most popular hostels were scrapped again as if they were from Puerto Narino (This time without description since the description is not displayed in the hostels of the main Colombia site). All hostels with city set as Puerto Narino were thus deleted from the dataframe and it was checked that this kind of issue did not occur for other sities by checking that all other hostel contained a description. 

Also during reviews exploration it was observed that some hostel were duplicated by only having a slight difference in the number of reviews or in the rating. This is due to the fact that some hostels were featured and thus repeated at the top of the hostels list in each page and it is possible that in the during scrapping a review was made and the values differed from page to page. Due to this small difference in the values this hostels were not dropped when the duplicates in the dataframed were removed. 

After all the cleaning and modification of the dataframe the index was reset to ensure that the index is sequential and to avoid any potential errors when accessing the data.

The summary of the data types of all columns was displayed again to confirm the desired changes had been made correctly. By executing these data validation and cleaning steps, we have created a reliable and robust dataset, free from missing values and unnecessary features, setting the groundwork for meaningful analysis. This dataset was exported as a csv file to be used in other notebooks for exploratory data analysis. 

In [5]:
# Display the characteristics of the numerical columns in the dataframe
display(df.describe())

# Inspect ratings column and replace value 0 with missing values
display(df.sort_values(by='rating', ascending=True).head(5))
df['rating'] = df['rating'].replace(0, None)
df['rating'] = df['rating'].astype('float')

# Inspect hostels with highest amount reviews
display(df.sort_values(by='reviews', ascending=False).head(20))

# Check hostels with Puerto Narino as city and hostels missing description. Remove rows that match this conditions.
display(df[df["city"] == "Puerto Narino"].head())
display(df[df["description"].isna()].head())
df.drop(df[df['city'] == 'Puerto Narino'].index, inplace=True)

# Remove rows that contain same hostel
df.drop_duplicates(subset=['country', 'city', 'name'], keep='last',  inplace = True)

# Check hostel with highest amount reviews after modifications
display(df.sort_values(by='reviews', ascending=False).head(20))


Unnamed: 0,rating,reviews,km_to_centre,min_private_price,min_dorm_price
count,1185.0,1580.0,1580.0,1476.0,1028.0
mean,8.569705,264.206329,13.611816,3839.243,2011.12893
std,1.460257,688.226715,226.165836,74580.0,43771.174973
min,2.0,0.0,0.03,0.01,0.01
25%,8.1,1.0,0.71,9.0,7.85
50%,9.0,24.0,1.475,13.14,10.93
75%,9.5,198.25,3.6725,19.575,14.6575
max,10.0,8756.0,8428.64,2294223.0,994005.92


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
1103,Peru,Ollantaytambo,Pachakusi Hostel,The hostel is a wonderful place with a nice ga...,2.0,67,1.33,8.26,
439,Colombia,Cali,Hotel Plaza Real,Hotel Plaza Real offers a variety of private r...,2.0,1,1.9,12.56,
1702,Argentina,Rosario,La Casona de Don Jaime 2,La Casona de Don Jaime is THE backpackers meet...,2.0,244,0.24,9.44,8.26
241,Colombia,Santa Marta,Historic House Hotel,"Located in Santa Marta, less than 1 km from Ba...",2.0,6,1.4,9.0,
935,Peru,Cuzco,Peruvian Hotel,The Grasshopper Hostel offer you guys a small ...,2.0,408,0.94,1.91,


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
781,Colombia,Puerto Narino,Viajero Cartagena Hostel,,9.1,8756,1.85,22.84,16.82
275,Colombia,Cartagena,Viajero Cartagena Hostel,Cartagena's walled city is the pride of Colomb...,9.1,8756,1.85,22.92,16.88
883,Peru,Cuzco,Pariwana Hostel Cusco,Live an incredibly fun experience while enjoyi...,9.0,6599,0.22,26.55,14.48
1494,Argentina,Buenos Aires,America del Sur Hostel Buenos Aires,"Located in the famous district of San Telmo, w...",8.9,6076,3.74,17.89,13.77
154,Colombia,Santa Marta,Dreamer Santa Marta,VOTED MOST POPULAR HOSTEL 2019 and BEST HOSTEL...,9.0,5607,3.9,16.88,11.25
827,Peru,Lima,Pariwana Hostel Lima,Pariwana is located in the center of Miraflore...,9.0,5138,8.08,28.96,14.48
1245,Bolivia,La Paz,Wild Rover La Paz,Wild Rover La Paz is a low-cost social experie...,8.9,4551,2.0,21.31,8.39
1241,Bolivia,La Paz,Wild Rover La Paz,Wild Rover La Paz is a low-cost social experie...,8.8,4550,2.0,21.26,8.37
1568,Argentina,El Calafate,America del Sur Hostel,America del Sur Hostel is located in the city ...,9.6,4107,0.73,20.05,17.44
1319,Chile,Santiago,La Chimba Hostel,"The city's liveliest area, the best nightlife ...",8.5,3956,1.43,19.77,19.31


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
769,Colombia,Puerto Narino,Black Sheep Hostel Medellin,,9.2,3170,5.39,19.62,15.69
770,Colombia,Puerto Narino,Arcadia Hostel,,8.9,1246,5.13,12.22,10.99
771,Colombia,Puerto Narino,Secret Buddha,,9.6,160,14.65,15.58,14.57
772,Colombia,Puerto Narino,Hostel Metro Floresta,,9.6,28,2.44,10.31,10.31
773,Colombia,Puerto Narino,The Cranky Croc Hostel,,9.6,3247,1.5,22.11,17.68


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
769,Colombia,Puerto Narino,Black Sheep Hostel Medellin,,9.2,3170,5.39,19.62,15.69
770,Colombia,Puerto Narino,Arcadia Hostel,,8.9,1246,5.13,12.22,10.99
771,Colombia,Puerto Narino,Secret Buddha,,9.6,160,14.65,15.58,14.57
772,Colombia,Puerto Narino,Hostel Metro Floresta,,9.6,28,2.44,10.31,10.31
773,Colombia,Puerto Narino,The Cranky Croc Hostel,,9.6,3247,1.5,22.11,17.68


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
275,Colombia,Cartagena,Viajero Cartagena Hostel,Cartagena's walled city is the pride of Colomb...,9.1,8756,1.85,22.92,16.88
883,Peru,Cuzco,Pariwana Hostel Cusco,Live an incredibly fun experience while enjoyi...,9.0,6599,0.22,26.55,14.48
1494,Argentina,Buenos Aires,America del Sur Hostel Buenos Aires,"Located in the famous district of San Telmo, w...",8.9,6076,3.74,17.89,13.77
154,Colombia,Santa Marta,Dreamer Santa Marta,VOTED MOST POPULAR HOSTEL 2019 and BEST HOSTEL...,9.0,5607,3.9,16.88,11.25
827,Peru,Lima,Pariwana Hostel Lima,Pariwana is located in the center of Miraflore...,9.0,5138,8.08,28.96,14.48
1245,Bolivia,La Paz,Wild Rover La Paz,Wild Rover La Paz is a low-cost social experie...,8.9,4551,2.0,21.31,8.39
1568,Argentina,El Calafate,America del Sur Hostel,America del Sur Hostel is located in the city ...,9.6,4107,0.73,20.05,17.44
1319,Chile,Santiago,La Chimba Hostel,"The city's liveliest area, the best nightlife ...",8.5,3956,1.43,19.77,19.31
1521,Argentina,Buenos Aires,Milhouse Hostel Hipo,There´s no better place to spend your holidays...,7.1,3698,3.34,18.31,12.02
863,Peru,Cuzco,Hospedaje Turistico Recoleta,Hospedaje Turistico Recoleta is situated very ...,9.3,3425,1.05,11.03,10.11


In [6]:
import numpy as np

# Inspect hostels with highest distance to centre of town
display(df.sort_values(by='km_to_centre', ascending=False).head(5))


# Set values to None where 'km_to_centre' is greater than 25.0 using numpy.where and the column as float type
df['km_to_centre'] = np.where(df['km_to_centre'] > 25.0, None, df['km_to_centre'])
df['km_to_centre'] = df['km_to_centre'].astype('float')
display(df.sort_values(by='km_to_centre', ascending=False).head(5))


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
332,Colombia,Cartagena,Pasada playa blanca,'Posada Playa Blanca' is located on the majest...,,0,8428.64,9.56,
1740,Argentina,Esquel,My Pod Capsule Hostel,My Pod Capsule Hostel is where you can enjoy t...,6.6,18,2804.14,,23.91
1729,Argentina,San Pedro,Hostel del centro II,Hostel del centro comfortable dorms and rooms ...,7.8,11,948.88,13.14,13.14
1742,Argentina,San Ignacio,SIHOSTEL - Adventure San Ignacio,SIHOSTEL - Adventure San Ignacio is located ri...,8.9,308,757.19,8.93,11.71
727,Colombia,Ipiales,Xantico Hostal,Tulcán is 9 km from Xantico Hostal. You can fi...,8.9,42,559.7,11.14,11.14


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
762,Colombia,Nuqui,El Vijo Surf Bungalows,.,10.0,2,24.13,77.63,
205,Colombia,Santa Marta,Finca del Tayrona La Gordita,Backpackers hostel and economic lodgings near ...,8.7,13,23.85,6.75,7.88
637,Colombia,Riohacha,Hostal CQ Camarones,"Hostal CQ Camarones, situated in the Los Flame...",,0,23.5,7.66,6.44
270,Colombia,Cartagena,Cabana Baru Hostel Club,"Caribbean cabin by the sea, a good place to sh...",9.7,6,23.31,11.25,9.0
763,Colombia,Nuqui,Chowa Lodge,Chowa Lodge provides cozy accommodation made f...,,0,22.87,20.25,


In [10]:
# Display the DataFrame's top 20 rows sorted by 'min_private_price' and 'min_dorm_price'
display(df.sort_values(by='min_private_price', ascending=False).head())
display(df.sort_values(by='min_private_price', ascending=True).head())
display(df.sort_values(by='min_dorm_price', ascending=False).head())
display(df.sort_values(by='min_dorm_price', ascending=True).head())

# Calculate the median 'min_private_price' for each country
median_mpp_by_country = df.groupby("country")["min_private_price"].median()

# Define a function to replace outliers in prices with the median price of the country where the hostel is located
def price_corrector(column, threshold_price, max=True):
    median_mpp_by_country = df.groupby("country")[column].median()
    for country in df["country"].unique():
        # Conditionally update 'min_private_price' using numpy.where
        if max:
            condition = (df[column] > threshold_price) & (df['country'] == country)
            df[column] = np.where(condition, median_mpp_by_country[country], df[column]) 
        else:
            condition = (df[column] < threshold_price) & (df['country'] == country)
            df[column] = np.where(condition, median_mpp_by_country[country], df[column])             

# Apply the price corrector to the private room and dorm price columns        
price_corrector("min_private_price", 250, max=True)
price_corrector("min_dorm_price", 100, max=True)
price_corrector("min_private_price", 4, max=False)
price_corrector("min_dorm_price", 2, max=False)

# Display the DataFrame's sorted by 'min_private_price' and 'min_dorm_price' after updates
display(df.sort_values(by='min_private_price', ascending=False).head())
display(df.sort_values(by='min_private_price', ascending=True).head())
display(df.sort_values(by='min_dorm_price', ascending=False).head())
display(df.sort_values(by='min_dorm_price', ascending=True).head())


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
953,Peru,Puno,Uros Titicaca lodge Puno Peru,Our rooms are loaded with traditional stylish ...,,0,11.71,179.3,
982,Peru,Iquitos,Antares Amazon Lodge,"Antares Amazon Lodge is set in Pampa Caño, onl...",,0,,160.6,
1417,Argentina,El Chalten,Estancia La Quinta,Estancia la Quinta is a one-hundred-year-old c...,,2,4.16,130.56,
1277,Chile,Easter Island,Easter Island Ecolodge,Easter Island Ecolodge - It has large terraces...,,2,6.68,122.59,
978,Peru,Iquitos,Lupuna jungle tours,We are committed to sustainable harmony betwee...,,0,0.7,107.95,8.645


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
631,Colombia,Popayan,Hostal Antonio,Hostal Antonio is just few blocks away from Pa...,,0,1.14,4.13,3.67
86,Colombia,Bogota,Viajero Bogota Hostel & Spa,CULTURE AND RELAXATION IN ONE PLACE! Welcome t...,9.0,312,1.8,4.15,12.78
1411,Argentina,Puerto Iguazu,Hotel Apart Naipi,Hotel Apart Naipi has 8 private and spacious a...,,0,1.46,4.2,
803,Peru,Cuzco,Pachamas Hostel,A terrace overlooking the city of Cusco. Priva...,3.0,3,0.11,4.24,6.36
922,Peru,Huaraz,El Jacal Backpacker,El Jacal Backpacker comfortable rooms and dorm...,7.4,20,0.49,4.29,5.05


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
1288,Argentina,Buenos Aires,Hostel Casa Basilico,Hostel Casa Basilico is the best place for a r...,9.1,16,1.7,,78.92
1229,Chile,Puerto Varas,Compass del Sur,The 'Compass del Sur' gives you a mixture of C...,9.9,537,0.33,17.53,75.1
281,Colombia,Cartagena,Hotel CL Getsemani con Wi-Fi y aire acondicionado,Cartagena de Indias is probably the most beaut...,,0,1.44,15.26,55.13
275,Colombia,Cartagena,4C2-R Cabaña en Isla de Barú,This beachfront property offers access to a te...,,0,,,53.1
768,Peru,Cuzco,Marlon's House Cusco,Marlon’s House is your house for as long as yo...,10.0,6,0.61,9.19,41.38


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
1446,Argentina,Cordoba,Rivera Hostel Córdoba,This is one of the most important journeys of ...,7.3,344,2.99,,2.1
1383,Argentina,Mendoza,Hostel Internacional Mendoza,Best Wifi – Best Price – Wonderful Location – ...,8.1,2275,3.21,8.28,2.36
792,Peru,Cuzco,Puka Packers,Attractively set in the Cusco City Centre dist...,8.6,8,0.83,8.58,2.76
997,Peru,Ica,Ica Wasi Hospedaje,Welcome to Ica Wasi-your home in Ica! We offer...,9.7,839,1.18,5.98,2.99
1295,Argentina,Buenos Aires,Franca City Hostel,Franca City Hostel has everything you need to ...,8.5,21,2.75,25.66,3.04


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
953,Peru,Puno,Uros Titicaca lodge Puno Peru,Our rooms are loaded with traditional stylish ...,,0,11.71,179.3,
982,Peru,Iquitos,Antares Amazon Lodge,"Antares Amazon Lodge is set in Pampa Caño, onl...",,0,,160.6,
1417,Argentina,El Chalten,Estancia La Quinta,Estancia la Quinta is a one-hundred-year-old c...,,2,4.16,130.56,
1277,Chile,Easter Island,Easter Island Ecolodge,Easter Island Ecolodge - It has large terraces...,,2,6.68,122.59,
978,Peru,Iquitos,Lupuna jungle tours,We are committed to sustainable harmony betwee...,,0,0.7,107.95,8.645


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
631,Colombia,Popayan,Hostal Antonio,Hostal Antonio is just few blocks away from Pa...,,0,1.14,4.13,3.67
86,Colombia,Bogota,Viajero Bogota Hostel & Spa,CULTURE AND RELAXATION IN ONE PLACE! Welcome t...,9.0,312,1.8,4.15,12.78
1411,Argentina,Puerto Iguazu,Hotel Apart Naipi,Hotel Apart Naipi has 8 private and spacious a...,,0,1.46,4.2,
803,Peru,Cuzco,Pachamas Hostel,A terrace overlooking the city of Cusco. Priva...,3.0,3,0.11,4.24,6.36
922,Peru,Huaraz,El Jacal Backpacker,El Jacal Backpacker comfortable rooms and dorm...,7.4,20,0.49,4.29,5.05


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
1288,Argentina,Buenos Aires,Hostel Casa Basilico,Hostel Casa Basilico is the best place for a r...,9.1,16,1.7,,78.92
1229,Chile,Puerto Varas,Compass del Sur,The 'Compass del Sur' gives you a mixture of C...,9.9,537,0.33,17.53,75.1
281,Colombia,Cartagena,Hotel CL Getsemani con Wi-Fi y aire acondicionado,Cartagena de Indias is probably the most beaut...,,0,1.44,15.26,55.13
275,Colombia,Cartagena,4C2-R Cabaña en Isla de Barú,This beachfront property offers access to a te...,,0,,,53.1
768,Peru,Cuzco,Marlon's House Cusco,Marlon’s House is your house for as long as yo...,10.0,6,0.61,9.19,41.38


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
1446,Argentina,Cordoba,Rivera Hostel Córdoba,This is one of the most important journeys of ...,7.3,344,2.99,,2.1
1383,Argentina,Mendoza,Hostel Internacional Mendoza,Best Wifi – Best Price – Wonderful Location – ...,8.1,2275,3.21,8.28,2.36
792,Peru,Cuzco,Puka Packers,Attractively set in the Cusco City Centre dist...,8.6,8,0.83,8.58,2.76
997,Peru,Ica,Ica Wasi Hospedaje,Welcome to Ica Wasi-your home in Ica! We offer...,9.7,839,1.18,5.98,2.99
1295,Argentina,Buenos Aires,Franca City Hostel,Franca City Hostel has everything you need to ...,8.5,21,2.75,25.66,3.04


In [8]:
# Reset the index of the DataFrame after transformations
df.reset_index(drop=True, inplace=True)

# Check the updated DataFrame information
print(df.info())
display(df.head())

# Export the final dataframe as a csv
df.to_csv('backpacking_hostel_data_cleaned.csv', index=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1542 entries, 0 to 1541
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   country            1542 non-null   category
 1   city               1542 non-null   category
 2   name               1542 non-null   object  
 3   description        1542 non-null   object  
 4   rating             1147 non-null   float64 
 5   reviews            1542 non-null   int32   
 6   km_to_centre       1463 non-null   float64 
 7   min_private_price  1440 non-null   float64 
 8   min_dorm_price     991 non-null    float64 
dtypes: category(2), float64(4), int32(1), object(2)
memory usage: 88.7+ KB
None


Unnamed: 0,country,city,name,description,rating,reviews,km_to_centre,min_private_price,min_dorm_price
0,Colombia,Medellin,Secret Buddha,Home away from home. That Simple. If you wish ...,9.6,160,14.65,15.64,14.63
1,Colombia,Medellin,Baku laureles Hostel,A perfect place for travellers who want to imm...,10.0,6,1.85,,11.25
2,Colombia,Medellin,Hostel Metro Floresta,Our hostel is always full of lovely vibes. Her...,9.6,28,2.44,10.35,10.35
3,Colombia,Medellin,Yellow House Hostel,"We are Yellow House Hostel, a very quiet, nice...",9.5,5,2.43,7.88,7.88
4,Colombia,Medellin,Viajero Medellin Hostel,360 Â° view of Medelli­n in the coolest neighb...,9.4,901,4.79,21.8,18.0
