# Practice DataFrame Mutations using Airbnb Data

## Introduction

Welcome to the Pandas DataFrame Mutations Lab: Exploring Airbnb Listings!

In this coding lab, you will learn creating new columns, deleting rows ad columns, modifying dataframe structure, add new rows, and use of inplace parameter. We will be working with the Airbnb Listings dataset, which contains information about various Airbnb listings in US States. This dataset will serve as the foundation for our exploration and learning journey.

### Dataset Description:

The Airbnb Listings dataset provides valuable insights into the world of short-term vacation rentals. It includes details such as the listing ID, name, host ID, host name, neighborhood, geographical coordinates, room type, pricing information, minimum nights required, and various other attributes that define each listing. By leveraging this dataset, we can extract meaningful information, uncover patterns, and gain insights into the Airbnb market in US States.

By working through this coding lab, you will develop a solid understanding of pandas and gain hands-on experience in performing data manipulation, analysis, and transformation tasks. These skills will be valuable in various data-driven domains, from business analytics to Data Science.

Now, it's time to roll up your sleeves, fire up your Jupyter notebooks, and let's embark on this exciting journey of exploring the Airbnb Listings dataset using pandas!

> IMPORTANT NOTE: Please make sure to complete the activities in the order they are presented, as some activities depend on the results of previous activities. If you encounter difficulties, you can refer to the solution code provided for each activity and also attempt to run all the activities again from the beginning before reporting an issue.

In [1]:
import pandas as pd 
path_to_csv = "../../data/AB_US_2023.csv"

In [4]:
df = pd.read_csv(path_to_csv, low_memory=False, parse_dates=['last_review'])
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,city
0,958,"Bright, Modern Garden Unit - 1BR/1BTH",1169,Holly,,Western Addition,37.77028,-122.43317,Entire home/apt,202,2,383,2023-02-19,2.31,1,128,59,San Francisco
1,5858,Creative Sanctuary,8904,Philip And Tania,,Bernal Heights,37.74474,-122.42089,Entire home/apt,235,30,111,2017-08-06,0.66,1,365,0,San Francisco
2,8142,Friendly Room Apt. Style -UCSF/USF - San Franc...,21994,Aaron,,Haight Ashbury,37.76555,-122.45213,Private room,56,32,9,2022-10-27,0.09,13,365,1,San Francisco
3,8339,Historic Alamo Square Victorian,24215,Rosy,,Western Addition,37.77564,-122.43642,Entire home/apt,575,9,28,2019-06-28,0.17,2,365,0,San Francisco
4,8739,"Mission Sunshine, with Private Bath",7149,Ivan & Wendy,,Mission,37.7603,-122.42197,Private room,110,1,770,2023-02-25,4.65,2,159,34,San Francisco


In [5]:
df.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'number_of_reviews_ltm', 'city'],
      dtype='object')

In [6]:
df.shape

(232147, 18)

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232147 entries, 0 to 232146
Data columns (total 18 columns):
 #   Column                          Non-Null Count   Dtype         
---  ------                          --------------   -----         
 0   id                              232147 non-null  int64         
 1   name                            232131 non-null  object        
 2   host_id                         232147 non-null  int64         
 3   host_name                       232134 non-null  object        
 4   neighbourhood_group             96500 non-null   object        
 5   neighbourhood                   232147 non-null  object        
 6   latitude                        232147 non-null  float64       
 7   longitude                       232147 non-null  float64       
 8   room_type                       232147 non-null  object        
 9   price                           232147 non-null  int64         
 10  minimum_nights                  232147 non-null  int64  

In [8]:
df.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
count,232147.0,232147.0,232147.0,232147.0,232147.0,232147.0,232147.0,183062.0,232147.0,232147.0,232147.0
mean,2.58458e+17,158224800.0,36.610585,-98.301436,259.468001,13.495867,40.91523,1.638348,29.879055,180.985686,11.689701
std,3.465985e+17,158716400.0,5.126523,19.706929,1024.645918,27.920631,80.649152,1.910812,106.013665,134.715299,20.599954
min,6.0,23.0,25.957323,-123.08913,0.0,1.0,0.0,0.01,1.0,0.0,0.0
25%,26388960.0,22992420.0,33.976225,-118.315111,91.0,2.0,1.0,0.31,1.0,52.0,0.0
50%,48963070.0,100578300.0,36.190556,-97.72767,149.0,3.0,9.0,1.0,2.0,175.0,3.0
75%,6.633014e+17,268693000.0,40.71744,-77.026222,250.0,30.0,43.0,2.42,10.0,321.0,16.0
max,8.581014e+17,506938400.0,47.73401,-70.996,100000.0,1250.0,3091.0,101.42,1003.0,365.0,1314.0


In [10]:
df.isnull().sum()

id                                     0
name                                  16
host_id                                0
host_name                             13
neighbourhood_group               135647
neighbourhood                          0
latitude                               0
longitude                              0
room_type                              0
price                                  0
minimum_nights                         0
number_of_reviews                      0
last_review                        49085
reviews_per_month                  49085
calculated_host_listings_count         0
availability_365                       0
number_of_reviews_ltm                  0
city                                   0
dtype: int64

## Activities

### 1. Create a New Column <code>price_per_night</code>

Create a new column called <code>price_per_night</code> that calculates the price per night for each Airbnb listing.

In [11]:
df['price_per_night'] = df['price'] / df['minimum_nights']

### 2. Delete all rows where the price is greater than $500

> Modify the original DataFrame df.

In [15]:
df.drop(df.loc[df['price'] > 500].index, inplace=True)

### 3. Delete the <code>host_name</code> and <code>neighbourhood_group</code> columns from the DataFrame df

> Modify the original DataFrame df.

In [16]:
df.drop(['neighbourhood_group', 'host_name'], axis=1, inplace=True)

### 4. Rename the column <code>number_of_reviews</code> to <code>reviews_count</code>

In [18]:
df = df.rename(columns={'number_of_reviews' : 'reviews_count'})

### 5. Convert the price column from integer to float data type

In [20]:
df['price'] = df['price'].astype(float)

### 6. Replace all occurrences of Private room in the <code>room_type</code> column with Private

In [23]:
df['room_type'] = df['room_type'].apply(lambda x: 'Private' if x == 'Private room' else x)

The better solution is:

<code>df.loc[df['room_type'] == 'Private room', 'room_type'] = 'Private'</code>

In this solution, df['room_type'] == 'Private room' creates a boolean mask that identifies the rows where the 'room_type' column has the value 'Private room'. Then, df.loc[boolean_mask, 'room_type'] selects the subset of the 'room_type' column where the boolean mask is True. Finally, we assign the value 'Private' to this subset, effectively replacing 'Private room' with 'Private'.

### 7. Add new row with the given details

Add a new row to the DataFrame df at the end with the following details:

<code>new_row_data = {'id': 851792795339743534, 'name': 'Tony Stark Apartment', 'host_id': 67890, 'room_type': 'Entire home/apt',
                'price': 150, 'minimum_nights': 3, 'reviews_count': 10}</code>

In [24]:
new_row_data = {'id': 851792795339743534, 'name': 'Tony Stark Apartment', 'host_id': 67890, 'room_type': 'Entire home/apt',
                'price': 150, 'minimum_nights': 3, 'reviews_count': 10}
new_row = pd.DataFrame(new_row_data, index=[len(df)])
df = pd.concat([df, new_row])

In this solution, we create a new DataFrame new_row with the new row data and index. Then, we use pd.concat() to concatenate the two DataFrames df and new_row along the row axis.

### 8. Remove the <code>availability_365</code> column from the DataFrame without creating a new DataFrame

In [None]:
df.drop(['availability_365'], axis=1, inplace=True)

### 9. Sort the DataFrame by the price column in descending order

Sort the DataFrame df by the <code>price</code> column in descending order and assign the result to <code>sorted_df</code>.

In [25]:
sorted_df = df.sort_values(by='price', ascending=False)

### 10. Convert all prices from US dollars to Euros

Create a new column <code>price_eur</code> in the DataFrame df that contains the prices in Euros. The conversion rate is 1 US dollar = 0.85 euros.

In [26]:
df['price_eur'] = df['price'] * 0.85

### 11. Modify the price_per_night by doubling the rates.

Modify the <code>price_per_night</code> column by doubling the rates. For example, if the <code>price_per_night</code> column contains the value 50, it should be modified to 100.

> Modify the original DataFrame df.

In [27]:
df['price_per_night'] = df['price_per_night'] * 2

### 12. Create a new column named year that contains the year information from the last_review column

Create a new column <code>year</code> in the DataFrame df that contains the year information from the <code>last_review column</code>. 
For example, if the <code>last_review</code> column contains the date 2019-05-21, the year column should contain the value 2019. You can use the dt accessor to access the datetime properties of a column. For example, <code>df['last_review'].dt.year</code> returns the year information from the last_review column.

In [28]:
df['year'] = df['last_review'].dt.year