# iNeuron Internship
##### Submitted by - Mohammad Salman & Mohammad Tippu

## Project Title:    Airbnb Data Analysis
## Technologies:  Business Intelligence
## Domain:  Travel Data Analysis
## Difficulty Level:  Advanced
#### PROBLEM STATEMENT

Since Airbnb's inception in 2008, the platform has captivated hosts and travelers, infusing travel experiences with a personal touch. This data analysis focuses on Amsterdam, aiming to identify top earners among hosts, understand the relationship between monthly earnings and prices, pinpoint high-demand neighborhoods, analyze the interplay between price and location, and delve into the connections between quality, price, amenities, and location.

Find key metrics and factors and show the meaningful relationships between attributes.

#### DATA COLLECTION
Dataset taken from iNeuron Portal Link - https://drive.google.com/drive/folders/1ANkgtAT0Pdp2r86IxFKv9vKYmnsYjJDO
#### Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import os

In [None]:
file_path = os.path.join("Dataset", "airbnb prices.csv")

In [None]:
df=pd.read_csv(file_path)

In [3]:
df.head()

Unnamed: 0,room_id,survey_id,host_id,room_type,country,city,borough,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,bathrooms,price,minstay,name,last_modified,latitude,longitude,location
0,10176931,1476,49180562,Shared room,,Amsterdam,,De Pijp / Rivierenbuurt,7,4.5,2,1.0,,156.0,,Red Light/ Canal view apartment (Shared),2017-07-23 13:06:27.391699,52.356209,4.887491,0101000020E610000033FAD170CA8C13403BC5AA41982D...
1,8935871,1476,46718394,Shared room,,Amsterdam,,Centrum West,45,4.5,4,1.0,,126.0,,Sunny and Cozy Living room in quite neighbours,2017-07-23 13:06:23.607187,52.378518,4.89612,0101000020E6100000842A357BA095134042791F477330...
2,14011697,1476,10346595,Shared room,,Amsterdam,,Watergraafsmeer,1,0.0,3,1.0,,132.0,,Amsterdam,2017-07-23 13:06:23.603546,52.338811,4.943592,0101000020E6100000A51133FB3CC613403543AA285E2B...
3,6137978,1476,8685430,Shared room,,Amsterdam,,Centrum West,7,5.0,4,1.0,,121.0,,Canal boat RIDE in Amsterdam,2017-07-23 13:06:22.689787,52.376319,4.890028,0101000020E6100000DF180280638F134085EE92382B30...
4,18630616,1476,70191803,Shared room,,Amsterdam,,De Baarsjes / Oud West,1,0.0,2,1.0,,93.0,,One room for rent in a three room appartment,2017-07-23 13:06:19.681469,52.370384,4.852873,0101000020E6100000CD902A8A57691340187B2FBE682F...


In [13]:
df.shape

(18723, 20)

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   room_id               18723 non-null  int64  
 1   survey_id             18723 non-null  int64  
 2   host_id               18723 non-null  int64  
 3   room_type             18723 non-null  object 
 4   country               0 non-null      float64
 5   city                  18723 non-null  object 
 6   borough               0 non-null      float64
 7   neighborhood          18723 non-null  object 
 8   reviews               18723 non-null  int64  
 9   overall_satisfaction  18723 non-null  float64
 10  accommodates          18723 non-null  int64  
 11  bedrooms              18723 non-null  float64
 12  bathrooms             0 non-null      float64
 13  price                 18723 non-null  float64
 14  minstay               0 non-null      float64
 15  name               

<h3><span style="color: hsl(200, 80%, 70%); font-weight:bold;">NO DUPLICATES

</span></h3>

In [35]:
df[df.duplicated()]

Unnamed: 0,room_id,survey_id,host_id,room_type,country,city,borough,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,bathrooms,price,minstay,name,last_modified,latitude,longitude,location



<span style="color: Green; font-weight:bold;">
In this dataset , it is having 4 columns (Country,borough,bathrooms,minstay) which are completely nan values. Therefore we can drop off those columns.
</span>


In [40]:
#droping nan columns
df=df.drop(['country','borough','bathrooms','minstay'],axis=1)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   room_id               18723 non-null  int64  
 1   survey_id             18723 non-null  int64  
 2   host_id               18723 non-null  int64  
 3   room_type             18723 non-null  object 
 4   city                  18723 non-null  object 
 5   neighborhood          18723 non-null  object 
 6   reviews               18723 non-null  int64  
 7   overall_satisfaction  18723 non-null  float64
 8   accommodates          18723 non-null  int64  
 9   bedrooms              18723 non-null  float64
 10  price                 18723 non-null  float64
 11  name                  18671 non-null  object 
 12  last_modified         18723 non-null  object 
 13  latitude              18723 non-null  float64
 14  longitude             18723 non-null  float64
 15  location           

In [37]:
df.sample()

Unnamed: 0,room_id,survey_id,host_id,room_type,country,city,borough,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,bathrooms,price,minstay,name,last_modified,latitude,longitude,location
13133,15592311,1476,100462143,Entire home/apt,,Amsterdam,,De Pijp / Rivierenbuurt,5,4.0,2,1.0,,102.0,,Spacious apartment in a beautiful neighborhood,2017-07-22 17:36:50.068079,52.344849,4.897584,0101000020E6100000C6A695422097134077871403242C...


<span style="color: Green; font-weight:bold;">we can reduce the datatype size for smaller values like for single digits - reviews	,overall_satisfaction	,accommodates,	bedrooms
. which reduces the size of the dataset file which leads to reduction of dataset loading></span>

<h2><span style="color: hsl(200, 80%, 70%); font-weight:bold;">Changing the Datatypes of columns in Dataframe

</span></h2>

In [41]:
df.head()

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,last_modified,latitude,longitude,location
0,10176931,1476,49180562,Shared room,Amsterdam,De Pijp / Rivierenbuurt,7,4.5,2,1.0,156.0,Red Light/ Canal view apartment (Shared),2017-07-23 13:06:27.391699,52.356209,4.887491,0101000020E610000033FAD170CA8C13403BC5AA41982D...
1,8935871,1476,46718394,Shared room,Amsterdam,Centrum West,45,4.5,4,1.0,126.0,Sunny and Cozy Living room in quite neighbours,2017-07-23 13:06:23.607187,52.378518,4.89612,0101000020E6100000842A357BA095134042791F477330...
2,14011697,1476,10346595,Shared room,Amsterdam,Watergraafsmeer,1,0.0,3,1.0,132.0,Amsterdam,2017-07-23 13:06:23.603546,52.338811,4.943592,0101000020E6100000A51133FB3CC613403543AA285E2B...
3,6137978,1476,8685430,Shared room,Amsterdam,Centrum West,7,5.0,4,1.0,121.0,Canal boat RIDE in Amsterdam,2017-07-23 13:06:22.689787,52.376319,4.890028,0101000020E6100000DF180280638F134085EE92382B30...
4,18630616,1476,70191803,Shared room,Amsterdam,De Baarsjes / Oud West,1,0.0,2,1.0,93.0,One room for rent in a three room appartment,2017-07-23 13:06:19.681469,52.370384,4.852873,0101000020E6100000CD902A8A57691340187B2FBE682F...


In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   room_id               18723 non-null  int64  
 1   survey_id             18723 non-null  int64  
 2   host_id               18723 non-null  int64  
 3   room_type             18723 non-null  object 
 4   city                  18723 non-null  object 
 5   neighborhood          18723 non-null  object 
 6   reviews               18723 non-null  int64  
 7   overall_satisfaction  18723 non-null  float64
 8   accommodates          18723 non-null  int64  
 9   bedrooms              18723 non-null  float64
 10  price                 18723 non-null  float64
 11  name                  18671 non-null  object 
 12  last_modified         18723 non-null  object 
 13  latitude              18723 non-null  float64
 14  longitude             18723 non-null  float64
 15  location           

In [44]:
max(df.reviews.unique())

532

<span style="color: Green; font-weight:bold;">Changing the reviews column from int64 to int32</span>

In [45]:
df['reviews']=df['reviews'].astype('int32')

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   room_id               18723 non-null  int64  
 1   survey_id             18723 non-null  int64  
 2   host_id               18723 non-null  int64  
 3   room_type             18723 non-null  object 
 4   city                  18723 non-null  object 
 5   neighborhood          18723 non-null  object 
 6   reviews               18723 non-null  int32  
 7   overall_satisfaction  18723 non-null  float64
 8   accommodates          18723 non-null  int64  
 9   bedrooms              18723 non-null  float64
 10  price                 18723 non-null  float64
 11  name                  18671 non-null  object 
 12  last_modified         18723 non-null  object 
 13  latitude              18723 non-null  float64
 14  longitude             18723 non-null  float64
 15  location           

In [48]:
df.sample()

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,last_modified,latitude,longitude,location
6149,6504693,1476,34000821,Entire home/apt,Amsterdam,Bos en Lommer,27,4.5,3,1.0,150.0,Cosy house with cats & garden,2017-07-22 22:58:33.894409,52.381778,4.854617,0101000020E61000002B6C06B8206B1340882AFC19DE30...


In [50]:
max(df['overall_satisfaction'].unique())

5.0

<span style="color: Green; font-weight:bold;">Changing the overall_satisfaction column from float64 to float16</span>

In [51]:
df['overall_satisfaction']=df['overall_satisfaction'].astype('float16')

In [52]:
max(df['accommodates'].unique())

17

<span style="color: Green; font-weight:bold;">Changing the accommodates column from int64 to int16</span>

In [53]:
df['accommodates']=df['accommodates'].astype('int16')

In [56]:
df['bedrooms'].value_counts()
#Can convert into int from float

bedrooms
1.0     11101
2.0      4456
3.0      1444
0.0      1154
4.0       473
5.0        62
6.0        19
10.0        5
7.0         4
8.0         3
9.0         2
Name: count, dtype: int64

<span style="color: Green; font-weight:bold;">Changing the bedrooms column from float64 to int16</span>

In [57]:
df['bedrooms']=df['bedrooms'].astype('int16')

<span style="color: Green; font-weight:bold;">Changing the price column from float64 to int32</span>

In [58]:
df['price']=df['price'].astype('int32')

In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   room_id               18723 non-null  int64  
 1   survey_id             18723 non-null  int64  
 2   host_id               18723 non-null  int64  
 3   room_type             18723 non-null  object 
 4   city                  18723 non-null  object 
 5   neighborhood          18723 non-null  object 
 6   reviews               18723 non-null  int32  
 7   overall_satisfaction  18723 non-null  float16
 8   accommodates          18723 non-null  int16  
 9   bedrooms              18723 non-null  int16  
 10  price                 18723 non-null  int32  
 11  name                  18671 non-null  object 
 12  last_modified         18723 non-null  object 
 13  latitude              18723 non-null  float64
 14  longitude             18723 non-null  float64
 15  location           

In [61]:
#segregating the last modified column
df['last_modified'][1]

'2017-07-23 13:06:23.607187'

In [63]:
df['last_modified'].value_counts()

last_modified
2017-07-23 13:06:27.391699    1
2017-07-22 17:50:54.562470    1
2017-07-22 17:50:37.799843    1
2017-07-22 17:50:37.804073    1
2017-07-22 17:50:37.808374    1
                             ..
2017-07-22 22:38:38.462314    1
2017-07-22 22:38:38.480154    1
2017-07-22 22:38:42.933102    1
2017-07-22 22:38:45.217573    1
2017-07-22 16:05:12.257054    1
Name: count, Length: 18723, dtype: int64

In [83]:
[i[11:16] for i in df['last_modified']]

['13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:06',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:05',
 '13:02',
 '12:58',
 '12:52',
 '12:30',
 '12:30',
 '12:30',
 '12:30',
 '12:29',
 '12:27',
 '12:22',
 '12:09',
 '12:08',
 '12:01',
 '11:55',
 '11:53',
 '11:52',
 '11:40',
 '11:40',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:36',
 '11:35',
 '11:25',
 '11:25',
 '11:25',
 '11:09',
 '11:06',
 '10:59',
 '10:44',
 '10:38',
 '10:31',
 '10:27',
 '10:07',
 '10:07',
 '10:06',
 '10:00',
 '09:56',
 '09:55',
 '09:48',
 '09:48',
 '09:48',
 '09:44',
 '09:41',
 '09:41',
 '09:40',
 '09:38',
 '09:36',
 '09:28',
 '09:22',
 '09:11',
 '09:09',
 '09:08',
 '08:58',
 '08:57',
 '08:44',
 '08:38',


<span style="color: Green; font-weight:bold;">splitting datatime format into individual columns</span>

In [85]:
df['last_modified_hour']=[i[11:13] for i in df['last_modified']]

In [88]:
df['last_modified_minutes']=[i[14:16] for i in df['last_modified']]

In [89]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   room_id                18723 non-null  int64  
 1   survey_id              18723 non-null  int64  
 2   host_id                18723 non-null  int64  
 3   room_type              18723 non-null  object 
 4   city                   18723 non-null  object 
 5   neighborhood           18723 non-null  object 
 6   reviews                18723 non-null  int32  
 7   overall_satisfaction   18723 non-null  float16
 8   accommodates           18723 non-null  int16  
 9   bedrooms               18723 non-null  int16  
 10  price                  18723 non-null  int32  
 11  name                   18671 non-null  object 
 12  last_modified          18723 non-null  object 
 13  latitude               18723 non-null  float64
 14  longitude              18723 non-null  float64
 15  lo

In [91]:
df['last_modified_date']=pd.to_datetime(df['last_modified_date'])

In [97]:
df['last_modified_Day']=df['last_modified_date'].dt.day

In [98]:
df['last_modified_Month']=df['last_modified_date'].dt.month

In [100]:
df['last_modified_Year']=df['last_modified_date'].dt.year

In [101]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   room_id                18723 non-null  int64         
 1   survey_id              18723 non-null  int64         
 2   host_id                18723 non-null  int64         
 3   room_type              18723 non-null  object        
 4   city                   18723 non-null  object        
 5   neighborhood           18723 non-null  object        
 6   reviews                18723 non-null  int32         
 7   overall_satisfaction   18723 non-null  float16       
 8   accommodates           18723 non-null  int16         
 9   bedrooms               18723 non-null  int16         
 10  price                  18723 non-null  int32         
 11  name                   18671 non-null  object        
 12  last_modified          18723 non-null  object        
 13  l

In [102]:
df.sample(4)

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,...,last_modified,latitude,longitude,location,last_modified_date,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year
12331,8759643,1476,45936182,Entire home/apt,Amsterdam,Oostelijk Havengebied / Indische Buurt,19,5.0,2,1,...,2017-07-22 17:51:48.495366,52.365386,4.939245,0101000020E6100000E0B9F770C9C113404835ECF7C42E...,2017-07-22,17,51,22,7,2017
13135,1952614,1476,3145131,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,17,4.5,4,2,...,2017-07-22 17:36:45.181825,52.345876,4.894144,0101000020E610000093A8177C9A931340533C2EAA452C...,2017-07-22,17,36,22,7,2017
7333,15259140,1476,55104081,Entire home/apt,Amsterdam,Westerpark,6,5.0,2,1,...,2017-07-22 22:22:07.846413,52.38361,4.878589,0101000020E61000005A80B6D5AC8313409335EA211A31...,2017-07-22,22,22,22,7,2017
7147,15364579,1476,19247187,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,2,0.0,2,1,...,2017-07-22 22:32:44.897487,52.352499,4.904872,0101000020E6100000B343FCC3969E1340B56FEEAF1E2D...,2017-07-22,22,32,22,7,2017


In [105]:
df=df.drop(['last_modified','last_modified_date'],axis=1)

In [106]:
df.sample(4)

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,location,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year
12310,5731536,1476,4760663,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,5,5.0,2,1,119,"Lovely apartment, classic building in the Pijp",52.352572,4.893783,0101000020E610000011E2CAD93B9313401FBE4C14212D...,17,57,22,7,2017
16133,1087866,1476,5979580,Private room,Amsterdam,Centrum West,43,4.5,2,1,133,Bed & Breakfast of Art,52.382568,4.888254,0101000020E61000002C2AE274928D1340813FFCFCF730...,16,33,22,7,2017
6114,1435255,1476,7672132,Entire home/apt,Amsterdam,De Baarsjes / Oud West,34,5.0,2,2,155,Light Ap. (10 mins to city centre),52.369539,4.863679,0101000020E6100000CBF78C44687413405CCCCF0D4D2F...,22,59,22,7,2017
8723,8065426,1476,41920155,Entire home/apt,Amsterdam,De Baarsjes / Oud West,14,4.5,3,2,127,Modern apartment in central location,52.363772,4.864392,0101000020E610000035B22B2D23751340AA99B514902E...,20,28,22,7,2017


In [107]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 20 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   room_id                18723 non-null  int64  
 1   survey_id              18723 non-null  int64  
 2   host_id                18723 non-null  int64  
 3   room_type              18723 non-null  object 
 4   city                   18723 non-null  object 
 5   neighborhood           18723 non-null  object 
 6   reviews                18723 non-null  int32  
 7   overall_satisfaction   18723 non-null  float16
 8   accommodates           18723 non-null  int16  
 9   bedrooms               18723 non-null  int16  
 10  price                  18723 non-null  int32  
 11  name                   18671 non-null  object 
 12  latitude               18723 non-null  float64
 13  longitude              18723 non-null  float64
 14  location               18723 non-null  object 
 15  la

In [108]:
df['last_modified_hour']=df['last_modified_hour'].astype('int16')
df['last_modified_minutes']=df['last_modified_minutes'].astype('int16')
df['last_modified_Day']=df['last_modified_Day'].astype('int16')
df['last_modified_Month']=df['last_modified_Month'].astype('int16')

In [109]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18723 entries, 0 to 18722
Data columns (total 20 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   room_id                18723 non-null  int64  
 1   survey_id              18723 non-null  int64  
 2   host_id                18723 non-null  int64  
 3   room_type              18723 non-null  object 
 4   city                   18723 non-null  object 
 5   neighborhood           18723 non-null  object 
 6   reviews                18723 non-null  int32  
 7   overall_satisfaction   18723 non-null  float16
 8   accommodates           18723 non-null  int16  
 9   bedrooms               18723 non-null  int16  
 10  price                  18723 non-null  int32  
 11  name                   18671 non-null  object 
 12  latitude               18723 non-null  float64
 13  longitude              18723 non-null  float64
 14  location               18723 non-null  object 
 15  la

In [117]:
df['host_id'].value_counts()

host_id
48703385     93
113977564    88
1464510      71
107745142    64
84453740     61
             ..
41902443      1
8305721       1
9901234       1
23343555      1
29724632      1
Name: count, Length: 15943, dtype: int64

In [118]:
df['name'].value_counts()


name
Amsterdam                                             36
Lovely apartment near Vondelpark                      10
Magnificent panoramic city view                        8
Beautiful apartment in Amsterdam                       8
Cosy apartment in Amsterdam                            8
                                                      ..
Bright and trendy apt, sunny balcony -De Pijp, RAI     1
Bright & Cozy Apartment in the Pijp                    1
NEW! Monumental Apartment In The Heart of the City     1
A great apartment in Amsterdam’s vibrant ‘de Pijp’     1
I have a room available for rent                       1
Name: count, Length: 18150, dtype: int64

In [120]:
#Handling missing values
df['name'].isna().sum()

52

In [142]:
df[df['name'].isnull()]['neighborhood'].value_counts()

neighborhood
De Baarsjes / Oud West                    13
Centrum West                               9
De Pijp / Rivierenbuurt                    7
Bos en Lommer                              5
Watergraafsmeer                            3
Centrum Oost                               3
Noord-West / Noord-Midden                  3
Oud Oost                                   3
Westerpark                                 2
Ijburg / Eiland Zeeburg                    1
Oostelijk Havengebied / Indische Buurt     1
Noord West                                 1
Slotervaart                                1
Name: count, dtype: int64

<h2><span style="color: hsl(200, 80%, 70%); font-weight:bold;">Handling NAN values

</span></h2>

In [147]:
df[df['name'].isnull()]

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,location,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year
151,17557382,1476,33583311,Entire home/apt,Amsterdam,Centrum West,2,0.0,2,0,75,,52.370532,4.881324,0101000020E610000039B874CC798613408C0FB3976D2F...,6,22,23,7,2017
241,9020907,1476,19122863,Entire home/apt,Amsterdam,De Baarsjes / Oud West,0,0.0,3,2,599,,52.361747,4.858919,0101000020E6100000F0A65B76886F13408B36C7B94D2E...,6,21,23,7,2017
355,10152802,1476,24138541,Entire home/apt,Amsterdam,Watergraafsmeer,0,0.0,6,3,359,,52.357266,4.937529,0101000020E6100000D234289A07C0134083FC6CE4BA2D...,6,2,23,7,2017
668,9808194,1476,18014889,Entire home/apt,Amsterdam,De Baarsjes / Oud West,11,5.0,4,2,324,,52.367432,4.857651,0101000020E6100000DB8651103C6E13407FC00303082F...,5,58,23,7,2017
927,11247641,1476,7023284,Entire home/apt,Amsterdam,Centrum Oost,21,4.5,4,2,359,,52.371198,4.911863,0101000020E6100000D4BA0D6ABFA51340992B836A832F...,5,54,23,7,2017
1448,11876707,1476,63326612,Entire home/apt,Amsterdam,Noord-West / Noord-Midden,9,4.5,6,4,300,,52.355063,4.845677,0101000020E6100000E6EAC726F9611340848252B4722D...,4,0,23,7,2017
1892,3771282,1476,10529642,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,6,4.5,4,2,256,,52.35492,4.896281,0101000020E6100000CE1ABCAFCA9513402159C0046E2D...,3,22,23,7,2017
1908,12965190,1476,71299676,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,18,4.5,4,2,240,,52.353727,4.900497,0101000020E61000009E95B4E21B9A1340295B24ED462D...,3,21,23,7,2017
2353,11579476,1476,14184874,Entire home/apt,Amsterdam,De Pijp / Rivierenbuurt,11,4.0,4,2,240,,52.351938,4.902831,0101000020E61000008942CBBA7F9C13401D1CEC4D0C2D...,3,15,23,7,2017
3812,12397008,1476,400638,Entire home/apt,Amsterdam,Centrum West,0,0.0,2,1,240,,52.377718,4.897159,0101000020E61000008E3F51D9B0961340268E3C105930...,3,1,23,7,2017


In [158]:
nan_values=df[df['name'].isnull()]['neighborhood'].value_counts()

Since its having 51 nan values so that we should no use traditional mode method directly 

In [216]:
df.sample(5)

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,location,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year
17373,16471741,1476,102878924,Private room,Amsterdam,Osdorp,9,3.5,2,1,66,Private Room in the house for short stay!,52.359578,4.796075,0101000020E61000001EA7E8482E2F1340382EE3A6062E...,16,11,22,7,2017
15102,13092845,1476,72689311,Private room,Amsterdam,Centrum Oost,17,4.5,4,0,300,Katja's City Apartment,52.360532,4.892451,0101000020E61000004721C9ACDE911340AB949EE9252E...,16,38,22,7,2017
9108,13150862,1476,73357312,Entire home/apt,Amsterdam,De Baarsjes / Oud West,10,5.0,2,1,132,A lovely bright cozy design home,52.36506,4.861936,0101000020E6100000F0FACC599F721340F4893C49BA2E...,20,13,22,7,2017
4278,3419746,1476,8949635,Entire home/apt,Amsterdam,De Baarsjes / Oud West,14,5.0,4,2,192,"Light spacious apt, child-friendly",52.372323,4.860108,0101000020E6100000A41CCC26C07013403946B247A82F...,2,36,23,7,2017
17122,7299294,1476,10449806,Private room,Amsterdam,Buitenveldert / Zuidas,28,4.5,2,1,102,Private room near Amsterdam Rai,52.325462,4.887639,0101000020E6100000D89DEE3CF18C13409E0B23BDA829...,16,18,22,7,2017


In [223]:
df['name'].value_counts()

name
Amsterdam                                             36
Lovely apartment near Vondelpark                      10
Magnificent panoramic city view                        8
Beautiful apartment in Amsterdam                       8
Cosy apartment in Amsterdam                            8
                                                      ..
Bright and trendy apt, sunny balcony -De Pijp, RAI     1
Bright & Cozy Apartment in the Pijp                    1
NEW! Monumental Apartment In The Heart of the City     1
A great apartment in Amsterdam’s vibrant ‘de Pijp’     1
I have a room available for rent                       1
Name: count, Length: 18150, dtype: int64

In [220]:
df[df[['latitude','longitude']].duplicated()]

Unnamed: 0,room_id,survey_id,host_id,room_type,city,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,location,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year


In [231]:
df[df['neighborhood']=='Centrum West'][['name','latitude','longitude']]

Unnamed: 0,name,latitude,longitude
1,Sunny and Cozy Living room in quite neighbours,52.378518,4.896120
3,Canal boat RIDE in Amsterdam,52.376319,4.890028
9,"CANAL BOATTOUR AMSTERDAM covered boat 1,5 hour",52.386610,4.890128
35,Prinsengracht Appartement,52.373078,4.884269
37,Great 4 bedroom apartment at the flower market,52.367890,4.889273
...,...,...,...
18596,Bright and cozy room in the Jordaan,52.375517,4.881505
18653,Seventies boatstay in the middle of Old Amsterdam,52.376518,4.887572
18671,Single zolderkamer,52.367228,4.891235
18721,City Center studio in Touristic Amsterdam 1,52.372120,4.890982




Here mode can be applied unofficially i.e., mode is applied on name column in which we take the frquently occured hotel names in individual neighborhoods and we can replace it with nan values in name column

In [51]:
k=[]
for i in neighborhood_names_in_nanvalues.index:
    p=df[df['neighborhood']==i]['name'].value_counts().sort_values(ascending=False)
    print(f'for neighborhood - {i} : {p.index[0]} and  (count {p[0]})')
    k.append(p.index[0])
    print()
    

for neighborhood - De Baarsjes / Oud West : Lovely apartment near Vondelpark and  (count 7)
for neighborhood - Centrum West : Hotel in the heart of Amsterdam 2p and  (count 5)
for neighborhood - De Pijp / Rivierenbuurt : Amsterdam and  (count 5)
for neighborhood - Bos en Lommer : Amsterdam and  (count 3)
for neighborhood - Watergraafsmeer : Amsterdam and  (count 4)
for neighborhood - Centrum Oost : Light, brand new canal house studio and  (count 2)
for neighborhood - Noord-West / Noord-Midden : Lovely apartment near Vondelpark and  (count 3)
for neighborhood - Oud Oost : Amsterdam and  (count 3)
for neighborhood - Westerpark : Magnificent panoramic city view and  (count 8)
for neighborhood - Ijburg / Eiland Zeeburg : Family home with parking and  (count 2)
for neighborhood - Oostelijk Havengebied / Indische Buurt : Beautiful designed apartment close to City Centre and  (count 3)
for neighborhood - Noord West : Central and rustic situated B&B and  (count 2)
for neighborhood - Slotervaar

In [60]:
neighborhood_names_in_nanvalues=list(neighborhood_names_in_nanvalues.index)

In [13]:
df_null_indexes=df[df['name'].isnull()].index

In [20]:
df_null_indexes=list(df_null_indexes)

In [77]:
df1=df['name'].copy()

In [81]:
for i in df_null_indexes:
    q=list(df[i:i+1]['neighborhood'])
    if q[0] in neighborhood_names_in_nanvalues:
        for j in neighborhood_names_in_nanvalues:
            if q[0]==j:
                print(neighborhood_names_in_nanvalues.index(j),j,':',k[neighborhood_names_in_nanvalues.index(j)])
                df1[i]=k[neighborhood_names_in_nanvalues.index(j)]
                

1 Centrum West : Hotel in the heart of Amsterdam 2p
0 De Baarsjes / Oud West : Lovely apartment near Vondelpark
4 Watergraafsmeer : Amsterdam
0 De Baarsjes / Oud West : Lovely apartment near Vondelpark
5 Centrum Oost : Light, brand new canal house studio
6 Noord-West / Noord-Midden : Lovely apartment near Vondelpark
2 De Pijp / Rivierenbuurt : Amsterdam
2 De Pijp / Rivierenbuurt : Amsterdam
2 De Pijp / Rivierenbuurt : Amsterdam
1 Centrum West : Hotel in the heart of Amsterdam 2p
2 De Pijp / Rivierenbuurt : Amsterdam
6 Noord-West / Noord-Midden : Lovely apartment near Vondelpark
0 De Baarsjes / Oud West : Lovely apartment near Vondelpark
2 De Pijp / Rivierenbuurt : Amsterdam
1 Centrum West : Hotel in the heart of Amsterdam 2p
7 Oud Oost : Amsterdam
9 Ijburg / Eiland Zeeburg : Family home with parking
0 De Baarsjes / Oud West : Lovely apartment near Vondelpark
2 De Pijp / Rivierenbuurt : Amsterdam
5 Centrum Oost : Light, brand new canal house studio
1 Centrum West : Hotel in the heart of

In [93]:
df['name']=df1

In [5]:
df.head()

Unnamed: 0,room_id,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year
0,10176931,49180562,Shared room,De Pijp / Rivierenbuurt,7,4.5,2,1,156,Red Light/ Canal view apartment (Shared),52.356209,4.887491,13,6,23,7,2017
1,8935871,46718394,Shared room,Centrum West,45,4.5,4,1,126,Sunny and Cozy Living room in quite neighbours,52.378518,4.89612,13,6,23,7,2017
2,14011697,10346595,Shared room,Watergraafsmeer,1,0.0,3,1,132,Amsterdam,52.338811,4.943592,13,6,23,7,2017
3,6137978,8685430,Shared room,Centrum West,7,5.0,4,1,121,Canal boat RIDE in Amsterdam,52.376319,4.890028,13,6,23,7,2017
4,18630616,70191803,Shared room,De Baarsjes / Oud West,1,0.0,2,1,93,One room for rent in a three room appartment,52.370384,4.852873,13,6,23,7,2017


Here we're going to add one column to the dataframe 

Column name is Earned which is multiplication of two columns values review column and price column 

In [8]:
(df['reviews'][0])*(df['price'][0])

1092

In [11]:
k=[]
for i in range(df.shape[0]):
    a=(df['reviews'][i])*(df['price'][i])
    k.append(a)

In [12]:
len(k)

18723

In [13]:
df.shape[0]

18723

In [14]:
df['Earned']=k

In [15]:
df.head()

Unnamed: 0,room_id,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,name,latitude,longitude,last_modified_hour,last_modified_minutes,last_modified_Day,last_modified_Month,last_modified_Year,Earned
0,10176931,49180562,Shared room,De Pijp / Rivierenbuurt,7,4.5,2,1,156,Red Light/ Canal view apartment (Shared),52.356209,4.887491,13,6,23,7,2017,1092
1,8935871,46718394,Shared room,Centrum West,45,4.5,4,1,126,Sunny and Cozy Living room in quite neighbours,52.378518,4.89612,13,6,23,7,2017,5670
2,14011697,10346595,Shared room,Watergraafsmeer,1,0.0,3,1,132,Amsterdam,52.338811,4.943592,13,6,23,7,2017,132
3,6137978,8685430,Shared room,Centrum West,7,5.0,4,1,121,Canal boat RIDE in Amsterdam,52.376319,4.890028,13,6,23,7,2017,847
4,18630616,70191803,Shared room,De Baarsjes / Oud West,1,0.0,2,1,93,One room for rent in a three room appartment,52.370384,4.852873,13,6,23,7,2017,93


In [None]:
df.to_csv('airbnb_cleaned_dataset.csv')