# AirBnB Analytics Exploratory Analysis task
by Chuck



***

**Introduction**

This notebook will accomplish the following task:

**Overall goal**:

You're recently got hired by a company as a data analyst and you've been told by your boss to conduct an analysis on the following dataset and to present your results visually to the team in the next meeting. You need to analyse the data set to understand this problem and propose data-driven solutions.

**Section 01: Exploratory Data Analysis**

* Are there any null values or outliers? if so delete them or fill in the null values
* Is there any data that we don't need?
* Create a summary of the dataset

**Section 02: Further Data Analysis**

Please focus on a specific area such as Neighborhood/Price/Reviews and Overall Satifactions to see any patterns or correlations. Also Answers the following questions

* What is the most popular Airbnb location in Amsterdam
* What is the overall satifaction
* Which neighborhood has the best overall satifaction?
* Average price of the house/apartment in Amsterdam
* Total number of accommodations?

Add more if required

Data Visualization

Please plot and visualize your results



**IMPORTING RELEVANT DATASETS AND LIBRARIES**

In [None]:
%matplotlib inline

import glob
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns


from cycler import cycler
plt.style.use('ggplot')
data = np.random.randn(50)

In [None]:
print(plt.style.available)

In [None]:
df = pd.read_csv("../input/amsterdam-airbnb-2017/airbnb.csv")
df.head(10)

Let's look through the data and see data we can drop. For example, we're looking for data that is irrelevant to our analysis and data column that has 'NaN'

In [None]:
#These Columns are irrelevant to our analysis, so we need to drop them
#Also dropping 'shared room' as the data in that category is not enough for analysis
df.drop(['country', 'borough', 'bathrooms', 'minstay'], axis=1, inplace=True)
df = df[df.room_type != 'Shared room']

In [None]:
df.shape

In [None]:
#Let's check what columns we have now
df.columns

In [None]:
#Checking the data types
df.dtypes

In [None]:
#calculating some statistical data
df.describe()

Ok, now we understand the dataset a bit more, let's create a bried insight of our data

**Brief Insight**

In [None]:
print("Number of properties:")
print(len(df["room_id"]))

print("")
print("Number of unique host:")
print(len(df["host_id"].unique()))


print("")
print("Number of Room Type:")
print(len(df["room_type"].unique()))


print("")
print("Number of Borough:")
print(len(df["neighborhood"].unique()))


print("")
print("Average Price for All Amsterdam:")
print(round(df.price.mean(),2))


print("")
print("Maximum Price for All Amsterdam:")
print(round(df.price.max(),2))


print("")
print("Minimum Price for All Amsterdam:")
print(round(df.price.min(),2))


print("")
print("Number of 0 (Zero) Price:")
print(len(df[df["price"]==0]))

print("")
print("Average Number of Revies for All Amsterdam:")
print(round(df.reviews.mean(),2))

In [None]:

#Deleting rows that number of reviews is equal zero

df = df[(df["reviews"]>0)]
len(df)

In [None]:
df.hist(figsize = (16,9), color='#32a88d')
plt.show()

Now let's focus on a paritcular section of our dataset and analyse in more depth! How about answering the question: 
**What is the most popular Airbnb location in Amsterdam**

In [None]:
nb = df.value_counts('neighborhood')

n = nb.plot(kind='bar',figsize=(15,8), color='#ff0000')
plt.xlabel('Neighborhood')
plt.ylabel('Number of Properties')
plt.title('Most popular AirBnB location in Amsterdam', fontsize=24)
for p in n.patches:
        n.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.4, p.get_height()),
                    ha='center', va='bottom',weight="bold",
                    color= 'black')
plt.show()

#1 = De Baarsjes

In [None]:
bc = df.groupby("neighborhood").survey_id.count().sort_values(ascending=False)
bc = bc.reset_index()
bc.rename(columns={"survey_id":"count"}, inplace=True)
bc["percentage"]=round(bc["count"]/bc["count"].sum()*100,2)
bc

In [None]:
labels = 'De Baarsjes/ Oud West','De Pijp/ Rivierenbuurt','Centrum West', 'Centrum Oost', 'Westernpark', 'Noord-West/Midden', 'Oud Oost','Other'
sizes = [17.99, 12.74, 12.14, 9.25, 7.89, 7.25, 6.34, 26.4]
explode = (0.1,0,0,0,0,0,0,0) 

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')
plt.title('Count of AirBnB location in Amsterdam', fontsize=15)

plt.show()

In [None]:
import geopandas as gpd
from shapely.geometry import Point, Polygon

BBox = (df.longitude.min(), df.longitude.max(),
         df.latitude.min(), df.latitude.max())

ruh_m = plt.imread("../input/map-airbnb/map.png")

fig, ax = plt.subplots(figsize = (15,9))
ax.scatter(df.longitude, df.latitude, zorder=1, alpha= 0.1, c='b', s=10)
ax.set_title('Popular AirBnB area in Amsterdam')
ax.set_xlim(BBox[0],BBox[1])
ax.set_ylim(BBox[2],BBox[3])
ax.imshow(ruh_m, zorder=0, extent = BBox, aspect= 'equal')

#West side of Amsterdam is the most popular area to stay at AirBnB

Awesome! Let's look at the **overall satifaction**

In [None]:
happiness = df.value_counts('overall_satisfaction')
plt.figure(figsize=(6,4))
ax = sns.barplot(happiness.index, happiness.values, alpha=0.8)
plt.title('Overall Satisfaction', fontsize=18)
plt.ylabel('Number of rating', fontsize=12)
plt.xlabel('Rating', fontsize=12)
plt.xticks(rotation=0)

for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.4, p.get_height()),
                    ha='center', va='bottom',weight="bold",
                    color= 'black')
plt.show();

In [None]:
bos = df.groupby("neighborhood").overall_satisfaction.sum().sort_values(ascending=False)
bos = bos.reset_index()
bos["percentage"] = round(bos["overall_satisfaction"] / bos["overall_satisfaction"].sum()*100,2)
bos

In [None]:
labels = 'De Baarsjes/ Oud West', 'Centrum West', 'De Pijp/ Rivierenbuurt', 'Centrum Oost', 'Westernpark', 'Noord-West/Midden', 'Oud Oost','Other'
sizes = [18.05, 12.62, 12.59, 9.58, 8.13, 7.01, 6.31, 25.71]
explode = (0.1,0,0,0,0,0,0,0) 

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')
plt.title('Overall Satisfaction in different Borough of Amseterdam', fontsize=15)

plt.show()

Answeing: **Average price of the house/apartment in Amsterdam**

In [None]:
av = df.groupby('neighborhood').mean()['price'].sum()/23
av

In [None]:
br = df.groupby("neighborhood").price.mean().sort_values(ascending=False)
br = br.reset_index()
br

In [None]:
npp = df.groupby('neighborhood').mean()['price'].sort_values(ascending=False)
av = df.groupby('neighborhood').mean()['price'].sum()/23

plt.figure(figsize=(20,10))
ax = sns.barplot(npp.index, npp.values, alpha=0.8,)
plt.title('Average House/Apt price of Amsterdam neighborhood', fontsize=18)
plt.ylabel('Houae/Apt Price', fontsize=12)
plt.xlabel('Neighborhood', fontsize=12)
plt.xticks(rotation=90)
for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.4, p.get_height()),
                    ha='center', va='bottom',weight="bold",
                    color= 'black')

x_coordinates = [0,23]
y_median = [(av,2), (av,2)]
        
plt.plot(x_coordinates, y_median,linestyle = '--', c="gray")
plt.text(17.5,149,'Average Price / Night = £142,48',fontsize = 13,backgroundcolor = 'gray',color = 'white')

plt.show();

Answering: **Total number of accommodations**

In [None]:
ba = df.groupby('neighborhood').accommodates.sum().sort_values(ascending=False)
ba = ba.reset_index()
ba

In [None]:
ba = df.groupby('neighborhood').accommodates.sum().sort_values(ascending=False)
ax = ba.plot(kind='bar',figsize=(13,8), color='#407294')
plt.xlabel('Locations')
plt.ylabel('Number of Accommodations')
plt.title('Total number of Accommodations', fontsize=18)

for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.4, p.get_height()),
                    ha='center', va='bottom',weight="normal", rotation=45,
                    color= 'black')
plt.show()

# Statistical analysis

In [None]:
sns.boxplot(x=df['price'])

In [None]:
cor = df.drop(['latitude','longitude','survey_id'], axis=1)
sns.heatmap(cor.corr())

In [None]:

import seaborn as sns; sns.set()
from matplotlib.pyplot import figure

figure(num=None, figsize=(20, 12), dpi=80, facecolor='w', edgecolor='k')
sns.violinplot(y='price',x='neighborhood',data=df[df.price < df['price'].quantile(.98)],)

plt.xticks(rotation=90)

plt.show()

In [None]:
figure(num=None, figsize=(16, 9), dpi=80, facecolor='w', edgecolor='k')
#cmap = sns.cubehelix_palette(as_cmap=True)
sns.scatterplot(x='reviews',y='price',data=df,alpha=0.5,\
                hue='room_type',\
                #palette=cmap,\
                legend="full")

plt.ylim(0, 2000)
#plt.xlim(-10, 400)