# Analysis of AirBnb New York

### Content
+ Introduction: Airbnb New York
+ Data description and objectives
+ Data manipulation: cleaning and shaping
+ Data scraping


## 1. Introduction: Airbnb New York

Airbnb is the world's largest accommodation booking service. Apart from apartments, Airbnb's list of offerings includes rare residential properties such as castles, estates, yachts, horse ranches, trailers, and even private islands. The Internet site pays great attention to the comfort and safety of travel, therefore, a reliable payment system, a round-the-clock support service and a system of reviews about each host and guest have been created. Since 2008, more than 25 million travelers have used Airbnb's services, sharing the philosophy of the sharing economy - sharing goods and services based on human relationships and trust.


The company was founded by Brian Chesky, Joe Gebbia, and Nathan Blecharczyk in San Francisco during 2008 where they became one of the first to peer-to-peer services that specialized in housing accommodations. The founders were traveling to a conference in 2007 but couldn’t pay for their housing, so two of the founders decided to rent out part of their apartments in order to help pay for the cost of the trip. This sparked their idea. The founders wanted to change the way that people thought about travel. In 2009 they partnered with Y Combinator and expanded its limited offerings. They continued their expansion and capital raising efforts to eventually grow its operations internationally by acquiring Accoleo. Airbnb now has operations in 191 countries and is able to rent out a room, a home, or even a castle for a night or longer.


Airbnb is located in over 191 countries and continues to expand internationally. They have even been able to establish operations in Cuba and other countries where its legality might be in question. By doing this they have created a niche that caters to travelers; including: high spenders, penny pincers and everyone in between.

## 2. Data description and objectives

Airbnb has a lot of values that can be analyzed. 
Here is shown data which will be used for my analysis:
+ Id - unique code
+ Name - name and short description about apartments for rent
+ Host_id - unique value of apartment's owner
+ Host_name - owner's name
+ Neighbourhood_group - New York regions
+ Neighbourhood - name of the area inside of neighbourhood_group
+ Latitude - geographic latitude/location(coordinates)
+ Longtitude - geographic longtitude/location(coordinates)
+ Room_type - type of apartments for the rent
+ Price - price for different types of room
+ Minimum_nights - minimum amount of nights for staying at this apartments
+ Number_of_reviews - total amount of comments/feedback about apartments
+ Last_review - the date of last review on the web-site about particular room
+ Reviews_per_month - the mean of reviews about room per month
+ Calculated_host_listings_count - amount of listing per host
+ Availability_365 - number of days when listing is available for booking

# Questions

1. Hosts with a little amount of reviews
2. Which region is the most expensive in New York?
3. Which area in a New York regions is the most expensive?
4. Which room_type is the cheapest one?
5. Which host is the more preferres?

# Data cleaning and shaping

In [1]:
#libraries
import pandas as pd    #processing data, mostly needed to read csv file using pd.read_csv()             
from bs4 import BeautifulSoup      #represents the document as a nested data structure, this library mostly 
#used for pulling data out of html page

In [23]:
df = pd.read_csv('IDA_project/AB_NYC_2019.csv') 
df
#read csv file and show it

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2


In [24]:
df = pd.read_csv('IDA_project/AB_NYC_2019.csv') 
df = df.dropna(axis = 0, how ='any') 
df
#to analyze and drop rows/columns with Null values.


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
5,5099,Large Cozy 1 BR Apartment In Midtown East,7322,Chris,Manhattan,Murray Hill,40.74767,-73.97500,Entire home/apt,200,3,74,2019-06-22,0.59,1,129
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48782,36425863,Lovely Privet Bedroom with Privet Restroom,83554966,Rusaa,Manhattan,Upper East Side,40.78099,-73.95366,Private room,129,1,1,2019-07-07,1.00,1,147
48790,36427429,No.2 with queen size bed,257683179,H Ai,Queens,Flushing,40.75104,-73.81459,Private room,45,1,1,2019-07-07,1.00,6,339
48799,36438336,Seas The Moment,211644523,Ben,Staten Island,Great Kills,40.54179,-74.14275,Private room,235,1,1,2019-07-07,1.00,1,87
48805,36442252,1B-1B apartment near by Metro,273841667,Blaine,Bronx,Mott Haven,40.80787,-73.92400,Entire home/apt,100,1,2,2019-07-07,2.00,1,40


In [25]:
df = df.drop(df.duplicated()[df.duplicated()].index.values) 
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
5,5099,Large Cozy 1 BR Apartment In Midtown East,7322,Chris,Manhattan,Murray Hill,40.74767,-73.97500,Entire home/apt,200,3,74,2019-06-22,0.59,1,129
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48782,36425863,Lovely Privet Bedroom with Privet Restroom,83554966,Rusaa,Manhattan,Upper East Side,40.78099,-73.95366,Private room,129,1,1,2019-07-07,1.00,1,147
48790,36427429,No.2 with queen size bed,257683179,H Ai,Queens,Flushing,40.75104,-73.81459,Private room,45,1,1,2019-07-07,1.00,6,339
48799,36438336,Seas The Moment,211644523,Ben,Staten Island,Great Kills,40.54179,-74.14275,Private room,235,1,1,2019-07-07,1.00,1,87
48805,36442252,1B-1B apartment near by Metro,273841667,Blaine,Bronx,Mott Haven,40.80787,-73.92400,Entire home/apt,100,1,2,2019-07-07,2.00,1,40


In [26]:
df.describe() #show count, mean, std, min,max in each column


Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0
mean,18100810.0,64245820.0,40.728129,-73.951149,142.332526,5.86922,29.290255,1.373229,5.166611,114.886299
std,10693720.0,75897520.0,0.054991,0.046693,196.994756,17.389026,48.1829,1.680328,26.302954,129.52995
min,2539.0,2438.0,40.50641,-74.24442,0.0,1.0,1.0,0.01,1.0,0.0
25%,8721444.0,7029525.0,40.68864,-73.98246,69.0,1.0,3.0,0.19,1.0,0.0
50%,18872860.0,28370920.0,40.72171,-73.95481,101.0,2.0,9.0,0.72,1.0,55.0
75%,27567460.0,101890500.0,40.76299,-73.93502,170.0,4.0,33.0,2.02,2.0,229.0
max,36455810.0,273841700.0,40.91306,-73.71299,10000.0,1250.0,629.0,58.5,327.0,365.0


In [27]:
df.isnull().sum().sort_values(ascending = False) # check missing data


availability_365                  0
calculated_host_listings_count    0
reviews_per_month                 0
last_review                       0
number_of_reviews                 0
minimum_nights                    0
price                             0
room_type                         0
longitude                         0
latitude                          0
neighbourhood                     0
neighbourhood_group               0
host_name                         0
host_id                           0
name                              0
id                                0
dtype: int64

# Data Scraping

In [7]:
import pandas as pd        
import requests
from bs4 import BeautifulSoup

In [16]:
text1 = []
text2 = []
text3 = []

for x in range(0, 1000, 20):
    req = requests.get("https://ru.airbnb.com/s/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&search_type=pagination&federated_search_session_id=85f2a593-69e1-4907-b0f4-2c5b32e28794&items_offset=" + str(x) + "&section_offset=7")
    soup = BeautifulSoup(req.content,'lxml')
    textof1 = soup.find_all("div", class_="_qf0valo")
    textof2 = soup.find_all("ol", class_="_194e2vt2")
    textof3 = soup.find_all("span", class_="_1p7iugi")
    for i in range(0, 20):
        text1.append(textof1[i].text)
        text2.append(textof2[i].text.split("· ")[1])
        text3.append(textof3[i].text.replace(u'Цена:$\xa0', u''))
    print(str(x) + " PAGE")    
    
dictionary = {"name":text1, 'location':text2, "cost":text3}
ds = pd.DataFrame(dictionary)
ds.to_csv('IDA_project/scrapZerdeli.csv')

0 PAGE
20 PAGE
40 PAGE
60 PAGE
80 PAGE
100 PAGE
120 PAGE
140 PAGE
160 PAGE
180 PAGE
200 PAGE
220 PAGE
240 PAGE
260 PAGE
280 PAGE
300 PAGE
320 PAGE
340 PAGE
360 PAGE
380 PAGE
400 PAGE
420 PAGE
440 PAGE
460 PAGE
480 PAGE
500 PAGE
520 PAGE
540 PAGE
560 PAGE
580 PAGE
600 PAGE
620 PAGE
640 PAGE
660 PAGE
680 PAGE
700 PAGE
720 PAGE
740 PAGE
760 PAGE
780 PAGE
800 PAGE
820 PAGE
840 PAGE
860 PAGE
880 PAGE
900 PAGE
920 PAGE
940 PAGE
960 PAGE
980 PAGE


In [20]:
ds = pd.read_csv('IDA_project/scrapZerdeli.csv') 
ds

Unnamed: 0.1,Unnamed: 0,name,location,cost
0,0,"One Bedroom, Cheap Pricing",Уиллистон,12
1,1,Belga Apartments,Ulcinj,19
2,2,LOFT古运河畔温馨的家,Yangzhou,23
3,3,Venice Every Day 3,Венеция,17
4,4,"Budget double room in London near Regents Park,",Большой Лондон,27
...,...,...,...,...
995,995,Loft nuevo a sólo 5 minutos caminando del mar!,Mazatlán,26
996,996,"Hermoso apartamento super ubicado,",Канкун,21
997,997,【悦曼民宿】1号房ins北欧风-万达商圈-市中心-凯德广场-飞机场-火车站-小区楼下美食街-...,Mianyang,22
998,998,Brand new studion flat London NW2,Большой Лондон,52


In [21]:
ds = ds.dropna(axis = 0, how ='any') 
ds

Unnamed: 0.1,Unnamed: 0,name,location,cost
0,0,"One Bedroom, Cheap Pricing",Уиллистон,12
1,1,Belga Apartments,Ulcinj,19
2,2,LOFT古运河畔温馨的家,Yangzhou,23
3,3,Venice Every Day 3,Венеция,17
4,4,"Budget double room in London near Regents Park,",Большой Лондон,27
...,...,...,...,...
995,995,Loft nuevo a sólo 5 minutos caminando del mar!,Mazatlán,26
996,996,"Hermoso apartamento super ubicado,",Канкун,21
997,997,【悦曼民宿】1号房ins北欧风-万达商圈-市中心-凯德广场-飞机场-火车站-小区楼下美食街-...,Mianyang,22
998,998,Brand new studion flat London NW2,Большой Лондон,52


In [22]:
ds = ds.drop(ds.duplicated()[ds.duplicated()].index.values) 
ds

Unnamed: 0.1,Unnamed: 0,name,location,cost
0,0,"One Bedroom, Cheap Pricing",Уиллистон,12
1,1,Belga Apartments,Ulcinj,19
2,2,LOFT古运河畔温馨的家,Yangzhou,23
3,3,Venice Every Day 3,Венеция,17
4,4,"Budget double room in London near Regents Park,",Большой Лондон,27
...,...,...,...,...
995,995,Loft nuevo a sólo 5 minutos caminando del mar!,Mazatlán,26
996,996,"Hermoso apartamento super ubicado,",Канкун,21
997,997,【悦曼民宿】1号房ins北欧风-万达商圈-市中心-凯德广场-飞机场-火车站-小区楼下美食街-...,Mianyang,22
998,998,Brand new studion flat London NW2,Большой Лондон,52
