# **Projeto 3 - Ciência dos dados**
______

## Análise dos Airbnb nas maiores cidades americanas

**Integrantes:** Bruno Kaczelnik, Guilherme Lotaif, Renato Tajima e Thiago Verardo.

**Fonte do dataset:** www.kaggle.com/rudymizrahi/airbnb-listings-in-major-us-cities-deloitte-ml

### Objetivo:
___

In [5]:
#Importando bibliotecas:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm,probplot
plt.style.use('ggplot')

from sklearn.model_selection import train_test_split

___
### Importando o dataset:

In [6]:
#Importado o arquivo:
df = pd.read_csv('train.csv')

In [7]:
#Análisando o tamanho do dataframe:
linhas, colunas = df.shape
print("O Dataframe possui {0} linhas por {1} colunas.".format(linhas, colunas))

O Dataframe possui 74111 linhas por 29 colunas.


#### Limpando o dataframe para ser análisado:
Esta etapa consiste em uma preparação do dataframe para facilitar a análise no mesmo, e evitar a ocorrência de complicações ou erros que atrapalhem nossso classificador. Portanto iremos limpar os titulos de cada coluna e remover os valores nulos de cada categoria

In [8]:
#Removendo os espaços em branco dos nomes das colunas:
df.columns = [espaços.strip() for espaços in df.columns.tolist()]

In [9]:
#Apagando valores nulos que podem causar problemas posteriores:
df = df.dropna(axis=0, subset=['bathrooms','first_review','host_has_profile_pic','host_identity_verified',
                               'host_response_rate','host_since','last_review','neighbourhood','review_scores_rating',
                               'thumbnail_url','thumbnail_url','zipcode','bedrooms','beds'])

In [10]:
df.sample(2)

Unnamed: 0,id,log_price,property_type,room_type,amenities,accommodates,bathrooms,bed_type,cancellation_policy,cleaning_fee,...,latitude,longitude,name,neighbourhood,number_of_reviews,review_scores_rating,thumbnail_url,zipcode,bedrooms,beds
67293,15287794,3.688879,Apartment,Private room,"{TV,Internet,""Wireless Internet"",""Air conditio...",1,1.0,Real Bed,strict,True,...,40.700764,-73.922819,"Great Room, Great Hosts, Great Hood",Bushwick,84,97.0,https://a0.muscache.com/im/pictures/20600188/b...,11237,1.0,1.0
32146,994884,5.686975,Apartment,Entire home/apt,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",3,1.0,Real Bed,moderate,True,...,40.733165,-73.998126,"Stunning 1 Br, West Village Luxury w/ great views",Greenwich Village,7,97.0,https://a0.muscache.com/im/pictures/d8b2fd87-5...,10011,1.0,1.0


#### Transformando categorias qualiativas em quantiativas:

O método usado para efetuar essa conversão é chamado de **One Hot Encoding**, que transforma variaveis categóricas em vetores binarios. Tal método transforma todas as variaveis em 0 menos a do item analisado, desse modo permitindo que sejam feitam análises em cima desse dados.

In [15]:
#Função para chamar o get_dummies e remover a coluna base:
def dummify(data, column_name):
    df1 = data.copy()
    df2 = pd.concat([df1.drop(column_name, axis=1), pd.get_dummies(data[column_name], prefix=column_name)], axis=1)
    return df2

# Análise exploratória

Após realizarmos o filtro, deve-se realizar uma análise exploratória dos dados, com o  objetivo de achar as váriaveis que mais influenciam no nosso objetivo e que assim possam nos ajudar a prever qual será a avaliação de um hotel aleatório. Ela será feita com o auxílio do pandas_profiling e seaborn.


In [16]:
#Importando as bibiotecas
import pandas_profiling
import seaborn as sns

In [18]:
#utilizandoo o pandas_profiling
#df é o dataframe após o filtro
pandas_profiling.ProfileReport(df)

0,1
Number of variables,30
Number of observations,38502
Total Missing (%),0.0%
Total size in memory,8.6 MiB
Average record size in memory,233.0 B

0,1
Numeric,11
Categorical,18
Boolean,1
Date,0
Text (Unique),0
Rejected,0
Unsupported,0

0,1
Distinct count,16
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.2338
Minimum,1
Maximum,16
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,2
Median,2
Q3,4
95-th percentile,8
Maximum,16
Range,15
Interquartile range,2

0,1
Standard deviation,2.1377
Coef of variation,0.66105
Kurtosis,7.0113
Mean,3.2338
MAD,1.5774
Skewness,2.1697
Sum,124506
Variance,4.5697
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
2,16896,43.9%,
4,6545,17.0%,
3,4217,11.0%,
1,3821,9.9%,
6,2748,7.1%,
5,1810,4.7%,
8,980,2.5%,
7,538,1.4%,
10,367,1.0%,
9,147,0.4%,

Value,Count,Frequency (%),Unnamed: 3
1,3821,9.9%,
2,16896,43.9%,
3,4217,11.0%,
4,6545,17.0%,
5,1810,4.7%,

Value,Count,Frequency (%),Unnamed: 3
12,136,0.4%,
13,20,0.1%,
14,64,0.2%,
15,26,0.1%,
16,138,0.4%,

0,1
Distinct count,36204
Unique (%),94.0%
Missing (%),0.0%
Missing (n),0

0,1
{},58
"{""translation missing: en.hosting_amenity_49"",""translation missing: en.hosting_amenity_50""}",27
"{""Family/kid friendly""}",17
Other values (36201),38400

Value,Count,Frequency (%),Unnamed: 3
{},58,0.2%,
"{""translation missing: en.hosting_amenity_49"",""translation missing: en.hosting_amenity_50""}",27,0.1%,
"{""Family/kid friendly""}",17,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,""Buzzer/wireless intercom"",Heating,""Family/kid friendly"",""Smoke detector"",""Carbon monoxide detector"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",15,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,""Free parking on premises"",Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",14,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,""Free parking on premises"",Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",""First aid kit"",""Safety card"",""Fire extinguisher"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",14,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",""First aid kit"",""Safety card"",""Fire extinguisher"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",14,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",""Fire extinguisher"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",13,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,""Free parking on premises"",Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",""First aid kit"",""Fire extinguisher"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",10,0.0%,
"{TV,""Cable TV"",Internet,""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Washer,Dryer,""Smoke detector"",""Carbon monoxide detector"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace""}",10,0.0%,

0,1
Distinct count,17
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.2152
Minimum,0
Maximum,8
Zeros (%),0.3%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,1
95-th percentile,2
Maximum,8
Range,8
Interquartile range,0

0,1
Standard deviation,0.55135
Coef of variation,0.45371
Kurtosis,25.427
Mean,1.2152
MAD,0.35321
Skewness,3.8981
Sum,46788
Variance,0.30399
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
1.0,30751,79.9%,
2.0,3922,10.2%,
1.5,2014,5.2%,
2.5,673,1.7%,
3.0,513,1.3%,
3.5,176,0.5%,
4.0,132,0.3%,
0.0,101,0.3%,
0.5,83,0.2%,
4.5,45,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,101,0.3%,
0.5,83,0.2%,
1.0,30751,79.9%,
1.5,2014,5.2%,
2.0,3922,10.2%,

Value,Count,Frequency (%),Unnamed: 3
6.0,10,0.0%,
6.5,4,0.0%,
7.0,3,0.0%,
7.5,2,0.0%,
8.0,22,0.1%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Real Bed,37405
Futon,434
Pull-out Sofa,342
Other values (2),321

Value,Count,Frequency (%),Unnamed: 3
Real Bed,37405,97.2%,
Futon,434,1.1%,
Pull-out Sofa,342,0.9%,
Airbed,213,0.6%,
Couch,108,0.3%,

0,1
Distinct count,11
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.2524
Minimum,0
Maximum,10
Zeros (%),9.5%

0,1
Minimum,0
5-th percentile,0
Q1,1
Median,1
Q3,1
95-th percentile,3
Maximum,10
Range,10
Interquartile range,0

0,1
Standard deviation,0.83648
Coef of variation,0.6679
Kurtosis,7.1984
Mean,1.2524
MAD,0.57581
Skewness,1.9121
Sum,48220
Variance,0.6997
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
1.0,25767,66.9%,
2.0,6036,15.7%,
0.0,3658,9.5%,
3.0,2172,5.6%,
4.0,618,1.6%,
5.0,171,0.4%,
6.0,45,0.1%,
7.0,22,0.1%,
8.0,7,0.0%,
10.0,4,0.0%,

Value,Count,Frequency (%),Unnamed: 3
0.0,3658,9.5%,
1.0,25767,66.9%,
2.0,6036,15.7%,
3.0,2172,5.6%,
4.0,618,1.6%,

Value,Count,Frequency (%),Unnamed: 3
6.0,45,0.1%,
7.0,22,0.1%,
8.0,7,0.0%,
9.0,2,0.0%,
10.0,4,0.0%,

0,1
Distinct count,18
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7593
Minimum,0
Maximum,18
Zeros (%),0.0%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,2
95-th percentile,4
Maximum,18
Range,18
Interquartile range,1

0,1
Standard deviation,1.2879
Coef of variation,0.73207
Kurtosis,18.289
Mean,1.7593
MAD,0.89175
Skewness,3.2413
Sum,67735
Variance,1.6587
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
1.0,22608,58.7%,
2.0,9056,23.5%,
3.0,3672,9.5%,
4.0,1712,4.4%,
5.0,714,1.9%,
6.0,359,0.9%,
7.0,111,0.3%,
8.0,105,0.3%,
10.0,56,0.1%,
9.0,43,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1,0.0%,
1.0,22608,58.7%,
2.0,9056,23.5%,
3.0,3672,9.5%,
4.0,1712,4.4%,

Value,Count,Frequency (%),Unnamed: 3
13.0,8,0.0%,
14.0,1,0.0%,
15.0,2,0.0%,
16.0,19,0.0%,
18.0,1,0.0%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
strict,19734
moderate,11386
flexible,7343
Other values (2),39

Value,Count,Frequency (%),Unnamed: 3
strict,19734,51.3%,
moderate,11386,29.6%,
flexible,7343,19.1%,
super_strict_30,33,0.1%,
super_strict_60,6,0.0%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
NYC,17856
LA,10068
SF,3601
Other values (3),6977

Value,Count,Frequency (%),Unnamed: 3
NYC,17856,46.4%,
LA,10068,26.1%,
SF,3601,9.4%,
DC,2409,6.3%,
Boston,2365,6.1%,
Chicago,2203,5.7%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Mean,0.81978

0,1
True,31563
(Missing),6939

Value,Count,Frequency (%),Unnamed: 3
True,31563,82.0%,
(Missing),6939,18.0%,

0,1
Distinct count,38193
Unique (%),99.2%
Missing (%),0.0%
Missing (n),0

0,1
"Private room in the heart of Little Italy with FREE PARKING :) You will have your own personal door code and may arrive at ANY time you would like and as late as you wish. You can store your luggage in the closet before check-in and after check-out times :) Price adapts to demand, my rates change constantly - every night is different! Put your dates in the calendar, with the number of guests, and what you see is what you'll pay :) The neighborhood is extremely safe even at night :) Restaurants and bars are right round the corner, both upscale and casual. This room is 6 minutes away by car or uber to Chicago's famous Willis (Sears) tower. Uber costs about $7. If you are arriving by car, I am providing free resident parking passes for anywhere in the area. (Yes, that means you can drive a few blocks and still park anywhere you want!) The room will be solely yours for the time you'll stay in the Windy City. Shall you have any concerns or need any tips from the local, I'll always be happy",7
"Hello, I've been running guest house for Koreans visiting U.S. for 3years, and recently decided to run this place for other travelers also. There are 10 room in the house. They are mostly dormitory rooms and couple of couple room and family room. This places are our women's dormitory in third floor. There are three rooms, but no doors. It is basically open space. There are 2 beds in two rooms and 4 in one room. I do not have closet in this room but there are hangers and mini shelves. My travelers usually put their baggage on the floor. There is one full bathroom only for women in 2nd floor, which you will be sharing with other women guests. Right next that bathroom, there is unisex half bathroom. All bathrooms have hair dryers. You cannot use kitchen, but you can use refrigerator. I offer breakfast every morning from 7-10 am. Bread, cereal, fruits, coffee, milk and juice will be served. You can eat take-out food in the kitchen, but please wash dishes that you used and put trash in the",6
"Welcome to RMH, a co-ed hostel vibe home for exploring travelers or working individuals needing temporary housing. (Guests access is from 3pm to 11am daily). NON-SMOKERS ONLY! Host & small dog live on property. NOTE: Guests' bedroom is dog free. GUARANTEED In Our Home: *You receive a clean home in a safe neighborhood, clean sheets, pillow, towel and covers. **You MUST Read House Rules BEFORE Booking :) ***IMPORTANT ***SMOKERS who book: reservation cancelled upon arrival + NO Refund. The bedroom is very large with enough space for everyone (6 guests) to have peace and quiet. The kitchen and bath are very spacious. Guests will have their own keys and have access to the bedroom in which they stay, kitchen, bathroom, living room and deck. OPTIONAL: (Normal Check IN is 3pm, Check OUT is 10am) If you would like an EARLY CHECK IN before 3pm ($25 fee) Guests trying to check in after 10pm will need to get pre-approval from host and pay the late check in fee of $25. There is no key box and no gu",6
Other values (38190),38483

Value,Count,Frequency (%),Unnamed: 3
"Private room in the heart of Little Italy with FREE PARKING :) You will have your own personal door code and may arrive at ANY time you would like and as late as you wish. You can store your luggage in the closet before check-in and after check-out times :) Price adapts to demand, my rates change constantly - every night is different! Put your dates in the calendar, with the number of guests, and what you see is what you'll pay :) The neighborhood is extremely safe even at night :) Restaurants and bars are right round the corner, both upscale and casual. This room is 6 minutes away by car or uber to Chicago's famous Willis (Sears) tower. Uber costs about $7. If you are arriving by car, I am providing free resident parking passes for anywhere in the area. (Yes, that means you can drive a few blocks and still park anywhere you want!) The room will be solely yours for the time you'll stay in the Windy City. Shall you have any concerns or need any tips from the local, I'll always be happy",7,0.0%,
"Hello, I've been running guest house for Koreans visiting U.S. for 3years, and recently decided to run this place for other travelers also. There are 10 room in the house. They are mostly dormitory rooms and couple of couple room and family room. This places are our women's dormitory in third floor. There are three rooms, but no doors. It is basically open space. There are 2 beds in two rooms and 4 in one room. I do not have closet in this room but there are hangers and mini shelves. My travelers usually put their baggage on the floor. There is one full bathroom only for women in 2nd floor, which you will be sharing with other women guests. Right next that bathroom, there is unisex half bathroom. All bathrooms have hair dryers. You cannot use kitchen, but you can use refrigerator. I offer breakfast every morning from 7-10 am. Bread, cereal, fruits, coffee, milk and juice will be served. You can eat take-out food in the kitchen, but please wash dishes that you used and put trash in the",6,0.0%,
"Welcome to RMH, a co-ed hostel vibe home for exploring travelers or working individuals needing temporary housing. (Guests access is from 3pm to 11am daily). NON-SMOKERS ONLY! Host & small dog live on property. NOTE: Guests' bedroom is dog free. GUARANTEED In Our Home: *You receive a clean home in a safe neighborhood, clean sheets, pillow, towel and covers. **You MUST Read House Rules BEFORE Booking :) ***IMPORTANT ***SMOKERS who book: reservation cancelled upon arrival + NO Refund. The bedroom is very large with enough space for everyone (6 guests) to have peace and quiet. The kitchen and bath are very spacious. Guests will have their own keys and have access to the bedroom in which they stay, kitchen, bathroom, living room and deck. OPTIONAL: (Normal Check IN is 3pm, Check OUT is 10am) If you would like an EARLY CHECK IN before 3pm ($25 fee) Guests trying to check in after 10pm will need to get pre-approval from host and pay the late check in fee of $25. There is no key box and no gu",6,0.0%,
"The Treat Street Clubhouse is a home you'll never forget. It's more than an Airbnb -- it's a collective of adventurers and unique individuals. We're creating a place that you can't wait to come back to. You'll have your own bunk bed in a shared room. This is a friendly atmosphere with many people and privacy will be a little sparse. With so many people, it may get rowdy at times. Make yourself at home here. There's a stocked kitchen, comfy loft, fridge, speakers, couches, sous vide machines, and the usual stuff. You'll be sharing the home with Duncan, a rugged cartographer, and Zain, a high-powered business tycoon -- both of whom have a proclivity for hyperbole. We also have a resident mascot: Roux, our loving, courageous, part-dingo pooch. We love going on adventures with guests and frequently organize events and outings. We're in the best part of the Mission with a rock climbing gym, incredible food, and the best coffee, all under a 5 minute walk. We're happy to show you the best loc",4,0.0%,
"Come stay in my converted townhome in the heart of Bedford-Stuyvesant. Close to great restaurants, coffeeshops and juice bars, a great place to lay your head after a day or night out on the town. You will have a private room on the middle floor with a full sized bed and clean sheets and towels. There are 4 bathrooms, one on each floor, a living room and a fully equipped kitchen to use. I live on site and will be available to help with anything that you may need or neighborhood suggestions. A great mix of the hip and gentrified new Brooklyn, and the old 'real' Brooklyn. Im just a stones throw form great Jamaican food, cold brew coffee and juice shops. Just a 7 min walk to the G or AC trains and just 30 minutes into downtown manhattan.",4,0.0%,
"本地方是个两层楼townhouse温馨之家,一楼我们自己住,二楼五个房间做出租房,有两个洗手间和浴室供您们使用,每个房间都配备双人床 被子枕头床头柜 台灯 书桌椅 电视机Wi-Fi 热水壶 毛巾牙刷 牙膏 拖鞋 餐巾纸 房间干净卫生。大门进出有密码锁 每个房间也都有密码锁 走廊有安装电眼探头,厨房可以煮简单的食物,可以说是麻雀虽小五脏俱全。 地方位于纽约布鲁克林八大道华人集聚地,附近各种餐馆林立,地区治安好交通发达,走路2分钟到小巴站25分钟车程到达曼哈顿唐人街,门口有大巴站坐两站到地铁站也可以走路8分钟到N地铁站或走12分钟到D或R地铁站。 如果您开车过来白天很容易找到车位,晚上8点后如果找不到车位,您可以把车停在我们旁边8大道meter车位上直到第二天早晨7点半后就有很多车位了,欢迎您的到来!",4,0.0%,
"Located at Laurel Hights close to USF, Golden Gate Park, Golden Gate Bridge and Shopping Center. We are steps from MUNI station and 15 minutes Bus ride or Lyft/Uber ride to downtown. The room has a queen sized bed with incredible comfortable mattress which is perfect for a restful night after a day in the city. There is free parking on the street. Free fast Wi-Fi.",4,0.0%,
"Spacious 2 story 4000 sqft home Enjoy fruit trees swimming pool fountain & gardens lWalking distance to mall, grocery store, movie theater & restaurants. Centrally located to Malibu, Santa Monica,Venice,Hollywood, Beverly Hills, West LA This spacious 2 story 4000 sqft home is fully furnished and beautifully decorated! Coffee and donuts are served every morning;) Please keep in mind that this is a shared space, and that your large room consists of 2 bunk beds, your own dresser, hamper, and closet space. You will have full access to the rest of the home. We provide a safe, clean, and positive environment, perfect for people relocating from out of state, students, and anyone else who needs a landing place while traveling. Near public transportation; ORANGELINE, 405, and 101 FREEWAYS. You will have full access to the rest of the home. We do have private and 2 person and or couples rooms available in the same property. Full kitchen privileges All Premium Cable and movie channels Full se",4,0.0%,
"Brand New 2017 Construction, Featuring 2 Bedrooms and 2 bathrooms . comfortable couch, coffee table, 50-inch TV in living room, with free Satellite Cable channel and internet Wi-Fi. Washer and Dryer in the building on each floor.This lovely Apartment is perfect for families, couples or four close friends! With over 900 square feet of living space.A short freeway ride to Westwood, Hollywood Bowl, Griffith Park, LA Zoo, and downtown venues such as Chinatown, LA Live, Staples Center and Olivera St. Enjoy the open entertainer floor plan featuring a gourmet kitchen with quartz counters w/natural mosaic stone and quartz back splash, Stainless Steel appliances, and designer cabinetry. All baths are tastefully done with tub/showers and sinks. Awesome living area, Enjoy those high ceilings! Terrific master suite with closet space. This is a ""smart home"" featuring an LED recessed lighting. Gated Garage is ready for two full sized cars all side by side. A cozy retreat for your stay in Los Angele",4,0.0%,
"HUGE loft in converted factory, with in building: -COFFEE SHOP -YOGA STUDIO -ROOF DECK -FOOD CO/OP -TATTOO PARLOR -VAPE SHOP -CLOTHING STORES -SHIPPING AND PRINTING STORE -Located in vibrant Bushwick Brooklyn within walking distance to some of the best street art in the US. -Just 3 blocks to subway with easy access to Manhattan, Williamsburg and JFK airport. At your disposal is a comfortable lounge and dining area with a kitchen, and 2 bathrooms, that is all cleaned daily. There is also a HUGE ROOF DECK, as well as mini mall downstairs containing a yoga studio, coffee shop, art gallery, skate shop, and clothing store. Right across the street are 2 of the biggest vintage shops in Brooklyn. Your room has a brand new super comfortable PILLOW TOP double bed, as well as your own air conditioner and space heater to combat extreme NYC temperatures. Free WIFI :) We pay for extra for the really fast internet so stream video at ease. We also provide professionally laundered fresh sheets and to",4,0.0%,

0,1
Distinct count,2448
Unique (%),6.4%
Missing (%),0.0%
Missing (n),0

0,1
2017-01-01,200
2017-09-04,187
2017-01-22,137
Other values (2445),37978

Value,Count,Frequency (%),Unnamed: 3
2017-01-01,200,0.5%,
2017-09-04,187,0.5%,
2017-01-22,137,0.4%,
2017-01-02,136,0.4%,
2017-04-16,123,0.3%,
2017-03-19,110,0.3%,
2017-04-09,109,0.3%,
2016-09-05,104,0.3%,
2016-01-03,103,0.3%,
2016-01-02,101,0.3%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
t,38445
f,57

Value,Count,Frequency (%),Unnamed: 3
t,38445,99.9%,
f,57,0.1%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
t,27921
f,10581

Value,Count,Frequency (%),Unnamed: 3
t,27921,72.5%,
f,10581,27.5%,

0,1
Distinct count,76
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
100%,31082
90%,1691
80%,749
Other values (73),4980

Value,Count,Frequency (%),Unnamed: 3
100%,31082,80.7%,
90%,1691,4.4%,
80%,749,1.9%,
70%,333,0.9%,
99%,273,0.7%,
50%,271,0.7%,
97%,238,0.6%,
96%,234,0.6%,
94%,232,0.6%,
98%,227,0.6%,

0,1
Distinct count,2981
Unique (%),7.7%
Missing (%),0.0%
Missing (n),0

0,1
2014-02-14,140
2015-03-30,89
2014-09-02,51
Other values (2978),38222

Value,Count,Frequency (%),Unnamed: 3
2014-02-14,140,0.4%,
2015-03-30,89,0.2%,
2014-09-02,51,0.1%,
2016-01-18,50,0.1%,
2016-08-23,50,0.1%,
2013-08-07,48,0.1%,
2015-03-05,47,0.1%,
2012-08-27,46,0.1%,
2014-07-29,46,0.1%,
2014-03-17,45,0.1%,

0,1
Distinct count,38502
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,11232000
Minimum,941
Maximum,21230903
Zeros (%),0.0%

0,1
Minimum,941
5-th percentile,825490
Q1,6233500
Median,12188000
Q3,16381000
95-th percentile,20027000
Maximum,21230903
Range,21229962
Interquartile range,10148000

0,1
Standard deviation,6084100
Coef of variation,0.54166
Kurtosis,-1.1373
Mean,11232000
MAD,5291000
Skewness,-0.25158
Sum,432468244021
Variance,37016000000000
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
18812926,1,0.0%,
21038410,1,0.0%,
2137823,1,0.0%,
11822406,1,0.0%,
9076037,1,0.0%,
15430980,1,0.0%,
2735657,1,0.0%,
15659073,1,0.0%,
13346110,1,0.0%,
12041533,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
941,1,0.0%,
2515,1,0.0%,
2864,1,0.0%,
3152,1,0.0%,
3662,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
21215451,1,0.0%,
21218973,1,0.0%,
21227461,1,0.0%,
21228356,1,0.0%,
21230903,1,0.0%,

0,1
Distinct count,38502
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,37093
Minimum,1
Maximum,74110
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,3828.1
Q1,18494.0
Median,36962.0
Q3,55696.0
95-th percentile,70411.0
Maximum,74110.0
Range,74109.0
Interquartile range,37202.0

0,1
Standard deviation,21406
Coef of variation,0.57709
Kurtosis,-1.2066
Mean,37093
MAD,18556
Skewness,0.0011925
Sum,1428168201
Variance,458230000
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
2047,1,0.0%,
11535,1,0.0%,
72200,1,0.0%,
5384,1,0.0%,
27911,1,0.0%,
25862,1,0.0%,
32005,1,0.0%,
17666,1,0.0%,
71603,1,0.0%,
36091,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.0%,
2,1,0.0%,
5,1,0.0%,
7,1,0.0%,
8,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
74102,1,0.0%,
74103,1,0.0%,
74107,1,0.0%,
74108,1,0.0%,
74110,1,0.0%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
f,26648
t,11854

Value,Count,Frequency (%),Unnamed: 3
f,26648,69.2%,
t,11854,30.8%,

0,1
Distinct count,984
Unique (%),2.6%
Missing (%),0.0%
Missing (n),0

0,1
2017-09-24,1230
2017-09-17,1151
2017-04-30,945
Other values (981),35176

Value,Count,Frequency (%),Unnamed: 3
2017-09-24,1230,3.2%,
2017-09-17,1151,3.0%,
2017-04-30,945,2.5%,
2017-09-18,789,2.0%,
2017-09-25,774,2.0%,
2017-04-23,714,1.9%,
2017-10-01,691,1.8%,
2017-09-16,650,1.7%,
2017-09-28,624,1.6%,
2017-09-04,622,1.6%,

0,1
Distinct count,38502
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,38.759
Minimum,33.706
Maximum,42.39
Zeros (%),0.0%

0,1
Minimum,33.706
5-th percentile,33.995
Q1,34.185
Median,40.683
Q3,40.76
95-th percentile,42.314
Maximum,42.39
Range,8.6846
Interquartile range,6.5753

0,1
Standard deviation,3.0077
Coef of variation,0.0776
Kurtosis,-1.1491
Mean,38.759
MAD,2.6475
Skewness,-0.70651
Sum,1492300
Variance,9.0462
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
41.81833221096686,1,0.0%,
40.73415223894808,1,0.0%,
41.88249523785852,1,0.0%,
40.774624455847146,1,0.0%,
40.73411114434017,1,0.0%,
40.819494752240246,1,0.0%,
40.69602112695013,1,0.0%,
34.1008177118698,1,0.0%,
40.637253425982536,1,0.0%,
42.34402116983361,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
33.70583529438538,1,0.0%,
33.70832107320805,1,0.0%,
33.708497599094855,1,0.0%,
33.7086020517297,1,0.0%,
33.7096642361058,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
42.3884156457878,1,0.0%,
42.388483496611904,1,0.0%,
42.38977244945219,1,0.0%,
42.390247544263616,1,0.0%,
42.39043717872241,1,0.0%,

0,1
Distinct count,608
Unique (%),1.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4.7455
Minimum,0
Maximum,7.6004
Zeros (%),0.0%

0,1
Minimum,0.0
5-th percentile,3.7612
Q1,4.3041
Median,4.7005
Q3,5.1648
95-th percentile,5.8579
Maximum,7.6004
Range,7.6004
Interquartile range,0.86072

0,1
Standard deviation,0.65802
Coef of variation,0.13866
Kurtosis,0.58232
Mean,4.7455
MAD,0.51946
Skewness,0.37997
Sum,182710
Variance,0.43298
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
5.0106352940962555,1278,3.3%,
4.605170185988092,1196,3.1%,
4.31748811353631,1042,2.7%,
4.59511985013459,954,2.5%,
4.174387269895637,919,2.4%,
4.0943445622221,908,2.4%,
4.8283137373023015,900,2.3%,
3.912023005428147,874,2.3%,
5.298317366548036,834,2.2%,
4.248495242049359,825,2.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1,0.0%,
1.6094379124341005,1,0.0%,
2.302585092994046,11,0.0%,
2.4849066497880004,1,0.0%,
2.6390573296152584,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
7.506591780070841,1,0.0%,
7.543802867501509,1,0.0%,
7.575584651557793,3,0.0%,
7.598399329323964,1,0.0%,
7.6004023345004,1,0.0%,

0,1
Distinct count,38502
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-90.895
Minimum,-122.51
Maximum,-71
Zeros (%),0.0%

0,1
Minimum,-122.51
5-th percentile,-122.43
Q1,-118.33
Median,-74.002
Q3,-73.949
95-th percentile,-71.115
Maximum,-71.0
Range,51.511
Interquartile range,44.377

0,1
Standard deviation,21.447
Coef of variation,-0.23595
Kurtosis,-1.6244
Mean,-90.895
MAD,20.255
Skewness,-0.55152
Sum,-3499600
Variance,459.96
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
-77.02341087104014,1,0.0%,
-73.91283087682852,1,0.0%,
-118.35599908694776,1,0.0%,
-73.87587054211056,1,0.0%,
-73.91239933468918,1,0.0%,
-87.67186359339479,1,0.0%,
-73.93490943886229,1,0.0%,
-118.26430778963156,1,0.0%,
-118.17678950244192,1,0.0%,
-73.89862958770388,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-122.51149998987214,1,0.0%,
-122.50963522313911,1,0.0%,
-122.50936476530724,1,0.0%,
-122.50933564108672,1,0.0%,
-122.50913745067112,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-71.00618210560536,1,0.0%,
-71.00574428525941,1,0.0%,
-71.00461976225016,1,0.0%,
-71.0021227970042,1,0.0%,
-71.00046158748074,1,0.0%,

0,1
Distinct count,38290
Unique (%),99.4%
Missing (%),0.0%
Missing (n),0

0,1
"Location, Location, Location",5
Spacious Private Room in Brooklyn,5
Bunk bed in the Treat Street Clubhouse,5
Other values (38287),38487

Value,Count,Frequency (%),Unnamed: 3
"Location, Location, Location",5,0.0%,
Spacious Private Room in Brooklyn,5,0.0%,
Bunk bed in the Treat Street Clubhouse,5,0.0%,
SHARED ROOM in VENICE BEACH HOSTEL,4,0.0%,
Brooklyn Oasis,4,0.0%,
Your home away from home,4,0.0%,
Make的小屋（地理位置好，交通方便，洛杉矶市中心，提供机场名牌店景点等接送，包车游玩等服务）,4,0.0%,
Kanmore Guest House,4,0.0%,
WOMEN SHARED ROOM in VENICE HOSTEL,3,0.0%,
East Village Studio,3,0.0%,

0,1
Distinct count,586
Unique (%),1.5%
Missing (%),0.0%
Missing (n),0

0,1
Williamsburg,1523
Bedford-Stuyvesant,1337
Bushwick,867
Other values (583),34775

Value,Count,Frequency (%),Unnamed: 3
Williamsburg,1523,4.0%,
Bedford-Stuyvesant,1337,3.5%,
Bushwick,867,2.3%,
Mid-Wilshire,846,2.2%,
Harlem,813,2.1%,
Hell's Kitchen,807,2.1%,
Hollywood,766,2.0%,
Upper West Side,663,1.7%,
Venice,660,1.7%,
Upper East Side,653,1.7%,

0,1
Distinct count,365
Unique (%),0.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,32.975
Minimum,1
Maximum,542
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,5
Median,16
Q3,42
95-th percentile,125
Maximum,542
Range,541
Interquartile range,37

0,1
Standard deviation,45.42
Coef of variation,1.3774
Kurtosis,13.091
Mean,32.975
MAD,30.664
Skewness,2.9589
Sum,1269604
Variance,2063
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
1,2748,7.1%,
2,2349,6.1%,
3,2021,5.2%,
4,1670,4.3%,
5,1520,3.9%,
6,1249,3.2%,
7,1171,3.0%,
8,1084,2.8%,
9,962,2.5%,
10,868,2.3%,

Value,Count,Frequency (%),Unnamed: 3
1,2748,7.1%,
2,2349,6.1%,
3,2021,5.2%,
4,1670,4.3%,
5,1520,3.9%,

Value,Count,Frequency (%),Unnamed: 3
505,1,0.0%,
525,1,0.0%,
530,1,0.0%,
532,1,0.0%,
542,1,0.0%,

0,1
Distinct count,31
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
Apartment,24752
House,8903
Condominium,1421
Other values (28),3426

Value,Count,Frequency (%),Unnamed: 3
Apartment,24752,64.3%,
House,8903,23.1%,
Condominium,1421,3.7%,
Townhouse,940,2.4%,
Loft,719,1.9%,
Guesthouse,335,0.9%,
Other,331,0.9%,
Bed & Breakfast,286,0.7%,
Bungalow,216,0.6%,
Guest suite,97,0.3%,

0,1
Distinct count,52
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,94.332
Minimum,20
Maximum,100
Zeros (%),0.0%

0,1
Minimum,20
5-th percentile,81
Q1,92
Median,96
Q3,99
95-th percentile,100
Maximum,100
Range,80
Interquartile range,7

0,1
Standard deviation,6.8767
Coef of variation,0.072899
Kurtosis,21.814
Mean,94.332
MAD,4.6731
Skewness,-3.3247
Sum,3632000
Variance,47.289
Memory size,300.9 KiB

Value,Count,Frequency (%),Unnamed: 3
100.0,9261,24.1%,
98.0,3373,8.8%,
97.0,3092,8.0%,
96.0,3057,7.9%,
95.0,2707,7.0%,
93.0,2494,6.5%,
99.0,2131,5.5%,
94.0,2009,5.2%,
90.0,1691,4.4%,
92.0,1550,4.0%,

Value,Count,Frequency (%),Unnamed: 3
20.0,42,0.1%,
27.0,1,0.0%,
35.0,1,0.0%,
40.0,27,0.1%,
47.0,5,0.0%,

Value,Count,Frequency (%),Unnamed: 3
96.0,3057,7.9%,
97.0,3092,8.0%,
98.0,3373,8.8%,
99.0,2131,5.5%,
100.0,9261,24.1%,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Entire home/apt,21495
Private room,16045
Shared room,962

Value,Count,Frequency (%),Unnamed: 3
Entire home/apt,21495,55.8%,
Private room,16045,41.7%,
Shared room,962,2.5%,

0,1
Distinct count,38496
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0

0,1
https://a0.muscache.com/im/pictures/70087089/bc66229a_original.jpg?aki_policy=small,2
https://a0.muscache.com/im/pictures/28563531/1000de61_original.jpg?aki_policy=small,2
https://a0.muscache.com/im/pictures/104667326/a7a2b145_original.jpg?aki_policy=small,2
Other values (38493),38496

Value,Count,Frequency (%),Unnamed: 3
https://a0.muscache.com/im/pictures/70087089/bc66229a_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/28563531/1000de61_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/104667326/a7a2b145_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/109405834/9a555e66_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/61042471/5543b0e0_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/23033013/54d62516_original.jpg?aki_policy=small,2,0.0%,
https://a0.muscache.com/im/pictures/5c35cb1d-fd25-47ec-a09d-1ec39a2284c7.jpg?aki_policy=small,1,0.0%,
https://a0.muscache.com/im/pictures/6023b05a-a8c5-4951-93c5-280bd604a69a.jpg?aki_policy=small,1,0.0%,
https://a0.muscache.com/im/pictures/0cce1a16-eb99-4773-9aec-53eaad0b9d36.jpg?aki_policy=small,1,0.0%,
https://a0.muscache.com/im/pictures/63048380/7227aae8_original.jpg?aki_policy=small,1,0.0%,

0,1
Distinct count,653
Unique (%),1.7%
Missing (%),0.0%
Missing (n),0

0,1
11221,727
11211.0,713
90291,618
Other values (650),36444

Value,Count,Frequency (%),Unnamed: 3
11221,727,1.9%,
11211.0,713,1.9%,
90291,618,1.6%,
94110,576,1.5%,
20002,464,1.2%,
10019,442,1.1%,
90046,427,1.1%,
10002,413,1.1%,
10009.0,404,1.0%,
10036,401,1.0%,

Unnamed: 0,id,log_price,property_type,room_type,amenities,accommodates,bathrooms,bed_type,cancellation_policy,cleaning_fee,city,description,first_review,host_has_profile_pic,host_identity_verified,host_response_rate,host_since,instant_bookable,last_review,latitude,longitude,name,neighbourhood,number_of_reviews,review_scores_rating,thumbnail_url,zipcode,bedrooms,beds
1,6304928,5.129899,Apartment,Entire home/apt,"{""Wireless Internet"",""Air conditioning"",Kitche...",7,1.0,Real Bed,strict,True,NYC,Enjoy travelling during your stay in Manhattan...,2017-08-05,t,f,100%,2017-06-19,t,2017-09-23,40.766115,-73.98904,Superb 3BR Apt Located Near Times Square,Hell's Kitchen,6,93.0,https://a0.muscache.com/im/pictures/348a55fe-4...,10019,3.0,3.0
2,7919400,4.976734,Apartment,Entire home/apt,"{TV,""Cable TV"",""Wireless Internet"",""Air condit...",5,1.0,Real Bed,moderate,True,NYC,The Oasis comes complete with a full backyard ...,2017-04-30,t,t,100%,2016-10-25,t,2017-09-14,40.80811,-73.943756,The Garden Oasis,Harlem,10,92.0,https://a0.muscache.com/im/pictures/6fae5362-9...,10027,1.0,3.0
5,12422935,4.442651,Apartment,Private room,"{TV,""Wireless Internet"",Heating,""Smoke detecto...",2,1.0,Real Bed,strict,True,SF,Beautiful private room overlooking scenic view...,2017-08-27,t,t,100%,2017-06-07,t,2017-09-05,37.753164,-122.429526,Comfort Suite San Francisco,Noe Valley,3,100.0,https://a0.muscache.com/im/pictures/82509143-4...,94131,1.0,1.0
7,13971273,4.787492,Condominium,Entire home/apt,"{TV,""Cable TV"",""Wireless Internet"",""Wheelchair...",2,1.0,Real Bed,moderate,True,LA,Arguably the best location (and safest) in dow...,2016-12-16,t,t,100%,2013-05-18,f,2017-04-12,34.046737,-118.260439,"Near LA Live, Staple's. Starbucks inside. OWN ...",Downtown,9,93.0,https://a0.muscache.com/im/pictures/61bd05d5-c...,90015,1.0,1.0
8,180792,4.787492,House,Private room,"{TV,""Cable TV"",""Wireless Internet"",""Pets live ...",2,1.0,Real Bed,moderate,True,SF,Garden Studio with private entrance from the s...,2016-02-13,t,f,100%,2015-06-04,f,2017-09-24,37.781128,-122.501095,Cozy Garden Studio - Private Entry,Richmond District,159,99.0,https://a0.muscache.com/im/pictures/0ed6c128-7...,94121,1.0,1.0


In [20]:
#Utilizando o seaborn
#df é o dataframe após o filtro

corre = df.corr(method = 'spearman')
sns.heatmap(corre, xticklabels = corre.columns.values, yticklabels = corre.columns.values)

<matplotlib.axes._subplots.AxesSubplot at 0x1a6a8714b00>

## Conclusão da análise exploratória