# Prerequisite

Download pandas library by running the following code.

In [27]:
!pip install pandas



# Exploratory data analysis (EDA)

You will learn how to systematically approach investigating an unknown dataset while maintaining a creative and open mind to search for insights.

## Context
Airbnb is an online marketplace for people to rent places to stay. 

Airbnb has rolled out a new service to help listers set prices. Airbnb makes a percentage commission off of the listings, so they are incentivized to help listers price optimally; that is, at the maximum possible point where they will still close a deal. You are an Airbnb consultant helping with this new pricing service.

## Goal

We are going to focus on a question: which features are helpful for finding out the appropriate listing price?

## Load Data

In [28]:
import pandas as pd

In [29]:
listings = pd.read_csv('data/airbnb_nyc.csv', delimiter=',')

Please check out data dictionary [here](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit#gid=982310896)

## Activities

**Q:** Can you view/print the data?

In [8]:
listings

Unnamed: 0,id,name,summary,description,experiences_offered,neighborhood_overview,transit,house_rules,host_id,host_since,...,hot_tub_sauna_or_pool,internet,long_term_stays,pets_allowed,private_entrance,secure,self_check_in,smoking_allowed,accessible,event_suitable
0,2539,Clean & quiet apt home by the park,Renovated apt home in elevator building.,Renovated apt home in elevator building. Spaci...,none,Close to Prospect Park and Historic Ditmas Park,Very close to F and G trains and Express bus i...,-The security and comfort of all our guests is...,2787,39698.0,...,-1,1,1,-1,-1,1,1,-1,1,1
1,3647,THE VILLAGE OF HARLEM....NEW YORK !,,WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY T...,none,,,Upon arrival please have a legibile copy of yo...,4632,39777.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
2,7750,Huge 2 BR Upper East Cental Park,,Large Furnished 2BR one block to Central Park...,none,,,,17985,39953.0,...,-1,1,-1,1,-1,-1,-1,-1,-1,-1
3,8505,Sunny Bedroom Across Prospect Park,Just renovated sun drenched bedroom in a quiet...,Just renovated sun drenched bedroom in a quiet...,none,Quiet and beautiful Windsor Terrace. The apart...,Ten minutes walk to the 15th sheet F&G train s...,- No shoes in the house - Quiet hours after 11...,25326,40006.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
4,8700,Magnifique Suite au N de Manhattan - vue Cloitres,Suite de 20 m2 a 5 min des 2 lignes de metro a...,Suite de 20 m2 a 5 min des 2 lignes de metro a...,none,,Metro 1 et A,,26394,40014.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30174,36484363,QUIT PRIVATE HOUSE,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,none,QUIT QUIT QUIT !!!!!!,TRAIN STATION 5 MINUTE UBER OR 15 MINUTE WALK ...,"Guest should not wear shoes, no smoking mariju...",107716952,42722.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,1
30175,36484665,Charming one bedroom - newly renovated rowhouse,"This one bedroom in a large, newly renovated r...","This one bedroom in a large, newly renovated r...",none,"There's an endless number of new restaurants, ...",We are three blocks from the G subway and abou...,,8232441,41504.0,...,-1,1,-1,-1,1,-1,1,-1,-1,-1
30176,36485057,Affordable room in Bushwick/East Williamsburg,,,none,,,,6570630,41419.0,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
30177,36485609,43rd St. Time Square-cozy single bed,,,none,,,,30985759,42104.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,-1


**Q:** How large is this data?

In [9]:
listings.shape

(30179, 81)

**Q:** Tell me the types of each variables using ```info()``` method?

In [10]:
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30179 entries, 0 to 30178
Data columns (total 81 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            30179 non-null  int64  
 1   name                                          30166 non-null  object 
 2   summary                                       28961 non-null  object 
 3   description                                   29575 non-null  object 
 4   experiences_offered                           30179 non-null  object 
 5   neighborhood_overview                         18113 non-null  object 
 6   transit                                       18190 non-null  object 
 7   house_rules                                   16623 non-null  object 
 8   host_id                                       30179 non-null  int64  
 9   host_since                                    30170 non-null 

**Q:** Can you list all column names?

In [12]:
listings.columns

Index(['id', 'name', 'summary', 'description', 'experiences_offered',
       'neighborhood_overview', 'transit', 'house_rules', 'host_id',
       'host_since', 'host_response_time', 'host_response_rate',
       'host_is_superhost', 'host_listings_count', 'host_identity_verified',
       'street', 'neighbourhood', 'latitude', 'longitude', 'property_type',
       'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
       'bed_type', 'amenities', 'price', 'guests_included', 'extra_people',
       'minimum_nights', 'calendar_updated', 'has_availability',
       'availability_30', 'availability_60', 'availability_90',
       'availability_365', 'number_of_reviews', 'number_of_reviews_ltm',
       'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'instant_bookable', 'cancellation_policy',
       'calculated_host_listings_count',


**Q:** Can you print the columns named ```number_of_reviews```, ```number_of_reviews_ltm```, ```review_scores_rating``` and ```review_scores_accuracy``` from rows 10000:10020?

In [14]:
listings.loc[10000:10020, ['number_of_reviews', 'number_of_reviews_ltm', 'review_scores_rating', 'review_scores_accuracy']]

Unnamed: 0,number_of_reviews,number_of_reviews_ltm,review_scores_rating,review_scores_accuracy
10000,10,0,100.0,10.0
10001,2,0,80.0,4.0
10002,3,1,100.0,10.0
10003,18,3,96.0,10.0
10004,0,0,,
10005,11,0,96.0,10.0
10006,12,2,95.0,10.0
10007,5,1,96.0,10.0
10008,13,0,98.0,10.0
10009,7,2,89.0,8.0


**Q:** Can you filter/output the listings which has more than 10 reviews (```number_of_reviews```>10) AND has lower than 50 ratings (```review_scores_rating```<50)?

In [16]:
listings[(listings['number_of_reviews'] >10) & (listings['review_scores_rating'] <50)]

Unnamed: 0,id,name,summary,description,experiences_offered,neighborhood_overview,transit,house_rules,host_id,host_since,...,hot_tub_sauna_or_pool,internet,long_term_stays,pets_allowed,private_entrance,secure,self_check_in,smoking_allowed,accessible,event_suitable


**Q:** Can you set index to the ```listings['id']```?

In [31]:
listings = listings.set_index('id')

KeyError: "None of ['id'] are in the columns"

**Q:** Can you tabulate ```self_check_in``` function?

Hint: use ```value_counts()``` method.

In [32]:
listings.self_check_in.value_counts()

-1    24878
 1     5301
Name: self_check_in, dtype: int64

**Q:** Which features have NaN variables? 

Hint: You can use ```any()``` function :)

In [33]:
listings.isna().any

<bound method NDFrame._add_numeric_operations.<locals>.any of            name  summary  description  experiences_offered  \
id                                                           
2539      False    False        False                False   
3647      False     True        False                False   
7750      False     True        False                False   
8505      False    False        False                False   
8700      False    False        False                False   
...         ...      ...          ...                  ...   
36484363  False    False        False                False   
36484665  False    False        False                False   
36485057  False     True         True                False   
36485609  False     True         True                False   
36487245  False    False        False                False   

          neighborhood_overview  transit  house_rules  host_id  host_since  \
id                                                   

**Q:** List your insights/takeaways from exploring this data.

Please, feel free to explore more! Bonus points for adding more questinos :)

## References

"New York", Inside Airbnb, http://insideairbnb.com/get-the-data.html

