## Predicting Airbnb Listing Price | Cleaning

With the data collected and combined into two files `los-angeles_listings.csv` and `los-angeles_reviews.csv`, it still had to be cleaned. 

Major cleaning considerations included:
- size of data
- data types
- redundancy in features
- data entry errors (e.g capitalization and spelling)
- null values
- combining listings and reviews into a single dataframe

This notebook will cover the data cleaning process for this project, in which I altered the data significantly by dropping redundant/unneccessary columns, optimized datatypes, implemented one hot encoding for categorical variables, and processed the text data.

---

In [1]:
import os
import shutil
import warnings
import numpy as np
import pandas as pd
import multiprocessing as mp
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
pd.set_option('display.max_columns', None)
warnings.filterwarnings("ignore")
sns.set_style('white')
%matplotlib inline

In [3]:
directory = '/Users/limesncoconuts2/datasets/airbnb/'
city = 'los-angeles'
listing = pd.read_csv(directory + city + '_listings.csv')
review = pd.read_csv(directory + city + '_reviews.csv')

First, I started out by getting a sense of the size of both dataframes. `listing` has 750 thousand rows and 106 features while `review` has over 2 million rows but only 6 features. All the data is either of integer, float, or object datatype. `listing` is much larger than `review` despite having many fewer rows.

In [4]:
listing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 750771 entries, 0 to 750770
Columns: 106 entries, access to zipcode
dtypes: float64(21), object(85)
memory usage: 607.2+ MB


In [5]:
review.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2095642 entries, 0 to 2095641
Data columns (total 6 columns):
listing_id       int64
id               int64
date             object
reviewer_id      int64
reviewer_name    object
comments         object
dtypes: int64(3), object(3)
memory usage: 95.9+ MB


In [6]:
listing.head(1)

Unnamed: 0,access,accommodates,amenities,availability_30,availability_365,availability_60,availability_90,bathrooms,bed_type,bedrooms,beds,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,calendar_last_scraped,calendar_updated,cancellation_policy,city,cleaning_fee,country,country_code,description,experiences_offered,extra_people,first_review,guests_included,has_availability,host_about,host_acceptance_rate,host_has_profile_pic,host_id,host_identity_verified,host_is_superhost,host_listings_count,host_location,host_name,host_neighbourhood,host_picture_url,host_response_rate,host_response_time,host_since,host_thumbnail_url,host_total_listings_count,host_url,host_verifications,house_rules,id,instant_bookable,interaction,is_business_travel_ready,is_location_exact,jurisdiction_names,last_review,last_scraped,latitude,license,listing_url,longitude,market,maximum_maximum_nights,maximum_minimum_nights,maximum_nights,maximum_nights_avg_ntm,medium_url,minimum_maximum_nights,minimum_minimum_nights,minimum_nights,minimum_nights_avg_ntm,monthly_price,name,neighborhood_overview,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,notes,number_of_reviews,number_of_reviews_ltm,picture_url,price,property_type,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_accuracy,review_scores_checkin,review_scores_cleanliness,review_scores_communication,review_scores_location,review_scores_rating,review_scores_value,reviews_per_month,room_type,scrape_id,security_deposit,smart_location,space,square_feet,state,street,summary,thumbnail_url,transit,weekly_price,xl_picture_url,zipcode
0,,2,"{Internet,""Wireless Internet"",""Air Conditionin...",30.0,365,60,90.0,1,Real Bed,1,1,1,,,,2015-07-28,6 months ago,moderate,Arcadia,$10.00,United States,US,安靜舒適的独立大套房. 阳光明媚的大卧室，(附帶冰箱)一个超大衣帽間和浴室. 提供免费无线高...,none,$20.00,2014-09-15,1,t,,100%,t,20130073,f,f,1.0,PH,Cici,,https://a2.muscache.com/ac/users/20130073/prof...,100%,within a day,2014-08-16,https://a2.muscache.com/ac/users/20130073/prof...,1,https://www.airbnb.com/users/show/20130073,"['email', 'phone', 'reviews']",,3890624,f,,,t,,2014-11-02,2015-07-28,34.1328,,https://www.airbnb.com/rooms/3890624,-118.025147,Los Angeles,,,1125.0,,https://a0.muscache.com/ac/pictures/52102815/8...,,,1,,"$2,500.00","安靜舒適的独立大套房,非常适合专业商務人士，学生，实习，渡假遊學...",风景秀丽，豪宅林立，闹中取静，交通方便，华人超市!,,Arcadia,,无合约要求，适合短期旅游，探亲访友，随时拎包入住，,2.0,,https://a0.muscache.com/ac/pictures/52102815/8...,$90.00,House,f,f,f,10.0,10.0,8.0,10.0,10.0,90.0,8.0,0.19,Private room,20150730000000.0,,,安靜舒適的独立大套房. 阳光明媚的大卧室，(附帶冰箱)一个超大衣帽間和浴室.,,CA,"South 2nd Avenue, Arcadia, CA 91006, United St...","我們j位於Arcadia市中心,非常适合专业商務人士，学生，渡假遊學...鄰近高速公路,餐厅...",https://a0.muscache.com/ac/pictures/52102815/8...,聚集高尔夫球场，网球场，大型购物中心，影院，图书馆，大型街心公园，跑马场，医院，诊所。非常便...,$650.00,https://a0.muscache.com/ac/pictures/52102815/8...,91006


### Dropping Columns
Looking through the first row of `listing`, it looks like many rows repeat information or are not relevant to this project's analysis. For example, all rows related to location (neighbourhood, city, state) can be removed in favor of zipcode. Also for this project, I am intentionally not using any non-categorical text data apart from that in `review`. 

Therefore, I put the names of all columns I did not want to keep into a list called `drop` and used the pandas drop function to remove those columns in place.

In [7]:
drop = ['access', 'availability_30','availability_60','availability_90',
'calculated_host_listings_count', 'calculated_host_listings_count_entire_homes',
'calculated_host_listings_count_private_rooms', 'calculated_host_listings_count_shared_rooms',
'calendar_last_scraped', 'calendar_updated', 'city', 'country', 'country_code', 'description',
'first_review', 'has_availability', 'host_about', 'host_id', 'host_location', 
'host_listings_count', 'host_name', 'host_neighbourhood', 'host_picture_url', 
'host_thumbnail_url', 'host_since', 'host_url', 'host_verifications','house_rules',
'interaction', 'jurisdiction_names','last_review', 'last_scraped', 'license', 
'listing_url', 'market', 'maximum_maximum_nights', 'maximum_minimum_nights',
'maximum_nights_avg_ntm', 'medium_url', 'minimum_maximum_nights', 'minimum_minimum_nights',
'minimum_nights_avg_ntm', 'monthly_price', 'name', 'neighborhood_overview', 'neighbourhood',
'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'notes', 'number_of_reviews_ltm',
'picture_url', 'review_scores_accuracy', 'review_scores_checkin', 'review_scores_cleanliness',
'review_scores_communication','review_scores_location','review_scores_value','scrape_id', 
'security_deposit','smart_location','space', 'state', 'street','summary', 'thumbnail_url',
'transit', 'weekly_price', 'xl_picture_url']

listing.drop(columns=drop, inplace=True)

### Inconsistencies in Categorical Data

It's important to correct inconsistencies in categorical data to prevent more categories being created than necessary. By converting all text in `listing` to lowercase, I immediately got rid of all capitalization-related data entry inconsistencies and could focus on spelling and formatting errors.

I used the `value_counts()` function to find such inconsistencies with the `bed_type` column and found that some entries are actually people's names! These entry errors are a vast minority in the data, and I decided to remove those rows from `listing`.

I ran `value_counts()` on other categorical variables (omitted because the code wasn't necessary for the final analysis), and found that removing the rows from the `bed_type` analysis got rid of most other inconsistences in other columns as well. 

One minor flaw that I found in the `accommodates` column included a string with a '%', which prevented the column from being converted to a numerical datatype. I removed that row from the `listing`.

Finally, I fixed minor entry inconsistencies in the `property_type` column in which property types of the same name had slightly different entry formats.

In [8]:
listing = listing.apply(lambda x: x.astype(str).str.lower())

In [9]:
print(len(listing.bed_type))
listing.bed_type.value_counts()

750771


real bed         734244
futon              6151
pull-out sofa      4127
airbed             3616
couch              2588
jay                  14
arya                 14
eliza                12
vitaly                3
newsha                2
Name: bed_type, dtype: int64

In [10]:
listing = listing[listing.accommodates != '50%']

In [11]:
listing = listing[(listing.bed_type == 'real bed') |
                  (listing.bed_type == 'futon') |
                  (listing.bed_type == 'pull-out sofa') |
                  (listing.bed_type == 'airbed') |
                  (listing.bed_type == 'couch')]

In [12]:
listing['property_type'] = listing['property_type'].str.replace('bed & breakfast', 'bed and breakfast')\
                                                    .str.replace('casa particular \(cuba\)', 'casa particular', regex=True)

### The Amenities Feature

One interesting feature in `listing` is the `amenities` row, which is a string containing a list of the amenities offered in a listing. Although the format wasn't machine leaning-compatible, I saw this as a great opportunity to do some feature creation.

From the row of `listing` above, you can see that `amenities` consists of strings wrapped in curly braces containing lists of amenities, and some amenity names are contained in quotes. To create a string without these extraneous characters, I removed the characters.

After removing the extranous characters, I created a list of all unique amenities represented in the dataset. I iterated through this list of unique amenities to create new feature-specific boolean columns in `listing` using my function `has_feature()`.

The output of `has_feature` were individual boolean columns for each unique feature in the dataset that indicated whether a listing provided that feature. After creating these columns I was able to drop the original `amenities` column from `listing`. 

In [13]:
listing['amenities'] = listing.amenities.str.replace('{','').str.replace('}','').str.replace('"','')
listing['amenities'] = listing['amenities'].str.replace('doorman entry', 'doorman')\
                                                    .str.replace('elevator in building', 'elevator')\
                                                    .str.replace('ski in/ski out', 'ski-in/ski-out')\
                                                    .str.replace('smart lock', 'smartlock')\
                                                    .str.replace('ceiling fan', 'ceiling fans')\
                                                    .str.replace('translation missing: en.hosting_amenity_49', '')\
                                                    .str.replace('translation missing: en.hosting_amenity_50', '')
listing['amenities'] = listing['amenities'].str.split(',')

In [14]:
all_amenities = []
for entry in listing.amenities:
    try:
        all_amenities.extend(entry)
    except TypeError: # if nan value
        pass
unique_amenities = list(set(all_amenities))
unique_amenities.sort()

In [15]:
def has_feature(series, feature):
    """ returns a pandas series that includes True or False based
        on whether that entry in the dataframe has that amenity
    """
    col = []
    for ls in series:
        try:
            if feature in ls:
                col.append(True)
            else:
                col.append(False)
        except:
            col.append(False)
    col = pd.Series(col)
    return col

In [16]:
# create a new column in listings for each amenity
for amenity in unique_amenities:
    col_name = 'has_' + amenity.replace(' ', '_')
    
    # if that listing has that amenity, True, if not, False
    try:
        listing[col_name] = has_feature(listing.amenities, amenity)
    except TypeError: # is a null value
        pass

In [17]:
# drop blank column and old amenities column
listing = listing.drop(columns=['has_', 'amenities'])

### Changing Datatypes

Since machine learning algorithms in scikit-learn only take numeric data, I needed to make sure that all columns in `listing` were either integers, floats, booleans, or categorical. Going manually through the columns, I separated them into lists based on which data type they should be, and used these lists in the `change_datatypes()` function.

Additionally, there were special columns that required specific processing because they included a special symbol: prices ($) and percentages (\%). The columns were put into their own lists as well for `change_datatypes()`.

`change_datatypes()` used these lists to run data type conversion functions on `listing`.

In [18]:
def change_datatypes(df):
    """ Changes the datatypes of specified columns in the dataframe,
        depending on if the dataframe passed to the function has
        listings data or reviews data. Has try/except blocks just in case
        a certain column is not found in a dataframe.
    """
    # lists of columns that need to be converted in listings data
    to_numeric = ['accommodates', 'availability_365', 'bathrooms', 'bedrooms', 
                  'beds', 'guests_included','host_total_listings_count', 'id',
                  'latitude', 'longitude', 'maximum_nights', 'minimum_nights',
                 'number_of_reviews', 'review_scores_rating','reviews_per_month',
                  'square_feet', 'zipcode']
    to_bool = ['experiences_offered', 'host_has_profile_pic', 'host_identity_verified',
               'host_is_superhost','instant_bookable', 'is_business_travel_ready',
               'is_location_exact', 'require_guest_phone_verification',
               'require_guest_profile_picture', 'requires_license']
    to_category = ['bed_type', 'cancellation_policy', 'host_response_time',
                   'property_type', 'room_type']
    prices = ['cleaning_fee', 'extra_people', 'price',]
    percentages = ['host_acceptance_rate', 'host_response_rate']
    
    ## to integer
    for col_name in to_numeric:
        try:
            df[col_name] = pd.to_numeric(df[col_name], errors='coerce')
        except KeyError:
            pass

    ## to bool: booleans start out as 't' or 'f' strings
    for col_name in to_bool:
        try:
            df[col_name] = df[col_name] == 't'
        except:
            pass
    
    ## to category
    for col_name in to_category:
        try:
            df[col_name] = df[col_name].astype('category')
        except KeyError:
            pass
       
    ## change price columns
    for col_name in prices:
        try:
            df[col_name] = pd.to_numeric(df[col_name].str.replace('$',''),
                                         downcast='float', errors='coerce')
        except:
            pass
        # rename column
        df.rename(columns={col_name: col_name + '_USD'}, inplace=True)

    ## change percentage columns
    for col_name in percentages:
        try:
            df[col_name] = pd.to_numeric(df[col_name].str.replace('%',''),
                                         downcast='float', errors='coerce')
        except:
            pass
        # rename column
        df.rename(columns={col_name: col_name + '_percentage'}, inplace=True)
    
    return df

In [19]:
listing = change_datatypes(listing)

### Target Variable Null Values

For the final analysis, it won't be useful to have rows of data that do not have values for the target variable `price_USD`. Therefore, I dropped all columns where `price_USD` was null.

In [20]:
# drop rows where target column is null
listing = listing[listing.price_USD.notna()]

### One Hot Encoding Categorical Variables

Before one hot encoding categorical variables, I saved and exported a small file contraining columns of interest for plotting later in the analysis.

Then, I created a list with the names of the columns that I wished to be one hot encoded, and used the pandas `get_dummies()` function to add those encoded columns to the list.

In [21]:
# save columns for plotting
destination = '/Users/limesncoconuts2/datasets/airbnb/'
filename = 'for_plotting.csv'
current_dir = os.getcwd() + '/' + filename
df_plot = listing[['price_USD', 'property_type', 'accommodates', 'square_feet']]
df_plot.to_csv(filename, index=False)
shutil.move(os.path.join(current_dir), os.path.join(destination, filename))

'/Users/limesncoconuts2/datasets/airbnb/for_plotting.csv'

In [22]:
# encode categories
encode = ['bed_type','host_response_time','property_type','room_type', 'cancellation_policy']
listing = pd.get_dummies(listing, columns=encode, drop_first=True)

### Incorporate Review Data

The `review` dataframe contains a row for each review written for an airbnb listing. I aggregated all the review text for each individual listing and added it to `listing` as an additional column.

I dropped all columns from the data that were not `listing_id` and `commesnts`, the two rows that would help me append the review text to the final dataframe.

Using the reduced `review` dataframe, I created the `aggregate_reviews()` function that grouped together all review text for each listing. Then I added that grouped data as another feature to the dataframe and renamed the dataframe `df`.

In [23]:
review.drop(columns=['id', 'date', 'reviewer_id', 'reviewer_name'], inplace=True)
review.comments.fillna('', inplace=True)

In [24]:
def aggregate_reviews(review):
    '''Groups together all review text for each listing and stores in a dataframe'''
    review_combined=pd.DataFrame(columns=['id', 'reviews'])
    unique_id = list(review.listing_id.unique())
    
    for i in unique_id:
        subset = review[i == review.listing_id]
        all_comments = []
        for row in subset.iterrows():
            review_text = row[1].comments
            all_comments.append(review_text)
        review_combined = review_combined.append({'id': i, 'reviews': all_comments}, ignore_index=True)
    
    review_combined.id = pd.to_numeric(review_combined.id, errors='coerce')
    return review_combined

In [25]:
cores = mp.cpu_count()
df_split = np.array_split(review, cores, axis=0)
pool = mp.Pool(cores)
review_combined = np.vstack(pool.map(aggregate_reviews, df_split))
pool.close()

In [26]:
review_combined = pd.DataFrame(review_combined, columns=['id', 'reviews'])
review_combined.id = review_combined.id.astype('int')

In [27]:
# add as a column in the df
df = listing.merge(review_combined, on='id', how='left')
df.reviews.fillna('', inplace=True)

### Missing Values

Because machine learning algorithms can't handle missing values, those missing values needed to be filled in way that makes sense. First I created a list of column names that had null values and weren't booleans and another list of column names that had null values are were booleans.  I filled the nan values differently for different columns:

For `zipcode` I filled missing values with the mode.

For other non-boolean columns, I filled missing values with the median.

For boolean columns, I filled missing values with `False`

In [28]:
nans = df.isna().sum()
nans = nans[nans != 0].sort_values(ascending=False)

In [29]:
has_cols = list(nans[11: len(nans)-1].index)

In [30]:
nans = [n for n in nans.index if n not in has_cols]

In [31]:
for col in nans:
    if col == 'zipcode':
        mode = df.zipcode.mode().values[0]
        df[col].fillna(mode, inplace=True)
    else:
        median = np.nanmedian(df[col].values)
        df[col].fillna(median, inplace=True)

for col in has_cols:
    df[col].fillna(False, inplace=True)

### Final Summary of Data and Save

I ended up with a single cleaned dataframe with 1.1 million entries and 353 features of numeric type (except for the review text data, which will be turned into a numeric feature in the machine learning analysis). I saved the dataframe as the csv `df_clean.csv` for later use.

In [32]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1166164 entries, 0 to 1166163
Columns: 353 entries, accommodates to reviews
dtypes: bool(259), float32(5), float64(16), int64(1), object(1), uint8(71)
memory usage: 558.3+ MB


In [33]:
df.head(1)

Unnamed: 0,accommodates,availability_365,bathrooms,bedrooms,beds,cleaning_fee_USD,experiences_offered,extra_people_USD,guests_included,host_acceptance_rate_percentage,host_has_profile_pic,host_identity_verified,host_is_superhost,host_response_rate_percentage,host_total_listings_count,id,instant_bookable,is_business_travel_ready,is_location_exact,latitude,longitude,maximum_nights,minimum_nights,number_of_reviews,price_USD,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_rating,reviews_per_month,square_feet,zipcode,has__toilet,has_24-hour_check-in,has_accessible-height_bed,has_accessible-height_toilet,has_air_conditioning,has_air_purifier,has_alfresco_bathtub,has_alfresco_shower,has_amazon_echo,has_apple_tv,has_baby_bath,has_baby_monitor,has_babysitter_recommendations,has_balcony,has_bath_towel,has_bathroom_essentials,has_bathtub,has_bathtub_with_bath_chair,has_bbq_grill,has_beach_chairs,has_beach_essentials,has_beach_view,has_beachfront,has_bed_linens,has_bedroom_comforts,has_bidet,has_body_soap,has_breakfast,has_breakfast_bar,has_breakfast_table,has_brick_oven,has_building_staff,has_buzzer/wireless_intercom,has_cable_tv,has_carbon_monoxide_detector,has_cat(s),has_ceiling_fans,has_ceiling_fanss,has_ceiling_hoist,has_central_air_conditioning,has_changing_table,has_chef's_kitchen,has_children’s_books_and_toys,has_children’s_dinnerware,has_cleaning_before_checkout,has_coffee_maker,has_computer,has_convection_oven,has_cooking_basics,has_crib,has_day_bed,has_decorative_fireplace,has_dining_area,has_disabled_parking_spot,has_dishes_and_silverware,has_dishwasher,has_dog(s),has_doorman,has_double_oven,has_dryer,has_dvd_player,has_electric_profiling_bed,has_elevator,has_en_suite_bathroom,has_espresso_machine,has_essentials,has_ethernet_connection,has_ev_charger,has_exercise_equipment,has_extra_pillows_and_blankets,has_family/kid_friendly,has_fax_machine,has_fire_extinguisher,has_fire_pit,has_fireplace_guards,has_firm_mattress,has_first_aid_kit,has_fixed_grab_bars_for_shower,has_fixed_grab_bars_for_toilet,has_flat_path_to_front_door,has_foosball_table,has_formal_dining_area,has_free_parking_on_premises,has_free_parking_on_street,has_free_street_parking,has_front_desk/doorperson,has_full_kitchen,has_game_console,has_games,has_garage_parking,has_garden,has_garden_or_backyard,has_gas_fireplace,has_gas_grill,has_gas_oven,has_gated_property,has_ground_floor_access,has_gym,has_hair_dryer,has_hammock,has_handheld_shower_head,has_hangers,has_hbo_go,has_heat_lamps,has_heated_floors,has_heated_infinity_pool,has_heated_pool,has_heated_towel_rack,has_heating,has_high_chair,has_high-resolution_computer_monitor,has_home_theater,has_host_greets_you,has_hot_tub,has_hot_water,has_hot_water_kettle,has_ice_machine,has_indoor_fireplace,has_infinity_pool,has_internet,has_iron,has_ironing_board,has_jetted_tub,has_keypad,has_kitchen,has_kitchenette,has_lake_access,has_lap_pool,has_laptop_friendly_workspace,has_library,has_lock_on_bedroom_door,has_lockbox,has_long_term_stays_allowed,has_lounge_area,has_lounge_chairs,has_luggage_dropoff_allowed,has_media_room,has_memory_foam_mattress,has_microwave,has_mini_fridge,has_misting_system,has_mobile_hoist,has_mountain_view,has_mudroom,has_murphy_bed,has_nan,has_natural_gas_barbeque,has_nespresso_machine,has_netflix,has_office,has_other,has_other_pet(s),has_outdoor_kitchen,has_outdoor_parking,has_outdoor_seating,has_outlet_covers,has_oven,has_pack_’n_play/travel_crib,has_paid_parking_off_premises,has_paid_parking_on_premises,has_parking,has_patio_or_balcony,has_pets_allowed,has_pets_live_on_this_property,has_piano,has_pillow-top_mattress,has_ping_pong_table,has_play_room,has_pocket_wifi,has_pond,has_pool,has_pool_cover,has_pool_house,has_pool_table,has_pool_toys,has_pool_with_pool_hoist,has_printer,has_private_bathroom,has_private_entrance,has_private_gym,has_private_hot_tub,has_private_living_room,has_private_pool,has_projector,has_projector_and_screen,has_propane_barbeque,has_putting_green,has_rain_shower,has_refrigerator,has_roll-in_shower,has_roll-in_shower_with_chair,has_room-darkening_shades,has_safety_card,has_satellite_tv,has_sauna,has_security_cameras,has_security_system,has_self_check-in,has_shampoo,has_shared_gym,has_shared_hot_tub,has_shared_pool,has_shower_chair,has_single_level_home,has_ski-in/ski-out,has_smart_tv,has_smartlock,has_smoke_detector,has_smoking_allowed,has_soaking_tub,has_sonos_sound_system,has_sound_system,has_stair_gates,has_stand_alone_steam_shower,has_standing_valet,has_steam_oven,has_step-free_access,has_stove,has_suitable_for_events,has_sun_deck,has_sun_loungers,has_swimming_pool,has_table_corner_guards,has_tennis_court,has_terrace,has_toaster,has_toilet_paper,has_touchless_faucets,has_tv,has_video_games,has_walk-in_shower,has_warming_drawer,has_washer,has_washer_/_dryer,has_waterfront,has_well-lit_path_to_entrance,has_wet_bar,has_wheelchair_accessible,has_wide_clearance_to_bed,has_wide_clearance_to_shower,has_wide_doorway,has_wide_entryway,has_wide_hallway_clearance,has_wifi,has_window_guards,has_wine_cellar,has_wine_cooler,has_wine_storage,has_wireless_internet,bed_type_couch,bed_type_futon,bed_type_pull-out sofa,bed_type_real bed,host_response_time_nan,host_response_time_within a day,host_response_time_within a few hours,host_response_time_within an hour,property_type_apartment,property_type_barn,property_type_bed and breakfast,property_type_boat,property_type_boutique hotel,property_type_bungalow,property_type_bus,property_type_cabin,property_type_camper/rv,property_type_campsite,property_type_casa particular,property_type_castle,property_type_cave,property_type_chalet,property_type_condominium,property_type_cottage,property_type_dome house,property_type_dorm,property_type_earth house,property_type_entire floor,property_type_farm stay,property_type_guest suite,property_type_guesthouse,property_type_hostel,property_type_hotel,property_type_house,property_type_houseboat,property_type_hut,property_type_in-law,property_type_island,property_type_lighthouse,property_type_loft,property_type_minsu (taiwan),property_type_nan,property_type_nature lodge,property_type_other,property_type_parking space,property_type_plane,property_type_resort,property_type_serviced apartment,property_type_tent,property_type_timeshare,property_type_tiny house,property_type_tipi,property_type_townhouse,property_type_train,property_type_treehouse,property_type_vacation home,property_type_villa,property_type_yurt,room_type_nan,room_type_private room,room_type_shared room,cancellation_policy_luxury_moderate,cancellation_policy_luxury_no_refund,cancellation_policy_luxury_super_strict_95,cancellation_policy_moderate,cancellation_policy_nan,cancellation_policy_no_refunds,cancellation_policy_strict,cancellation_policy_strict_14_with_grace_period,cancellation_policy_super_strict_30,cancellation_policy_super_strict_60,reviews
0,2.0,365.0,1.0,1.0,1.0,10.0,False,20.0,1,100.0,True,False,False,100.0,1.0,3890624.0,False,False,True,34.132812,-118.025147,1125.0,1.0,2.0,90.0,False,False,False,90.0,0.19,700.0,91006.0,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,[Very friendly host and nice room. Really enj...


In [34]:
filename = 'df_clean.csv'
current_dir = os.getcwd() + '/' + filename
df.to_csv(filename, index=False)
shutil.move(os.path.join(current_dir), os.path.join(destination, filename))

'/Users/limesncoconuts2/datasets/airbnb/df_clean.csv'