###  Data Cleaning & Wrangling
- [1.1 Overview](#1_1)
- [1.2 Loading and Imports](#1_2)
- [1.3 Data Exploration](#1_3)
- [1.4 Listing Attributes](#1_4)
   - [1.4.1 ](#1_4_1)
   - [1.4.2  ](#1_4_2)
   - [1.4.3 ](#1_4_3)
   - [1.4.4 ](#1_4_4)
   - [1.4.5 ](#1_4_5)
   - [1.4.6 ](#1_4_6)
- [1.5 Reviews](#1_5)
- [1.6 Exporting](#1_6)

### 1.1 Overview <a id = "1_1"></a>
- The goal of this is to get a sense of what data we're working with and prepare it for exploratory data analysis
- On the general level, we want to see how we can optimize the pricing of listings for hosts (and give them ideal prices to give based on particular attributes)

We I will be considering will be listing data found here (https://insideairbnb.com/get-the-data/):
  - Number of rooms
  - Amenities (gym, pool, etc)
  - Neighborhood attributes

We will also be using the zipcodes found in listing info to pull from US Census Data to attribute information on:
  - Neighborhood Conditions
  - Income
  - Demographics
  - Crime rates

### 1.2 Loading and Imports <a id = "1_2"></a>

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

In [3]:
df_listing = pd.read_csv('NYC_listings.csv')
df_listing.head(3)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,739333866230665371,https://www.airbnb.com/rooms/739333866230665371,20240904164109,2024-09-04,city scrape,Lovely room 2 windows tv work desk ac included,"Lovely vocation room, has work desk , tv, 2 wi...",,https://a0.muscache.com/pictures/miso/Hosting-...,3013025,...,,,,,f,1,0,1,0,
1,572612125615500056,https://www.airbnb.com/rooms/572612125615500056,20240904164109,2024-09-04,city scrape,Room by Sunny & Bay! Sunset Park & Bay Ridge,Cozy room in a charming Sunset Park apartment....,,https://a0.muscache.com/pictures/5f44a178-6043...,358089614,...,4.83,4.67,4.67,,t,2,0,2,0,0.21
2,45267941,https://www.airbnb.com/rooms/45267941,20240904164109,2024-09-04,city scrape,Private Room in Luxury Apartment,,,https://a0.muscache.com/pictures/3c15a88e-b08a...,39162543,...,,,,,f,3,2,1,0,


In [4]:
# Fetch all the zip codes

zip_codes = pd.read_csv

### 1.3 Data Exploration and Reshaping <a id = "1_2"></a>

In [5]:
# There's 75 columns, we don't need every column obviously so lets see what was can drop
df_listing.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
       'description', 'neighborhood_overview', 'picture_url', 'host_id',
       'host_url', 'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'ca

In [10]:
columns_to_keep = [
    'neighbourhood_cleansed', 'price',
    'room_type', 'property_type', 'accommodates', 'bedrooms', 'beds', 
    'bathrooms', 'bathrooms_text', 'amenities','review_scores_rating', 'review_scores_cleanliness', 
    'review_scores_location', 'review_scores_value', 
    'host_is_superhost', 'neighbourhood_group_cleansed', 
    'calculated_host_listings_count',  'description', 'neighborhood_overview', 
]

#  'latitude', 'longitude', 'number_of_reviews', 'reviews_per_month', 'host_response_time', 'host_response_rate', 'host_acceptance_rate', 'minimum_nights', 'maximum_nights'
# 'host_listings_count', 
df_listing = df_listing[columns_to_keep]
df_listing

Unnamed: 0,neighbourhood_cleansed,price,room_type,property_type,accommodates,bedrooms,beds,bathrooms,bathrooms_text,amenities,review_scores_rating,review_scores_cleanliness,review_scores_location,review_scores_value,host_is_superhost,neighbourhood_group_cleansed,calculated_host_listings_count,description,neighborhood_overview
0,Fort Hamilton,$89.00,Private room,Private room in rental unit,1,1.0,1.0,1.0,1 shared bath,"[""Kitchen"", ""Dedicated workspace"", ""TV"", ""Smok...",,,,,f,Brooklyn,1,"Lovely vocation room, has work desk , tv, 2 wi...",
1,Sunset Park,$45.00,Private room,Private room in rental unit,1,1.0,1.0,1.0,1 shared bath,"[""Single level home"", ""Cleaning products"", ""St...",4.83,4.67,4.67,4.67,t,Brooklyn,2,Cozy room in a charming Sunset Park apartment....,
2,Morris Heights,$107.00,Private room,Private room in rental unit,2,1.0,1.0,1.0,1 shared bath,"[""Kitchen"", ""Hair dryer"", ""Hot water"", ""Dryer""...",,,,,f,Bronx,3,,
3,East Harlem,$140.00,Entire home/apt,Entire rental unit,8,3.0,3.0,2.0,2 baths,"[""Building staff"", ""Elevator"", ""Dedicated work...",,,,,f,Manhattan,5,,
4,South Slope,$340.00,Entire home/apt,Entire home,5,4.0,4.0,2.5,2.5 baths,"[""BBQ grill"", ""Kitchen"", ""Dedicated workspace""...",,,,,f,Brooklyn,3,425 10th Street is what dreams are made of! S...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37536,Jamaica,$65.00,Private room,Private room in home,2,1.0,1.0,1.0,1 bath,"[""Free street parking"", ""Hot water"", ""Dining t...",3.80,4.20,3.80,3.80,,Queens,6,"Please read the amenities we offer, doing as t...",
37537,East Elmhurst,$367.00,Entire home/apt,Entire rental unit,7,3.0,3.0,2.0,2 baths,"[""Mosquito net"", ""Cleaning products"", ""Clothin...",4.88,4.84,4.78,4.77,t,Queens,1,"Welcome to Fly-Inn, a stylish retreat in the h...",
37538,East Elmhurst,$89.00,Private room,Private room in home,2,1.0,1.0,1.0,1 shared bath,"[""Dishwasher"", ""Cleaning products"", ""Dining ta...",4.32,4.39,4.40,4.30,f,Queens,4,"Clean, quiet, safe, comfortable and easily acc...",
37539,Throgs Neck,$185.00,Entire home/apt,Entire home,2,1.0,1.0,1.0,1 bath,"[""Coffee maker: pour-over coffee"", ""Cleaning p...",,,,,f,Bronx,1,"Private driveway, wash machine, dryer, utiliti...","The Bronx has the best pizza, Italian restaura..."
