###  Data Cleaning & Wrangling
- [1.1 Overview](#1_1)
- [1.2 Loading and Imports](#1_2)
- [1.3 Data Exploration](#1_3)
- [1.4 Listing Attributes](#1_4)
   - [1.4.1 ](#1_4_1)
   - [1.4.2  ](#1_4_2)
   - [1.4.3 ](#1_4_3)
   - [1.4.4 ](#1_4_4)
   - [1.4.5 ](#1_4_5)
   - [1.4.6 ](#1_4_6)
- [1.5 Reviews](#1_5)
- [1.6 Exporting](#1_6)

### 1.1 Overview <a id = "1_1"></a>
- The goal of this is to get a sense of what data we're working with and prepare it for exploratory data analysis
- On the general level, we want to see how we can optimize the pricing of listings for hosts (and give them ideal prices to give based on particular attributes)

We I will be considering will be listing data found here ():
  - Number of rooms
  - Amenities (gym, pool, etc)
  - Neighborhood attributes

We will also be using the zipcodes found in listing info to pull from US Census Data to attribute information on:
  - Neighborhood Conditions
  - Income
  - Demographics
  - Crime rates

### 1.2 Loading and Imports <a id = "1_2"></a>

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

In [14]:
df_listing = pd.read_csv('NYC_listings.csv')
df_listing.head(3)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,739333866230665371,https://www.airbnb.com/rooms/739333866230665371,20240904164109,2024-09-04,city scrape,Lovely room 2 windows tv work desk ac included,"Lovely vocation room, has work desk , tv, 2 wi...",,https://a0.muscache.com/pictures/miso/Hosting-...,3013025,...,,,,,f,1,0,1,0,
1,572612125615500056,https://www.airbnb.com/rooms/572612125615500056,20240904164109,2024-09-04,city scrape,Room by Sunny & Bay! Sunset Park & Bay Ridge,Cozy room in a charming Sunset Park apartment....,,https://a0.muscache.com/pictures/5f44a178-6043...,358089614,...,4.83,4.67,4.67,,t,2,0,2,0,0.21
2,45267941,https://www.airbnb.com/rooms/45267941,20240904164109,2024-09-04,city scrape,Private Room in Luxury Apartment,,,https://a0.muscache.com/pictures/3c15a88e-b08a...,39162543,...,,,,,f,3,2,1,0,


In [12]:
# Fetch all the zip codes

zip_codes = pd.read_csv

### 1.3 Data Exploration and Cleaning <a id = "1_2"></a>

In [13]:
# There's 75 columns, we don't need every column obviously so lets see what was can drop
df_listing.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
       'description', 'neighborhood_overview', 'picture_url', 'host_id',
       'host_url', 'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'ca