# Airbnb Listings
This dataset consists of six files with Airbnb rental listings of six cities: Austin, Bangkok, Buenos Aires, Cape Town, Istanbul, and Melbourne. Each row represents a listing with details such as coordinates, neighborhood, host id, price per night, number of reviews, and so on. 

Not sure where to begin? Scroll to the bottom to find challenges!

## Other cities

The file names for the other cities are `listings_austin.csv`, `listings_bangkok.csv`, `listings_buenoes_aires.csv`, `listings_cape_town.csv`, and `listings_istanbul.csv`. If you want data on other locations, visit the source of the dataset, [InsideAirbnb](http://insideairbnb.com), and upload it to your workspace.

## Data Dictionary

| Column                            | Explanation                                                                                                                                                                                        |
| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| id                                | Airbnb's unique identifier for the listing                                                                                                                                                         |
| name                              |                                                                                                                                                                                                    |
| host\_id                          |                                                                                                                                                                                                    |
| host\_name                        |                                                                                                                                                                                                    |
| neighbourhood\_group              | The neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.                                                        |
| neighbourhood                     | The neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.                                                              |
| latitude                          | Uses the World Geodetic System (WGS84) projection for latitude and longitude.                                                                                                                      |
| longitude                         | Uses the World Geodetic System (WGS84) projection for latitude and longitude.                                                                                                                      |
| room\_type                        |                                                                                                                                                                                                    |
| price                             | daily price in local currency. Note, $ sign may be used despite locale                                                                                                                             |
| minimum\_nights                   | minimum number of night stay for the listing (calendar rules may be different)                                                                                                                     |
| number\_of\_reviews               | The number of reviews the listing has                                                                                                                                                              |
| last\_review                      | The date of the last/newest review                                                                                                                                                                 |
| calculated\_host\_listings\_count | The number of listings the host has in the current scrape, in the city/region geography.                                                                                                           |
| availability\_365                 | avaliability\_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may be available because it has been booked by a guest or blocked by the host. |
| number\_of\_reviews\_ltm          | The number of reviews the listing has (in the last 12 months)                                                                                                                                      |
| license                           |                                                                                                                                                                                                    |

The data for each city was compiled by [InsideAirbnb](http://insideairbnb.com) between October and November 2021.

[Source](http://insideairbnb.com/get-the-data.html) and [license](https://creativecommons.org/licenses/by/4.0/) of dataset. 

## Don't know where to start?

**Challenges are brief tasks designed to help you practice specific skills:**

- 🗺️ **Explore**: What is the distribution of prices across a city's neighborhoods? How does it change when you segment it further by `room_type`?
- 📊 **Visualize**: Create a map with a dot for each listing in a city and add a color scale based on `price` on the dots.
- 🔎 **Analyze**: How do listings that require a minimum stay of a week or longer differ from those that don't?

**Scenarios are broader questions to help you develop an end-to-end project for your portfolio:**

An international real estate firm has hired you to research professional hosting on Airbnb. These are hosts that have multiple listings, make considerable income from their listings, and often manage teams to operate their listings. Examples include property managers and hospitality business owners.

Using the data from all six cities, you'll have to infer listings by professional hosts based on the distribution 
of `calculated_host_listings_count`. The lead consultant is interested in whether you can identify trends across listings operated by inferred professional hosts, as well as an estimation of the percentage of listings on Airbnb operated by professional hosts.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

# Importing data with read_csv()

In [3]:
import pandas as pd

# Read the CSV file
airbnb_data = pd.read_csv("data/listings_austin.csv")

# View the first 5 rows
airbnb_data

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,2265,Zen-East in the Heart of Austin (monthly rental),2466,Paddy,,78702,30.277520,-97.713770,Entire home/apt,179,7,26,2021-07-02,0.36,3,35,2,
1,5245,"Eco friendly, Colorful, Clean, Cozy monthly share",2466,Paddy,,78702,30.276140,-97.713200,Private room,114,30,9,2017-02-24,0.21,3,0,0,
2,5456,"Walk to 6th, Rainey St and Convention Ctr",8028,Sylvia,,78702,30.260570,-97.734410,Entire home/apt,108,2,575,2021-09-25,24.16,1,324,39,
3,5769,NW Austin Room,8186,Elizabeth,,78729,30.456970,-97.784220,Private room,39,1,264,2021-07-03,5.95,1,0,7,
4,6413,Gem of a Studio near Downtown,13879,Todd,,78704,30.248850,-97.735870,Entire home/apt,109,3,117,2021-04-02,1.27,1,0,4,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11264,52772517,"Perfect for F1 | Modern, Cozy 1B Gem near Down...",243684594,Shirley,,78756,30.319251,-97.732620,Entire home/apt,128,1,0,,,39,357,0,
11265,52772519,"Perfect for F1 | Modern, Cozy 1B Gem near Down...",243684594,Shirley,,78756,30.319457,-97.730823,Entire home/apt,120,1,0,,,39,364,0,
11266,52773211,South Austin Duplex,29154315,David,,78747,30.154323,-97.758275,Entire home/apt,257,2,0,,,2,80,0,
11267,52775433,Upscale apartment home | 1 BR in Austin,359036978,Casey,,78758,30.399703,-97.708126,Entire home/apt,157,90,0,,,293,365,0,


## Selecting a column as index

In [4]:
# Setting the id column as the index
airbnb_data = pd.read_csv("data/listings_austin.csv", index_col="id")
# airbnb_data = pd.read_csv("data/listings_austing.csv", index_col=0)

# Preview first 5 rows
airbnb_data.head()

Unnamed: 0_level_0,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2265,Zen-East in the Heart of Austin (monthly rental),2466,Paddy,,78702,30.27752,-97.71377,Entire home/apt,179,7,26,2021-07-02,0.36,3,35,2,
5245,"Eco friendly, Colorful, Clean, Cozy monthly share",2466,Paddy,,78702,30.27614,-97.7132,Private room,114,30,9,2017-02-24,0.21,3,0,0,
5456,"Walk to 6th, Rainey St and Convention Ctr",8028,Sylvia,,78702,30.26057,-97.73441,Entire home/apt,108,2,575,2021-09-25,24.16,1,324,39,
5769,NW Austin Room,8186,Elizabeth,,78729,30.45697,-97.78422,Private room,39,1,264,2021-07-03,5.95,1,0,7,
6413,Gem of a Studio near Downtown,13879,Todd,,78704,30.24885,-97.73587,Entire home/apt,109,3,117,2021-04-02,1.27,1,0,4,


## Selecting specific columns to read into memory

In [5]:
# Defining the columns to read 
usecols = ["id", "name", "host_id", "neighbourhood", "room_type", "price", "minimum_nights"]

# Read data with subset of columns
airbnb_data = pd.read_csv("data/listings_austin.csv", index_col="id", usecols=usecols)

# Preview first 5 rows
airbnb_data.head()

Unnamed: 0_level_0,name,host_id,neighbourhood,room_type,price,minimum_nights
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2265,Zen-East in the Heart of Austin (monthly rental),2466,78702,Entire home/apt,179,7
5245,"Eco friendly, Colorful, Clean, Cozy monthly share",2466,78702,Private room,114,30
5456,"Walk to 6th, Rainey St and Convention Ctr",8028,78702,Entire home/apt,108,2
5769,NW Austin Room,8186,78729,Private room,39,1
6413,Gem of a Studio near Downtown,13879,78704,Entire home/apt,109,3


# Reading data from a URL

In [6]:
# Webpage URL 
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Define the column names
col_names = ["sepal_length_in_cm",
             "sepal_width_in_cm", 
             "petal_length_in_cm", 
             "petal_width_in_cm", 
             "class"]

# Read data from URL
iris_data = pd.read_csv(url, names=col_names)

iris_data.head() 

Unnamed: 0,sepal_length_in_cm,sepal_width_in_cm,petal_length_in_cm,petal_width_in_cm,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


# Methods of the dataframe structure

## .head() and .tail()

In [7]:
# See first N
iris_data.head()

Unnamed: 0,sepal_length_in_cm,sepal_width_in_cm,petal_length_in_cm,petal_width_in_cm,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [8]:
# See last N
iris_data.tail()

Unnamed: 0,sepal_length_in_cm,sepal_width_in_cm,petal_length_in_cm,petal_width_in_cm,class
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [9]:
# Discover the column names
iris_data.columns

Index(['sepal_length_in_cm', 'sepal_width_in_cm', 'petal_length_in_cm',
       'petal_width_in_cm', 'class'],
      dtype='object')

In [10]:
# Get information on the dataframe
iris_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   sepal_length_in_cm  150 non-null    float64
 1   sepal_width_in_cm   150 non-null    float64
 2   petal_length_in_cm  150 non-null    float64
 3   petal_width_in_cm   150 non-null    float64
 4   class               150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [11]:
# Get descriptive statistics 
iris_data.describe()

Unnamed: 0,sepal_length_in_cm,sepal_width_in_cm,petal_length_in_cm,petal_width_in_cm
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


# Exporting the DataFrame to a CSV file

In [12]:
# Export the file to the current working directory
iris_data.to_csv("cleaned_iris_data.csv")

In [13]:
# Change the delimmiter to a tab
iris_data.to_csv("tab_seperated_iris_data.csv", sep="\t")

In [14]:
# Export data without the index
iris_data.to_csv("tab_seperated_iris_data.csv", sep="\t")

# If you get UnicodeEncodeError use this...  
# iris_data.to_csv("tab_seperated_iris_data.csv", sep="\t", index=False, encoding='utf-8')

In [15]:
# Replace missing values with "Unknown"
iris_data.to_csv("tab_seperated_iris_data.csv", sep="\t", na_rep="Unknown")

In [16]:
# Do not include headers (column names) when exporting the data
iris_data.to_csv("tab_seperated_iris_data.csv", sep="\t", na_rep="Unknown", header=False)