<p align="center">
<img src="https://github.com/adelnehme/python-crash-course-an-introduction-to-spreadsheet-users/blob/main/assets/dc_logo.png?raw=true" alt = "DataCamp Amazon icon" width="65%">
</p>


## **Python Crash Course—An Introduction to Spreadsheet Users**


### **Key session takeaways**

* Import data into Python using `pandas` — Python’s most popular data analysis package.
* Filter, add new columns, and analyse datasets using pandas.
* Present data visualizations using `matplotlib` and `seaborn` — Python's most popular data visualization packages.

### **The Dataset**

The dataset to be used in this training is a CSV file named `airbnb.csv`, which contains data on airbnb listings in the state of New York. It contains the following columns:

- `listing_id`: The unique identifier for a listing
- `description`: The description used on the listing
- `host_id`: Unique identifier for a host
- `neighbourhood_full`: Name of boroughs and neighbourhoods
- `coordinates`: Coordinates of listing _(latitude, longitude)_
- `listing_added`: Date of added listing
- `room_type`: Type of room 
- `rating`: Rating from 0 to 5.
- `price`: Price per night for listing
- `number_of_reviews`: Amount of reviews received 
- `reviews_per_month`: Number of reviews per month
- `availability_365`: Number of days available per year
- `number_of_stays`: Total number of stays thus far

### **Questions to answer**

- **Question 1:** What is the distribution of price per room type?
- **Question 2:** What is the number of listings per borough?
- **Question 3:** What is the number of listings per year?
- **Question 4:** What is the number of listings per year in each borough?


## **Getting started**

In [1]:
# Import libraries


<font color=#03ef62>*To import a CSV file into* `pandas` *, we use* </font> `data = pd.read_csv(file_path)` <font color=#03ef62> *check out this [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) for importing other data types*<font color=#03ef62>


In [2]:
# Read in the dataset


<font color=03ef62>*Some common methods needed to get a better understanding of your DataFrames:*<font color=03ef62>

- `data.head()` <font color=03ef62>*prints the first 5 rows*</font>
- `data.describe()` <font color=03ef62>*prints the distribution of numeric columns*</font>
- `data.info()` <font color=03ef62>*prints the missingness and data types of columns*</font>

In [None]:
# Inspect header of dataset


In [None]:
# Check description of dataset


In [None]:
# Check information of dataset


## **Data Cleaning**

### **Data cleaning to-do list!**

_Data Type Problems_

- _**Task 1**: Remove_ `$` _from_ `price` _and convert it to_ `float`
- _**Task 2**: Convert_ `listing_added` _to_ `datetime`

<br>

_Text/categorical data problems:_

- _**Task 3**: Extract borough from_ `neighbourhood_full`

<br>

_Dealing with missing data:_

- _**Task 4**: Deal with missing data in_ `host_id` _and_ `description` _columns_

<br>


##### **Task 1:** Remove `$` from `price` and convert it to `float`


<font color="03ef62"> _To remove a character from a string column, we can use:_ </font>

```
data['column_name'] = data['column_name'].str.strip(character)
```

<font color="03ef62"> _To convert a numeric column to a float, we can use:_ </font>

```
data['column_name'] = data['column_name'].astype('float')
```

In [None]:
# Strip $ from price column

# Convert price column to float


# Print header again


##### **Task 2:** Convert `listing_added` to `datetime`

<font color="03ef62"> _To convert a date column to_ </font> `datetime`<font color="03ef62">_, we can use:_


```
data['column_name'] = pd.to_datetime(data['column_name'])
```

In [None]:
# Convert listing_added to datetime

# Print info again


##### **Task 3:** Extract borough from `neighbourhood_full` column

<font color="03ef62"> _To split a column into multiple columns, we can use:_</font>

```
split_data = data['column_name'].str.split(',', expand = True)
```


In [None]:
# Create new DataFrame with split column



In [None]:
# Create borough and neighbourhood columns


# Print header of columns


In [None]:
# Drop original neighbourhood_full column


##### **Task 4:** Deal with missing values in `host_id` and `description` columns


<font color="03ef62"> _To count and drop missing values in a DataFrame, you can use the following:_</font>

- `data.isna().sum()` <font color="03ef62">_to count missing values_</font>
- `data.dropna()` <font color="03ef62">_drops missing values_</font>

In [None]:
# Find missing values


In [None]:
# Drop missing values

# Count missing values again




---

<center><h1> Q&A</h1> </center>

---





## **Data Analysis and Visualization**

##### **Question 1:** What is the distribution of price per room type?


<p align="center">
<img src="https://github.com/adelnehme/python-crash-course-an-introduction-to-spreadsheet-users/blob/main/assets/boxplot_image.png?raw=true" alt = "boxplot" width="65%">
</p>


<font color="03ef62"> _To create a boxplot using_</font> `seaborn`<font color="03ef62"> _we can use:_</font>

- `sns.boxplot(x = , y = , data = )`
  - `x`: <font color="03ef62"> _column name on x-axis_</font> 
  - `y`: <font color="03ef62"> _column name on y-axis_</font> 
  - `data`: <font color="03ef62"> _data being used_</font> 
- `plt.title()`: <font color="03ef62"> _sets plot title_</font> 
- `plt.xlabel()`: <font color="03ef62"> _sets x-axis label_</font> 
- `plt.ylabel()`: <font color="03ef62"> _sets y-axis label_</font> 
- `plt.show()`: <font color="03ef62"> _shows plot_</font>  


In [None]:
# Visualize price by room type

# Create plot

# Set titles and labels of plot

# Set y-axis limit

# Show plot


##### **Question 2:** What are the number of listings per borough?

<font color="03ef62"> _To easily show the count of observations in each category using_</font> `seaborn`<font color="03ef62"> _we can use:_</font>

- `sns.countplot(x = , data = )`
  - `x`: <font color="03ef62"> _categorical column name on x-axis_</font> 
  - `data`: <font color="03ef62"> _data being used_</font> 
- `plt.title()`: <font color="03ef62"> _sets plot title_</font> 
- `plt.xlabel()`: <font color="03ef62"> _sets x-axis label_</font> 
- `plt.ylabel()`: <font color="03ef62"> _sets y-axis label_</font> 
- `plt.show()`: <font color="03ef62"> _shows plot_</font>  

In [None]:
# Count the amount of listings per borough

# Set titles and labels of plot


# Show plot


##### **Question 3:** What are the number of listings per year?

In [None]:
# Extract listing year column from listing_added column


In [None]:
# Set figure size


# Count the number of listings per year


# Set titles and labels of plot


# Show plot


##### **Question 4:** What are the number of listings per year in each borough?

In [None]:
# Set figure size

# Count the number of listings per year for each borough

# Set titles and labels of plot


# Show title




---

<center><h1> Q&A</h1> </center>

---



