<p align="center">
<img src="https://github.com/datacamp/python-live-training-template/blob/master/assets/datacamp.svg?raw=True" alt = "DataCamp icon" width="50%">
</p>
<br><br>

## **Python PySpark Live Training Template**

_Enter a brief description of your session, here's an example below:_

Welcome to this hands-on training where we will immerse yourself in data visualization in Python. Using both `matplotlib` and `seaborn`, we'll learn how to create visualizations that are presentation-ready.

The ability to present and discuss

* Create various types of plots, including bar-plots, distribution plots, box-plots and more using Seaborn and Matplotlib.
* Format and stylize your visualizations to make them report-ready.
* Create sub-plots to create clearer visualizations and supercharge your workflow.

## **The Dataset**

_Enter a brief description of your dataset and its columns, here's an example below:_


The dataset to be used in this webinar is a CSV file named `airbnb.csv`, which contains data on airbnb listings in the state of New York. It contains the following columns:

- `listing_id`: The unique identifier for a listing
- `description`: The description used on the listing
- `host_id`: Unique identifier for a host
- `host_name`: Name of host
- `neighbourhood_full`: Name of boroughs and neighbourhoods
- `coordinates`: Coordinates of listing _(latitude, longitude)_
- `Listing added`: Date of added listing
- `room_type`: Type of room 
- `rating`: Rating from 0 to 5.
- `price`: Price per night for listing
- `number_of_reviews`: Amount of reviews received 
- `last_review`: Date of last review
- `reviews_per_month`: Number of reviews per month
- `availability_365`: Number of days available per year
- `Number of stays`: Total number of stays thus far


## **Setting up a PySpark session**

This set of code lets you enable a PySpark session using google colabs, make sure to run the code snippets to enable PySpark.

In [0]:
# Just run this code
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
!tar xf spark-2.4.5-bin-hadoop2.7.tgz
!pip install -q findspark

In [0]:
# Just run this code too!
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.5-bin-hadoop2.7"

In [0]:
# Set up a Spark session
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

## **Getting started**

In [0]:
# Import other relevant libraries
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression

In [0]:
# Get dataset into local environment
!wget -O /tmp/airbnb.csv 'https://github.com/datacamp/python-live-training-template/blob/master/data/airbnb.csv?raw=True'
airbnb = spark.read.csv('/tmp/airbnb.csv', inferSchema=True, header =True)

### **Examples on use of markdown**

#### **Images**

To add images, gifs, or other assets of that kind, make sure to use the HTML `<img>` function as in the following 
```
<p align="">
<img src="" alt = "alt-text" width="100%">
</p>
<br><br>
```

- The `align` argument takes in `"center"`, `"left"`, `"right"`.
- The `src` argument takes in the raw link of your image.
- The `width` argument takes in a percentage, where `100%` is the original size of the image. 


#### **Formulas**

To use formulas, feel free to use Latex Notation as such:

$y = ax + b$

You can even use color schemes like in this example, where coefficients are colored in red

$y = \color{red}a x + \color{red}b$

#### **Changing font color and size**

To change or highlight specific texts in a color, you can use the following

```
<font color="00AAFF">**Example text**</font>
```

Where the results will look like <font color="00AAFF">**Example text**</font>.

- The `color` argument takes in a HEX code for your color. 