# About this notebook

This notebook is created by Bella Ratmelia (bellar@smu.edu.sg) for SMU Libraries' Python 101: Visualizing your data in Python bite-sized workshop on 10 Feb 2023

Dataset used in this workshop:

* Netflix stock data (we will retrieve it from Yahoo Finance with the help of `yfinance` library)
* Survey data `youth-survey.csv` is a raw survey data conducted on 1010 youth from Czech Republic. This data is available on Kaggle.
* Download data of TikTok in several countries `tiktok-downloads-by-country.csv`. This data is available from Statista. 

## Preparing our data

* install and import all the necessary packages
* load our data to DataFrames

In [None]:
pip install yfinance

In [None]:
pip install matplotlib

In [None]:
pip install pandas

In [None]:
pip install seaborn

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import yfinance

In [None]:
# load the survey data
survey = pd.read_csv("youth-survey.csv")

# load TikTok data
tiktok = pd.read_csv("tiktok-downloads-by-country.csv")

#retrieve our time series data, which is the Netflix stock price
nflx = yfinance.download("NFLX", start="2020-01-01", end="2020-03-31")
nflx.info()

# Using `matplotlib`

In [None]:
fig, ax = plt.subplots() # create a figure with a single axes


plt.grid() #show grid
plt.legend() #show legend
plt.show() #render the information

## Anatomy of a matplotlib graph

<img src="https://matplotlib.org/stable/_images/anatomy.png" alt="Drawing" style="width: 400px;"/>

## Line Chart

## Bar Charts

## Scatterplots

## Histogram

## Subplots

# Using `seaborn`

* seaborn is built on top of matplotlib
* advantage: cleaner output, simpler syntax, integrate well with dataframe
* both are visualization library, but different library has different "rules"

In [None]:
# setup
# Create an array with the colors you want to use
colors = ["#69b3a2", "#4374B3"]
sns.set_palette(sns.color_palette(colors))

# or set style
sns.set_style("darkgrid")

# Set the figure size
plt.figure(figsize=(10, 10))

## How Seaborn is structured

* **`Relplot`** - to show the relationship between two variables
    * `scatterplot()` (with kind="scatter"; the default)
    * `lineplot()` (with kind="line")

Documentation: https://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot

* **`Catplot`** - show the relationship between a numerical and one or more categorical variables
    * Categorical scatterplots:
        * `stripplot()` (with kind="strip"; the default)
        * `swarmplot()` (with kind="swarm")
    * Categorical distribution plots:
        * `boxplot()` (with kind="box")
        * `violinplot()` (with kind="violin")
        * `boxenplot()` (with kind="boxen")
    * Categorical estimate plots:
        * `pointplot()` (with kind="point")
        * `barplot()` (with kind="bar")
        * `countplot()` (with kind="count")

Documentation: https://seaborn.pydata.org/generated/seaborn.catplot.html#seaborn.catplot

* **`Displot`** - show the distribution of data points across a range
    * `histplot` (with kind="hist"; the default)
    * `kdeplot` (with kind="kde")
    * `ecdfplot` (with kind="ecdf"; univariate-only)
    * `rugplot` can be added to any kind of plot to show individual observations.

Documentation: https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot

## Relplots

## Catplots

## Displots

## Seaborn themes

more here: https://seaborn.pydata.org/tutorial/color_palettes.html

In [None]:
sns.color_palette("colorblind")

In [None]:
sns.color_palette("rocket", as_cmap=True)

In [None]:
#setting pallete:
sns.set_palette("colorblind")

# Extra: Candlestick chart

In [None]:
pip install plotly # this will take a while to install!

In [None]:
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = "iframe"

In [None]:
candlestick = go.Candlestick(
                    x=nflx.index,
                    open=nflx['Open'],
                    high=nflx['High'],
                    low=nflx['Low'],
                    close=nflx['Close']
                )

fig = go.Figure(data=[candlestick])

fig.show()

## Exercises
- Try to recreate some of the charts above with your own data! What kind of chart do you think is best for your data?