# Intro to Pandas and Matplotlib

#### Preparing, cleaning, and visualising data using `pandas` and `Matplotlib`

Setting up the dataset:
- Read in the CSV and name all the columns accordingly
- Make sure that the Month column is properly formatted as having a `datetime` type
- Set Month as the index of the dataframe
- Define relevant variables for use later

In [None]:
from dotenv import load_dotenv
import os

import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact

load_dotenv()
DATASET_PATH = os.environ.get("DATASET_PATH")

# load in CSV and reformat the dataframe
df = pd.read_csv(DATASET_PATH + "Day 1 Conversion Rate Motor.csv")
df.rename(columns={"Grouping": "State", "Selected Measure1": "Day 1 Conversion Rate"}, inplace=True)
df["Month"] = pd.to_datetime(df["Month"], format="%d/%m/%y")
df.set_index("Month", inplace=True)

states = sorted(set(df["State"]))
df.head()

Plot the conversion rate for every state individually on the same plot:

In [None]:
for state in states:
    df[df.State == state]["Day 1 Conversion Rate"].plot(
        x="Month",
        ylabel="Day 1 Conversion Rate",
        title="Day 1 Conversion Rate against Month"
    )

plt.legend(states);

Here I experimented with using `ipywidgets` to create interactive plots in Jupyter; this ended up being unnecessary as the final product wouldn't be run on a Jupyter notebook anyway. Nevertheless it's useful for visualising data during development.

In [None]:
widget = widgets.Dropdown(
    options=states,
    value="a. NSW",
    description="State:",
    disabled=False
)

def plot_conversion_rate(state):
    df[df.State == state]["Day 1 Conversion Rate"].plot(
        ylabel="Day 1 Conversion Rate",
        title=f"Day 1 Conversion Rate against Month ({state})"
    )

interact(plot_conversion_rate, state=widget);