# Example Notebook

(Last updated: May 22, 2023)

You can also create content with Jupyter Notebooks.
This means that you can include code blocks and their outputs in your book.
In this notebook, we show some examples of loading and plotting data.
Check [this documentation](https://jupyterbook.org/en/stable/content/executable/index.html) about how to write executable content.

In [33]:
# Import packages
import pandas as pd
import plotly.express as px
import seaborn as sns

## Load data

You can put your data at the same directory as the notebook file and then use `pandas` to load the data.

In [35]:
# Load the Iris flower dataset using pandas
def description(data_path,list_vars):
    df = pd.read_csv(data_path)
    for i in list_vars:
        avg = df[i].median()
        mode = df[i].mode()
        print(f"median_{i}: {avg}, mode_{i}: {mode}")
        print(df[i].describe())
        print(df[i].isnull().sum())

In [36]:
description("resources/beer_reviews.csv",['review_overall', 'review_aroma', 'review_time', 'beer_abv', 'review_palate'])

median_review_overall: 4.0, mode_review_overall: 0    4.0
dtype: float64
count    1.586614e+06
mean     3.815581e+00
std      7.206219e-01
min      0.000000e+00
25%      3.500000e+00
50%      4.000000e+00
75%      4.500000e+00
max      5.000000e+00
Name: review_overall, dtype: float64
0
median_review_aroma: 4.0, mode_review_aroma: 0    4.0
dtype: float64
count    1.586614e+06
mean     3.735636e+00
std      6.976167e-01
min      1.000000e+00
25%      3.500000e+00
50%      4.000000e+00
75%      4.000000e+00
max      5.000000e+00
Name: review_aroma, dtype: float64
0
median_review_time: 1239202881.5, mode_review_time: 0    1101772800
dtype: int64
count    1.586614e+06
mean     1.224089e+09
std      7.654427e+07
min      8.406720e+08
25%      1.173224e+09
50%      1.239203e+09
75%      1.288568e+09
max      1.326285e+09
Name: review_time, dtype: float64
0
median_beer_abv: 6.5, mode_beer_abv: 0    5.0
dtype: float64
count    1.518829e+06
mean     7.042387e+00
std      2.322526e+00
min      1

In [41]:
description("resources/cleaned_ingredients.csv", ['Energy_kcal', 'Fat_g', 'Iron_mg', 'Sugar_g', 'VitB12_mcg'])

median_Energy_kcal: 183.0, mode_Energy_kcal: 0    884.0
dtype: float64
count    9318.000000
mean      220.897683
std       169.034417
min         0.000000
25%        86.000000
50%       183.000000
75%       330.582000
max       902.000000
Name: Energy_kcal, dtype: float64
0
median_Fat_g: 4.8, mode_Fat_g: 0    0.0
dtype: float64
count    9318.000000
mean       10.202202
std        15.575583
min         0.000000
25%         0.820000
50%         4.800000
75%        13.210000
max       100.000000
Name: Fat_g, dtype: float64
0
median_Iron_mg: 1.3, mode_Iron_mg: 0    0.0
dtype: float64
count    9318.000000
mean        2.664721
std         5.631967
min         0.000000
25%         0.490000
50%         1.300000
75%         2.597500
max       123.600000
Name: Iron_mg, dtype: float64
0
median_Sugar_g: 0.46, mode_Sugar_g: 0    0.0
dtype: float64
count    9318.000000
mean        6.522893
std        13.496676
min         0.000000
25%         0.000000
50%         0.460000
75%         5.487500
max   

## Plot data

We set the repository in a way that you can use [Plotly](https://plotly.com/python/) for interactive visualizations.
For more information, check [this documentation](https://jupyterbook.org/en/stable/interactive/interactive.html).

In [14]:
# Plot the Iris dataset using Plotly
g1 = px.scatter_3d(df,
                   x="sepal_width",
                   y="sepal_length",
                   z="petal_width",
                   color="species",
                   size="petal_length",
                   opacity=0.6,
                   size_max=30,
                   height=700)
g1

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['brewery_id', 'brewery_name', 'review_time', 'review_overall', 'review_aroma', 'review_appearance', 'review_profilename', 'beer_style', 'review_palate', 'review_taste', 'beer_name', 'beer_abv', 'beer_beerid'] but received: sepal_width

You can also plot the data using static visualizations, such as the [seaborn](https://seaborn.pydata.org/#) library.

In [6]:
# Plot the Iris dataset using seaborn
g2 = sns.pairplot(df.drop("species_id", axis=1),
                  hue='species')
g2

NameError: name 'sns' is not defined