# Banknotes dataset using pandas

This brief Jupyter notebook uses tools _other than_ the `datascience` library to visualize the banknotes dataset from Section 17.4 in the textbook.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
path_data = 'http://personal.psu.edu/drh20/200DS/assets/data/'

The code below uses the `read_csv` function in the `pandas` library to read the `banknotes.csv` dataset (the same one from Section 17.4) and then display the object.

In [None]:
banknotes = pd.read_csv(path_data + 'banknote.csv')
banknotes

The `banknotes` object created by `read_csv` is of the `DataFrame` type:

In [None]:
type(banknotes)

There's a `pandas` method called `groupby` that operates on `DataFrame` objects and creates groups based on one of the columns.  In our dataset, `Class` is the obvious grouping variable, so for instance we can use `groupby` to find the mean of each variable when the rows are grouped by `Class`:

In [None]:
banknotes.groupby('Class').mean()

The code that follows creates a 3-dimensional scatterplot very much like the one seen in Subsection 17.4.2.

In [None]:
fig = plt.figure(figsize=(8, 6))
ax = Axes3D(fig)

for grp_name, grp_idx in banknotes.groupby('Class').groups.items():
    x = banknotes.loc[grp_idx,'WaveletVar']
    y = banknotes.loc[grp_idx,'WaveletSkew']
    z = banknotes.loc[grp_idx,'WaveletCurt']
    ax.scatter(x,y,z, label=grp_name)

ax.legend(labels=['Genuine', 'Counterfeit'])