![Erudio logo](img/erudio-logo-small.png)
---
![Pandas logo](img/pandas-logo-small.png)

# Basic plots

The introduction to Pandas showed a couple simple plots.  Let's look at a few others as well. More advanced plotting will be discussed in the lessons on Matplotlib.


In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from src.training import *

In [None]:
cancer = pd.read_csv('data/wisconsin.csv')            
cancer

## Histograms

In [None]:
cancer['mean area'].plot(kind='hist', title="Histogram of Mean Area", bins=15);

## Bar charts

Basic bar charts are simple in Pandas.  To fine tune the results (such as using dual-Y axes), you need to utilize the underlying Matplotlib capabilities outside of Pandas itself.

In [None]:
(cancer.loc[:19, ["mean perimeter", "mean radius"]]
     .plot(kind="bar", title="Attributes for 20 patients")
);

In [None]:
(cancer.loc[:19, ["mean perimeter", "mean radius"]]
     .plot(kind="bar", title="Attributes for 20 patients",
           subplots=True)
);

## Scatter plots

In [None]:
cancer.plot(
    kind="scatter", 
    x="mean radius", 
    y="mean smoothness",
    title="Comparing mean radius to mean smoothness",
);

In [None]:
cancer.plot(
    kind="scatter", 
    x="mean radius", 
    y="mean smoothness",
    title="Comparing mean radius to mean smoothness",
    s=3,  # marker size smaller than default
    color="red"
);

### Hex binning

In this example and resolution, the default hexagons are largely just scatter points.

In [None]:
cancer.plot(
    kind="hexbin", 
    x="mean radius", 
    y="mean smoothness",
    title="Comparing mean radius to mean smoothness",
);

By using a somewhat coarser grid, we can see meaningful distinctions among regions.

In [None]:
cancer.plot(
    kind="hexbin", 
    x="mean radius", 
    y="mean smoothness",
    title="Comparing mean radius to mean smoothness",
    gridsize=8);

An alternate spelling of most kinds of plots is with methods named after the kind, e.g.:

In [None]:
cancer.plot.hexbin(
    x="mean radius",
    y="mean smoothness",
    title="Comparing mean radius to mean smoothness",
    gridsize=15
);

## Statistical distributions

Box plots are a quick way to visualize distributions of features.  If the additional module `scipy` is installed, some additional plots such as KDEs are available.

In [None]:
# Find a few features with similar numeric size
m = cancer.mean()
m[(m > 10) & (m < 20)]

In [None]:
cancer[["mean radius", "mean texture", "mean radius"]].boxplot();


---

Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors