Examples of data exploration, using [this dataset](https://catalogue.data.govt.nz/dataset/traffic-counts-hamilton-city-council) from the huge range of NZ central/local government datasets at [catalogue.data.govt.nz](https://catalogue.data.govt.nz). (Thanks to Peter for pointing me at that resource.)

Useful links:
* Pandas: https://pandas.pydata.org/
* Matplotlib: https://matplotlib.org/

Recap/udpate of my previous notes on Matplotlib:

  * every part of a chart is an “Artist” object <https://matplotlib.org/stable/api/artist_api.html>.
  * each figure in a plot (each “subplot”) is an “Axes” object <https://matplotlib.org/stable/api/axes_api.html>,
    the axes of which are “Axis” objects <https://matplotlib.org/stable/api/axis_api.html>.
  * pyplot is a whole separate thing <https://matplotlib.org/stable/api/pyplot_summary.html>.
    I think this is meant to provide a more MATLAB-like charting API. But what
    do I know...
  * 3D plotting is a separate API layer, beginning here
    <https://matplotlib.org/stable/api/toolkits/mplot3d.html>. Third-party toolkits like Seaborn are listed here: <https://matplotlib.org/mpl-third-party/>.


In [None]:
from ipywidgets.widgets import \
    interact

# use common short names
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

Load in the sample data set and do some initial exploration.

In [None]:
dats1 = pd.read_csv("Traffic Counts - Hamilton City Council.csv")
dats1

In [None]:
dats1.columns

X- and Y- values might be some kind of coordinate system. Units seem to be metres.

In [None]:
xmean = dats1["X"].mean()
ymean = dats1["Y"].mean()
xmean, ymean

Create a local coordinate system, with origin in about the middle of the points.

In [None]:
dats1.insert(len(dats1.columns), "dX", dats1["X"] - xmean)
dats1.insert(len(dats1.columns), "dY", dats1["Y"] - ymean)

In [None]:
dats1[["dX", "dY"]]

Easy reference to those “Yearnnnn” columns:

In [None]:
year_cols = list("Year%4d" % y for y in range(2000, 2019))
min_year = 2000
year_cols, min_year

In [None]:
dats1.groupby("Direction").count()

In [None]:
sorted(list(dats1["Site_Name"]))

In [None]:
dats1.loc[list("WAIRERE" in d for d in dats1["Site_Name"])][["Site_Name", "Site_Locat", "Direction", "dX", "dY"]]

In [None]:
dats2 = dats1[["Site_Name", "Site_Locat"] + list("Year%4d" % y for y in range(2000, 2019))]

In [None]:
dats2.sort_values(by = "Year2004", ascending = False)

In [None]:
dats3 = dats2[dats2["Site_Name"] == "KAHIKATEA DRIVE"]

dats3

In [None]:
dats4 = dats3.loc[66, ["Site_Name"] + list("Year%4d" % y for y in range(2000, 2019))]

dats4

In [None]:
dats4[1:].plot(title = "Kahikatea Drive West")

In [None]:
dats3.loc[115, ["Site_Name"] + list("Year%4d" % y for y in range(2000, 2019))][1:].plot(title = "Kahikatea Drive East")

In [None]:
dats5 = dats2[dats2["Site_Name"] == "GREENWOOD STREET"]

dats5

In [None]:
dats2.loc[146, ["Site_Name"] + list("Year%4d" % y for y in range(2000, 2019))][1:].plot()

In [None]:
plot_data = dats1.loc[list("WAIRERE" in d for d in dats1["Site_Name"])][["Site_Name", "Site_Locat"] + year_cols]

plot_data.index

In [None]:
def format_sites(tickval, tickpos) :
    tickval = float(tickval)
      # avoid warnings about using numpy bools as indexes
    return \
        (
            lambda : "",
            lambda : plot_data["Site_Locat"].index[int(tickval)],
        )[tickval == int(tickval) and 0 <= tickval and tickval < len(plot_data.index)]()
#end format_sites

@interact(elev = (-90, 90, 10), azi = (-90, 90, 10))
def plotit(elev, azi) :
    fig = plt.figure(figsize = (10, 12))
    fig.suptitle(plot_data["Site_Name"].iloc[3], y = 0.7)
      # docs: <https://matplotlib.org/stable/api/figure_api.html>
    ax = fig.add_axes((0, 0, 1, 1), projection = '3d')
    # Not sure why, but collapsing above two lines into single line
    #     ax = plt.figure(figsize = (10, 12)).add_subplot(projection = '3d')
    # (as per the example code here <https://matplotlib.org/stable/gallery/mplot3d/polys3d.html#sphx-glr-gallery-mplot3d-polys3d-py>)
    # produces a slightly smaller plot.
    verts = \
        [
            [
                (i + min_year, plot_data.loc[u][y])
                for i, y in enumerate(year_cols)
            ]
            for u in plot_data.index
        ]
    for row in verts :
        row[0] = (row[0][0], 0)
        row[-1] = (row[-1][0], 0)
    #end for
    #print(verts)
    poly = matplotlib.collections.PolyCollection \
      (
        verts,
        facecolors = list
          (
            matplotlib.colors.hsv_to_rgb((i / len(plot_data.index), 0.5, 0.8))
            for i in range(len(plot_data.index))
          )
      )
    poly.set_alpha(0.7)
    spacing = 3
      # fudge factor to get positioning of curves roughly correct
    ax.add_collection3d \
      (
        poly,
        zs = list(spacing * i for i in range(len(plot_data["Site_Name"]))),
        zdir = "x"
      )
    ax.xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(format_sites))
    ax.set \
      (
        xlim = (-1, len(plot_data["Site_Name"])),
        ylim = (min_year - 1, min_year + len(year_cols) + 1),
        zlim = (0, max(max(c[1] for c in row) for row in verts)),
        xlabel = "PLACE",
        ylabel = "YEAR",
        zlabel = "TRAFFIC"
      )
    ax.view_init(elev, azi)

    plt.show()
#end plotit
