# statdepth: An Interactive Guide

In this notebook we'll be exploring `statdepth,` a Python package for computing statistical depth of univariate functional data, multivariate functional data, and pointcloud data for distributions in $\mathbb{R}^d$

We'll begin by importing some libraries we may need

In [3]:
import numpy as np
import pandas as pd

from statdepth import FunctionalDepth, PointwiseDepth
from statdepth.testing import generate_noisy_pointcloud, generate_noisy_univariate

We'll now generate some random univariate functions with similar shape and some noise.

In [13]:
df = generate_noisy_univariate(data=[2,3,3.4,4,5,3.1,3,3,2]*2)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,1.797507,1.191882,1.798602,1.873556,1.827389,1.710321,0.069087,1.454418,0.79263,1.963514,1.101313,0.020255,1.36717,0.733254,1.417815,1.283236,1.975659,0.310327,0.361743,0.402736
1,2.69626,1.787823,2.697903,2.810334,2.741083,2.565481,0.10363,2.181627,1.188945,2.945271,1.65197,0.030382,2.050755,1.099881,2.126722,1.924854,2.963489,0.465491,0.542614,0.604104
2,3.055761,2.026199,3.057623,3.185045,3.106561,2.907546,0.117447,2.47251,1.347471,3.337974,1.872233,0.034433,2.324189,1.246532,2.410285,2.181502,3.358621,0.527556,0.614962,0.684651
3,3.595013,2.383764,3.597204,3.747112,3.654778,3.420642,0.138173,2.908835,1.58526,3.927028,2.202627,0.040509,2.73434,1.466508,2.835629,2.566472,3.951319,0.620654,0.723485,0.805472
4,4.493767,2.979705,4.496505,4.68389,4.568472,4.275802,0.172716,3.636044,1.981574,4.908785,2.753283,0.050637,3.417926,1.833135,3.544537,3.20809,4.939148,0.775818,0.904357,1.00684


Now we'll use our library to calculate band depth (using standard containment on $\mathbb{R}^2$

In [14]:
bd = FunctionalDepth([df], J=2, relax=False)

Well, we can first look at the $n$ deepest and most outlying curves

In [44]:
bd.deepest(n=5)

29    0.047619
23    0.038095
17    0.033333
14    0.033333
13    0.033333
dtype: float64

In [45]:
bd.outlying(n=5)

30    0.0
28    0.0
27    0.0
26    0.0
0     0.0
dtype: float64

But this is much more meaningful with visuals!

In [16]:
bd.plot_deepest(n=3)

We can also plot the most outlying functions

In [17]:
bd.plot_outlying(n=3)

Now, let's try calculating band depth for some pointcloud data. Maybe you've sampled $n$ points from some distribution in $R^d$, and you'd like to understand which points are the most "central".

First, let's try this for some points sampled in $\mathbb{R^2}$

In [31]:
df = generate_noisy_pointcloud(n=50, d=2)
bd = PointwiseDepth(df, K=8, containment='simplex')

We can look at deepest points

In [32]:
bd.deepest(n=5)

22    0.158333
13    0.140476
1     0.123810
32    0.123810
41    0.120238
dtype: float64

Again, we can plot our data. Here, the lighter the color the deeper (more central) the point.

In [46]:
bd.plot_depths(invert_colors=False)

We can also just plot the $n$ deepest points. 

In [34]:
bd.plot_deepest(n=3)

Or even the $n$ most outlying points, since often it's nice to know which data we should consider to be outliers

In [37]:
bd.plot_outlying(n=8)

But of course, if we're just defining depth using a certain measure of containment, there is no reason it shouldn't generalize to arbitrary dimensions. And indeed, this is the case. Let's take a look at some data in $\mathbb{R}^3$.

Notice, we're using sample depth because if we were to compute depth precisely, we'd be calculating about 500k simplices for each of our 50 datapoints, which can become unweildy fast.

However, it turns out that sample band depth is quite accurate for $K << n$, where $n$ is our number of datapoints, so this is definitely worth it.

In [47]:
df = generate_noisy_pointcloud(n=50, d=3)
bd = PointwiseDepth(df, K=8, containment='simplex')

In [48]:
bd.deepest(n=5)

26    0.042857
33    0.041270
49    0.038095
27    0.033333
41    0.028571
dtype: float64

Well, looking at the 5 deepest points is interesting, but it's a lot more meaningful visually.

In [49]:
bd.plot_depths()

Or, we could just plot the $n$ deepest points

In [50]:
bd.plot_deepest(n=3)

It turns out this generalizes to higher dimensions, but in this case we're forced to plot parallel axis.