# EDA Visualizations


## 1D Visualizations

In the cases of many units of measurement (many users, members, etc.) we consider
multiple time series in parallel. It can be interesting to stack these visually, emphasiz‐
ing individual units of analysis and their respective time frames. We ignore the values
measured and rather take the existence of data over a given range as the information
of interest. The time span itself becomes the unit of analysis. Here we use R’s timevis
package, but there are many other options available.

In [1]:
library(timevis)
library(data.table)

ERROR: Error in library(timevis): there is no package called ‘timevis’


In [2]:
donations <- fread('donations.csv')
d <- donations[, .(min(timestamp), max(timestamp)), user]
names(d) <- c('content', 'start', 'end')
d <- d[start != end]

ERROR: Error in fread("donations.csv"): could not find function "fread"


In [3]:
timevis(d[sample(1:nrow(d), 20)])

ERROR: Error in timevis(d[sample(1:nrow(d), 20)]): could not find function "timevis"


The chart helps us see that we probably have “busy” periods globally
across the member population. We also glean some sense of the distribution of active
donation spans in a member’s “lifetime” in our organization.

Gantt charts have been used for over a century, most often for project management
tasks. They came about independently in many different industries, and the idea is
intuitive as soon as you see one. Despite the project management origins, Gantt
charts can be useful in time series analysis where there are many independent actors,
rather than a single process being measured.

## 2D Visualizations

Now we’ll use the AirPassengers data to see the seasonality and the trend, but we
shouldn’t think of time as linear. In particular, time happens on more than one axis.
There is, of course, the axis of time going forward from day to day and year to year,
but we can also consider laying time out along the axis of hour of the day or day of
the week, and so on. In this way, we can more easily think about seasonality, such as
certain behaviors happening at a certain time of the day or month of the year.

We extract the data from the AirPassengers ts object and put it into appropriate
matrix form:


In [12]:
t(matrix(AirPassengers, nrow = 12, ncol = 12))

0,1,2,3,4,5,6,7,8,9,10,11
112,118,132,129,121,135,148,148,136,119,104,118
115,126,141,135,125,149,170,170,158,133,114,140
145,150,178,163,172,178,199,199,184,162,146,166
171,180,193,181,183,218,230,242,209,191,172,194
196,196,236,235,229,243,264,272,237,211,180,201
204,188,235,227,234,264,302,293,259,229,203,229
242,233,267,269,270,315,364,347,312,274,237,278
284,277,317,313,318,374,413,405,355,306,271,306
315,301,356,348,355,422,465,467,404,347,305,336
340,318,362,348,363,435,491,505,404,359,310,337
