# Alignment

![](banner_alignment.jpg)

In [1]:
cat("Open code cell here for notebook apparatus.")

options(warn=-1)

# Load some required functions
library(lubridate, verbose=FALSE, warn.conflicts=FALSE, quietly=TRUE)

Open code cell here for notebook apparatus.

## Introduction

Text, text ...

## Discourse

### Data

Consider this pedagogical data.  Variables x1 and x2 are aligned at the same resolution.

In [14]:
data = data.frame(date=ymd(c("2017-12-30","2017-12-31","2018-01-01","2018-01-02","2018-01-03","2018-01-04")),
                  x1=c(1000, 1100, 900, 950, 800, 1050),
                  x2=c(72, 86, 76, 80, 82, 74))
data

date,x1,x2
2017-12-30,1000,72
2017-12-31,1100,86
2018-01-01,900,76
2018-01-02,950,80
2018-01-03,800,82
2018-01-04,1050,74


The consider this pedagogical data.  Variables x1 and x2 are not aligned at the same resolution.  x1 is relatively coarse resolution.  x2 is relatively fine resolution.

In [15]:
data.1 = data.frame(date=ymd(c("2017-12-30","2017-12-31","2018-01-01","2018-01-02","2018-01-03","2018-01-04")),
                    x1=c(1000, 1100, 900, 950, 800, 1050))

data.2 = data.frame(date=ymd_h(c("2017-12-30 00","2017-12-30 12","2017-12-31 00","2017-12-31 12","2018-01-01 00","2018-01-01 12","2018-01-02 00","2018-01-02 12","2018-01-03 00","2018-01-03 12","2018-01-04 00","2018-01-04 12")),
                    x2=c(72, 56, 86, 60, 76, 63, 80, 68, 82, 59, 74, 61))
data.1
data.2

date,x1
2017-12-30,1000
2017-12-31,1100
2018-01-01,900
2018-01-02,950
2018-01-03,800
2018-01-04,1050


date,x2
2017-12-30 00:00:00,72
2017-12-30 12:00:00,56
2017-12-31 00:00:00,86
2017-12-31 12:00:00,60
2018-01-01 00:00:00,76
2018-01-01 12:00:00,63
2018-01-02 00:00:00,80
2018-01-02 12:00:00,68
2018-01-03 00:00:00,82
2018-01-03 12:00:00,59


### Align Variables by Compression

Mark observations with sequence numbers.  The finer resolution table will adopt the coarser resolution table's sequence numbers, repeating as necessary.

In [16]:
data.1$step = 1:nrow(data.1)
data.1

date,x1,step
2017-12-30,1000,1
2017-12-31,1100,2
2018-01-01,900,3
2018-01-02,950,4
2018-01-03,800,5
2018-01-04,1050,6


In [20]:
1:nrow(data.1)
sort(rep(1:nrow(data.1), 2))

data.2$step = sort(rep(1:nrow(data.1), 2))
data.2

date,x2,step
2017-12-30 00:00:00,72,1
2017-12-30 12:00:00,56,1
2017-12-31 00:00:00,86,2
2017-12-31 12:00:00,60,2
2018-01-01 00:00:00,76,3
2018-01-01 12:00:00,63,3
2018-01-02 00:00:00,80,4
2018-01-02 12:00:00,68,4
2018-01-03 00:00:00,82,5
2018-01-03 12:00:00,59,5


Consolidate observations.  Choose an aggregator function that makes sense for the variable(s).

In [21]:
data.2.consolidated = aggregate(x2 ~ step, data.2, mean)
data.2.consolidated

step,x2
1,64.0
2,73.0
3,69.5
4,74.0
5,70.5
6,67.5


Join the tables.

In [22]:
merge(data.1, data.2.consolidated, by="step")[,-1]

date,x1,x2
2017-12-30,1000,64.0
2017-12-31,1100,73.0
2018-01-01,900,69.5
2018-01-02,950,74.0
2018-01-03,800,70.5
2018-01-04,1050,67.5


### Align Variables by Expansion

Expand observations, repeating as necessary.  Mark observations with sequence numbers.  The coarser resolution table will adopt the finer resolution table's sequence numbers.

In [23]:
1:nrow(data.1)
sort(rep(1:nrow(data.1), 2))

In [24]:
data.1.dilated = data.1[sort(rep(1:nrow(data.1), 2)),]
data.1.dilated$step = 1:nrow(data.2)
data.1.dilated

Unnamed: 0,date,x1,step
1.0,2017-12-30,1000,1
1.1,2017-12-30,1000,2
2.0,2017-12-31,1100,3
2.1,2017-12-31,1100,4
3.0,2018-01-01,900,5
3.1,2018-01-01,900,6
4.0,2018-01-02,950,7
4.1,2018-01-02,950,8
5.0,2018-01-03,800,9
5.1,2018-01-03,800,10


In [25]:
data.2$step = 1:nrow(data.2)
data.2

date,x2,step
2017-12-30 00:00:00,72,1
2017-12-30 12:00:00,56,2
2017-12-31 00:00:00,86,3
2017-12-31 12:00:00,60,4
2018-01-01 00:00:00,76,5
2018-01-01 12:00:00,63,6
2018-01-02 00:00:00,80,7
2018-01-02 12:00:00,68,8
2018-01-03 00:00:00,82,9
2018-01-03 12:00:00,59,10


Disaggregate with a disaggregation function that makes sense for the variable(s).

In [26]:
data.1.dilated$x1 = data.1.dilated$x1 / 2
data.1.dilated

Unnamed: 0,date,x1,step
1.0,2017-12-30,500,1
1.1,2017-12-30,500,2
2.0,2017-12-31,550,3
2.1,2017-12-31,550,4
3.0,2018-01-01,450,5
3.1,2018-01-01,450,6
4.0,2018-01-02,475,7
4.1,2018-01-02,475,8
5.0,2018-01-03,400,9
5.1,2018-01-03,400,10


Join the tables.

In [27]:
merge(data.1.dilated[-1], data.2, by="step")[,c(3,2,4)]

date,x1,x2
2017-12-30 00:00:00,500,72
2017-12-30 12:00:00,500,56
2017-12-31 00:00:00,550,86
2017-12-31 12:00:00,550,60
2018-01-01 00:00:00,450,76
2018-01-01 12:00:00,450,63
2018-01-02 00:00:00,475,80
2018-01-02 12:00:00,475,68
2018-01-03 00:00:00,400,82
2018-01-03 12:00:00,400,59


## Code Templates

| Key Functions | Reference Documentation |
|--|--|
| `aggregate` function from `stats` library | https://www.rdocumentation.org/packages/stats/versions/3.5.2/topics/aggregate |
| `merge` function from `base` library | https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/merge |


## Expectations

Know about this:
* How to align variables of differing resolutions, conceptually & using R

## Further Reading

Further reading coming soon ...

<font size=1;>
<p style="text-align: left;">
Copyright (c) Berkeley Data Analytics Group, LLC
<span style="float: right;">
document revised May 6, 2019
</span>
</p>
</font>