# Rendered Variables

![](banner_calculated_variables.jpg)

In [3]:
cat("Open code cell here for notebook apparatus.")

options(warn=-1)

# Load some required functions
library(psych, verbose=FALSE, warn.conflicts=FALSE, quietly=TRUE)

Open code cell here for notebook apparatus.

## Introduction


Change or enhance how data is represented without adding unobserved any measurements.

Transform one or more variables to a new variable by applying a function one observation at a time, possibly with offsets. The transform function can be a unit conversion; a linear re-combinaton; a non-linear recombination such as power, log, exponential, or trigonometric; a recombination with observation offsets; a descriptive statistic; or any arbitrary function.

## Discourse

### By Unit Conversion

In [1]:
data = data.frame(distance.mi=c(1.7,2.3,3.1,4.6,26.1))
data

distance.mi
1.7
2.3
3.1
4.6
26.1


In [2]:
data$distance.km = data$distance.mi * 1.609
data

distance.mi,distance.km
1.7,2.7353
2.3,3.7007
3.1,4.9879
4.6,7.4014
26.1,41.9949


### By Linear Recombination

In [12]:
data = data.frame(age=c(45, 47, 55, 18, 22), student=c(FALSE, TRUE, FALSE, TRUE, TRUE))
data

age,student
45,False
47,True
55,False
18,True
22,True


In [15]:
data$score = 0.5*(100-data$age) + 50*as.numeric(data$student)
as.numeric(data$student)
data

age,student,score
45,False,27.5
47,True,76.5
55,False,22.5
18,True,91.0
22,True,89.0


### By Non-Linear Recombination

In [49]:
data = data.frame(x=c(3,4,5,4,5,5), y=c(25,42,53,47,51,65))
data

x,y
3,25
4,42
5,53
4,47
5,51
5,65


In [50]:
data$square.x = data$x^2
data$square.y = data$y^2
data$xy = data$x * data$y
data$ln.x = log(data$x)
data$ln.y = log(data$y)
data$sin.x = sin(data$x)
data

x,y,square.x,square.y,xy,ln.x,ln.y,sin.x
3,25,9,625,75,1.098612,3.218876,0.14112
4,42,16,1764,168,1.386294,3.73767,-0.7568025
5,53,25,2809,265,1.609438,3.970292,-0.9589243
4,47,16,2209,188,1.386294,3.850148,-0.7568025
5,51,25,2601,255,1.609438,3.931826,-0.9589243
5,65,25,4225,325,1.609438,4.174387,-0.9589243


### By Recombinaton with Observation Offsets

#### Example: Determine the average periodic return.

In [30]:
data = data.frame(year=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019),
                  start_balance=c(100,77.90,100.26,111.19,116.63,135.06,142.49,89.77,113.56,130.70,133.45))
data

year,start_balance
2009,100.0
2010,77.9
2011,100.26
2012,111.19
2013,116.63
2014,135.06
2015,142.49
2016,89.77
2017,113.56
2018,130.7


In [31]:
data$start_balance[2:nrow(data)]
c(data$start_balance[2:nrow(data)], NA)

data$end_balance = c(data$start_balance[2:nrow(data)], NA)
data

year,start_balance,end_balance
2009,100.0,77.9
2010,77.9,100.26
2011,100.26,111.19
2012,111.19,116.63
2013,116.63,135.06
2014,135.06,142.49
2015,142.49,89.77
2016,89.77,113.56
2017,113.56,130.7
2018,130.7,133.45


In [32]:
data$return = (data$end_balance - data$start_balance) / data$start_balance
data

year,start_balance,end_balance,return
2009,100.0,77.9,-0.221
2010,77.9,100.26,0.28703466
2011,100.26,111.19,0.10901656
2012,111.19,116.63,0.04892526
2013,116.63,135.06,0.15802109
2014,135.06,142.49,0.05501259
2015,142.49,89.77,-0.36999088
2016,89.77,113.56,0.26501058
2017,113.56,130.7,0.15093343
2018,130.7,133.45,0.02104055


In [33]:
data$growth = 1 + data$return
data

year,start_balance,end_balance,return,growth
2009,100.0,77.9,-0.221,0.779
2010,77.9,100.26,0.28703466,1.2870347
2011,100.26,111.19,0.10901656,1.1090166
2012,111.19,116.63,0.04892526,1.0489253
2013,116.63,135.06,0.15802109,1.1580211
2014,135.06,142.49,0.05501259,1.0550126
2015,142.49,89.77,-0.36999088,0.6300091
2016,89.77,113.56,0.26501058,1.2650106
2017,113.56,130.7,0.15093343,1.1509334
2018,130.7,133.45,0.02104055,1.0210406


In [45]:
# a nonsense statistic sometimes mistaken for the average annual return
data.frame(wrong.average.annual.return=mean(data$return[1:10]))

wrong.average.annual.return
0.05040038


In [39]:
data.frame(average.annual.return=geometric.mean(data$growth[1:10]) - 1)

average.annual.return
0.02927603


### By Descriptive Statistic

Treat high-tech industry return as the average of Apple, Dell, IBM, and Microsoft returns. 

In [47]:
data = read.csv("High-Tech Stocks.csv", header=TRUE)
data

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.15909090,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.000,98.625,92.500,19900131
1990.125,0.003235,0.35135135,0.06550063,0.06756756,0.014901,0.008539,6.250,34.000,103.875,98.750,19900228
1990.208,0.183824,0.22000000,0.02166065,0.12151898,0.024140,0.024255,7.625,40.250,106.125,110.750,19900330
1990.292,-0.021739,0.11475410,0.02709069,0.04740406,-0.028286,-0.026887,8.500,39.375,109.000,58.000,19900430
1990.375,0.050413,0.29411766,0.11201835,0.25862068,0.088936,0.091989,11.000,41.250,120.000,73.000,19900531
1990.458,0.084848,0.14772727,-0.02083330,0.04109589,-0.004196,-0.008886,12.625,44.750,117.500,76.000,19900629
1990.542,-0.061453,-0.06930690,-0.05106380,-0.12500000,-0.009405,-0.005223,11.750,42.000,111.500,66.500,19900731
1990.625,-0.116429,0.00000000,-0.07547090,-0.07518800,-0.091896,-0.094314,11.750,37.000,101.875,61.500,19900831
1990.708,-0.216216,-0.25531910,0.04417178,0.02439024,-0.053843,-0.051184,8.750,29.000,106.375,63.000,19900928
1990.792,0.060345,0.21428572,-0.00940070,0.01190476,-0.012504,-0.006698,10.625,30.750,105.375,63.750,19901031


To inspect the returns columns of data, with the columns sorted to present `SP.500.Return` first, we reference the column positions we want in the order we want, in this case `data[,c(7, 2:5)]`. We assign the resulting table the new name `datax`. Note the 1st column of `datax` matches the 7th column of data, the 2nd column of `datax` matches the 2nd column of data, etc. Taken all together, it looks like this:
`datax = data[,c(7, 2:5)]`

To add a new column `Mean.Tech.Return` to `datax`, we assign a vector of mean tech return values to that column - the column will be created automatically.

To get the vector of mean tech return values, we compute the mean tech return value for each row, stepping through rows one at a time, and then combine all the resulting values together into a vector. We use the `rowMeans` function to step through the rows.

In [2]:
datax = data[,c(7, 2:5)]
datax$Mean.Tech.Return = rowMeans(data[,2:5])                                     
datax  

SP.500.Return,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Mean.Tech.Return
-0.068817,-0.035461,-0.15909090,0.04780877,0.06321839,-0.02088119
0.008539,0.003235,0.35135135,0.06550063,0.06756756,0.12191364
0.024255,0.183824,0.22000000,0.02166065,0.12151898,0.13675091
-0.026887,-0.021739,0.11475410,0.02709069,0.04740406,0.04187746
0.091989,0.050413,0.29411766,0.11201835,0.25862068,0.17879242
-0.008886,0.084848,0.14772727,-0.02083330,0.04109589,0.06320947
-0.005223,-0.061453,-0.06930690,-0.05106380,-0.12500000,-0.07670593
-0.094314,-0.116429,0.00000000,-0.07547090,-0.07518800,-0.06677197
-0.051184,-0.216216,-0.25531910,0.04417178,0.02439024,-0.10074327
-0.006698,0.060345,0.21428572,-0.00940070,0.01190476,0.06928370


## Code

### Useful Functions

## Expectations

Know about this:
* xxxxx
* xxxxx

## Further Reading

* References to be added here ...


<font size=1;>
<p style="text-align: left;">
Copyright (c) Berkeley Data Analytics Group, LLC
<span style="float: right;">
document revised May 6, 2019
</span>
</p>
</font>