# Data Manipulation in R $\,\, \tiny\text{Primer}$
<img src="images/banner_dm.jpg" align=left>

<br>
This primer assumes that you are familiar with Jupyter, but not yet familiar with R.

In this lesson ...
* Using R Code
* Apparatus
* Retrieve Data
* Slice the Table by Rows
* Slice the Table by Rows and Columns (positions)
* Slice the Table by Rows and Columns (names)
* Slice the Table by Rows and One Column
* Slice the Table by Rows and Columns (criterion)
* Slice the Table by First, Last, or Random Rows, and by Columns
* More About R

<br>
___
### Using R Code $\, \tiny\text{(video 1A)}$

Here are a few key concepts about using R code.  Subsequent discussions will elaborate on them.

R is a programming language well-suited to data analysis.

R makes use of **tables** and **vectors** to hold collections of values: 
- **Table:** Holds a (2-dimensional) table of values, indexed by row and column.  Each row has a row number.  Each column has a column number and name, and holds values of a specific type (e.g., continuous values, character string values, or categorical values).  All values in a single column must be of the same type, but any specific column need not have values of the same type as any other column.  Note, a table is sometimes also known as a data frame.


- **Vector:** Holds a (1-dimensional) vector of values, like a column of values, indexed by position.  Each position has a position number.  A vector holds values of a specific type (e.g., continuous values, character string values, or categorical values).  All values contained by a vector must be of the same type.

R also makes use of **functions** to make calculations and/or perform actions (e.g., `max(...)`, `min(...)`).  Each function generally takes some parameters and returns a table, vector, or single value.  A function distinguishes parameters by their order and/or by their names. A function's specific behavior depends on the types of its parameters.

R provides a special function (c) to **concatenate** multiple vectors into a single vector.

<br>
___
### Apparatus $\, \tiny\text{(video 1B)}$

Load function libraries, define additional useful functions, and set defaults here.

The R system provides us a specific set of commonly used functions.  We can use the `library` function to load additional packages of functions.  Each time we use the `library` function, the 1st parameter is the name of a library to be loaded, the 2nd, 3rd, and 4th parameters have to do with system messages and are optional.  There are hundreds of packages available.  

Here we load the following additional packages of functions:  
* `dplyr` to enable sampling operations 

Preceding the `library` function is a comment.

In [1]:
options(warn=-1)

# Load some required functions
library(dplyr, verbose=FALSE, warn.conflicts=FALSE, quietly=TRUE)

<br>
___
### Retrieve Data $\, \tiny\text{(video 1B)}$

The dataset that we use here is called ** $\,$ `DATASET_High-Tech_Stocks.csv` $\,$ **.

To retrieve a dataset, we use the `read.csv` function.  This function assumes the dataset is in CSV format. (There are other functions for retrieving datasets that assume different formats.)  The 1st parameter is the filename of the file in which the dataset resides, in this case `DATASET_High-Tech_Stocks.csv`.  The `sheetIndex` parameter indicates which worksheet in the file has the dataset, in this case the 1st worksheet.  The `header` parameter indicates whether or not the 1st row of the dataset should be interpretted as a column heading, in this case it should.

The result of the function is a table, with each row corresponding to 1 observation in the dataset. We assign it the name `data` so that we can use it later.

In [2]:
data = read.csv("../Data/DATASET_High-Tech_Stocks.csv", header=TRUE)

<br>
To confirm that the dataset has been retrieved and assigned, we use the `dim` function, which returns the table's number of rows and the number of columns.  In this case, the table has 261 rows and 12 columns.

In [3]:
dim(data)

<br>
To further confirm that the dataset has been retrieved and assigned, we use the `colnames` function, which returns the table's column names.  Note, column names have been automatically renamed to be more convenient for us (certain characters have been substituted with the `.` character).

In [4]:
colnames(data)

<br>
___
### Slice the Table by Rows $\, \tiny\text{(video 1C)}$
Indicate rows by their positions.

To inspect all or part of the dataset, we reference the `data` table retrieved earlier.  We indicate which rows and columns of the table we want with `[...]` notation - specific row positions and column positions separated by a comma and enclosed within brackets.  We start counting row positions and column positions at 1.  If we want all rows, then we can leave the row positions blank and all rows will be assumed.  Similarly, if we want all columns, then we can leave the column positions blank and all columns will be assumed.

So, to inspect the 1st row of data, we reference `data[1,]`.  The row position (in this case just 1 row) is 1.  The column positions are blank and so assumed to be all columns.  Note, the comma is required even though we left the column positions blank.  The output is presented as a table with 12 columns, with headings, each with 1 value per column corresponding to the 1st observation in the dataset.

In [5]:
data[1,]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34,98.625,92.5,19900131


<br>
To inpect the 5th row of data, we reference `data[5,]`.  The row index (in this case just 1 row) is 5.

In [6]:
data[5,]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
5,1990.375,0.050413,0.2941177,0.1120184,0.2586207,0.088936,0.091989,11,41.25,120,73,19900531


<br>
To inspect the first 3 rows of data, we reference `data[1:3,]`.  We indicate a sequence of positions with the `:` notation - the left side is the start position, the right side is the stop position.  So, the row positions indicated by `1:3` are 1, 2, and 3.  The output is presented as a table with 12 columns, with headings, each with 3 values per column corresponding to the first 3 observations in the dataset.

In [7]:
data[1:3,]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.0,98.625,92.5,19900131
1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.014901,0.008539,6.25,34.0,103.875,98.75,19900228
1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330


<br>
To inspect the 2nd through 5th rows of data, we reference `data[2:5,]`.  The row positions indicated by `2:5` are 2, 3, 4, and 5.

In [8]:
data[2:5,]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
2,1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.014901,0.008539,6.25,34.0,103.875,98.75,19900228
3,1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330
4,1990.292,-0.021739,0.1147541,0.02709069,0.04740406,-0.028286,-0.026887,8.5,39.375,109.0,58.0,19900430
5,1990.375,0.050413,0.2941177,0.11201835,0.25862068,0.088936,0.091989,11.0,41.25,120.0,73.0,19900531


<br>
To inspect the 1st and 3rd rows of the data, we reference `data[c(1,3),]`.  We indicate a sequence of positions with the `c` function.  So, the row positions described by `c(1,3)` are 1 and 3.  Note, the `:` notation would not be appropriate here, because it does not allow skipped positions.  Note also, it would not be appropriate here to describe the row positions without the `c` function because the row positions would not otherwise be distinguisghable from the column positions.  The output is presented as a table with 12 columns, with headings, each with 2 values per column corresponding to the 1st and 3rd observations in the dataset.

In [9]:
data[c(1,3),]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1,1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.0,98.625,92.5,19900131
3,1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330


<br>
___
### Slice the Table by Rows and Columns $\, \tiny\text{(video 1D)}$
Indicate rows and columns by their positions.

<br>
To inspect the data at the intersection of row 1 and column 2, we reference `data[1,2]`.  Because this references a single value, association with a table is dropped, and the output is presented as single value.

In [10]:
data[1,2]

<br>
To inspect only the first 5 columns of the 1st row of data, we reference `data[1,1:5]`.

In [11]:
data[1,1:5]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839


<br>
To inspect only the first 5 columns of the first 3 rows of data, we reference `data[1:3,1:5]`.

In [12]:
data[1:3,1:5]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839
1990.125,0.003235,0.3513514,0.06550063,0.06756756
1990.208,0.183824,0.22,0.02166065,0.12151898


<br>
To inspect the first 5 columns and the 7th column of the 2nd and 4th rows of data, we reference `data[c(2,4),c(1:5,7)]`.  Note, the `c` function can take a combination of `:`-style sequences and single positions as parameters.

In [13]:
data[c(2,4),c(1:5,7)]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,SP.500.Return
2,1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.008539
4,1990.292,-0.021739,0.1147541,0.02709069,0.04740406,-0.026887


<br>
___
### Slice the Table by Rows and Columns $\, \tiny\text{(video 1D)}$
Indicate columns by their names.

As a sometimes convenient alternative to referencing columns by their positions, we can indicate them by their names.  Note, column names are described as strings and so must each be enclosed within `"..."`.

In [14]:
data[1,"Apple.Return"]

<br>

In [15]:
data[1,c("Date","Apple.Return","Dell.Return","IBM.Return","Microsoft.Return")]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839


<br>

In [16]:
data[1:3,c("Date","Apple.Return","Dell.Return","IBM.Return","Microsoft.Return")]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839
1990.125,0.003235,0.3513514,0.06550063,0.06756756
1990.208,0.183824,0.22,0.02166065,0.12151898


<br>

In [17]:
data[c(1,3),c("Date","Apple.Return","Dell.Return","IBM.Return","Microsoft.Return","SP.500.Return")]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,SP.500.Return
1,1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.068817
3,1990.208,0.183824,0.22,0.02166065,0.12151898,0.024255


<br>
___
### Slice the Table by Rows and One Column $\, \tiny\text{(video 1D)}$
Indicate the column by its name using \$ notation.

As a sometimes convenient alternative to referencing columns using the `[...]` notation, if we want just one column, then we can use the `$` notation - a table name followed by the `$` symbol, followed by a column name, followed by row positions enclosed within `[...]`.

So, to inspect the `Apple.Return` column of the 1st row of data, we reference `data$Apple.Return[1]`.  Note, in this notation, the column name is described not as a string, so it is not enclosed within `"..."`.  Note also, because this references a single value, association with a table is dropped, and the output is presented as single value.  (We cannot force the reference to keep an association with a table.)

In [18]:
data$Apple.Return[1]

<br>
To inspect the `Apple.Return` column of the 1st 3 rows of data, we reference `data$Apple.Return[1:3]`.  Note, because these reference vectors of values, associations with tables are dropped, and the outputs are presented as vectors. (We cannot force these references to keep associations with tables.)

In [19]:
data$Apple.Return[1:3]

<br>
Similarly, to inspect the `Apple.Return` column of the 1st and 3rd rows of data, we reference `data$Apple.Return[c(1,3)]`.

In [20]:
data$Apple.Return[c(1,3)]

<br>
___
### Slice the Table by Rows and Columns $\, \tiny\text{(video 1E)}$
Indicate the rows by a criterion.

To inspect the part of the dataset that satisfies a specific criterion, we again use the `[...]` notation, but indicate the rows we want by TRUE/FALSE expressions involving the columns.  (This is similar to, but more general than, Excel's filtering functionality.)

To inspect only rows where the `Date` column value is less than 1991 (i.e., observations from earlier than the year 1991), we reference `data[data$Date < 1991,]`.  The row positions are indicated by the expression `data$Data < 1991` - every value in the `Date` column is compared to 1991, and rows selected accordingly.  The column positions are blank and so assumed to be all columns.

In [21]:
data[data$Date < 1991,]

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.0,98.625,92.5,19900131
1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.014901,0.008539,6.25,34.0,103.875,98.75,19900228
1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330
1990.292,-0.021739,0.1147541,0.02709069,0.04740406,-0.028286,-0.026887,8.5,39.375,109.0,58.0,19900430
1990.375,0.050413,0.2941177,0.11201835,0.25862068,0.088936,0.091989,11.0,41.25,120.0,73.0,19900531
1990.458,0.084848,0.1477273,-0.0208333,0.04109589,-0.004196,-0.008886,12.625,44.75,117.5,76.0,19900629
1990.542,-0.061453,-0.0693069,-0.0510638,-0.125,-0.009405,-0.005223,11.75,42.0,111.5,66.5,19900731
1990.625,-0.116429,0.0,-0.0754709,-0.075188,-0.091896,-0.094314,11.75,37.0,101.875,61.5,19900831
1990.708,-0.216216,-0.2553191,0.04417178,0.02439024,-0.053843,-0.051184,8.75,29.0,106.375,63.0,19900928
1990.792,0.060345,0.2142857,-0.0094007,0.01190476,-0.012504,-0.006698,10.625,30.75,105.375,63.75,19901031


<br>
To inspect only rows where the `Date` column is between 1991 (inclusive) and 1992 (exclusive), (i.e., observations from the year 1991), we reference `data[(data$Date >= 1991) & (data$Date < 1992),]`.  The row positions are again indicated by an expression, this time a complex expression using the AND operator (`&`) to get the intersection of 2 sets of row positions.  The column positions are blank and so assumed to be all columns.

In [22]:
data[(data$Date >= 1991) & (data$Date < 1992),]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
13,1991.042,0.290698,0.22297297,0.12168141,0.3039867,0.049078,0.041518,22.625,55.5,126.75,98.125,19910131
14,1991.125,0.033694,0.1160221,0.02532544,0.05732484,0.075847,0.067281,25.25,57.25,128.75,103.75,19910228
15,1991.208,0.187773,0.12871288,-0.115534,0.02289157,0.028923,0.022203,28.5,68.0,113.875,106.125,19910328
16,1991.292,-0.191176,-0.1798246,-0.0954994,-0.0671378,0.003322,0.000346,23.375,55.0,103.0,99.0,19910430
17,1991.375,-0.143273,0.05882353,0.04208738,0.10858586,0.040732,0.038577,24.75,47.0,106.125,109.75,19910531
18,1991.458,-0.117021,-0.010101,-0.0848057,-0.0689066,-0.044029,-0.047893,24.5,41.5,97.125,68.125,19910628
19,1991.542,0.114458,0.17346939,0.04247104,0.07889909,0.046795,0.044859,28.75,46.25,101.25,73.5,19910731
20,1991.625,0.148541,0.13478261,-0.0312593,0.15986395,0.026819,0.019649,32.625,53.0,96.875,85.25,19910830
21,1991.708,-0.066038,0.02298851,0.06967742,0.04398827,-0.010975,-0.019144,33.375,49.5,103.625,89.0,19910930
22,1991.792,0.040404,-0.2546816,-0.0518697,0.05477528,0.017789,0.01186,24.875,51.5,98.25,93.875,19911031


<br>
As an alternative way to inspect only observations from the year 1991, we reference `data[!((data$Date < 1991) | (data$Date >= 1992)),]`.  The row positions are again indicated by an expression, this time a complex expression using the NOT operator (`!`) and the OR operator (`|`).  The column positions are blank and so assumed to be all columns.

In [23]:
data[!((data$Date < 1991) | (data$Date >= 1992)),]

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
13,1991.042,0.290698,0.22297297,0.12168141,0.3039867,0.049078,0.041518,22.625,55.5,126.75,98.125,19910131
14,1991.125,0.033694,0.1160221,0.02532544,0.05732484,0.075847,0.067281,25.25,57.25,128.75,103.75,19910228
15,1991.208,0.187773,0.12871288,-0.115534,0.02289157,0.028923,0.022203,28.5,68.0,113.875,106.125,19910328
16,1991.292,-0.191176,-0.1798246,-0.0954994,-0.0671378,0.003322,0.000346,23.375,55.0,103.0,99.0,19910430
17,1991.375,-0.143273,0.05882353,0.04208738,0.10858586,0.040732,0.038577,24.75,47.0,106.125,109.75,19910531
18,1991.458,-0.117021,-0.010101,-0.0848057,-0.0689066,-0.044029,-0.047893,24.5,41.5,97.125,68.125,19910628
19,1991.542,0.114458,0.17346939,0.04247104,0.07889909,0.046795,0.044859,28.75,46.25,101.25,73.5,19910731
20,1991.625,0.148541,0.13478261,-0.0312593,0.15986395,0.026819,0.019649,32.625,53.0,96.875,85.25,19910830
21,1991.708,-0.066038,0.02298851,0.06967742,0.04398827,-0.010975,-0.019144,33.375,49.5,103.625,89.0,19910930
22,1991.792,0.040404,-0.2546816,-0.0518697,0.05477528,0.017789,0.01186,24.875,51.5,98.25,93.875,19911031


<br>
To inspect only rows where the `Date` column is between 1991 (inclusive) and 1992 (exclusive), and only the `Date` and `Apple.Return` columns, we reference `data[(data$Date >= 1991) & (data$Date < 1992), c("Date", "Apple.Return")]`.

In [24]:
data[(data$Date >= 1991) & (data$Date < 1992), c("Date", "Apple.Return")]

Unnamed: 0,Date,Apple.Return
13,1991.042,0.290698
14,1991.125,0.033694
15,1991.208,0.187773
16,1991.292,-0.191176
17,1991.375,-0.143273
18,1991.458,-0.117021
19,1991.542,0.114458
20,1991.625,0.148541
21,1991.708,-0.066038
22,1991.792,0.040404


<br>
___
### Slice the Table by First, Last, or Random Rows, and by Columns $\, \tiny\text{(video 1F)}$

As a sometimes convenient alternative to referencing rows by their positions or using criteria, we can use the `head` and `tail` functions.

To inspect the first 6 rows of data, we use `head(data)`.  The first parameter is the dataset, in this case `data`.  If no other parameters are provided, then the first 6 rows is assumed.

In [25]:
head(data)

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.0,98.625,92.5,19900131
1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.014901,0.008539,6.25,34.0,103.875,98.75,19900228
1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330
1990.292,-0.021739,0.1147541,0.02709069,0.04740406,-0.028286,-0.026887,8.5,39.375,109.0,58.0,19900430
1990.375,0.050413,0.2941177,0.11201835,0.25862068,0.088936,0.091989,11.0,41.25,120.0,73.0,19900531
1990.458,0.084848,0.1477273,-0.0208333,0.04109589,-0.004196,-0.008886,12.625,44.75,117.5,76.0,19900629


<br>
To inspect the first 3 rows of data we use `head(data, 3)`.  The 2nd parameter is the number of rows.

In [26]:
head(data, 3)

Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
1990.042,-0.035461,-0.1590909,0.04780877,0.06321839,-0.070115,-0.068817,4.625,34.0,98.625,92.5,19900131
1990.125,0.003235,0.3513514,0.06550063,0.06756756,0.014901,0.008539,6.25,34.0,103.875,98.75,19900228
1990.208,0.183824,0.22,0.02166065,0.12151898,0.02414,0.024255,7.625,40.25,106.125,110.75,19900330


<br>
To inspect the first 6 rows of the 2nd **through** 5th columns of data we use `head(data[,2:5])`.

In [27]:
head(data[,2:5])

Apple.Return,Dell.Return,IBM.Return,Microsoft.Return
-0.035461,-0.1590909,0.04780877,0.06321839
0.003235,0.3513514,0.06550063,0.06756756
0.183824,0.22,0.02166065,0.12151898
-0.021739,0.1147541,0.02709069,0.04740406
0.050413,0.2941177,0.11201835,0.25862068
0.084848,0.1477273,-0.0208333,0.04109589


<br>
To inspect the first 6 rows of the 2nd **and** 5th columns of data we use `head(data[c(2,5)])`.

In [28]:
head(data[,c(2,5)])

Apple.Return,Microsoft.Return
-0.035461,0.06321839
0.003235,0.06756756
0.183824,0.12151898
-0.021739,0.04740406
0.050413,0.25862068
0.084848,0.04109589


<br>
To inspect the first 6 positions of the `Apple.Return` column of data we use `head(data$Apple.Return)`.

In [29]:
head(data$Apple.Return)

<br>
To inspect the last 6 rows of data we use `tail(data)`.

In [30]:
tail(data)

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
256,2011.292,0.004656,0.066161,0.046054,0.020874,0.028683,0.028495,15.47,350.13,170.58,25.92,20110428
257,2011.375,-0.006569,0.039431,-0.005276,-0.028935,-0.014934,-0.013501,16.08,347.83,168.93,25.01,20110531
258,2011.458,-0.03496,0.036692,0.015509,0.039584,-0.018391,-0.018258,16.67,335.67,171.55,26.0,20110630
259,2011.542,0.163285,-0.025795,0.060041,0.053846,-0.022448,-0.021474,16.24,390.48,181.85,27.4,20110731
260,2011.625,-0.014469,-0.084667,-0.050536,-0.023358,-0.05747,-0.056791,14.865,384.83,171.91,26.6,20110831
261,2011.708,-0.009121,-0.048772,0.017218,-0.064286,-0.084872,-0.071762,14.14,381.32,174.87,24.89,20110929


<br>
To inspect a random sample of 6 rows of data we use `sample_n(data, 6)`.

In [31]:
sample_n(data, 6)

Unnamed: 0,Date,Apple.Return,Dell.Return,IBM.Return,Microsoft.Return,Value.weighted.Market.Return,SP.500.Return,Price..Dell,Price..Apple,Price..IBM,Price..Microsoft,Calendar.Date
228,2008.958,-0.07899,-0.083259,0.031373,-0.038576,0.022149,0.007822,10.24,85.35,84.16,19.44,20081229
47,1993.875,0.028293,0.3395062,0.1766304,-0.0015601,-0.017606,-0.012911,27.125,31.5,53.875,80.0,19931130
69,1995.708,-0.133721,0.1038961,-0.0858525,-0.0216216,0.036362,0.040098,85.0,37.25,94.5,90.5,19950929
227,2008.875,-0.138675,-0.084426,-0.116919,-0.08867,-0.084613,-0.074849,11.17,92.67,81.6,20.22,20081130
235,2009.542,0.14716,-0.025492,0.129381,-0.010517,0.081715,0.074142,13.38,163.39,117.93,23.52,20090731
115,1999.542,0.202429,0.1047297,-0.0275629,-0.04851,-0.030635,-0.032046,40.875,55.6875,125.6875,85.8125,19990730


<br>
___
### More About R

_ About strings:<br>
A string is a sequence of characters meant to be interpretted as such, and not to be interpretted as a name.  A string is described by enclosing a sequence of characters within <code>"&#8230;"</code>. _

_ About comments:<br>
A comment is text ignored by the R system, but perhaps useful to us.  A comment is described by the `#` symbol, followed by the comment text, through the end of the line. _

_ About functions:<br>
A function is described by a function name, followed by parentheses that enclose the function's parameter values, separated by commas.  The parameters are distinguished either by their position or by explicitly naming them.  Named parameters are described by a parameter name, followed by the `=` symbol, followed by the parameter value.  An example function looks like this:<br>
`melt(data, id=x)` _

_ About TRUE/FALSE values:<br>
The R system recognizes the values `TRUE` and `FALSE`, and their abbreviations `T` and `F`. _

_ About function names; parameter names; and table, vector, and value names:<br>
A name can include upper case and lower case letters - the upper case version of a letter is considered different than the lower case version of that letter.  A name can include numbers, but cannot start with a number.  A name can include the `.` and _ `_` _ characters. _

_ About output:<br>
A table, vector, or value referenced on a line by itself will output the value assigned to it. _

$\tiny \text{Copyright (c) Berkeley Data Analytics Group, LLC}$