# R Basic Tutorial

## 1.0 Assignment
Use `<-`, not `=`, for assignment.
The symbol `=` is preferable for arguments in functions. Additionally, it can not be used in some syntax context. More information [here](https://stackoverflow.com/questions/1741820/what-are-the-differences-between-and-assignment-operators-in-r).

In [None]:
a = 1
b <- 1

In [None]:
b

In [None]:
b

In [None]:
c <- b <- 11

In [None]:
c

In [None]:
c <- b <- 11
print(c)

## 2.0 Objects

### Clases

R is a functional programming language. It means that the programs are constructed by applying and composing functions. However, It can use Object Oriented Programming to construct tools for data analysis. Class is the blueprint that helps to create an object and contains its member variable along with the attributes. We have two main classes: S3 and S4. [info](https://www.datacamp.com/community/tutorials/r-objects-and-classes)

### Objects
Objects are the instance of a class. It means that it has some methods that can act upon its attributes. In R, everything is an object. We will cover four main objects: vector, matrix, list and dataframes. All of them comes from a class.

#### Data Types
There are four data types: `character`, `numeric`, `integer` and `boolean`.

In [None]:
a <- 'Hola'
print(class( a ))   # type of object
print(typeof( a )) # how object is stored in memory

In [None]:
b <- 20.5
print(class( b ))
b

In [None]:
c <- as.integer(20.5)
c
print(class( c ))


In [None]:
# typeof helps us to understand how this object is store in memory.
typeof(b)
# Class helps us to understand the type of a object
class(b)

In [None]:
c1 <- "Machine Learning"
c2 <- "Causal inference"
cat(c1,"  ",c2)

In [None]:
install.packages("devtools")
library(glue)
glue('{c1} y {c2} course')

In [None]:
# Boolean variables

log_true <- TRUE
print(class( log_true ))

z1 <- (1==1)

z2 <- (10 > 20)

z3 <- (1==1)

cat(z1,'\n',z2,'\n',z3)

class(as.integer(z3))

typeof(as.integer(z3))

# 3.0 List

A list in R can contain many different data types inside it. A list is a collection of data which is ordered and changeable.

#### Lists
This data structure does not require that all members be of the same data type.

In [8]:
# imnstall packeg to operate list functions
install.packages("rlist")
library(rlist)

Installing package into 'C:/Users/Esteban/Documents/R/win-library/4.1'
(as 'lib' is unspecified)

also installing the dependency 'XML'




package 'XML' successfully unpacked and MD5 sums checked
package 'rlist' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Esteban\AppData\Local\Temp\Rtmpyor6uH\downloaded_packages


In [9]:
list_1 <- list( "good", "bad", "ugly","good", "bad", 1)
list_1

In [10]:
list_1[1:5]

In [11]:
list_1 <- append(list_1, "bad")
list_1

In [12]:
# factor grouped categorical variables

fac_2 <- factor( c( "good", "bad", "ugly","good", "bad", "ugly", 5 ) )
fac_2

In [13]:
fac <- factor( c( "good", "bad", "ugly","good", "bad", "ugly" ) )
print( fac )
# Type of class
typeof( fac )
class( fac )

[1] good bad  ugly good bad  ugly
Levels: bad good ugly


In [14]:
# Levels o categories 
levels( fac )
# Number of Levels
nlevels( fac )

In [None]:
# Check the variables that you defined
ls()

Basic data structures in R include the vector, list, matrix, data frame, and factors. Some of these structures require that all members be of the same data type (e.g. vectors, matrices) while others permit multiple data types (e.g. lists, data frames).

In [None]:
mylist <- list( num_UEFA = c( 13 , 7 , 6 ) , clubs = c( "Real Madrid" , "AC Milan" , "Liverpool FC" ) , 
               last_year = c( 2018 , 2007 , 2019 ) )

mylist

In [None]:
mylist$num_UEFA

In [None]:
# Indexing vectors
mylist[[3]]

In [None]:
# Indexing group of vectors
mylist[[3]][2:3]

In [None]:
# Indexing group of vectors
mylist[3][1]

In [None]:
list.remove(mylist,"last_year")

In [None]:
Postal_code <- append(mylist, 4000)
Postal_code

#### Vectors
This data structure requires that all members be of the same data type.

In [None]:
vec_str <-  c( "good", "bad", "ugly","good", "bad")
print( class( vec_str ) )
print( is.vector( vec_str ) )
print( length( vec_str ) )

In [None]:
vec <- c( 2 , 3 , 4 , 5 , 6 , 4 )
vec 

In [None]:
print( class( vec ) )
print( is.vector( vec ) )
print( length( vec ) )

In [None]:
sec_1_20 <- seq(1,20,2)
sec_1_20

In [None]:
sec_1_9 <- seq(1,10)
sec_1_9

#### Indexing

In [None]:
index_vec  <- vec[ c( 3, 1, 6, 2 ) ]
class(index_vec)

In [None]:
print(index_vec)
print(index_vec[ -3 ])

In [None]:
print( index_vec[ index_vec < 3 ] )

In [None]:
print( index_vec[ 2:4 ] )

#### Matrix
This data structure requires that all members be of the same data type.

In [None]:
A = matrix( c(3, 5, 2, 6, 7, 1, 6, 3, 7 ) , nrow = 3, ncol = 3 , byrow = FALSE, 
            dimnames = list( c( "rowA" , "rowB" , "rowC" ), c( "colA" , "colB" , "colC" ) ) )
A

In [None]:
A = matrix( c(3, 5, 2, 6, 7, 1, 6, 3, 7 ) , nrow = 3, ncol = 3 , byrow = TRUE)
A

In [None]:
A = matrix( c(3, 5, 2, 6, 7, 1, 6, 3, 7 ) , nrow = 3, ncol = 3 , byrow = TRUE, 
            dimnames = list( c( "rowA" , "rowB" , "rowC" ), c( "colA" , "colB" , "colC" ) ) )
print( A )

In [None]:
class(A)
typeof(A)

In [None]:
dim(A)

In [None]:
cat("rows: ", dim(A)[1], '\n', "Columns: ", dim(A)[2])

In [None]:
B = A %*% solve( A )
B

In [None]:
# Solve for getting inverse matrix
# %*% Matrix Multiplication
# round output of the matrix
B = round( A %*% solve( A ) , 2 )
B

In [None]:
print( diag( B ) )  


In [None]:
print( t( A ) )

In [None]:
A <- matrix(c(seq(0, 9), seq( 10, 19), seq( 30, 39), seq( -20, -11), seq( 2, 20,2)), nrow = 5, byrow =TRUE)
print(A)

In [None]:
A[2:4,] # rows selecrtion

In [None]:
A[,1:6]  # columns selecrtion

In [None]:
M1 <- matrix(0,8,2)

print(M1)

In [None]:
M2 <- matrix(1,8, 4) 
print(M2)

In [None]:
M3 <- cbind(M1,M2)
M3

In [None]:
M4 = matrix(c(2,2,3,4,5,1,1,5,5,9,8,2), nrow =2, byrow = TRUE)
print(M4)

In [None]:
M5 <- rbind(M3,M4)
print(M5)

#### DataFrame
This data structure does not require that all members be of the same data type.

In [1]:
Student_Name <- c("Amy", "Bob", "Chuck", "Daisy", "Ellie", "Frank", 
                  "George", "Helen")

Age <- c(27, 55, 34, 42, 20, 27, 34, 42)

Gender <- c("F", "M", "M", "F", "F", "M", "M", "F")

GPA <- c(3.26, 3.75, 2.98, 3.40, 2.75, 3.32, 3.68, 3.97)

nsc <- data.frame(Student_Name, Age, Gender, GPA)   # Naming the data frame
nsc # Generates the data frame

Student_Name,Age,Gender,GPA
<chr>,<dbl>,<chr>,<dbl>
Amy,27,F,3.26
Bob,55,M,3.75
Chuck,34,M,2.98
Daisy,42,F,3.4
Ellie,20,F,2.75
Frank,27,M,3.32
George,34,M,3.68
Helen,42,F,3.97


In [2]:
# Lists variables
names(nsc)   

In [3]:
select_col <- c(1,3)
select_row <- c(1,5)

In [4]:
nsc[select_row, select_col]

Unnamed: 0_level_0,Student_Name,Gender
Unnamed: 0_level_1,<chr>,<chr>
1,Amy,F
5,Ellie,F


In [5]:
# indexing dataframes
nsc[3:5 , 2:4]   

Unnamed: 0_level_0,Age,Gender,GPA
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>
3,34,M,2.98
4,42,F,3.4
5,20,F,2.75


In [6]:
# indexing dataframes
nsc[ c( 1 , 2 , 4, 6 ) , c( 3 , 2 ) ]

Unnamed: 0_level_0,Gender,Age
Unnamed: 0_level_1,<chr>,<dbl>
1,F,27
2,M,55
4,F,42
6,M,27


In [7]:
nsc$Student_Name

## Load  files : CSV, Excel
## Working with DataFrames

In [None]:
#install.packages("readxl")
library(readxl)

In [None]:
base <- read_excel("../../data/base1.xlsx", sheet = "data")
head(base)

In [1]:
base <- read.csv(file = "../../../data/base0.csv")
head(base)

Unnamed: 0_level_0,X,wage,lwage,sex,shs,hsg,scl,clg,ad,mw,...,we,ne,exp1,exp2,exp3,exp4,occ,occ2,ind,ind2
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,...,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<int>
1,10,9.615385,2.263364,1,0,0,0,1,0,0,...,0,1,7,0.49,0.343,0.2401,3600,11,8370,18
2,12,48.076923,3.872802,0,0,0,0,1,0,0,...,0,1,31,9.61,29.791,92.3521,3050,10,5070,9
3,15,11.057692,2.403126,0,0,1,0,0,0,0,...,0,1,18,3.24,5.832,10.4976,6260,19,770,4
4,18,13.942308,2.634928,1,0,0,0,0,1,0,...,0,1,25,6.25,15.625,39.0625,420,1,6990,12
5,19,28.846154,3.361977,1,0,0,0,1,0,0,...,0,1,22,4.84,10.648,23.4256,2015,6,9470,22
6,30,11.730769,2.462215,1,0,0,0,1,0,0,...,0,1,1,0.01,0.001,0.0001,1650,5,7460,14


In [2]:
sapply(base, typeof)

In [3]:
str(base)

'data.frame':	5150 obs. of  21 variables:
 $ X    : int  10 12 15 18 19 30 43 44 47 71 ...
 $ wage : num  9.62 48.08 11.06 13.94 28.85 ...
 $ lwage: num  2.26 3.87 2.4 2.63 3.36 ...
 $ sex  : int  1 0 0 1 1 1 1 0 1 1 ...
 $ shs  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ hsg  : int  0 0 1 0 0 0 1 1 1 0 ...
 $ scl  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ clg  : int  1 1 0 0 1 1 0 0 0 1 ...
 $ ad   : int  0 0 0 1 0 0 0 0 0 0 ...
 $ mw   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ so   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ we   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ne   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ exp1 : num  7 31 18 25 22 1 42 37 31 4 ...
 $ exp2 : num  0.49 9.61 3.24 6.25 4.84 ...
 $ exp3 : num  0.343 29.791 5.832 15.625 10.648 ...
 $ exp4 : num  0.24 92.35 10.5 39.06 23.43 ...
 $ occ  : num  3600 3050 6260 420 2015 ...
 $ occ2 : int  11 10 19 1 6 5 17 17 13 10 ...
 $ ind  : num  8370 5070 770 6990 9470 7460 7280 5680 8590 8190 ...
 $ ind2 : int  18 9 4 12 22 14 14 9 19 18 ...


In [4]:
#install.packages("dplyr")
library(dplyr)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




In [5]:
base1 <- base %>% rename(salario = wage, id = X)

head(base1)

Unnamed: 0_level_0,id,salario,lwage,sex,shs,hsg,scl,clg,ad,mw,...,we,ne,exp1,exp2,exp3,exp4,occ,occ2,ind,ind2
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,...,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<int>
1,10,9.615385,2.263364,1,0,0,0,1,0,0,...,0,1,7,0.49,0.343,0.2401,3600,11,8370,18
2,12,48.076923,3.872802,0,0,0,0,1,0,0,...,0,1,31,9.61,29.791,92.3521,3050,10,5070,9
3,15,11.057692,2.403126,0,0,1,0,0,0,0,...,0,1,18,3.24,5.832,10.4976,6260,19,770,4
4,18,13.942308,2.634928,1,0,0,0,0,1,0,...,0,1,25,6.25,15.625,39.0625,420,1,6990,12
5,19,28.846154,3.361977,1,0,0,0,1,0,0,...,0,1,22,4.84,10.648,23.4256,2015,6,9470,22
6,30,11.730769,2.462215,1,0,0,0,1,0,0,...,0,1,1,0.01,0.001,0.0001,1650,5,7460,14


In [6]:
base1 <- base1 %>% select(-c(id,ind2))

In [7]:
head(base1)

Unnamed: 0_level_0,salario,lwage,sex,shs,hsg,scl,clg,ad,mw,so,we,ne,exp1,exp2,exp3,exp4,occ,occ2,ind
Unnamed: 0_level_1,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
1,9.615385,2.263364,1,0,0,0,1,0,0,0,0,1,7,0.49,0.343,0.2401,3600,11,8370
2,48.076923,3.872802,0,0,0,0,1,0,0,0,0,1,31,9.61,29.791,92.3521,3050,10,5070
3,11.057692,2.403126,0,0,1,0,0,0,0,0,0,1,18,3.24,5.832,10.4976,6260,19,770
4,13.942308,2.634928,1,0,0,0,0,1,0,0,0,1,25,6.25,15.625,39.0625,420,1,6990
5,28.846154,3.361977,1,0,0,0,1,0,0,0,0,1,22,4.84,10.648,23.4256,2015,6,9470
6,11.730769,2.462215,1,0,0,0,1,0,0,0,0,1,1,0.01,0.001,0.0001,1650,5,7460


In [8]:
head(base1 %>%   slice(-c(1,3)) )

Unnamed: 0_level_0,salario,lwage,sex,shs,hsg,scl,clg,ad,mw,so,we,ne,exp1,exp2,exp3,exp4,occ,occ2,ind
Unnamed: 0_level_1,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
1,48.07692,3.872802,0,0,0,0,1,0,0,0,0,1,31,9.61,29.791,92.3521,3050,10,5070
2,13.94231,2.634928,1,0,0,0,0,1,0,0,0,1,25,6.25,15.625,39.0625,420,1,6990
3,28.84615,3.361977,1,0,0,0,1,0,0,0,0,1,22,4.84,10.648,23.4256,2015,6,9470
4,11.73077,2.462215,1,0,0,0,1,0,0,0,0,1,1,0.01,0.001,0.0001,1650,5,7460
5,19.23077,2.956512,1,0,1,0,0,0,0,0,0,1,42,17.64,74.088,311.1696,5120,17,7280
6,19.23077,2.956512,0,0,1,0,0,0,0,0,0,1,37,13.69,50.653,187.4161,5240,17,5680


## Slicing  dataframe 

In [None]:
base1[1:10,]

In [None]:
base1[,1:5]

In [None]:
base1[1:10,] %>% select(salario, lwage)


In [None]:
select(base1[1:10,], salario,lwage)

In [None]:
base1 %>% filter(sex == 1)

In [None]:
head(base1 %>% filter(exp1 > 10))

In [None]:
head(base1 %>% filter(sex == 1 & exp1 > 10))

head(base1  %>% filter(sex == 1 | exp1 > 10))

In [None]:
head(filter(base1, sex == 1 & exp1 > 10))
head(filter(base1, sex == 1 | exp1 > 10))

In [9]:
head(arrange(base1, exp1, salario))

Unnamed: 0_level_0,salario,lwage,sex,shs,hsg,scl,clg,ad,mw,so,we,ne,exp1,exp2,exp3,exp4,occ,occ2,ind
Unnamed: 0_level_1,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
1,4.25,1.446919,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1860,5,7870
2,5.128205,1.634756,1,0,0,0,0,1,1,0,0,0,0,0,0,0,9620,22,4970
3,9.135769,2.212197,0,0,0,0,0,1,0,1,0,0,0,0,0,0,2310,8,7860
4,9.615385,2.263364,1,0,0,0,0,1,0,0,1,0,0,0,0,0,2430,8,7860
5,9.615385,2.263364,1,0,0,0,0,1,0,0,1,0,0,0,0,0,3160,10,8090
6,10.025,2.305082,1,0,0,0,0,1,0,0,0,1,0,0,0,0,2540,8,7860


In [None]:
base1$dummy = "NA"
base1

In [None]:
base1$dummy[base1$exp1 > 10] <- 1
base1

## If condition
The body of the if condition is excuted if the `test_expression` is `TRUE`. The ouput of the test expression should be a `boolean`variable. 

<img src="if-statement.jpg" alt="image info" />


The structure of the code is the following: <br><br>

<font size="4">
if <font color='green'>(test expresion)</font>{<br>
&nbsp;&nbsp;&nbsp;&nbsp;Code to excute<br>
}</font>

The function **if** tests the veracity of a logic expression. The result of test statement should be a **<font color='red'>boolean</font>**. In other words, the output of the test statemen must be **<font color='red'>TRUE</font>** or **<font color='red'>FALSE</font>**. To sum, any function that its output is **boolean** can be used as a test expression in the **if** function. 

In [None]:
x <- -4

if(x > 0){
print("Non-negative number")
}

### Tests more than 1 expression

We will use **else if**. This function allows us to add more test expressions.  <br><br>

<font size="3">
if <font color='green'>(Test expression 1)</font>{ <br>
&nbsp;&nbsp;&nbsp;&nbsp;Code1<br><br>
} else if <font color='green'>(Test expression 2)</font>{<br>
&nbsp;&nbsp;&nbsp;&nbsp;Code2<br><br>
} else if <font color='green'>(Test expression 3)</font>{<br>
&nbsp;&nbsp;&nbsp;&nbsp;Code3<br><br>
} else if <font color='green'>(Test expression 4)</font>{<br>
&nbsp;&nbsp;&nbsp;&nbsp;Code4<br><br>
} else if <font color='green'>(Test expression 5)</font>{<br>
&nbsp;&nbsp;&nbsp;&nbsp;Code5<br><br>
}else{<br>&nbsp;&nbsp;&nbsp;&nbsp;Code6<br><br>
}</font>

R will read the conditions of the test in order. If **Test expression 1** is `TRUE`, the rest of the test expressions will not be evaluated. R execute **Code2** and will not test the next conditions. <br>

In case no Test expression is `TRUE`, the **Code6** will be excuted.

In [None]:
if (3 > 9) {
    print('xsd')
}

In [None]:
if ((z <- 12) >= 9){
    print('xsd')
}

In [None]:
x <- 5

if (x < 0) {
print("Negative number")
} else if (x > 0) {
print("Positive number")
} else {
print("Zero")}

## Loops
A for loop is used for iterating over a sequence. It has the following structure:

<img src="for_loop.jpg" alt="image info" />



In [None]:
age = c( 20 , 27 , 31 , 25 , 28 )
years = c( 2023 , 2021 , 2022 , 2026 , 2027, 2028 )
length

In [None]:
age_finish <- numeric( length = length( age ) )
age_finish

In [None]:
for ( i in c(1:length(age)) ) {
  print(i + 1)
}

In [None]:
for ( i in c( 1:length( age ) ) ) {
  age_finish[ i ] = age[ i ] + years[ i ] - 2021
}
print( age_finish )

## Functions in R

- `Arguments` − Elements that the function will use to make operations. Arguments are optional and can have default values.

- `Function Body` − This defines what your function does.
- `return` − Specifies the variable that will be the output of the function.

Return Value − The return value of a function is the last expression in the function body to be evaluated.
<br><br>
<font size="4">
function <font color='green'>(arg_1, arg_2, ...) </font>{<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;Function body <br>
    <br> &nbsp;&nbsp;&nbsp;&nbsp;return <br>
}</font>


In [None]:
demean<- function(x){ 
    new_var = x - mean(x)
    new_var_2 = new_var^4
    return( new_var_2 )
}

In [None]:
vector_2 = c(2 , 3, 4)

In [None]:
demean( vector_2 )

## Linear Regressions

## Packages

In [None]:
install.packages( "glmnet" )

In [None]:
library(glmnet)

In [None]:

# Reading data and converting to dataframe
Penn <- as.data.frame(read.table("../../data/penn_jae.dat", header=T ))
dim(Penn)

In [None]:
Penn

In [None]:
#Number of rows
n <- dim(Penn)[1]

# Number of columns
p_1 <- dim(Penn)[2]

In [None]:
# Filtering data to tg==4 | tg==0
Penn<- subset(Penn, tg==4 | tg==0)

# we are making the columns of the data frame as vectors available with its own name
attach(Penn)

In [None]:
T4 <- (tg==4)
typeof(T4)

In [None]:
summary(T4)

### Formula
For more information [link1](https://thomasleeper.com/Rcourse/Tutorials/formulae.html), [link2](https://www.datacamp.com/community/tutorials/r-formula-tutorial).

In [None]:
install.packages( "lmtest" )
install.packages( "sandwich" )
library(lmtest)
library(sandwich)
# Suggestion for update your R version

In [None]:
# Formula regression
# Generation of formulas for regressions
# The class formula

formula1 <- T4 ~ female

formula2 <-  T4 ~ female + black

# We include only the interaction

formula3 <-  T4 ~ female + black + female:black

# We include the two terms and their interaction

formula4 <-  T4 ~ female*black

# drop intercep

formula5 <-  T4 ~ -1 + female + black + female:black

formula5 <-  T4 ~ 0 + female + black + female:black

# polinomial independent varibles

formula6 <- T4 ~ -1 + female + black + female:black + lusd + lusd^2 

# interaction effects & categorical variables 

formula7 <-  T4~(female+black+othrace+factor(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)^2

In [None]:
all.vars(formula1)

In [None]:
# We can use multiple independent variables b simply separating them with the plus symbol(+)

formula2 <- (T4 ~ female + black )
# See the variables in the formula
all.vars( formula2 )

In [None]:
formula3 <- (T4 ~ female - black )
print( terms( formula3 ) )

In [None]:
# Interactions terms
# We include the two terms and their interaction
formula5  <- (T4 ~ female * black)
print( terms( formula5 ) )


In [None]:

# We include only the interaction
formula6  <- (T4 ~ female:black)
print( terms( formula6 ) )

In [None]:
# Factor in regression Analysis
# dep is a factor variable but as defaul it is a vector

class(dep)

#factor(dep) == as.factor( dep )

# we need to specify that dep is a factor variable in a regression formula to not be treated as a numeric vector
## by default, the factor's first level treated as a baseline

formula6  <- (T4 ~  female * black + factor( dep ) )
print( terms( formula6 ) )

formula7  <- (T4 ~  female * black + dep )
print( terms( formula7 ) )

In [None]:
# regression

reg <- lm(formula1, Penn)
#summary(reg )

summary( reg )$coefficient
summary( reg )$r.squared

In [None]:
names(reg)
typeof(reg$residuals)

In [None]:
summary(reg)$coefficients

In [None]:
reg <- lm(formula3, Penn)

summary( reg )$coefficient


In [None]:
reg <- lm(formula7, Penn)

summary( reg )$coefficient


In [None]:
# We can update formulas 
formula_modelA  <- (T4 ~  female + black + lusd + lusd ^ 2 )
print( terms( formula_modelA ) )
class( formula_modelA )

# formula for model B
formula_modelB  <- update( formula_modelA, ~ . + factor(dep))
print( terms( formula_modelB ) )

## Regressions Objects
We will understand the output from a regression

In [None]:
key_columns  <-  c('female', 'black', 'lusd' , 'dep', 'T4')

In [None]:
data2  <-  data.frame( Penn[(names( Penn ) %in% key_columns ) ] )

In [None]:
data2

In [None]:
# Regression
reg1 <- lm(formula_modelB , data2 )

# The output is a list of elements
typeof(reg1)
is.list( reg1 )
# All the elements in the list are detailed in the table bellow.
names(summary( reg1 ))

In [None]:
summary(reg1)$aliased

In [None]:
cat("Parameters: ")
summary(reg1)$coefficients

cat("R2:", summary(reg1)$r.squared,'\n')

print("Predictived values:")

predict(reg1)

In [None]:
# Summary of regression is a list
is.list(summary( reg1 ))
is.data.frame(summary( reg1 ))
is.matrix(summary( reg1 )$coefficients)


| Elements 	| Definition 	|
|---	|---	|
| coefficients 	| Vector of coefficients. 	|
| residuals 	| Vector of residuals. 	|
| effects 	| Vector of the uncorrelated single-degree-of-freedom <br>values obtained by projecting the data onto the successive <br>orthogonal subspaces generated by the QR decomposition <br>during the fitting process. 	|
| rank 	| Number of independent columns. 	|
| fitted.values 	| Vector of fitted values. 	|
| qr 	| The QR decomposition. 	|
| df.residual 	| The degrees of freedom of the residuals. 	|
| contrasts 	| A contrast is a linear combination of variables <br>that allows comparison of different treatments. 	|
| xlevels 	| Levels of the factor variables. 	|
| call 	| The formula. 	|
| terms 	| Variables of the regression. 	|
| model 	| DataFrame of the variables in the model. 	|

#### Predictions
We can get the vector of predictions using the formula `predict`.

In [None]:
new_obs  <- data.frame(matrix( c(0, 0, 1 , 1), ncol= 4, dimnames = list( c() , c( "female" , "black" , "dep", "lusd"  ) ) ))

In [None]:
new_obs

In [None]:
y_hat  <- predict(reg1,  new_obs )
y_hat