## 1. Data frame

* A data structure in R
* Same as good old table with rows and columns
* Each column has a header with a name

### Reading data

In [None]:
setwd(".")

In [None]:
#let's read same data from HW 1
# data.csv is just an example. It could be anything

data<-read.csv("data.csv") 

In [None]:
head(data,20)

### Subsetting data

#### Need some help? Google it first. Or, type ?name_of_function

In [None]:
?subset

In [None]:
numeric_columns<-c("Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10")
int_students=subset(data,finnish=="no",select=numeric_columns)

#### or

In [None]:
int_students=subset(data,finnish=="no",select=c(1:10))

#### or 

In [None]:
int_students=subset(data,finnish=="no",select=c(Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10))

#### yet another way

In [None]:
int_students=data[data$finnish=="no",1:10]

In [None]:
int_students

### Some statistics

In [None]:
correlation_between_Q1_Q2=cor(int_students$Q1,int_students$Q2)
correlation_between_Q1_Q2

In [None]:
mean_Q6=mean(int_students$Q1)
mean_Q6

# 2. Functions

Functions are the building blocks of R, just like in any other language 

A function is written as follows: 

      myfunction <- function(arg1, arg2, ... ){
            statements
            return(object)
      } 

A function in R is called as:

**myfunction(args)**

Let's implement some functions
* Standard Deviation 
* Frequency Table
* Function that uses factors
* Function to merge two data frames

### Standard deviation  

In [None]:
standard_deviation <- function(sample){
    xbar<-mean(sample)
    sumXminusXbar_sqrd=0
    for(x in  sample){
        diff_from_mean=x-xbar
        sumXminusXbar_sqrd=sumXminusXbar_sqrd+(diff_from_mean*diff_from_mean)
    }
    denominator=length(sample)-1
    return(sqrt(sumXminusXbar_sqrd/denominator))
}

#### Let's test the function using all the data

In [None]:
# Standard Deviation of Q1 responses
sd<-standard_deviation(data$Q1)
sd

#### Let's validate our function with R's built in function for standard deviation: sd()

In [None]:
sd(data$Q1)

### Frequency table

In [None]:
frequency_table <- function(dataframe){
    res <- NULL
    
    for(columnName in names(dataframe)){
        
        sample<-dataframe[,c(columnName)]
        xbar<-mean(sample)
        
        #using our own defined started deviation
        sd<-standard_deviation(sample)
        count<-length(sample)
        
        res <- rbind(res,c(columnName,count,xbar,sd))
        
    }
    colnames(res) <- c("response_for","count","mean","standard_deviation")
    res<-data.frame(res)
    return(res)

}

What do the **names**, **rbind**, **colnames** functions do??

#### Now let's test our function

In [None]:
# subsetting column 1 to 10 of the data 
#   - you may have used subset() before, this is just another way

all_questions=data[,1:10]

In [None]:
freq_table<-frequency_table(all_questions)
freq_table

### Function that uses factors

Let's look at the data again

In [None]:
data

In [None]:
levels(data$website) 
factor(data$website)

In [None]:
library("hcitools")
news_score<-function(d){
    res <- NULL
    for (news_firm in levels(d$website)){ 
        for (gender in levels(d$gender)){
            
            subset<-d[d$website==news_firm&d$gender==gender,1:10]
            result=questionnaire.analyse(subset, name="SUS")
            res <- rbind(res,c(news_firm,gender,result))
        }
    }
    colnames(res) <- c("Website","Gender","Sus_score")
    res<-data.frame(res)
    return(res)
}

In [None]:
news_score(data)

In [None]:
?rnorm



### Function to merge two data frames

#### let's write a function to generate a dataframe with 10 rows
* x column is a sequence from 1 to 10
* y column is a randomly generated sequence with mean=0 and standard deviation=1


In [None]:
generate_data<-function(){
    x<-seq(from=1,to=10)
    y<-rnorm(10,mean=0,sd=1)
    xy<-cbind(x,y)
    #asMatrix<-as.data.frame(xy)
    colnames(xy)<-c("x","y")
    return(data.frame(xy))
}

In [None]:
gen_data1<-generate_data()
gen_data1

gen_data2<-generate_data()
gen_data2

#### rbind finally does the trick of merging the two data frames into one ("Row Bind" = rbind. It binds the rows of two data sets).

In [None]:
merged_data<-rbind(gen_data1,gen_data2)

In [None]:
merged_data