# R practical for Data Science


---

## Introduction:

This guide examines in more detail the good practices and basics that will allow you to analyze data using R. In this guide, you will learn how to use Jupyter notebooks and libraries to explore and analyze your data in a straightforward, clear and transparent manner.

## R Language:

R is a programming language for analyzing and displaying statistical and graphical data. The central part of R is an interpretive computational language that allows generation and looping as well as modular programming using functions. R can be combined with procedures written in C, C++, .Net, Python or FORTRAN for more efficiency.

## Features of R 

The most important features of R are:

* R is a very advanced, simple and powerful programming language encompassing conditions, loops, recursive functions and customizable input/output functions. 
* R has efficient data processing and storage capacity. 
* R offers a suite of operators to perform calculations on tables, lists, vectors and matrices. 
* R provides a comprehensive and coherent set of tools for data analysis. 
* R provides graphical tools for data analysis and display directly on the computer.

## Data Types

The most frequently used structures are: 

* Scalars 
* Vectors 
* Lists
* Matrices 
* Tables
* Factors


### 1. Scalars :

Scalars can be an integer, real, logical or string type. The objects are assigned to values via the operators` < - or =.`
`Is() function` is used to list the variables of the workspace, and `rm() function` allows to delete one or more variables (objects).

**As examples:**


In [0]:
2+2 #is the sum of two integers
exp(10) #gives the exponential of 10
a = log(2) #assign log(2) to object a
b <- cos(10) #assign cos(10) to object b
a+b # give the sum of two objects a and b
a # display the value of a
b = 2 # assign the value "2" to object b
ls() # list the objects already created
rm(a) # delete the object already created "a

###2. Vectors : 

To create a vector with more than one item, you must use  `c () function` which consists in combining items in a vector. 

**As examples:**

In [0]:
# Create a vector.
Color <- c('red','green','blue')

In [0]:
weight = c(13.5, 20.7, 30.5, 38.1, 41.5) #the weight vector
weight # displays the entered values
length(weight) # counts the number of measurements
order(weight) # orders the values entered
sum(weight) # returns the sum of the elements
min(weight) # Min of weight
max(weight) # Max of weight
prod(weight) # product of weight
weightskg = weight * 0.001; weightskg # converts the measurements into kg
weight[3] # value at position 3
weight[c(1,3,5)] # the values in first, third and fifth position.
average = sum(weight)/5 # the average
median = median(weight) #the median
variance = var(weight) #the variance
seq1 = 1:12 #generates a series of 1 to 12
seq2 = seq(-5,3,by=1); # display -5 -4 -3 -2 -1 0 1 2 3
seq3= seq(1,2,by=0.1) # displays 1.0 1.1 1.1 1.2 1.3 1.3 1.4 1.5 1.6 1.7 1.8 1.8 1.9 2.0
seq4= seq(10.00000,50.00000,by=4.4444444)
fix(weight) # opens a window or corrects the value
weight # to check the correction
# the weeks corresponding to the weights:
time = c(1,2,2,3,3,4,5) #creates the time vector
time # displays its content
time = seq(1,5,by=1); time # creates the time vector but with another way using
"seq"
rev(time) #reverses the ordr of the sequence
rep(time,3) # repeats the sequence 3 times
sum(time) #Sum of the sequence
length(time) #sequence length
name = names(time)[1:5] <- c("week1", "week2", "week3", "week4", "week5"); name: #renames the elements of the time vector

### 3. Lists : 

A list is an R object that contains many different elements types including vectors, functions and even another list.

**As example:**

In [0]:
# Create a list of items.
list1 <- list(c(2,5,3),21.3)

### 4. Matrices : 

A matrix is a 2-dimensional dataset. It can be created using a vector input to the matrix function.

**As examples:**

In [0]:
# Create a matrix with 3 columns and 2 rows.
M = matrix('c('a','a','b','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)

In [0]:
x<-seq(1:5) #creates a sequence named x
y<-x*2 #creates an object y
cbind(x,y) #manipulates the vectors to form a matrix
xy<-rbind(x,y) #manipulates vectors to form a matrix
xy
matrix(1:20, nrow=5, byrow=T) #creates a matrix of 20 elements with a number of lines of 5

Now, assuming that the text file "student.txt" located in C:/ contains these students :

| sex | weight   | size |  
|------|------|------|------|
|   f  |55 | 166
|   f  |53 | 135
|   m  |56 | 169
|   f  |55 | 161
|   m |56 | 187
|   f  |67 | 166
|   f  |67 | 169



In [0]:
study <- read.table("C:/student.txt", header = TRUE) 
# Read the student file.txt 
study[1:5, ] 
# Display the table 
size <- study[, "taile"] # displays the size of students 
sex <- study[,"sex"] 
# Display the sex of students
tf <- etud[etud$sexe==="f","size"] 
# display the size of all "f" 
tf

### 5. Tables :

Matrices are limited to 2D, while those can be of any number of dimensions. The `array () function` takes a `dim` parameter that creates the required number of dimensions.

**As example:**

In [0]:
# Create an array that contains 2 elements and has a size of 3*3.
a <- array(c('red','blue),dim = c(3,3,2))

### 6. Factors :

Factors are R objects that are created using a vector. It stores this vector with different values of items as labels. Labels are usually characters, whether numeric, character or Boolean. It is useful in statistical analysis.

**As example: **

In [0]:
# Create a vector.
color <- c('red','green','blue','grey','black','white')
# Create a Factor object.
Color_factor<- factor(color)

## R Software 

The R software is a freeware available on this  [website](http://cran.r-project.org/) .

There are 3  versions: [Windows](https://cran.r-project.org/bin/windows/), [MacOS](https://cran.r-project.org/bin/macosx/) and [Linux](https://cran.r-project.org/bin/linux/).

![](http://pndar.ir/wp-content/uploads/2019/02/rlogo-382x226.jpg)

The available options are:

* An object-oriented programming language
* Basic functions
* Additional libraries/packages (1800 on the [CRAN site](http://cran.r-project.org/))

To use the R help, you can type the following commands on the editor:

In [0]:
help ("rm") #Get help on the usefulness of the rm() function
help . search ("rm")

### 1. Basic operations

The basic  operations on scalars are: `*, -, /, +, ˆ.`
As already mentioned above, the assignment of objects to values is done via  operators:  `< - or =.`

**As examples:**

In [0]:
# create a variable a which contains the value 42
a <- 42

# The content of the variable is displayed
a
# Change the content of the variable
a <- 8
# Display its content
a
# The assignment also works in the other direction
5 -> a
a
5 -> b
a+b >a+b

In [0]:
11*13 # a multiplication operation
pi # R gives the value of pi
x = 2*3 # stores the result in x without displaying it
x # displays the value of x
log(x)/x # calculation with x
y <- 13 # y<-13 is equivalent to y=13
x*x^y # exponent calculation
ls() # displays the names of the created variables
5*3+4 # is not the same calculation as 5*(3+4)
5*(3+4)
x*x^y # is not the same calculation as (x*x)^y
(x*x)^y

In R, the logical comparison operators are: ==, <, >, <=, >=, !=.  The output result is either true (T=True) or false (F=False) :

**As example:**

In [0]:
3 < 5 # the answer is True
x > y # it's False
(x+7) == y # it's True

In the previous examples, we have created numeric variables because we have assigned numbers to them. As shown below, you can also assign a string to it.

**As examples:**

In [0]:
# assign a character string to the variable a
a <-'Virgilio'
#Display its content
a

We will now see some functions applicable to strings. In R, the concatenation is done thanks to the ` paste() function.`

**As example:**

In [0]:
# create variables containing the information
age <- 3
name <- 'Virgilio'
# The paste() function is called by giving it the different items of the final sentence in the following order
paste('Hello my name is', name,'and I am', age,'months', sep=' ')

The `nchar() function `allows to count the number of letters in a character string as shown in the following example:

In [0]:
# This function allows to count the number of characters and spaces
nchar("Virgilio")

The two functions `toupper() and tolower()` respectively are used to transform the given character string into an argument either all in upper case (`toupper() function`) or all in lower case (`tolower() function`).

**As example:**

In [0]:
toupper ("ViRGilio")
tolower("virGilio")

Now we will  discover a number of functions that will allow to perform simple mathematical operations.

**As examples:**

In [0]:
# The floor function refers to the lower integer
floor(2.4)
# The ceiling function returns the upper integer
ceiling(2.4)
# The round function rounds to the nearest integer
round(2.4)
round(2.6)
# The cos function
cos(90)
# The Sin function
sin(90)
# The tangent function
tan(90)


We will now see how to manipulate  vectors.
Note that the `vector() function` allows  to create a vector.

**As examples:**

In [0]:
# Create a vector containing 10 numerical elements
vector("numeric", 10)
# For example, you can create a vector containing character strings of length 5
vector("character", 5)
# Create of a vector containing 8 logical elements (Boolean, default value FALSE)
vector("logical", 8)

The `scan() function` is used to type items on the keyboard.

**As example:**

In [0]:
# Create a vector of size 3 using the scan() function
scan(nmax=3)