# Data Structures in R

<font red='blue'>Data structure can be defined as the specific form of organizing and storing the data. </font>

R programming  supports five basic types of data structure namely vector, matrix, Array, data frame and list. 

- **Vectors**: one-dimensional arrays used to store collection data of the same data type 
        • Numeric Vectors (data type: numeric) 
        • Complex Vectors (data type: complex) 
        • Logical Vectors (data type: logical) 
        • Character Vector or text strings (data type: character) 
    
- **Matrices**: two-dimensional arrays to store collections of data of the same mode. They are accessed by two integer indices. 
- **Arrays**: similar to matrices but they can be multi-dimensional (more than two dimensions) 
- **Data Frames**: generalization of matrices where different columns can store different mode data. 
- **Lists**: ordered collection of objects, where the elements can be of different types 

![image.png](attachment:image.png)

|Homogenous|Heterogeneous|
|---|---|
|1d - Vector|List|
|2d - Matrix| Data Frame|
|nd - Array||

## Vectors

- Vectors: are one-dimension arrays that can hold numeric data, character data, or logical data. 

**Declare variables of different types**
- In R, you create a vector with the combine function `c()`

**Vector:** it is a sequence of data elements of same basic type 

    > Vector can be combined using function c()

**Value Coerction:** When 2 different primary data type consisting members vector combined, to maintain the same primitive data type. the data type of the member are changed to on single type

**Recycling rule:** if two vectors of unequal length is used for any arithmetic operation the shorter one is recycled to match the longer vector

### Numerical Vector

In [1]:
vec1 <- c(1,2,3,4,5) # use of combine function
vec1

In [2]:
class(vec1)

In [3]:
print(vec1[2])         #Accessing vector element
print(vec1[c(1,3)])    # Accessing first and third element

[1] 2
[1] 1 3


In [4]:
vec2 <- c('a','b','c')
class(vec2)

**Vector Operations:**

In [5]:
vec1 * 2

In [6]:
sum(vec1)  # adding the elements of vector

In [7]:
mean(vec1)

In [8]:
vec1 * vec1

In [9]:
vec5=c(1L, 2L, 4.5) #if we mix integer and float values, the entire vector gets converted into float (numeric datatype)
print(vec5)
class(vec5)

[1] 1.0 2.0 4.5


In [11]:
vec1^2

In [12]:
# assigning names to the values
names(vec1) <- c('a','b','c')
vec1

#### Character vector

similar to numeric vector as a functionality

In [10]:
vec6 = c(1L, 4.5, "Abhijeet") #if we mix character and numeric then all the values will be converted to character
print(vec6)
class(vec6)

[1] "1"        "4.5"      "Abhijeet"


#### logical vector

In [11]:
vec7 = c(TRUE, 1, 4.5) # Numeric given preference
print(vec7)
class(vec7)

[1] 1.0 1.0 4.5


In [12]:
vec8= c(TRUE, 1, 5, "Sudhir") # Character given preference
print(vec8)
class(vec8)

[1] "TRUE"   "1"      "5"      "Sudhir"


### Order of preference

Character>Float>Numeric>logical

In [13]:
# paste() : combine two charater
name = c ("Uma", "maheshwar")
print(name)
name = paste("Uma", "maheshwar")
print(name)

[1] "Uma"       "maheshwar"
[1] "Uma maheshwar"


## Matrix
Collection of data elements arranged in a 2 dimentional layout

In [13]:
#?matrix
#matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,dimnames = NULL) 
#A ’dimnames’ attribute for the matrix: a ’list’ of length 2.
#MATRICES: Two Dimentional array
x<-matrix(data=c(1:8),nrow=4,ncol=2)   #By default data is organised column wise if we want to organise data through row wise set byrow=True
x

0,1
1,5
2,6
3,7
4,8


In [14]:
x[1,]  #Accessing first row

In [15]:
x[,2]  #Aceesing the 2nd column

In [16]:
x[2,1]  #Accessing the specific element in a matrix

In [18]:
colnames(x) <- c('col1','col2')
rownames(x) <- c('row1','row2','row3','row4')
x

Unnamed: 0,col1,col2
row1,1,5
row2,2,6
row3,3,7
row4,4,8


In [19]:
x['row1','col1']

In [20]:
m1 <- matrix(1:25, byrow=T, nrow=5)
m1

0,1,2,3,4
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,17,18,19,20
21,22,23,24,25


In [21]:
m2 <- matrix(1:25, nrow=5)  # Fill the value by columns
m2

0,1,2,3,4
1,6,11,16,21
2,7,12,17,22
3,8,13,18,23
4,9,14,19,24
5,10,15,20,25


In [22]:
is.matrix(x)   # Validate the whether x is a matrix or not
# as.matrix(x)   --> to convert x from other datastructure to matrix data structure

In [23]:
nrow(x)    #Tells number of rows

In [24]:
ncol(x)    #Tells number of columns

In [25]:
class(x)

__Property of Matrices__

In [26]:
dim(x)   #dimension of matrix

In [27]:
mode(x)   #Informs the type or storage mode of an object, e.g., numerical, logical etc.

In [28]:
attributes(x)    #provides all the attributes of an object, For matrix it will shows dimensions

Note: __data.matrix__ --> attempts to convert into a __numeric__ matrix

__Matrix Operations:__

In [29]:
#diagonal matrix
d <- diag(1, nrow=2, ncol=2)
d

0,1
1,0
0,1


In [30]:
diag(x)  #if x is normal matrix then gives diagonal elements

In [31]:
#Transpose of a matrix X
xt <- t(x)
xt

Unnamed: 0,row1,row2,row3,row4
col1,1,2,3,4
col2,5,6,7,8


In [32]:
#Multiplication of a matrix with a constant
4*x

Unnamed: 0,col1,col2
row1,4,20
row2,8,24
row3,12,28
row4,16,32


In [33]:
# Element wise multiplication
x * x

Unnamed: 0,col1,col2
row1,1,25
row2,4,36
row3,9,49
row4,16,64


In [34]:
#Matrix multiplication(%*%)
xtx <- t(x) %*% x
xtx

Unnamed: 0,col1,col2
col1,30,70
col2,70,174


In [35]:
#Cross product of a matrix X
xtx2 <- crossprod(x)
xtx2

Unnamed: 0,col1,col2
col1,30,70
col2,70,174


__Note:__

1) Command `crossprod()` --> executes the multiplication faster than the conventional method with `t(x)%*%x`

2) Addition and subtraction of matrices (of same dimensions!) can be executed with the usual operators + and –

In [36]:
mat<-matrix(data=c(1:9),nrow=3,ncol=3)
mat

0,1,2
1,4,7
2,5,8
3,6,9


In [37]:
#Inverse of the matrix
print(det(mat))     #det(mat)=0,so inverse is not possible
solve(mat)

[1] 0


ERROR: Error in solve.default(mat): Lapack routine dgesv: system is exactly singular: U[3,3] = 0


In [38]:
eigen(mat)   #finds the eigen values and eigen vectors of a positive definite matrix

eigen() decomposition
$values
[1]  1.611684e+01 -1.116844e+00 -1.576734e-16

$vectors
           [,1]       [,2]       [,3]
[1,] -0.4645473 -0.8829060  0.4082483
[2,] -0.5707955 -0.2395204 -0.8164966
[3,] -0.6770438  0.4038651  0.4082483


## Array

R Array is the data objects which can store data in more than two dimensions. An Array is created using the Array() function. The array can store only data type. Array takes vectors as input and uses the values in the dim parameter to create an Array.

In [39]:
#ARRAY : Similar to matrices but can have more than two dimensions
?array  #array(data = NA, dim = length(data), dimnames = NULL)

In [40]:
Arr <- array (c(1:27), dim = c(3,3,3))
Arr

In [41]:
print(Arr[1,1,1])
print(Arr[2,1,1])
print(Arr[3,1,2])

[1] 1
[1] 2
[1] 12


In [42]:
z <- array(1:24, dim=c(2,3,4))
z

`as.array(z)`  --> convert the datatype into array

`is.array(z)` --> validate whether a datatype is array

## DataFrame (designed for data sets):

Dataframe is a table or two-dimensional array like structure where each column contains values of one variable and each row contains one set of values from each column.

1)	In a data frame, we can combine variables  of equal length,  with each row in the data frame containing observations on the same unit.

2)	Similar to matrix and cbind functions,but Advantage is that one can make changes to the data without affecting the original data.

3)	One can also combine numerical variables, character strings as well as factors in data frame(cbind and matrix functions can not be used to combine different types of data).

4)	columns contain variables and observations are contained in rows.

5)	Data frames contain complete data sets that are mostly created with other programs (spreadsheet-files, software SPSS-files, Excelfiles etc.).

6)	Variables in a data frame may be numeric (numbers) or categorical (characters or factors).

7)	Extract a variable from data frame using $ Variables can be extracted using the $ operator followed by the name of the variable.   Ex: painters$School

8)	The data from a data frame can be extracted by using the matrixstyle [row, column] indexing.Ex: painters["Da Udine", "Composition"]

9) Rows which has connections(within same column same data type should present,across columns different data can present)

10) To convert vectors to data frames use data.frame() command

In [1]:
# DATAFRAME: Similar to general metrics but its columns can contain 
# different modes of data types such as numeric and character

num <- c(2, 3, 5) 
char <- c("aa", "bb", "cc") 
log <- c(TRUE, FALSE, TRUE) 
df = data.frame(num, char, log)       # df is a data frame
 
df

num,char,log
<dbl>,<fct>,<lgl>
2,aa,True
3,bb,False
5,cc,True


In [2]:
#inbuilt dataframe in R - mtcars
mtcars
#data description
#https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [3]:
head(mtcars) # head()- first several rows

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [4]:
tail(mtcars)  # tail()-last several rows

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


In [5]:
str(mtcars) # structure of the dataset

'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


In [6]:
mtcars[1:2,1:4] # First 4 attributes of the first 2 brand of the car

Unnamed: 0_level_0,mpg,cyl,disp,hp
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21,6,160,110
Mazda RX4 Wag,21,6,160,110


In [9]:
summary(mtcars)

      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000  

**Different ways to access the data:**

1. mtcars$mpg
2. mtcars[,'mpg']
3. mtcars[,1]
4. mtcars[['mpg']]

These commands will return vector of the column

5. mtcats['mpg']
6. mtcars[c('mpg','cyl')]

Above commands will return data frame data type

In [10]:
mtcars$mpg #Accessing column - Return Vector

In [11]:
mtcars$disp #Accessing column  - Return Vector

In [12]:
mtcars$gear

In [13]:
df<-mtcars
#Using $ 
print(df$wt)
print(df$disp)          # Access column
print(df$disp[2])

# Using []
print(df[2,])           # access 2nd row
print(df[,"disp"])      # access disp column
print(df[2, "disp"])    # access disp column of 2nd row
print(df[2,3])          # access 3rd column of 2nd row

 [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
[13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
[25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780
 [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
[13] 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0 304.0 350.0
[25] 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0
[1] 160
              mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
 [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
[13] 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0 304.0 350.0
[25] 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0
[1] 160
[1] 160


In [14]:
# Dropping a attribute
df[,-3] # Drop 3 column

Unnamed: 0_level_0,mpg,cyl,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,123,3.92,3.44,18.3,1,0,4,4


In [15]:
df[,-c(2,3)] # Drop 2nd and 3rd column

Unnamed: 0_level_0,mpg,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,123,3.92,3.44,18.3,1,0,4,4


In [16]:
# Subset ()
car1 <- subset (df, cyl > 6)
print(car1)
print(car1$cyl)

car2 <- subset (df, hp >50)
print(car2)
print(car2$hp)

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1   

In [17]:
#rbind() - Combine row
print(str(df))
df1 <- df[1:20,]
print(str(df1))
 
df2 <- df[21:32,]
print(str(df2))
 
df_full <- rbind(df1,df2)
print(str(df_full))

'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
NULL
'data.frame':	20 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 

In [18]:
# cbind() - Combine column
df3 <- df$mpg
print(df3)
df4 <- df$cyl
print(df4)
df_full <- cbind(df3,df4)
print(df_full) 

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
       df3 df4
 [1,] 21.0   6
 [2,] 21.0   6
 [3,] 22.8   4
 [4,] 21.4   6
 [5,] 18.7   8
 [6,] 18.1   6
 [7,] 14.3   8
 [8,] 24.4   4
 [9,] 22.8   4
[10,] 19.2   6
[11,] 17.8   6
[12,] 16.4   8
[13,] 17.3   8
[14,] 15.2   8
[15,] 10.4   8
[16,] 10.4   8
[17,] 14.7   8
[18,] 32.4   4
[19,] 30.4   4
[20,] 33.9   4
[21,] 21.5   4
[22,] 15.5   8
[23,] 15.2   8
[24,] 13.3   8
[25,] 19.2   8
[26,] 27.3   4
[27,] 26.0   4
[28,] 30.4   4
[29,] 15.8   8
[30,] 19.7   6
[31,] 15.0   8
[32,] 21.4   4


In [19]:
colnames(df)  #Gives columns names for df dataframe

In [20]:
rownames(df) #Gives row names for df dataframe

In [21]:
t(df)   #changes data frame to matrix, to convert back to dataframe use as.dataframe(df)

Unnamed: 0,Mazda RX4,Mazda RX4 Wag,Datsun 710,Hornet 4 Drive,Hornet Sportabout,Valiant,Duster 360,Merc 240D,Merc 230,Merc 280,...,AMC Javelin,Camaro Z28,Pontiac Firebird,Fiat X1-9,Porsche 914-2,Lotus Europa,Ford Pantera L,Ferrari Dino,Maserati Bora,Volvo 142E
mpg,21.0,21.0,22.8,21.4,18.7,18.1,14.3,24.4,22.8,19.2,...,15.2,13.3,19.2,27.3,26.0,30.4,15.8,19.7,15.0,21.4
cyl,6.0,6.0,4.0,6.0,8.0,6.0,8.0,4.0,4.0,6.0,...,8.0,8.0,8.0,4.0,4.0,4.0,8.0,6.0,8.0,4.0
disp,160.0,160.0,108.0,258.0,360.0,225.0,360.0,146.7,140.8,167.6,...,304.0,350.0,400.0,79.0,120.3,95.1,351.0,145.0,301.0,121.0
hp,110.0,110.0,93.0,110.0,175.0,105.0,245.0,62.0,95.0,123.0,...,150.0,245.0,175.0,66.0,91.0,113.0,264.0,175.0,335.0,109.0
drat,3.9,3.9,3.85,3.08,3.15,2.76,3.21,3.69,3.92,3.92,...,3.15,3.73,3.08,4.08,4.43,3.77,4.22,3.62,3.54,4.11
wt,2.62,2.875,2.32,3.215,3.44,3.46,3.57,3.19,3.15,3.44,...,3.435,3.84,3.845,1.935,2.14,1.513,3.17,2.77,3.57,2.78
qsec,16.46,17.02,18.61,19.44,17.02,20.22,15.84,20.0,22.9,18.3,...,17.3,15.41,17.05,18.9,16.7,16.9,14.5,15.5,14.6,18.6
vs,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
am,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
gear,4.0,4.0,4.0,3.0,3.0,3.0,3.0,4.0,4.0,4.0,...,3.0,3.0,3.0,4.0,5.0,5.0,5.0,5.0,5.0,4.0


In [22]:
# In a data frame, character vectors are automatically converted into factors
name <- c("joe","john","nancy")
print(class(name))
sex <- c("M","M","F")
print(class(sex))
age <- c(27,26,26)
print(class(age))
 
df <- data.frame(name,sex,age)
print(df)
print(class(df))
print(class(df$name))
print(class(df$sex))
print(class(df$age))

[1] "character"
[1] "character"
[1] "numeric"
   name sex age
1   joe   M  27
2  john   M  26
3 nancy   F  26
[1] "data.frame"
[1] "factor"
[1] "factor"
[1] "numeric"


In [23]:
object.size(df)  #check the size of the object 

2144 bytes

__Note:__ Matrix requires smaller space compared with data frames

__attach(rdf)__ all columns of the dataframe is available for global enviornment which can be accessed directly  Never do it as it will replace same variable with the same name which is outside dataframe (attach() over the data frame, the variables can be referenced directly by name.)

__detach()__ recovers the default setting   and then  we  have to use   painters$ again.

In [24]:
# For all the columns use command stack(df)
# all colums will sit on together and new row is created with name ind which indicates the column name from which it came
stack(df)

“non-vector columns will be ignored”

values,ind
<dbl>,<fct>
27,age
26,age
26,age


## Factor

In a data frame, character variables are automatically changed or converted into factor, and the number of levels can be determined as the number of different values in such a vector.

Factor takes a limited number of different values, such variables are referred to as categorical variables. So, Factor represents the categorical data,  the factor can be ordered or unordered and are an important class for statistical analysis and for plotting. Factor variables are very useful to many different types of graphics.

Storing data factors insures that the modeling functions will treat such data correctly. The factor can store both integers and strings. These are very useful in the columns which have a limited number of unique values such as “Male, Female” and “True, False” etc.

Factors in R has two varieties

1) ordered

2) unordered.

Factors are stored as a vector of integer values, with a corresponding set of character values to use when the factor is shown. `factor()` function is used to create a factor. The required argument to factor is a vector of values, which will be returned as a vector of factor values. Numeric and Character variables both can be made into factors, but a factor’s levels will always be character values.

1) R’s term for a categorical variable is a factor.

2) In R, each possible value of a categorical variable is called a level.

3) A vector of levels is called a factor.

4) A categorical variable is characterized by a (here: finite) number of  levels called as factor levels.

`factor(x = character(), levels, labels = levels, exclude = NA, …)`

__levels  :__  Determines the categories of the factor variable. Default is the sorted list of all the distinct values of x.

__labels  :__  (Optional) Vector of values that will be the labels of the categories in the levels argument.

__exclude :__  (Optional) It defines which levels will be classified as NA in any output using the factor variable.

In [25]:
y <-c(1,4,3,5,4,2,4)
possible.dieface<- c(1,2,3,4,5,6)
labels.dieface <-c('one','two','three','four','five','six')
fac <-factor(y,levels=possible.dieface,labels=labels.dieface)
fac

__unclass()__ is used to  temporarily remove the effects of class.

In [26]:
x<- factor(c('juice','juice','lemonade','juice','water'))
x

In [27]:
unclass(x)

In [28]:
x<-factor(c('juice','juice','lemonade','juice','water'),levels=c('water','juice','lemonade'))
x

In [29]:
# ordered factor
income=ordered(c('high','medium','low','medium','medium'),levels=c('low','medium','high'))
income

In [30]:
unclass(income)

In [31]:
# A vector can be turned into a factor with the command as.factor
x<-c(4,5,1,2,3,4,4,5,6)
x<-as.factor(x)
x

In [32]:
# Example  - Change the name of the factors using level()
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
gender_vector
class(gender_vector)

In [33]:
# Convert gender_vector to a factor
factor_gender_vector <- as.factor(gender_vector)
print(factor_gender_vector)    # factor_gender has two levels - Male and Female
 
# Change the name of the factors using level()
levels(factor_gender_vector) <- c("F", "M")
print(factor_gender_vector) 

[1] Male   Female Female Male   Male  
Levels: Female Male
[1] M F F M M
Levels: F M


## List

These are the most complex data structure. A List may contain a combination of vectors, matrices, data frames and even other list itself. The list is being created using `List()` function in R.  A list is a generic vector containing other objects. Lists is a data structure containing of mixed data types. A vector which have all elements of same type is called atomic vector but a vector having elements of various type is called List.

In [34]:
b<-list('India','USA')
b

In [35]:
my_vector<-1:10
my_matrix<-matrix(data=c(1:9),ncol=3)
my_df<-mtcars[1:3,]

my_list<-list(my_vector,my_matrix,my_df)
my_list

0,1,2
1,4,7
2,5,8
3,6,9

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
