Conditioning and Subsetting
    Syntax: dataframe[rows, columns]

New columns/rows

Excercise:
    Create a clone of an existing data frame (or create your own if time permits)
    Manipulate the data frame by adding new and old columns/rows
    Sorting your data frame
    
Helpful functions
    head(), summary(), glimpse(), is.na(), ...
    
Excercise:
Use these newly learned functions to find specific information in your data frame

# Week 4: Data Frames

This week, we will discover Data Frames--the fundamental data structure to deal with data sets!

A data frame is closely related to a list. In Layman terms, a data frame is a sub category of a list because it has multiple restrictions on how a data frame is defined; thus, it represents a list with the class, "data.frame". 

**Restrictions/Requirements for a Data Frame**

1) Components must consist of vectors, factors, numeric matrices, lists, or other data frames.

2) Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.

3) Numeric vectors, logicals, and factors are left alone. Character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.

4) If you are adding vector structures to the data frame, it is important to note that *all vector structures* must be the SAME LENGTH! Similarly, *matrix structures* must all have the SAME ROW SIZE. 

**Why not use a matrix? Isn't this what a data frame basically is?**

Not necessarily! The main reason a data frame is so useful to manipulate datasets is to include numerics, characters, and other classes of objects. You may ask why not use a matrix. That's because a matrix does not have the versatility of a data frame in the sense that it can only hold *numbers*.

**Why?** 
A data frame may for many purposes be regarded as a matrix with columns possibly of
differing modes and attributes. It may be displayed in matrix form, and its rows and columns
extracted using matrix indexing conventions.




## Creating a Data Frame


Keep in mind that a data frame is basically just a data set! Examples of where data sets imported from include CSV files, relational databases, and software packages. 

However, what if you want to create a data frame from scratch using the amazing R programming langauge??
**Use the function**, ' <code>data.frame()</code> '.


Now, consider a scenario where you have 3 vectors and you want to compose a data from from them. Below, you will notice a three vectors: forecast (description of the weather in words), low_temperature (what is the expected lowest temperature), and precipitation (whether or not it rains). 

**Notice how each vector contains various classes. In this case, characters, numerics, and booleans respectively. Onward to making a data frame...**

<code> 
> forecast <- c("showers", "cloudy", "cloudy", "rain", "rain")

> low_temperature <- c(46, 34, 44, 55, 53)

> precipitation <- c(TRUE, FALSE, FALSE, TRUE, TRUE)

> data.frame(forecast, low_temperature, precipitation) 

      forecast low_temperature precipitation
    1  showers              46          TRUE
    2   cloudy              34         FALSE
    3   cloudy              44         FALSE
    4     rain              55          TRUE
    5     rain              53          TRUE 
</code>


## Vocabulary

-The top horizontal line that starts with "forecast", is called the **header**. 

-Following the header, each horizontal how underneath it is called a **data row**.

-Each element is synonymous with a **cell**.



### Subscripting and Subsetting




### Some Useful Functions To Make Data Frames Easier To Deal With

**head(data_frame, n = number of rows returned)**
- head() ==> will return first 'n' rows starting from the beginning of the data frame.


**tail(data_frame, n = number of rows returned)**
- tail() ==> will return first 'n' rows starting from the end of the data frame.


**summary(data_frame)**
- summary() ==> a function that will produce a gist of the results of each individual header. Examples of the statistical summaries that will be outputted include: 
    - Minimum
    - 1st Quartile
    - Median
    - Mean
    - 3rd Quartile
    - Maximum

- However, the function invokes particular methods which depend on the class of the first argument.


**str(data_frame)**
- str() ==> an alternative to 'summary()'. Displays the internal **str**ucture of a data frame, or any other object placed in parentheses.


**glimpse(data_frame)**
- glimpse() ==> basically transposes (the rows become the columns and the                   columns become the rows) the data frame

- The header becomes vertically-orientated


**nrow(data_frame)**
- nrow() ==> gives you the **n**umber of **row**s in the data frame


**ncol(data_frame)**
- ncol() ==> gives you the **n**umber of **col**umns in the data frame


**help(data_frame)**
- help() ==> gives you additional information about the given data frame  
     that is available in R documentation.










## Excercise (Try some on your own using RStudio!): 
- Create a clone of an existing data frame (or create your own if time permits)
- Manipulate the data frame by adding new and old columns/rows
- Sort your data frame

You can look up your own data frames on R studio by downloading packages:
<code> install.packages("dplyr")
install.packages("ggplot2")

library("ggplot2")
library("dplyr")

car <- mtcars

class(car)
class(11)
class(c(1,2,3,4,5))
class("11")

head(car)
tail(car)
summary(car)
glimpse(car)

car$mpg
hp <- car$hp
car
head(car) 
hp
car$hp

plot( x=car$mpg, y=hp)
# car[row, columns] 
car[car$disp > 235, ]
car[car$disp > mean(car$disp), ]
car[car$disp > mean(car$disp) & (car$hp > 180), ] </code>


In [None]:
# install.packages() ==> Download and install packages from CRAN-like 
## repositories or from local files.
install.packages("dplyr")
install.packages("ggplot2")

# library() ==> see all the packages installed
library("ggplot2")
library("dplyr")

# mtcars is a built-in data frame in R. This data frame holds information
## about various cars and their specifications such as miles per gallon, 
### horespower, etc.
car <- mtcars # now you can write car instead of mtcars

# Here we will use classes to identify, you guessed it, each class.
class(car)
class(11)
class(c(1,2,3,4,5))
class("11")

# useful functions
head(car)
tail(car)
summary(car)
glimpse(car)
nrow(car)
ncol(car)
help(car)

# format: data_frame$vector
car$mpg
hp <- car$hp
car
head(car) 
hp
car$hp

car[car$disp > 235, ]
car[car$disp > mean(car$disp), ]
car[car$disp > mean(car$disp) & (car$hp > 180), ] 



In [None]:
# If you inputted everything above correctly, you should see something like
## this in the console!

> car <- mtcars

> 

> class(car)
[1] "data.frame"
> class(11)
[1] "numeric"
> class(c(1,2,3,4,5))
[1] "numeric"
> class("11")
[1] "character"

> 

> head(car)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


> tail(car)
                mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2


> summary(car)
      mpg             cyl             disp             hp             drat      
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930  
       wt             qsec             vs               am              gear      
 Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000  
 1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000  
 Median :3.325   Median :17.71   Median :0.0000   Median :0.0000   Median :4.000  
 Mean   :3.217   Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688  
 3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000  
 Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000  
      carb      
 Min.   :1.000  
 1st Qu.:2.000  
 Median :2.000  
 Mean   :2.812  
 3rd Qu.:4.000  
 Max.   :8.000  


> glimpse(car)
Observations: 32
Variables: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4...
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8...
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, ...
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205...
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07...
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, ...
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, ...
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0...
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0...
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3...
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4...

> 

> car$mpg
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

> hp <- car$hp

> car
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2


> head(car) 
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


> hp
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97
[22] 150 150 245 175  66  91 113 264 175 335 109


> car$hp
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97
[22] 150 150 245 175  66  91 113 264 175 335 109

> 

> car[car$disp > 235, ]
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8


> car[car$disp > mean(car$disp), ]
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8


> car[car$disp > mean(car$disp) & (car$hp > 180), ]
                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
Ford Pantera L      15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

## Congratulations!
You're done with tonight's exercises! Check back to [the syllabus]
(https://github.com/JasonFreeberg/R_Tutorials/blob/master/README.md) for this week's homework. And remember... *if you're going 
through hell, you keep going.*

And remember, don't be afraid to ask questions if you need guidance! We are here to help you learn!