Conditioning and Subsetting
    Syntax: dataframe[rows, columns]

New columns/rows

Excercise:
    Create a clone of an existing data frame (or create your own if time permits)
    Manipulate the data frame by adding new and old columns/rows
    Sorting your data frame
    
Helpful functions
    head(), summary(), glimpse(), is.na(), ...
    
Excercise:
Use these newly learned functions to find specific information in your data frame

# Week 4: Data Frames

This week, we will discover Data Frames--the fundamental data structure to deal with data sets!

A data frame is closely related to a list. In Layman terms, a data frame is a sub category of a list because it has multiple restrictions on how a data frame is defined; thus, it represents a list with the class, "data.frame". 

**Restrictions/Requirements for a Data Frame**

1) Components must consist of vectors, factors, numeric matrices, lists, or other data frames.

2) Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.

3) Numeric vectors, logicals, and factors are left alone. Character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.

4) If you are adding vector structures to the data frame, it is important to note that *all vector structures* must be the SAME LENGTH! Similarly, *matrix structures* must all have the SAME ROW SIZE. 

**Why not use a matrix? Isn't this what a data frame basically is?**

Not necessarily! The main reason a data frame is so useful to manipulate datasets is to include numerics, characters, and other classes of objects. You may ask why not use a matrix. That's because a matrix does not have the versatility of a data frame in the sense that it can only hold *numbers*.

**Why?** 
A data frame may for many purposes be regarded as a matrix with columns possibly of
differing modes and attributes. It may be displayed in matrix form, and its rows and columns
extracted using matrix indexing conventions.




## Creating a Data Frame


Keep in mind that a data frame is basically just a data set! Examples of where data sets imported from include CSV files, relational databases, and software packages. 

However, what if you want to create a data frame from scratch using the amazing R programming langauge??
**Use the function**, ' <code>data.frame()</code> '.


Now, consider a scenario where you have 3 vectors and you want to compose a data from from them. Below, you will notice a three vectors: forecast (description of the weather in words), low_temperature (what is the expected lowest temperature), and precipitation (whether or not it rains). 

**Notice how each vector contains various classes. In this case, characters, numerics, and booleans respectively. Onward to making a data frame...**

<code> 
> forecast <- c("showers", "cloudy", "cloudy", "rain", "rain")

> low_temperature <- c(46, 34, 44, 55, 53)

> precipitation <- c(TRUE, FALSE, FALSE, TRUE, TRUE)

> data.frame(forecast, low_temperature, precipitation) 

      forecast low_temperature precipitation
    1  showers              46          TRUE
    2   cloudy              34         FALSE
    3   cloudy              44         FALSE
    4     rain              55          TRUE
    5     rain              53          TRUE 
</code>


## Vocabulary

-The top horizontal line that starts with "forecast", is called the **header**. 

-Following the header, each horizontal how underneath it is called a **data row**.

-Each element is synonymous with a **cell**.



## Excercise (Try some on your own using RStudio!): 
- Create a clone of an existing data frame (or create your own if time permits)
- Manipulate the data frame by adding new and old columns/rows
- Sort your data frame

You can look up your own data frames on R studio by downloading packages:
<code> install.packages("dplyr")
install.packages("ggplot2")

library("ggplot2")
library("dplyr")

car <- mtcars

class(car)
class(11)
class(c(1,2,3,4,5))
class("11")

head(car)
tail(car)
summary(car)
glimpse(car)

car$mpg
hp <- car$hp
car
head(car) 
hp
car$hp

plot( x=car$mpg, y=hp)
# car[row, columns] 
car[car$disp > 235, ]
car[car$disp > mean(car$disp), ]
car[car$disp > mean(car$disp) & (car$hp > 180), ] </code>


### Some Useful Functions To Make Data Frames More Fun!

**head(data_frame, n = number of rows returned)**
- head() ==> will return first 'n' rows starting from the beginning of the data frame.


**tail(data_frame, n = number of rows returned)**
- tail() ==> will return first 'n' rows starting from the end of the data frame.


**summary(data_frame)**
- summary() ==> a function that will produce a gist of the results of each individual header. Examples of the statistical summaries that will be outputted include: 
    - Minimum
    - 1st Quartile
    - Median
    - Mean
    - 3rd Quartile
    - Maximum

- However, the function invokes particular methods which depend on the class of the first argument.


**glimpse(data_frame)**
- glimpse() ==> basically transposes (the rows become the columns and the columns become the rows) the data frame

- the header becomes vertically-orientated


**nrow(data_frame)**
- nrow() ==> 


**ncol(data_frame)**
- 


**help(data_frame)**
- 





In [None]:
# install.packages() ==> Download and install packages from CRAN-like 
## repositories or from local files.
install.packages("dplyr")
install.packages("ggplot2")

# library() ==> see all the packages installed
library("ggplot2")
library("dplyr")

# mtcars is a built-in data frame in R. This data frame holds information
## about various cars and their specifications such as miles per gallon, 
### horespower, etc.
car <- mtcars # now you can write car instead of mtcars

# Here we will use classes to identify, you guessed it, each class.
class(car)
class(11)
class(c(1,2,3,4,5))
class("11")

# useful functions
head(car)
tail(car)
summary(car)
glimpse(car)
nrow(car)
ncol(car)
help(car)

# data_frame$vector
car$mpg
hp <- car$hp
car
head(car) 
hp
car$hp

plot( x=car$mpg, y=hp)
# car[row, columns] 
car[car$disp > 235, ]
car[car$disp > mean(car$disp), ]
car[car$disp > mean(car$disp) & (car$hp > 180), ] 

## Congratulations!
You're done with tonight's exercises! Check back to [the syllabus]
(https://github.com/JasonFreeberg/R_Tutorials/blob/master/README.md) for this week's homework. And remember... *if you're going 
through hell, you keep going.*

And remember, don't be afraid to ask questions if you need guidance! We are here to help you learn!