diff --git a/Week 05/pre-class-05.Rmd b/Week 05/pre-class-05.Rmd
index 0f4015a..483d292 100644
--- a/Week 05/pre-class-05.Rmd
+++ b/Week 05/pre-class-05.Rmd
@@ -95,16 +95,83 @@ bmi Body Mass Index
### Question 1: Standardize Function
-A. Create a function called standardize.me() that takes a numeric vector as an argument, and returns the standardized version of the vector.
+A. Create a function called standardize.me() that takes a numeric vector as an argument, and returns the standardized version of the vector.
+```{r}
+#The standardize.me function takes each element of a numeric vector x, subtracts the mean of x from it and then divides by the standard deviation of x. This creates a standardized version of the vector x.
+standardize.me <- function(x){
+ standardized <- (x - mean(x))/sd(x)
+ standardized
+}
+```
B. Assign all the numeric columns of the original WCGS dataset to a new dataset called WCGS.new.
+```{r}
+#Using the dplyr function "select_if", the columns of the original dataset that are numeric are assigned to a new dataset WCGS.new.
+library(dplyr)
+WCGS.new <- select_if(wcgs, is.numeric)
+
+```
C. Using a loop and your new function, standardize all the variables WCGS.new dataset.
+```{r}
+#This loop standardizes each column of the new dataset containing only numeric columns.
+for(i in seq_along(WCGS.new)){
+ WCGS.new[,i] <- standardize.me(WCGS.new[,i])
+}
+#Notice that since the "chol" column contains an NA value, the whole column becomes NA when the standardize function is applied to it, since the mean is not calculated with NA values. The function could be fixed to deal with the cases of NA value - since this is not explicitly asked for, I'll leave it out.
+```
D. What should the mean and standard deviation of all your new standardized variables be? Test your prediction by running a loop
-
+```{r}
+#After being standardized, the mean should be 0 and the standard deviation should be 1 for each of the standardized variables. Let's check this:
+means <- numeric(0)
+sds <- numeric(0)
+for(i in seq_along(WCGS.new)){
+ means[i] <- mean(WCGS.new[,i])
+ sds[i] <- sd(WCGS.new[,i])
+}
+means
+sds
+#In fact, excluding the NA values, the mean is extremely close to 0 for each of the standardized variables, and the standard deviations are all exactly 1.
+```
### Question 2: Looping to Calculate
A. Using a loop, calculate the mean weight of the subjects separated by the type of CHD they have.
+```{r}
+#First we initiate vectors that will contain the number of people with each type of CHD, and then the sum of their weights. Then these two vectors can be divided to produce the mean weight of the subjects by the type of CHD they have.
+num_type <- numeric(4)
+names(num_type) <- c("no CHD", "MI or SD", "silent MI", "angina")
+sum_weight <- numeric(4)
+names(sum_weight) <- c("no CHD", "MI or SD", "silent MI", "angina")
+#This for loop looks at each row, determines the type of CHD the patient has, and adds 1 to the number of people with that type of CHD, then adds the weight of the patient to the sum of the weights of the patients with that type of CHD.
+for(i in 1:nrow(wcgs)){
+ if(wcgs$typchd69[i] == "no CHD"){
+ num_type[1] <- num_type[1] + 1
+ sum_weight[1] <- sum_weight[1] + wcgs$weight[i]
+ }
+ else if(wcgs$typchd69[i] == "MI or SD"){
+ num_type[2] <- num_type[2] + 1
+ sum_weight[2] <- sum_weight[2] + wcgs$weight[i]
+ }
+ else if(wcgs$typchd69[i] == "silent MI"){
+ num_type[3] <- num_type[3] + 1
+ sum_weight[3] <- sum_weight[3] + wcgs$weight[i]
+ }
+ else if(wcgs$typchd69[i] == "angina"){
+ num_type[4] <- num_type[4] + 1
+ sum_weight[4] <- sum_weight[4] + wcgs$weight[i]
+ }
+}
+#Once the loop is finished, the sum of the weights is divided by the number of people with each type of CHD, giving the mean weight of the subjects separated by the type of CHD they have.
+mean_by_type <- sum_weight/num_type
+names(mean_by_type) <- c("no CHD", "MI or SD", "silent MI", "angina")
+mean_by_type
+```
B. Now do the same thing, but now don’t use a loop
+```{r}
+#Now, using dplyr functions and piping, we group by the type of CHD and summarise by the mean weight.
+wcgs %>%
+ group_by(typchd69) %>%
+ summarise(mean_weight = mean(weight))
+```
diff --git a/Week 06/README.md b/Week 06/README.md
new file mode 100644
index 0000000..2f8715c
--- /dev/null
+++ b/Week 06/README.md
@@ -0,0 +1,27 @@
+
+# Functions Pre-Class Work
+
+Please complete the following work.
+
+
+### Objectives
+
+1.Gain further practice on functions
+
+
+
+
+## Required Reading
+
+
+You Should read all of these
+
+- [R For Data Science: Functions Chapter](http://r4ds.had.co.nz/functions.html)
+- [Functional Programming - Advanced R](http://adv-r.had.co.nz/Functional-programming.html)
+- [Functionals](http://adv-r.had.co.nz/Functionals.html)
+- [Function Operators](http://adv-r.had.co.nz/Function-operators.html)
+
+
+
+
+Then proceed to the Pre Week 06 RMarkdown file, complete this and commit your work often to begin to learn how to make small changes and committing.
diff --git a/Week 06/pre-class-06.Rmd b/Week 06/pre-class-06.Rmd
new file mode 100644
index 0000000..72c0c84
--- /dev/null
+++ b/Week 06/pre-class-06.Rmd
@@ -0,0 +1,70 @@
+# pre-class
+
+
+Make sure you commit this often with meaningful messages.
+
+
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+1. Read the source code for each of the following three functions, puzzle out what they do, and then brainstorm better names.
+
+```
+f1 <- function(string, prefix) {
+ substr(string, 1, nchar(prefix)) == prefix
+}
+```
+*This function takes the inputs "string" and "prefix", and checks if "prefix" is a prefix of "string", e.g. "ab" is a prefix of "abc", "e" is not a prefix of "abc". A better name for the function might be "check_prefix()"*
+```
+f2 <- function(x) {
+ if (length(x) <= 1) return(NULL)
+ x[-length(x)]
+}
+```
+*This function takes a vector, checks if it has a length longer than 1 (if it does not, it returns NULL). If it does have length longer than 1, it returns the vector with the last element in the vector removed. A good name might be "remove_last()".*
+
+```
+f3 <- function(x, y) {
+ rep(y, length.out = length(x))
+}
+```
+*This function takes two input vectors x and y, it then repeats the vector y, it then repeats the vector y up to the length of x, e.g. if y = 1:5 and x = 1:20, then the function returns 1:5 repeated 4 times, if y = 1:7 and x = 1:20, then the function returns 1:7 repeated twice, then 1:6, since the length of x is 20. A better name might be "repeat_y_with_length_x()".*
+
+2. Compare and contrast rnorm() and MASS::mvrnorm(). How could you make them more consistent?
+
+*Firstly, the MASS::mvrnorm() function produces samples from a multivariate normal distribution, whereas the rnorm() function only simulates from the normal distribution. If the MASS::mvrnorm() function is given single values rather than a vector of values, it should produce similar results to the rnorm() function.*
+
+*The function rnorm() has no default value for n, whereas mvrnorm() has default value n = 1. On the other hand, rnorm() has default "mean = 0", whereas mvrnorm() has no default value for the mean "mu", similarly rnorm has default "sd = 1", whereas mvrnorm() has no default for the standard deviation "Sigma". In other words, rnorm() defaults to the standard normal distribution, whereas mvrnorm() needs to have the mean and standard deviation specified. Finally, mvrnorm() returns a matrix whereas rnorm() returns a vector of values.*
+
+*It makes sense to keep it such that mvrnorm() returns a matrix so it can also give information for the case when it is giving samples from a multivariate distribution. The best things to do to make these two functions more consistent by giving both functions the same argument names and default values, e.g. n = 1, mean = 0, sd = 1.*
+
+3. Use `lapply()` and an anonymous function to find the coefficient of variation (the standard deviation divided by the mean) for all columns in the mtcars dataset.
+```{r}
+#First load the mtcars data set.
+mtcars <- mtcars
+#Using lapply with the anonymous function that gives the standard deviation divided by the mean gives the desired value for each column, since data frames are treated as a collection of lists corresponding to each column, so the function is applied to each column of mtcars.
+lapply(mtcars, function(x) sd(x)/mean(x))
+```
+4. Use vapply() to:
+ a. Compute the standard deviation of every column in a numeric data frame.
+```{r}
+#This function uses vapply to calculate the standard deviation of each column, with FUN.VALUE set to numeric to ensure that the function returns a numeric vector.
+column_sd <- function(df){
+ vapply(df, sd, FUN.VALUE = numeric(1))
+}
+```
+ b. Compute the standard deviation of every numeric column in a mixed data frame. (Hint: you’ll need to use vapply() twice.)
+```{r}
+#This function works similarly to the one in part a, but first checks the columns to find which ones are numeric, then only returns the standard deviation of the columns which are numeric.
+num_column_sd <- function(df){
+ num_columns <- numeric(0)
+ for(i in seq_along(df)){
+ if(is.numeric(df[,i]))
+ num_columns <- c(num_columns, i)
+ }
+ vapply(df[,num_columns], sd, numeric(1))
+}
+```
diff --git a/Week 07/README.md b/Week 07/README.md
new file mode 100644
index 0000000..d0ded87
--- /dev/null
+++ b/Week 07/README.md
@@ -0,0 +1,23 @@
+# Simulating Gamblers Ruin
+
+## Description of Problem
+
+[Wikipedia]() describe the Gambler's Ruin as follows:
+
+The term gambler's ruin is a statistical concept expressed in a variety of forms:
+
+- The original meaning is that a persistent gambler who raises his bet to a fixed fraction of bankroll when he wins, but does not reduce it when he loses, will eventually and inevitably go broke, even if he has a positive expected value on each bet.
+- Another common meaning is that a persistent gambler with finite wealth, playing a fair game (that is, each bet has expected value zero to both sides) will eventually and inevitably go broke against an opponent with infinite wealth. Such a situation can be modeled by a random walk on the real number line. In that context it is provable that the agent will return to his point of origin or go broke and is ruined an infinite number of times if the random walk continues forever.
+- The result above is a corollary of a general theorem by Christiaan Huygens which is also known as gambler's ruin. That theorem shows how to compute the probability of each player winning a series of bets that continues until one's entire initial stake is lost, given the initial stakes of the two players and the constant probability of winning. This is the oldest mathematical idea that goes by the name gambler's ruin, but not the first idea to which the name was applied.
+- The most common use of the term today is that a gambler playing a negative expected value game will eventually go broke, regardless of betting system. This is another corollary to Huygens' result.
+- The concept may be stated as an ironic paradox: Persistently taking beneficial chances is never beneficial at the end. This paradoxical form of gambler's ruin should not be confused with the gambler's fallacy, a different concept.
+
+The concept has specific relevance for gamblers; however it also leads to mathematical theorems with wide application and many related results in probability and statistics. Huygens' result in particular led to important advances in the mathematical theory of probability.
+
+
+
+
+## This project
+
+
+Many times an advantage of computer science is that even without knowing how to accomplish a math problem we can actually simulate the result to as much precision as we would like. We will work through simulating the answer to this problem in this project.
diff --git a/Week 07/pre-class-07.Rmd b/Week 07/pre-class-07.Rmd
new file mode 100644
index 0000000..fa4997d
--- /dev/null
+++ b/Week 07/pre-class-07.Rmd
@@ -0,0 +1,44 @@
+---
+title: "Simulations Pre-Class Project"
+date: "Due March 13, 2017 at 5:00pm"
+output:
+ html_document
+
+
+---
+
+
+```{r,setup, echo=FALSE, cache=TRUE}
+## numbers >= 10^5 will be denoted in scientific notation,
+## and rounded to 2 digits
+options(scipen = 3, digits = 3)
+```
+
+
+
+
+#Project Goals:
+
+
+With this project we will simulate a famoues probability problem. This will not require knowledge of probability or statistics but only the logic to follow the steps in order to simulate this problem. This is one way to solve problems by using the computer.
+
+ 1. **Gambler's Ruin**: Suppose you have a bankroll of $1000 and make bets of $100 on a fair game. By simulating the outcome directly for at most 5000 iterations of the game (or hands), estimate:
+ a. the probability that you have "busted" (lost all your money) by the time you have placed your one hundredth bet.
+ b. the probability that you have busted by the time you have placed your five hundredth bet by simulating the outcome directly.
+ c. the mean time you go bust, given that you go bust within the first 5000 hands.
+ d. the mean and variance of your bankroll after 100 hands (including busts).
+ e. the mean and variance of your bankroll after 500 hands (including busts).
+
+Note: you *must* stop playing if your player has gone bust. How will you handle this in the `for` loop?
+
+2. **Markov Chains**. Suppose you have a game where the probability of winning on your first hand is 48%; each time you win, that probability goes up by one percentage point for the next game (to a maximum of 100%, where it must stay), and each time you lose, it goes back down to 48%. Assume you cannot go bust and that the size of your wager is a constant $100.
+ a. Is this a fair game? Simulate one hundred thousand sequential hands to determine the size of your return. Then repeat this simulation 99 more times to get a range of values to calculate the expectation.
+ b. Repeat this process but change the starting probability to a new value within 2% either way. Get the expected return after 100 repetitions. Keep exploring until you have a return value that is as fair as you can make it. Can you do this automatically?
+ c. Repeat again, keeping the initial probability at 48%, but this time change the probability increment to a value different from 1%. Get the expected return after 100 repetitions. Keep changing this value until you have a return value that is as fair as you can make it.
diff --git a/Week 09/README.md b/Week 09/README.md
new file mode 100644
index 0000000..a4a5111
--- /dev/null
+++ b/Week 09/README.md
@@ -0,0 +1,27 @@
+# This Project
+
+## Flipped Material
+
+
+## Flipped Material
+
+- Sign into [Datacamp](https://www.datacamp.com/)
+- Complete [Working with Web Data in R](https://campus.datacamp.com/courses/working-with-web-data-in-r/downloading-files-and-using-api-clients?ex=1)
+- Complete [Webscraping in R from PHP 2560](https://campus.datacamp.com/courses/php-15602560-statistical-programming-in-r).
+
+
+## Exercises
+
+1. Read the HTML content of the following URL with a variable called webpage: https://money.cnn.com/data/us_markets/ At this point, it will also be useful to open this web page in your browser.
+2. Get the session details (status, type, size) of the above mentioned URL.
+3. Extract all of the sector names from the “Stock Sectors” table (bottom left of the web page.)
+4. Extract all of the “3 Month % Change” values from the “Stock Sectors” table.
+5. Extract the table “What’s Moving” (top middle of the web page) into a data-frame.
+6. Re-construct all of the links from the first column of the “What’s Moving” table.
+Hint: the base URL is “https://money.cnn.com”
+7. Extract the titles under the “Latest News” section (bottom middle of the web page.)
+8. To understand the structure of the data in a web page, it is often useful to know what the underlying attributes are of the text you see.
+Extract the attributes (and their values) of the HTML element that holds the timestamp underneath the “What’s Moving” table.
+9. Extract the values of the blue percentage-bars from the “Trending Tickers” table (bottom right of the web page.)
+Hint: in this case, the values are stored under the “class” attribute.
+10. Get the links of all of the “svg” images on the web page.
diff --git a/Week 09/pre-class-09.Rmd b/Week 09/pre-class-09.Rmd
new file mode 100644
index 0000000..5383cea
--- /dev/null
+++ b/Week 09/pre-class-09.Rmd
@@ -0,0 +1,89 @@
+---
+title: "Basic Webscraping"
+---
+
+
+```{r,setup, echo=FALSE, cache=TRUE}
+## numbers >= 10^5 will be denoted in scientific notation,
+## and rounded to 2 digits
+options(scipen = 3, digits = 3)
+```
+
+
+
+## Exercises
+
+1. Read the HTML content of the following URL with a variable called webpage: https://money.cnn.com/data/us_markets/ At this point, it will also be useful to open this web page in your browser.
+```{r}
+#After installing the 'rvest' package, we can use the read_html function on the the given url to save the content of the URL.
+
+library(rvest)
+url <- "https://money.cnn.com/data/us_markets/"
+webpage <- read_html(url)
+```
+2. Get the session details (status, type, size) of the above mentioned URL.
+```{r}
+#Running the html_session function on the given URL will give the session details.
+html_session(url)
+```
+3. Extract all of the sector names from the “Stock Sectors” table (bottom left of the web page.)
+```{r}
+#Running html_nodes on the webpage and specifying table and then running html_table on the nodes gives data frames containing the data on the webpage.
+tables <- html_table(html_nodes(webpage, "table"))
+#Three tables are then saved from the page. The second table is the "Stock Sectors" table, so we can then select that one.
+Stock_Sectors <- tables[[2]]
+#The first column of the Stock_Sectors table contains the names.
+Stock_Sectors[,1]
+```
+4. Extract all of the “3 Month % Change” values from the “Stock Sectors” table.
+```{r}
+#The second column of the Stock_Sectors table contains the "3 Month % Change" values.
+Stock_Sectors[,2]
+```
+5. Extract the table “What’s Moving” (top middle of the web page) into a data-frame.
+```{r}
+#We saved the "What's Moving" table in the "table" list of data frames above. It is the first table in the list.
+Whats_Moving <- tables[[1]]
+Whats_Moving
+```
+6. Re-construct all of the links from the first column of the “What’s Moving” table.
+Hint: the base URL is “https://money.cnn.com”
+```{r}
+#I used the selectorgadget to pick out the appropriate CSS selector - in this case it is "tr .wsod_symbol" - and the "href" attribute combined with the base URL recreates the links.
+url_suffixes <- html_attr(html_nodes(webpage, css = "tr .wsod_symbol"), "href")
+paste("https://money.cnn.com", url_suffixes, sep = "")
+```
+7. Extract the titles under the “Latest News” section (bottom middle of the web page.)
+```{r}
+#This bit of code extracts the nodes from the "Latest News" section, and the html_text function extracts the titles.
+html_text(html_nodes(webpage, css = ".HeadlineList a"))
+```
+8. To understand the structure of the data in a web page, it is often useful to know what the underlying attributes are of the text you see.
+Extract the attributes (and their values) of the HTML element that holds the timestamp underneath the “What’s Moving” table.
+```{r}
+#The html_attrs function gives the attributes of the HTML element, which is selected using the SelectorGadget app to find the appropriate css selector.
+html_attrs(html_node(webpage, css = ".wsod_disclaimer span"))
+
+```
+9. Extract the values of the blue percentage-bars from the “Trending Tickers” table (bottom right of the web page.)
+Hint: in this case, the values are stored under the “class” attribute.
+```{r}
+#This time, we want the class attribute of the values that are given after selecting for ".bars" (found using SelectorGadget). This returns a vector of strings with values "bars pctX" where X is the percentage value of the bar. Then the "bars pct" is removed to give just a vector of the values.
+values <- html_attr(html_nodes(webpage, ".bars"), "class")
+as.numeric(gsub("bars pct", "", values))
+```
+10. Get the links of all of the “svg” images on the web page.
+
+```{r}
+#We use html_nodes to select the images, the html_attr function collects the url extensions of the images. Then the images that have a .svg file type are selected, and their urls are collected.
+images <- html_attr(html_nodes(webpage, "img"), "src")
+svg_images <- images[grep("svg", images)]
+paste("https://money.cnn.com", svg_images, sep = "")
+```
\ No newline at end of file