cohortMLAssignment.Rmd

---
title: "Machine Learning on Your Cohort"
author: "Ted Laderas (and you!)"
date: "May 16, 2017"
output: html_document
---

In this assignment, you will run multiple machine learning algorithms on your cohort to see if you can improve on the regular logistic regression process. You will use this Markdown document to record your observations and code.

The first step is to load the data (in the `/data/` folder). Since we are training first, we'll load our training data here.

```{r}
##Write your data loading code here
##Do some EDA as well and show any relevant plots
library(caret)
library(dplyr)
library(rpart)
library(ranger)
library(e1071)
library(data.table)
trainDataFull <- fread("data/fullDataTrainSet.csv", stringsAsFactors = TRUE)
testDataFull <- fread("data/fullDataTestSet.csv", stringsAsFactors = TRUE)

##put your cohort selection filter here
trainDataFullCohort <- trainDataFull %>% filter(age > 55)
testDataFullCohort <- testDataFull %>% filter(age > 55)


##Some questions to think about: What is the distribution of cvd in the training set and the test set? Are there any concerns?
```

## Initial Model

Specify your model (outcome variable and predictor variables). Think very carefully about whether variables have a relationship with each other before you add them to the model.

Now you can train your learner on the training data. What is the class that we're training on? How do we do this? Adapt the code that we used on the Iris dataset here. 

```{r}
##Write your training code here.
```

## Initial Model Predictions

Now predict on the test set.

```{r}
##Write your lda code here.
```

## Which method did the best?

Compare accuracies if you are running multiple methods here.

```{r}
##write any plotting code here.

```

## Get Confused

Look at the confusion matrix here and compare the kinds of mistakes each method makes here.

Evaluate your learner on the test set. How well did it do? Does the test data give you any confidence that you built a good learner? What is your interpretation of the results? Do you think the problem is an easy or hard one? How did the learner do?

```{r}
##write your confusion matrix code here.
```

## Score Your Learners and Cohort

Add your findings to the Google Sheet here: http://bit.ly/2qOmwSl