# Exploratory Data Analysis

What visualizations did you use to initially look at your data? What insights did you gain? How did these insights inform your design?

## Factor Effects

In order to provide details on the factor effects, we used two approaches. Overall effects were provided by the Sci-kit Learn [Random Forest](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) classification, specifically the feature_importance property. In order to extract factors per school, we used Linear Mixed Model analysis using the following R code:

```
library(lmerTest)
library(RJSONIO)
##this is just one example of how LMER may be used to investigate
##college-by-college differences. 

datanormed<-read.csv("../client/data/collegedata_normalized.csv")

factors = c('canAfford','female','MinorityGender','MinorityRace','international','sports', 'earlyAppl')
effects = list()

for (i in factors) {
  f <- paste("acceptStatus~", i, "+(1+", i, "|name)")
  print (f)
  model<-glmer(as.formula(f),family="binomial",data=datanormed)
  df<-data.frame(matrix(nrow=25,ncol=2))
  df$names<-row.names(coef(model)$name)
  df$vals<-coef(model)$name[,2]
  df$X1 = NULL
  df$X2 = NULL
  effects[[i]] = df
}

outJS = toJSON(effects)
write(outJS,"../client/data/factors.json")
```

We sketched a number of different means of comparing factor effects such as bar charts, maps, treemaps, ultimately deciding that a heatmap provided the best contrast and least cognitive load of interpretability for our audience.

## Predictions

For the predictions, we started with a line graph so the applicant could easily see relative positioning between the different schools. After even a little user testing, we switched to a more intuitive scatterplot approach.