# CROSS VALIDATION

Cross validation = splitting the data into testing and training data:
- Gives estimate of performance on an independent dataset
- Serves as check on overfitting)

We want the model to generalize.  Generalization is being able to predict accurately not only for data we trained on but new data we haven’t seen before.  This data is called 'test data', but if we are using it to choose hyper parameters or a model we might call it ‘validation data’.

Usually split data into train/test sets to get an idea of how well a model will generalize.


## Train/Test Split
Train/Test Split >>>PCA >>>SVM  
Training, Transforms, Predicting  

**Train/Test Split **   
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


**Train** 

PCA
- pca.fit(training_features) # only finding the PC's   
- pca.transform(training_features) # transforming data into the PC representation  

SVC
- svc.train(training_features) 


**Test**   

PCA  
- **NO PCA.FIT FOR TEST FEATURES**:Use the SAME PC's found in the training features  
- pca.transform(test_features)   # transform test data into PC representation using same PC's found in training data    

SVC
- svc.predict(test_features)  


### Problem with Train/Test Split
Problem with splitting into training and testing sets is you want to maximize both training and test set for best learning results but there has to be a tradeoff.

Variance Problem: When you test performance of model on one test set, you get one accuracy but if you test it again on another test set you can get a very different accuracy.  So judging accuracy on one test set is not the most relevant way to evaluate model performance.    


## K-FOLD CV

Cross validation for evaluating algorithm performance  
Partition dataset into k bins of equal size  
Run k separate learning experiments  
- pick one of those k bins as your testing set  
- remaining k-1 bins are put together into the training set
- test on testing set 
- Run multiple times and average the 10 different testing set performances   

Idea is train and evaluate model using different train/test split combinations (Cross-validation) within the training data, tune hyper-parameters accordingly until you get an acceptable mean accuracy.  Then apply the very best model on test set to see how it performs on ‘general’ data.  

### **K-Fold Cross Validation Overview**
- Split data into K parts (typical values for K = 5, 8 ,10)  
- Loop K times  
- In each iteration, take 1 part out (use it for validation), use the rest for training  
- Returns K different scores (accuracies)    

### **10-Fold Cross Validation Example**
- Splitting the training set into 10 folds (most of the time k = 10)  
- Train model on 9 folds and test it on the last remaining fold  
- With each iteration can use different combinations of the 9 training folds and 1 test fold  
- Can train and test the model on 10 combinations of training and test sets  
- Take an average of the different accuracies of the 10 evaluations   
- and also compute the standard deviation to look at the variance to get a much better idea of model performance  
    
### **Simpler Variation of K-Folds Cross Validation**  
- Instead of evaluating model on one train/test spit, evaluate model on multiple train/test split combinations and take the overall mean accuracy.  
- Pass in model and all of your data  
- Run model 5 times using different 5 train/test split combinations and returns accuracy for each individual result (Fold) 
 average accuracies together to get an overall error metric  
- Evaluate our model against the entire dataset spit up 5 different ways and give us back the individual results  
- In practice, you need to try different variations of your model and measure the mean accuracy using K-Fold Cross Validation until you find a sweet spot  

### ** K-Fold Cross Validation in sklearn**  
from sklearn.model_selection import cross_val_score  

#Get the 10 accuracies for each one of the 10 combinations that will be created through k-fold cross validation  
accuracies = cross_val_score(estimator = classifier, X = X_train,y= y_train, cv = 10)   # most common choice is 10 folds  

---


# IMPROVING PERFORMANCE 

## PARAMETER TUNING
Two types of parameters:    

### **1. Parameters that the model learned**   

### **2. Hyper-parameters ** 
Parameters that we chose ourselves   
Find optimal values of these parameters (Grid Search)
Popular method for choosing hyper parameters is k-folds cross validation

### GridSearchCV 
Cross validation for parameter tuning.  
Way of systematically working through multiple combos of parameter tunes, cross-validating as it goes, to determine which tune gives the best performance.  

**GridSearchCV in sklearn**  
parameters = {'kernel':('linear','rbf'),'C':[1,10]}    
svr = svm.SVC()    
clf = grid_search.GridSearch(svr, parameters)    
clf.fit(iris.data, iris.target)    


## **XG BOOST**  
Most powerful implementation of gradient boosting in terms of model performance and execution speed  
1.	High performance  
2.	Fast execution speed  
3.	Keep interpretation of problem and model (no feature scaling) 
 

## **ENSEMBLE LEARNING**  
Using different models to try to solve the same problem and let them vote on the results (Random Forests)  

Ensembling techniques take a number of weak learners (classifiers/regressors that are barely better than guessing) combine them (through averaging or max vote) to create a strong learner that can make accurate predictions  

### **1.Bagging (Bootstrap aggregating)**  
- Take random subsets (bootstrap samples) of the training data and feed them into different versions of the same model and let them all vote on the final result  
- Bootstrapping is a type of resampling where large numbers of smaller samples of the same size are repeatedly drawn, with replacement, from a single original sample  
- Random forest uses bagging to implement ensemble learning  

### **2.Boosting**   
- Alternative technique where each subsequent model in the ensemble boost attributes that address data mis-classified by the previous model   
- Each model in the ensemble boosts (give more weight to) attributes mis-classified in the previous model so that subsequent models give more focus to them during training  
- Keep refining model based on the weaknesses of the previous one  
- Uses all the data to train each learner but instances that were misclassified by the previous learners are given more weight so that subsequent learners give more focus to them during training  

### **3.A Bucket of Models:**  
- trains several different models using training data and picks the one that works best with the test data  
- Take entirely different models (for example: Kmeans, decision tree, and regression), run all three models together a on a set of training data and let them all vote on a final classification result   
- Pick the model that wins    

### **4.Stacking ** 
- runs multiple models at once on the data and combines the results of all those models together to arrive at a final result  