# Feature Selection Lab

In this lab we will explore feature selection on the [Titanic Dataset](https://www.kaggle.com/c/titanic/data).

We encourage you to conduct EDA across the data before building a logistic regression to determine whether or not a given individual survived. 

You'll then experiment with various feature selection techniques to improve your performance. You'll need the sklearn documentation: http://scikit-learn.org/stable/modules/feature_selection.html

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt


## 1. Import the data and EDA

We'll be working with the titanic datasets - go ahead and import it from the "assets" folder. While you're at it, take some time to do EDA and see what the data looks like! 

## 2. Feature selection

Let's use the `SelectKBest` method in scikit learn to see which are the top 5 features.

- What are the top 5 features for `Xt`?

=> store them in a variable called `kbest_columns`

## 3. Recursive Feature Elimination

`Scikit Learn` also offers recursive feature elimination as a class named `RFECV`. Use it in combination with a logistic regression model to see what features would be kept with this method.

=> store them in a variable called `rfecv_columns`

## 4. Logistic regression coefficients

Let's see if the Logistic Regression coefficients correspond.

- Create a logistic regression model
- Perform grid search over penalty type and C strength in order to find the best parameters
- Sort the logistic regression coefficients by absolute value. Do the top 5 correspond to those above?

=> choose which ones you would keep and store them in a variable called `lr_columns`

## 5. Compare features sets

Use the `best estimator` from question 4 on the 3 different feature sets:

- `kbest_columns`
- `rfecv_columns`
- `lr_columns`
- `all_columns`

Questions:

- Which scores the highest? (use cross_val_score)
- Is the difference significant?

Discuss results.

## Bonus 1

Use a bar chart to display the logistic regression coefficients. Start from the most negative on the left.

## Bonus 2

Use Sebastian Raschka's [MLxtend library](http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/) to implement a feature selection tactic discussed in class: sequential forward or backward search or floating sequential forward/backward search.