# Feature Selection

1. Use intuition- which features do you think will have useful information?
2. Code up feature
3. Visualize feature - does this feature have discriminatory power?
4. Repeat

### BEWARE OF BUGS WHEN CREATING FEATURES

#### Why get rid of features?

- strongly related to another feature it is already present (multicollinearity)
- it is noisy
- feature causing overfitting
- additional features slowing down train/test process

##### Takeaway: features != information

Want bare minimum of features that provide information. Get rid of features that do not provide much value.

#### 2 important univariate feature selection tools in sklearn: 

- SelectPercentile: selects the X% of features that are most powerful (where X is a parameter)
- SelectKBest: selects the K features that are most powerful (where K is a parameter).

Text learning is a good candidate for feature selection, since the data has such high dimensionality.

In [5]:
from sklearn.feature_selection import f_classif,SelectPercentile,SelectKBest

In [6]:
# f_classif is the anova f-value
selector = SelectPercentile(f_classif, percentile=10) #get top 10 percent of features 

### TfIdfVectorizer 

Key Terms:
- <b/>bmax_df</b>: removes word that occur in greater frequency than threshold set here
    - these words are considered corpus specific stopwords
    
    
### Bias-Variance Dilema Revisted

#### <b/> High Bias Algorithm: </b>
    - pays little attention to data, oversimplified
    - generalizes data
    - does some thing over and over again
    - high error on training set (low r^2, high SSE for regression)
 
#### <b/> High Variance Algorithm: </b>
    - pays too much attention to data, does not generalize it
    - overfits
    - memorizing training examples
    - low error on training set (high r^2, low SSE for regression), high error on test set

##### Example 1 : few features can lead into high-bias type regime

You need several features, but only use 2 or 3, so not paying as much attention to data, oversimplifying things == high bias
 
##### Example 2 : carefully minimized SSE

Tradeoff between goodness of fit and simplicity of fit. Want to fit algorithm with optimal number of features

### Regularization: 

#### Process of finding point at which quality of model is maximized, optimal trade-off between number of features and error

##### Regularization in Regression: automatically penalizing extra features

<b/> Lasso Regression </b>:
    - in addition to minimize SSE, want to minimize features
    - automatically takes into account penalty paramter so finds out which features have the highest importance
    - weights features of little/no importance to 0, via changing coeffecient to 0


In [7]:
from sklearn import linear_model

In [10]:
lasso = linear_model.Lasso()

In [12]:
lasso.fit(x_train,y_train)
pred = lasso.predict(x_test)