<b>Embedded methods combine the qualities' of filter and wrapper methods. It's implemented by algorithms that have their own built-in feature selection methods. Some of the most popular examples of these methods are LASSO and RIDGE regression which have inbuilt penalization functions to reduce overfitting<b>

In [1]:
import numpy as np
import pandas as pd

df=pd.read_csv('https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv')
X=df.drop('Wine',axis=1)
y=df['Wine']
df.head()

Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


### LASSO Regularization (L1)

In [2]:
from sklearn.linear_model import LogisticRegression
lgs=LogisticRegression(C=1,penalty='l1',solver='liblinear').fit(X,y)
lgs.feature_names_in_

array(['Alcohol', 'Malic.acid', 'Ash', 'Acl', 'Mg', 'Phenols',
       'Flavanoids', 'Nonflavanoid.phenols', 'Proanth', 'Color.int',
       'Hue', 'OD', 'Proline'], dtype=object)

In [3]:
coef = lgs.coef_[0]
imp_features = pd.Series(X.columns)[list(coef!=0)]
imp_features

0        Alcohol
1     Malic.acid
2            Ash
3            Acl
4             Mg
6     Flavanoids
9      Color.int
11            OD
12       Proline
dtype: object

### Random Forest Importance

In [4]:
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier(n_estimators=18).fit(X,y)
rfc.feature_importances_

array([0.14960138, 0.02992175, 0.03338813, 0.05153613, 0.03315087,
       0.04510897, 0.15396848, 0.01546707, 0.02875178, 0.16939538,
       0.01460131, 0.16932549, 0.10578326])

In [5]:
final=pd.Series(rfc.feature_importances_,X.columns)
final

Alcohol                 0.149601
Malic.acid              0.029922
Ash                     0.033388
Acl                     0.051536
Mg                      0.033151
Phenols                 0.045109
Flavanoids              0.153968
Nonflavanoid.phenols    0.015467
Proanth                 0.028752
Color.int               0.169395
Hue                     0.014601
OD                      0.169325
Proline                 0.105783
dtype: float64

In [6]:
important=rfc.feature_importances_
final_df=pd.DataFrame({"Features":pd.DataFrame(X).columns, "Importances":important})
final_df

Unnamed: 0,Features,Importances
0,Alcohol,0.149601
1,Malic.acid,0.029922
2,Ash,0.033388
3,Acl,0.051536
4,Mg,0.033151
5,Phenols,0.045109
6,Flavanoids,0.153968
7,Nonflavanoid.phenols,0.015467
8,Proanth,0.028752
9,Color.int,0.169395


### knn

In [7]:
from sklearn.neighbors import KNeighborsClassifier
knc=KNeighborsClassifier(n_neighbors=4).fit(X,y)
knc.classes_

array([1, 2, 3])