### Try-it 9.2: Predicting Wages

This activity is meant to summarize your work with regularized regression models.  You will use your earlier work with data preparation and pipelines together with what you've learned with grid searches to determine an optimal model.  In addition to the prior strategies, this example is an excellent opportunity to utilize the `TransformedTargetRegressor` estimator in scikitlearn.

### The Data

This dataset is loaded from the openml resource library.  Originally from census data, the data contains wage and demographic information on 534 individuals. From the dataset documentation [here](https://www.openml.org/d/534)

```
The Current Population Survey (CPS) is used to supplement census information between census years. These data consist of a random sample of 534 persons from the CPS, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. 
```

In [14]:
from sklearn.datasets import fetch_openml
import warnings

warnings.filterwarnings("ignore")

In [15]:
wages = fetch_openml(data_id=534, as_frame=True)

In [16]:
wages.frame.head()

Unnamed: 0,EDUCATION,SOUTH,SEX,EXPERIENCE,UNION,WAGE,AGE,RACE,OCCUPATION,SECTOR,MARR
0,8,no,female,21,not_member,5.1,35,Hispanic,Other,Manufacturing,Married
1,9,no,female,42,not_member,4.95,57,White,Other,Manufacturing,Married
2,12,no,male,1,not_member,6.67,19,White,Other,Manufacturing,Unmarried
3,12,no,male,4,not_member,4.0,22,White,Other,Other,Unmarried
4,12,no,male,17,not_member,7.5,35,White,Other,Other,Married


#### Task

Build regression models to predict `WAGE`.  Incorporate the categorical features and transform the target using a logarithm.  Build `Ridge` models and consider some different amounts of regularization.  

After fitting your model, interpret the model and try to understand what features led to higher wages.  Consider using `permutation_importance` that you encountered in module 8.  Discuss your findings in the class forum.

For an in depth example discussing the perils of interpreting the coefficients see the example in scikitlearn examples [here](https://scikit-learn.org/stable/auto_examples/inspection/plot_linear_model_coefficient_interpretation.html).

In [26]:
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score,PredictionErrorDisplay
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression, RidgeCV, Lasso
from sklearn.preprocessing import PolynomialFeatures, StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer, TransformedTargetRegressor
from sklearn.inspection import permutation_importance
from sklearn.feature_selection import SequentialFeatureSelector
import numpy as np
import plotly.express as px
import pandas as pd
import warnings

In [27]:
X = wages.data
y = wages.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [35]:
wages

{'data':      EDUCATION SOUTH     SEX  EXPERIENCE       UNION  AGE      RACE  \
 0            8    no  female          21  not_member   35  Hispanic   
 1            9    no  female          42  not_member   57     White   
 2           12    no    male           1  not_member   19     White   
 3           12    no    male           4  not_member   22     White   
 4           12    no    male          17  not_member   35     White   
 ..         ...   ...     ...         ...         ...  ...       ...   
 529         18    no    male           5  not_member   29     White   
 530         12    no  female          33  not_member   51     Other   
 531         17    no  female          25      member   48     Other   
 532         12   yes    male          13      member   31     White   
 533         16    no    male          33  not_member   55     White   
 
        OCCUPATION         SECTOR       MARR  
 0           Other  Manufacturing    Married  
 1           Other  Manufacturin

In [33]:
X_train

Unnamed: 0,EDUCATION,SOUTH,SEX,EXPERIENCE,UNION,AGE,RACE,OCCUPATION,SECTOR,MARR
5,13,no,male,9,member,28,White,Other,Other,Unmarried
116,11,no,male,11,not_member,28,White,Other,Construction,Unmarried
45,7,yes,female,15,not_member,28,White,Other,Manufacturing,Married
444,16,yes,male,13,not_member,35,Other,Professional,Other,Married
298,12,no,female,0,not_member,18,White,Clerical,Other,Unmarried
...,...,...,...,...,...,...,...,...,...,...
71,14,no,male,20,member,40,White,Other,Other,Married
106,14,no,male,21,member,41,White,Other,Other,Married
270,12,no,female,38,not_member,56,White,Clerical,Other,Married
435,18,no,male,8,not_member,32,White,Professional,Other,Married


In [34]:
y_train

5      13.07
116     3.75
45      6.00
444    17.50
298     5.00
       ...  
71     16.00
106    26.00
270     9.65
435    22.20
102     6.50
Name: WAGE, Length: 357, dtype: float64