## Elastic Net Regression

Elastic net regression combines the power of ridge and lasso regression into one algorithm. What this means is that with elastic net the algorithm can remove weak variables altogether as with lasso or to reduce them to close to zero as with ridge. All of these algorithms are examples of regularized regression.

This post will provide an example of elastic net regression in Python. Below are the steps of the analysis.

- Data preparation

- Baseline model development

- Elastic net model development

To accomplish this, we will use the Fair dataset from the pydataset library. Our goal will be to predict marriage satisfaction based on the other independent variables. Below is some initial code to begin the analysis.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing

## Data Preparation

We will now load our data. The only preparation that we need to do is convert the factor variables to dummy variables. Then we will make our and y datasets. Below is the code.

In [2]:
genotypeFile = 'genotype_full_1_2.txt'
genotype = pd.read_csv(genotypeFile, sep = '\t', index_col = 0)
print('genotypeFile shape:', genotype.shape )

phenotypeFile = 'phenotype.csv'
multi_pheno = pd.read_csv(phenotypeFile, sep = ',', index_col = 0)
print('Phenotype_Multi shape:', multi_pheno.shape )

genotypeFile shape: (4390, 28220)
Phenotype_Multi shape: (4390, 20)


**NB:** laboratory strain (BY) and an isolate from a vineyard (RM). The original data fields in the yeast genotype profiles were encoded as -1 for BY and 1 for RM. The loss function to be used in this proposed model requires non-negative data fields, we replaced all -1 values with 2 when preprocessing the genotype data.

**Hence: 1 = vineyard strain (RM) and 2 = laboratory strain (BY)**

In [5]:
pheno_df = pd.DataFrame(multi_pheno)
print("The Columns of the Phenotype Dataset:\n",pheno_df.columns,'\n')
pheno_df_3 = pheno_df[[
    '1_CobaltChloride_1', '1_CopperSulfate_1', '1_Diamide_1',
       '1_E6-Berbamine_1', '1_Ethanol_1', '1_Formamide_1',
       '1_Hydroxyurea_1', '1_IndolaceticAcid_1', '1_Lactate_1',
       '1_Lactose_1', '1_MagnesiumChloride_1', '1_ManganeseSulfate_1',
       '1_Menadione_1', '1_Neomycin_1', '1_Raffinose_1', '1_Trehalose_1',
       '1_Xylose_1', '1_YNB_1', '1_YPD_1', '1_Zeocin_1'
]]
print('\nChecking if Any Column has missing Values in the Phenotype Dataset:\n{0}\n\n'.format(pheno_df_3.isnull().sum()))
print("The first 10 records of the Phenotype Dataset:\n")
pheno_df_3.head(10)

The Columns of the Phenotype Dataset:
 Index(['1_CobaltChloride_1', '1_CopperSulfate_1', '1_Diamide_1',
       '1_E6-Berbamine_1', '1_Ethanol_1', '1_Formamide_1', '1_Hydroxyurea_1',
       '1_IndolaceticAcid_1', '1_Lactate_1', '1_Lactose_1',
       '1_MagnesiumChloride_1', '1_ManganeseSulfate_1', '1_Menadione_1',
       '1_Neomycin_1', '1_Raffinose_1', '1_Trehalose_1', '1_Xylose_1',
       '1_YNB_1', '1_YPD_1', '1_Zeocin_1'],
      dtype='object') 


Checking if Any Column has missing Values in the Phenotype Dataset:
1_CobaltChloride_1       222
1_CopperSulfate_1        114
1_Diamide_1               81
1_E6-Berbamine_1          80
1_Ethanol_1              129
1_Formamide_1            122
1_Hydroxyurea_1           93
1_IndolaceticAcid_1      104
1_Lactate_1              628
1_Lactose_1              574
1_MagnesiumChloride_1    127
1_ManganeseSulfate_1      67
1_Menadione_1             92
1_Neomycin_1              86
1_Raffinose_1            227
1_Trehalose_1             79
1_Xylose_1   

Unnamed: 0,1_CobaltChloride_1,1_CopperSulfate_1,1_Diamide_1,1_E6-Berbamine_1,1_Ethanol_1,1_Formamide_1,1_Hydroxyurea_1,1_IndolaceticAcid_1,1_Lactate_1,1_Lactose_1,1_MagnesiumChloride_1,1_ManganeseSulfate_1,1_Menadione_1,1_Neomycin_1,1_Raffinose_1,1_Trehalose_1,1_Xylose_1,1_YNB_1,1_YPD_1,1_Zeocin_1
01_01,-2.253831,-1.588146,0.19493,-1.055938,-0.25037,0.498227,-0.213244,-0.181865,,-0.847586,-0.352481,1.212162,0.335224,-0.665269,-0.37047,-0.674826,-0.816972,17.711068,25.871208,0.763908
01_02,-1.887746,0.542872,0.45154,0.011593,0.103719,0.82866,0.639112,0.66082,,-0.62046,0.394129,-1.942857,1.397952,-0.313936,1.007102,0.493351,-1.412415,18.286691,26.218803,1.272112
01_03,1.047185,0.453067,0.721835,1.645301,0.427616,-0.326177,-0.141772,-0.611875,-0.797737,-0.219193,-0.108411,0.750178,-0.913395,0.419907,-0.072188,-0.346773,0.169568,15.499536,24.49684,0.072323
01_04,2.417437,0.747427,0.454517,1.856809,-0.135731,0.556514,0.197233,0.371108,,0.666068,0.021487,-0.917218,-0.239386,0.744319,0.033719,1.774186,0.6684,17.301076,25.827809,0.676447
01_06,-1.041743,0.180384,0.464474,-0.966225,-0.33803,-0.728221,0.543498,-1.833931,-0.170299,0.08603,0.10812,-1.25163,-0.038772,-0.670791,-0.233617,-0.199903,-0.283471,15.308695,25.513351,0.996027
01_07,1.73438,0.440941,0.380474,-0.049762,0.262329,-1.005624,0.527123,-0.656915,-0.398994,-0.670894,-0.003434,-1.141673,-0.92093,0.834907,-0.827282,-0.433795,0.938031,15.437198,24.154409,-0.812026
01_08,,,,,,,,,,,,,,,,,,19.621357,,
01_09,0.940961,0.797739,0.224067,0.542497,0.623649,0.460779,0.245576,-0.194188,-0.040061,0.217974,0.058288,0.125721,-0.795422,0.783061,1.515246,0.215446,0.155263,19.441247,28.519919,0.263971
01_10,0.106811,0.249607,0.384673,-1.374385,0.17104,-0.223995,-0.145539,0.136811,0.959766,0.262849,0.30471,-1.112984,0.121261,0.218337,0.088029,0.727231,0.134565,16.961701,26.664183,-0.03476
01_11,-1.349368,-0.124124,0.593057,1.097218,-0.362058,0.770479,0.745277,0.560757,0.73657,-0.017763,0.42636,0.559189,-0.764494,-0.12131,0.468228,-0.184995,-0.009737,16.54327,23.18163,1.259029


### Summary of Data Loading

The genotype and phenotpye datasets have been loaded successfully. The **Genotype** are the features that would be used to predict the twenty **phenotypes** using **Elastic Network Model**

- The genotype dataset has no missing values
- The phenotype dataset has missing values in each of the phenotype we desire to predict.

Hence, pre-processing needs to be done to clean the data

## **PART 1: DATA PRE-PROCESSING**

Removing NAN values from phenotype. It is important to note that because the phenotype is a reflection from the genotypes, all NAN phenotypes removed has to be removed from the corresponding genotype columns.

Steps:

1.   Concatenate the phenotpye and genotype dataframes together.
2.   Iterate through the dataframe in step1 removing every row whose **'phenotype'** column has a NAN value.
3.   Separate the genotypes from the phenotype into two distinct datasets where the genotypes are the inputs (x) and phenotype would the output (y). However, we do one-hot encoding on the input(x)
4.   We normalize the output (thus, the phenotypes). 

In [6]:
# take a small part to test code
# genotype
X = genotype
# X = genotype.iloc[0:1000:, 0:5000]
# single_pheno
Y = multi_pheno.iloc[:, :]#index=2 --> 1_E6-Berbamine_1
# Y = multi_pheno.iloc[0:1000, pheno_i]


# # Add noise
# random missing masker
missing_perc = 0.1
nonmissing_ones = np.random.binomial(
    1, 1 - missing_perc, size=X.shape[0] * X.shape[1])
nonmissing_ones = nonmissing_ones.reshape(X.shape[0], X.shape[1])
nonmissing_ones, nonmissing_ones.shape

corrupted_X = X * nonmissing_ones
# corrupted_X.head()

# # Prepare data
# ## One-hot encoding

from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
#X_onehot = to_categorical(X)
corrupted_X_onehot = corrupted_X
# corrupted_X_onehot.shape

# normlization
cols_to_norm = ['1_CobaltChloride_1', '1_CopperSulfate_1', '1_Diamide_1',
       '1_E6-Berbamine_1', '1_Ethanol_1', '1_Formamide_1', '1_Hydroxyurea_1',
       '1_IndolaceticAcid_1', '1_Lactate_1', '1_Lactose_1',
       '1_MagnesiumChloride_1', '1_ManganeseSulfate_1', '1_Menadione_1',
       '1_Neomycin_1', '1_Raffinose_1', '1_Trehalose_1', '1_Xylose_1',
       '1_YNB_1', '1_YPD_1', '1_Zeocin_1']
x_ = multi_pheno[cols_to_norm]
 
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x_)
df = pd.DataFrame(x_scaled,columns=cols_to_norm)
scaled_Y =df


def detect_outliers(df):
    outlier_indices = []

    Q1 = np.percentile(df, 25)
    Q3 = np.percentile(df, 75)
    IQR = Q3 - Q1
    outlier_step = 1.5 * IQR

    outlier_indices = df[(df < Q1 - outlier_step) |
                         (df > Q3 + outlier_step)].index

    return outlier_indices


temp_Y = scaled_Y[~scaled_Y.isna()]
outliers_index = detect_outliers(temp_Y)


# set outliers as NAN
scaled_Y_ = scaled_Y.copy()
scaled_Y_[outliers_index] = np.nan


# ## Split train and test
train_X, test_X, corrupted_train_X, corrupted_test_X, train_Y, test_Y = train_test_split(
    X, corrupted_X_onehot, scaled_Y_.iloc[:], test_size=0.1, random_state = 42)

# split df to train and valid
train_X, valid_X, corrupted_train_X, corrupted_valid_X, train_Y, valid_Y = train_test_split(
    train_X, corrupted_train_X, train_Y, test_size=0.1,random_state = 42)


train_X = train_X.reset_index().drop('SAMID',axis=1)
train_Y = train_Y.iloc[:,0:20].reset_index().drop('index',axis=1)

valid_X = valid_X.reset_index().drop('SAMID',axis=1)
valid_Y = valid_Y.iloc[:,0:20].reset_index().drop('index',axis=1)

test_X = test_X.reset_index().drop('SAMID',axis=1)
test_Y = test_Y.iloc[:,0:20].reset_index().drop('index',axis=1)

x_train = pd.concat([train_X,train_Y],axis=1).dropna().iloc[:,0:-20]
y_train = train_Y.dropna().to_numpy()

x_valid = pd.concat([valid_X,valid_Y],axis=1).dropna().iloc[:,0:-20]
y_valid = valid_Y.dropna().to_numpy()

x_test = pd.concat([test_X,test_Y],axis=1).dropna().iloc[:,0:-20]
y_test  = test_Y.dropna().to_numpy()

2023-04-27 01:30:38.069081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-27 01:30:38.167029: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-27 01:30:38.739486: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/rkannan/miniconda3/envs/richard_tf/lib/
2023-04-27 01:30:38.739542: W tensorflow

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
  scaled_Y_[outliers_index] = np.nan
 

In [7]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(2826, 28220)
(342, 28220)
(2826, 20)
(342, 20)


## Elastic Net

Elastic net, just like ridge and lasso regression, requires normalize data. This argument is set inside the ElasticNet function. The second thing we need to do is create our grid. This is the same grid as we create for ridge and lasso in prior posts. The only thing that is new is the I1_ratio argument.

When the |1_ratio is set to 0 it is the same as ridge regression. When I1_ ratio is set to 1 it is lasso. Elastic net is somewhere between 0 and 1 when setting the I1_ ratio. Therefore, in our grid, we need to set several values of this argument. Below is the code. a hyperparameter “alpha” is provided to assign how much weight is given to each of the L1 and L2 penalties.



##### Hyper-parameter tuning to get the optimal alpha, l1_ratio and l2_ration using GridSearchCV

In [8]:
enet = ElasticNet()
param_grid = [{
    "alpha":[1.0,0.5],
    "l1_ratio":[0.0001,0.001]
}]

grid_search = GridSearchCV(enet,param_grid,cv=10,scoring='neg_mean_squared_error',return_train_score = True,verbose=10)
grid_search.fit(x_train,y_train[:,1:2])#n_iter=4

Fitting 10 folds for each of 4 candidates, totalling 40 fits
[CV 1/10; 1/4] START alpha=1.0, l1_ratio=0.0001.................................
[CV 1/10; 1/4] END alpha=1.0, l1_ratio=0.0001;, score=(train=-0.005, test=-0.006) total time=  29.9s
[CV 2/10; 1/4] START alpha=1.0, l1_ratio=0.0001.................................
[CV 2/10; 1/4] END alpha=1.0, l1_ratio=0.0001;, score=(train=-0.005, test=-0.004) total time=  29.8s
[CV 3/10; 1/4] START alpha=1.0, l1_ratio=0.0001.................................
[CV 3/10; 1/4] END alpha=1.0, l1_ratio=0.0001;, score=(train=-0.005, test=-0.006) total time=  27.6s
[CV 4/10; 1/4] START alpha=1.0, l1_ratio=0.0001.................................
[CV 4/10; 1/4] END alpha=1.0, l1_ratio=0.0001;, score=(train=-0.005, test=-0.007) total time=  27.1s
[CV 5/10; 1/4] START alpha=1.0, l1_ratio=0.0001.................................
[CV 5/10; 1/4] END alpha=1.0, l1_ratio=0.0001;, score=(train=-0.005, test=-0.007) total time=  27.1s
[CV 6/10; 1/4] START alpha=1.

In [12]:
best_enet = grid_search.best_estimator_
best_enet

In [11]:
phenotype_enet_mse = {
    '1_CobaltChloride_1':[], '1_CopperSulfate_1':[], '1_Diamide_1':[],
       '1_E6-Berbamine_1':[], '1_Ethanol_1':[], '1_Formamide_1':[],
       '1_Hydroxyurea_1':[], '1_IndolaceticAcid_1':[], '1_Lactate_1':[],
       '1_Lactose_1':[], '1_MagnesiumChloride_1':[], '1_ManganeseSulfate_1':[],
       '1_Menadione_1':[], '1_Neomycin_1':[], '1_Raffinose_1':[], '1_Trehalose_1':[],
       '1_Xylose_1':[], '1_YNB_1':[], '1_YPD_1':[], '1_Zeocin_1':[]
}
pheno_list = list(phenotype_enet_mse.keys())
pheno_list 

['1_CobaltChloride_1',
 '1_CopperSulfate_1',
 '1_Diamide_1',
 '1_E6-Berbamine_1',
 '1_Ethanol_1',
 '1_Formamide_1',
 '1_Hydroxyurea_1',
 '1_IndolaceticAcid_1',
 '1_Lactate_1',
 '1_Lactose_1',
 '1_MagnesiumChloride_1',
 '1_ManganeseSulfate_1',
 '1_Menadione_1',
 '1_Neomycin_1',
 '1_Raffinose_1',
 '1_Trehalose_1',
 '1_Xylose_1',
 '1_YNB_1',
 '1_YPD_1',
 '1_Zeocin_1']

In [14]:
from sklearn.linear_model import ElasticNet
for itr in range(2):
    print(itr)
    for i in range(20):
        enet = ElasticNet(alpha=0.5, copy_X=True, fit_intercept=True, l1_ratio=0.001,
           max_iter=1000, positive=False, precompute=False,
           random_state=None, selection='cyclic', tol=0.0001, warm_start=False)
        enet.fit(x_train,y_train[:,i:i+1])
        y_pred = enet.predict(x_test)
        #print("slope: %.2f" %enet.coef_[0])
        #print("intercept: %.2f" %enet.intercept_)
        from sklearn.metrics import mean_squared_error
        mse = mean_squared_error(y_test[:,i:i+1],y_pred)
        phenotype_enet_mse[pheno_list[i]].append(mse)
        
        

0
1


In [15]:
phenotype_enet_mse_df = pd.DataFrame(phenotype_enet_mse)
phenotype_enet_mse_df

Unnamed: 0,1_CobaltChloride_1,1_CopperSulfate_1,1_Diamide_1,1_E6-Berbamine_1,1_Ethanol_1,1_Formamide_1,1_Hydroxyurea_1,1_IndolaceticAcid_1,1_Lactate_1,1_Lactose_1,1_MagnesiumChloride_1,1_ManganeseSulfate_1,1_Menadione_1,1_Neomycin_1,1_Raffinose_1,1_Trehalose_1,1_Xylose_1,1_YNB_1,1_YPD_1,1_Zeocin_1
0,0.008818,0.006675,0.00851,0.013871,0.008788,0.004235,0.004863,0.008419,0.005189,0.004772,0.006643,0.011365,0.007803,0.008224,0.006639,0.007602,0.006004,0.006653,0.006712,0.015164
1,0.008818,0.006675,0.00851,0.013871,0.008788,0.004235,0.004863,0.008419,0.005189,0.004772,0.006643,0.011365,0.007803,0.008224,0.006639,0.007602,0.006004,0.006653,0.006712,0.015164
