# ML Project on Higgs Dataset

Plan:
- We only use imported data without any modification
- We then cleaned it by removing all samples containing at least one Nan value
- We then partioned our dataset into 4 subgroups, depending on the value of feature PRI_jet_num. We then standardized each subgroup individually.

## Import useful commands

In [9]:
# Useful starting lines
%matplotlib inline
import datetime
import numpy as np
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Load the training data into feature matrix, class labels, and event ids:

In [10]:
from proj1_helpers import *
from implementations import*
DATA_TRAIN_PATH = '../data/train.csv/train.csv'
y, x, ids = load_csv_data(DATA_TRAIN_PATH)

## Pre-processing of data

### Removing Nan values in dataset

By analysing the Higgs dataset, we saw that there were a lot of missing values, corresponding to the -999.0 value. We thus decided to remove the samples that had at least one -999.0 outlier value.

In [11]:
selector = np.all(x != -999.0, axis=1)

x_clean = x[selector]
y_clean = y[selector]
print("We removed", (1-x_clean.shape[0]/x.shape[0])*100, "% of our training dataset.")

We removed 72.7544 % of our training dataset.


We see that too many data samples are lost in x_clean. So it is not a good idea to remove these rows. 

### Replacing Nan values in dataset 

Thus we chose an alternative way to deal with the Nan values. We decided to replace each NaN value by the mean of the feature it is in. The mean was computed without taking into account the Nan values in the feature. This is a standardization concept.

In [72]:
def standardize_NAN(x):
    x_nan = x.copy()
    x_nan[x_nan==-999.0] = np.nan
    return (standardize(x_nan))

# All the Nan (corresponding to unknown values) were replaced by the mean value of the feature it is in.
def replace_mean(x_nan):
    means_cols = np.nanmean(x_nan, axis=1)
    is_nan = np.isnan(x_nan)
    for col in range(x_nan.shape[1]):
        x_nan[is_nan[:, col], col] = means_cols[col]
    return (x_nan)

Below is the corresponding dataset:

In [73]:
x_nan, mean_x_nan, std_x_nan = standardize_NAN(x)
x_nan = replace_mean(x_nan)

### Dealing with the outliers of the dataset

We also assumed that the dataset can have some ouliers. So to deal with them we implemented some methods that can remove the datasamples where some percentile is trespassed.

In [14]:
def get_ind_percentiles(tX, tX_clean, i, percentile):
    arguments = []
    a = np.percentile(tX_clean[:,i],percentile)
    tX_perc = tX.copy()
    arguments = np.argwhere(tX_perc[tX[:,i] > round(a, 2)])
    return list(set(arguments[:,0]))

def remove_rows_by_percentiles(tX,tX_clean):
    args = []
    for i in range(tX.shape[1]):
        args= args+get_ind_percentiles(tX,tX_clean,i,99.97)
    flat_list = [item for item in args]
    mylist = list(set(flat_list))
    return mylist

Below is the corresponding dataset:

In [15]:
arg = remove_rows_by_percentiles(x,x_clean)
x_perc = np.delete(x, arg, axis=0)
y_perc = np.delete(y, arg, axis=0)

### Partitionning of dataset, based on PRI_jet_num

While doing some data analysis, we saw that a specific column was only composed of 4 discrete values. This column was the "PRI_jet_num" feature.

In [16]:
#Feature names and their respective indices
string_features = 'DER_mass_MMC,DER_mass_transverse_met_lep,DER_mass_vis,DER_pt_h,DER_deltaeta_jet_jet,DER_mass_jet_jet,DER_prodeta_jet_jet,DER_deltar_tau_lep,DER_pt_tot,DER_sum_pt,DER_pt_ratio_lep_tau,DER_met_phi_centrality,DER_lep_eta_centrality,PRI_tau_pt,PRI_tau_eta,PRI_tau_phi,PRI_lep_pt,PRI_lep_eta,PRI_lep_phi,PRI_met,PRI_met_phi,PRI_met_sumet,PRI_jet_num,PRI_jet_leading_pt,PRI_jet_leading_eta,PRI_jet_leading_phi,PRI_jet_subleading_pt,PRI_jet_subleading_eta,PRI_jet_subleading_phi,PRI_jet_all_pt'
features = string_features.split(",")
dict = {}
for ind, feat in enumerate(features):
    dict[feat] = ind

We thus partitionned our dataset in the 4 different subgroups, corresponding to the value of the PRI_jet_num. This way, when doing the standardization on the dataset, we didn't bias our samples: indeed, the 4 groups have different kind of means and standard deviations values, which we used when standardizing each subgroup.

In [108]:
#Subgrouping
def subgrouping(x,ids,dict):
    x_0=x[x[:,dict['PRI_jet_num']]==0]
    x_1=x[x[:,dict['PRI_jet_num']]==1]
    x_2=x[x[:,dict['PRI_jet_num']]==2]
    x_3=x[x[:,dict['PRI_jet_num']]==3]
    x_0 = np.delete(x_0,dict['PRI_jet_num'],1)
    x_1 = np.delete(x_1,dict['PRI_jet_num'],1)
    x_2 = np.delete(x_2,dict['PRI_jet_num'],1)
    x_3 = np.delete(x_3,dict['PRI_jet_num'],1)
    x_list = [x_0, x_1, x_2, x_3]

    ids_0=ids[x[:,dict['PRI_jet_num']]==0]
    ids_1=ids[x[:,dict['PRI_jet_num']]==1]
    ids_2=ids[x[:,dict['PRI_jet_num']]==2]
    ids_3=ids[x[:,dict['PRI_jet_num']]==3]
    ids_list = [ids_0]
    ids_list.append(ids_1)
    ids_list.append(ids_2)
    ids_list.append(ids_3)
    

    #Standardization of subgroups
    mean = []
    std = []
    x_nan_replaced = []
    for i in range(4):
        x_arr,m,s = standardize_NAN(x_list[i])
        print(i, m, s)
        x_nan_replaced.append(replace_mean(x_arr))
        mean.append(m)
        std.append(s)
    return x_nan_replaced, ids_list
    
#Grouping them back again
def group(l,ids,dict):
    ls = l.copy()
    for i in range(4):
        ls[i] = np.insert(ls[i],dict['PRI_jet_num'],np.ones((len(ids),1))*i,axis=1)
    data_ord = np.insert(ls[0],0,ids[0], axis=1)
    for i in range(1,4):
        a = np.insert(ls[i],0,ids[i], axis=1)
        data_ord = np.concatenate((data_ord, a))
    x_new = data_ord[data_ord[:,0].argsort()]
    
    print(f"groups: tx {np.mean(x_new, axis=0)}{np.std(x_new,axis=0)}")
    
    return x_new[:,1:]


Below is the corresponding dataset:

In [109]:
x_subgroups_list, ids_list = subgrouping(x,ids,dict)
x_subgroups = group(x_subgroups_list, ids_list,dict)

0 [ 1.20667654e+02  5.87862388e+01  8.18703092e+01  1.38238670e+01
  0.00000000e+00  0.00000000e+00  0.00000000e+00  2.66496128e+00
  1.38238669e+01  7.63770107e+01  1.39276344e+00 -9.10076857e-01
  0.00000000e+00  3.40127231e+01 -2.48576361e-02 -1.56573619e-02
  4.23642875e+01 -5.23114410e-02  4.23519862e-02  3.15367606e+01
 -2.44434558e-02  1.25860810e+02  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00] [51.74971089 32.003391   38.04347886 16.67462352  0.          0.
  0.          0.69329161 16.67462348 23.56093879  0.5815929   0.93670201
  0.         15.2294643   1.23342423  1.81734217 14.58590597  1.31084863
  1.81783473 20.29443907  1.81099743 53.0863344   0.          0.
  0.          0.          0.          0.          0.        ]
1 [ 1.22182109e+02  4.60536000e+01  8.22190327e+01  6.59030903e+01
  0.00000000e+00  0.00000000e+00  0.00000000e+00  2.33968637e+00
  1.66449998e+01  1.50368035e+02  1.44418464e+00  2.356

### Feature engineering

When displaying all features that had Nan Values, we saw that many sample of our dataset had to replace their Nan value ba the mean value of the feature, as explained above in the standardization part. The problem is that having a mean value for 75% of our datasample is bad for some features. That's why we decided to remove some of them, and play only with the features that are essential to our model.

Indeed, the columns that were removed were all features that derived from some primitive ones. And as most of them had many Nan values, we assumed it was a good idea to remove them and see how our model will predict.

In [83]:
selected_columns0 = [1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 29]
selected_columns1 = [1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 29]
selected_columns_ideal = [0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 29]

def selected_non_nan_columns(x):
    x_selected = np.zeros((len(x), len(selected_columns0)))
    for i in range(len(x)):
        s = np.take(x[i], indices=selected_columns0, axis=0)
        x_selected[i] = s
    return x_selected

Below is the corresponding dataset:

In [84]:
x_s = selected_non_nan_columns(x)

We then build polynomial features, meaning we expanded the number of features we had by adding features with an incremeneted degree.

In [85]:
def build_poly(x, degree):
    nb_features = x.shape[1]
    nb_samples = x.shape[0]
    x_poly = np.ones((nb_samples, 1))
    for d in range(1, degree+1):
        x_d = x**d
        x_poly = np.hstack((x_poly, x_d))
    return x_poly

Below is the corresponding dataset:

In [86]:
x_poly = build_poly(x_s, 2)

## Implementations of the different ML algorithms

### 1. Least Squares Gradient Descent

In [24]:
def cross_validation_LS_GD_demo(x_LS,y_LS,K): 
    #Adding constant term
    tX_LS = np.c_[np.ones((y_LS.shape[0], 1)), x_LS]

    max_iters = 100
    gammas = np.logspace(-4,0,20)

    # Initialization
    w_initial = np.zeros(tX_LS.shape[1])

    list_tX_LS = np.split(tX_LS,K)
    list_y_LS = np.split(y_LS,K)

    gen_opt_w=[]
    gen_mse =[]

    #gamma selection
    for ind, gamma in enumerate(gammas):
        weights=[]
        mse_errors = []
        #K-fold crossvalidation
        for ind, tX_bloc in enumerate(list_tX_LS):
            tX_test = tX_bloc
            y_test = list_y_LS[ind]
            tX_train= list_tX_LS[:ind] + list_tX_LS[ind+1:]
            tX_train= np.concatenate(tX_train)
            y_train= list_y_LS[:ind] + list_y_LS[ind+1:]
            y_train=np.concatenate(y_train)
        
            mse, opt_w = least_squares_GD(y_train, tX_train, w_initial, max_iters, gamma)
            mse_errors.append(compute_mse(y_test, tX_test,opt_w))
            weights.append(opt_w)
        gen_mse.append(np.mean(mse_errors))
        gen_opt_w.append(np.mean(weights, axis=0))
    optimal_gamma_LS_GD = gammas[np.nanargmin(gen_mse)]
    optimal_weights_LS_GD = gen_opt_w[np.nanargmin(gen_mse)]
    mse_LS_GD = np.nanmin(gen_mse)
    print("   gamma={l:.3f}, mse={mse:.3f}".format(mse = mse_LS_GD, l = optimal_gamma_LS_GD))

    #Training Accuracy
    y_predicted = predict_labels(optimal_weights_LS_GD, tX_LS)
    accuracy = (list(y_LS == y_predicted).count(True))/len(y_LS)
    print("   accuracy={acc:.3f}".format(acc=accuracy))
    #return accuracy,optimal_gamma_LS_GD, optimal_wights_LS_GD,mse_LS_GD


### 2. Least Squares Stochastic Gradient Descent

In [25]:
def cross_validation_LS_SGD_demo(x_LS,y_LS,K):
    #Adding constant term
    tX_LS = np.c_[np.ones((y_LS.shape[0], 1)), x_LS]

    max_iters = 50
    max_batch_size = 32
    gammas = np.logspace(-4,0,20)
    batch_sizes = np.array([2,4,6,8])

    # Initialization
    w_initial = np.zeros(tX_LS.shape[1])
    list_tX_LS = np.split(tX_LS,K)
    list_y_LS = np.split(y_LS,K)


    result_mse =[]
    result_opt_w=[]
    result_gamma=[]
    for ind_batch,batch_size in enumerate(batch_sizes):  
        result_mse_gamma = []
        result_opt_w_gamma = []
        for ind_gamma,gamma in enumerate(gammas):
            mse_errors=[]
            weights=[]
            #K-fold crossvalidation
            for ind, tX_bloc in enumerate(list_tX_LS):
                tX_test = tX_bloc
                y_test = list_y_LS[ind]
                tX_train= list_tX_LS[:ind] + list_tX_LS[ind+1:]
                tX_train= np.concatenate(tX_train)
                y_train= list_y_LS[:ind] + list_y_LS[ind+1:]
                y_train=np.concatenate(y_train)
        
                sgd_mse, opt_w = least_squares_SGD(y_train, tX_train, w_initial, batch_size, max_iters, gamma)
                mse_errors.append(compute_mse(y_test, tX_test,opt_w))
                weights.append(opt_w)
    
            result_mse_gamma.append(np.mean(mse_errors))
            result_opt_w_gamma.append(np.mean(weights,axis=0))
        result_mse.append(np.min(result_mse_gamma))
        result_gamma.append(gammas[np.argmin(result_mse_gamma)])
        result_opt_w.append(result_opt_w_gamma[np.argmin(result_mse_gamma)])

    print("   gamma={l:.3f}, batch={b:.2f}, mse={mse:.3f}".format(mse = np.nanmin(result_mse), l =result_gamma[np.nanargmin(result_mse)], b=np.nanargmin(result_mse)))

    optimal_weights_LS_SGD = result_opt_w[np.nanargmin(result_mse)]
    

    #Training Accuracy
    y_predicted = predict_labels(optimal_weights_LS_SGD, tX_LS)
    accuracy = (list(y_LS == y_predicted).count(True))/len(y_LS)
    print("   accuracy={acc:.3f}".format(acc=accuracy))

### 3. Least Squares

In [26]:
def compute_least_squares(x, y):
    x_LS= x.copy()
    y_LS= y.copy()
    weights=[]
    mse_errors = []
    opt_w = []
    K_values = return_factors(len(x_LS))
    #K-fold crossvalidation
    for K in K_values:
        #Initialization
        list_x_LS = np.split(x_LS,K)
        list_y_LS = np.split(y_LS,K)
        for ind, x_bloc in enumerate(list_x_LS):
            x_test = x_bloc
            y_test = list_y_LS[ind]
            x_train = np.concatenate(list_x_LS[:ind] + list_x_LS[ind+1:])
            y_train = np.concatenate(list_y_LS[:ind] + list_y_LS[ind+1:])
            mse_LS, optimal_weights_LS = least_squares(y_train,x_train)
            mse_errors.append(compute_mse(y_test, x_test, optimal_weights_LS))
            weights.append(optimal_weights_LS)

    opt_w = weights[np.argmin(mse_errors)]
    y_model = predict_labels(opt_w, x_LS)

    #Computing accuracy
    print("   mse={mse}".format(mse = mse_LS))
    accuracy = (list(y_model == y_LS).count(True))/len(y_model)
    print("   accuracy={acc:.3f}".format(acc=accuracy))

### 4. Ridge Regression

In [None]:
def cross_validation_demo_RR(x,y,K=4):
    seed = 1
    degree = 4
    k_fold = K
    lambdas = np.logspace(-4, 0, 20)
    
    # split data in k fold
    k_indices = build_k_indices(y, k_fold, seed)
    x_test = x[k_indices[0]]
    x_train = np.delete(x, [k_indices[0]], axis=0)
    
    # define lists to store the loss of training data and test data
    rmse_tr = []
    rmse_te = []
    
    for i in range(len(lambdas)):
        l = lambdas[i]
        avg_err_tr = 0
        avg_err_te = 0
        for k in range(k_fold):
            err = cross_validation_rr(y, x, k_indices, k, l, degree)
            avg_err_tr += err[0]
            avg_err_te += err[1]
        rmse_tr.append(np.sqrt(2 * avg_err_tr / k_fold))
        rmse_te.append(np.sqrt(2 * avg_err_te / k_fold))
    cross_validation_visualization(lambdas, rmse_tr, rmse_te)
    
    min_err_index = 0
    for i in range(1, len(rmse_te)):
        if rmse_te[i] < rmse_te[min_err_index]:
            min_err_index = i
            
    print('Best lambda is: {0}'.format(lambdas[min_err_index]))       
    return lambdas[min_err_index]

In [None]:
def ridge_regression_demo(x_s,x,y,degree_opt=4):
    x_poly = build_poly(x_s, degree_opt)
    lambda_opt = cross_validation_demo_RR(x,y,K)
    w_rr_opt, loss_tr = ridge_regression_s(y_s, x_poly, lambda_opt)
    print("Training set mse: {}".format(loss_tr))

    #Training Accuracy
    y_predicted = predict_labels(w_rr_opt, x_poly)
    accuracy = []
    accuracy.append((list(y_clean == y_predicted).count(True))/len(y_clean))
    print("accuracy = {val}".format(val=accuracy))

### 5. Logistic Regression

In [94]:
def cross_validation_LR_demo(x_LR,y_LR,K):
    #Adding constant term
    tX_LR = np.c_[np.ones((y_LR.shape[0], 1)), x_LR]
    max_iters = 100
    gammas = np.logspace(-4,0,20)

    # Initialization
    w_initial = np.zeros(tX_LR.shape[1])

    list_tX_LR = np.split(tX_LR,K)
    list_y_LR = np.split(y_LR,K)

    gen_opt_w=[]
    gen_loss =[]

    #gamma selection
    for ind, gamma in enumerate(gammas):
        weights=[]
        loss_errors = []
        #K-fold crossvalidation
        for ind, tX_bloc in enumerate(list_tX_LR):
            tX_test = tX_bloc
            y_test = list_y_LR[ind]
            tX_train= list_tX_LR[:ind] + list_tX_LR[ind+1:]
            tX_train= np.concatenate(tX_train)
            y_train= list_y_LR[:ind] + list_y_LR[ind+1:]
            y_train=np.concatenate(y_train)
            loss, opt_w = logistic_regression(y_train,tX_train,w_initial, max_iters, gamma)
            loss_errors.append(calculate_loss_logistic_reg(y_test, tX_test,opt_w))
            weights.append(opt_w)
        
        gen_loss.append(np.mean(loss_errors))
        gen_opt_w.append(np.mean(weights, axis=0))

    optimal_gamma_LR = gammas[np.nanargmin(gen_loss)]
    optimal_weights_LR = gen_opt_w[np.nanargmin(gen_loss)]
    print("   gamma={l:.3f},loss={loss:.3f}".format(loss = np.min(gen_loss), l = optimal_gamma_LR))

     #Training Accuracy
    y_predicted = predict_labels(optimal_weights_LR, tX_LR)
    accuracy = (list(y_predicted == y_LR).count(True))/len(y_LR)
    print("   accuracy={acc:.3f}".format(acc=accuracy))


### 6. Regularized Logistic Regression

In [98]:
def cross_validation_LRR_demo(x_LRR,y_LRR,K):
    #Adding constant term
    tX_LRR = np.c_[np.ones((y_LRR.shape[0], 1)), x_LRR]

    max_iters = 50
    #lambdas = np.logspace(-4,0,10)
    lamdas = [0.006]
    #gammas = np.logspace(-4,0,20)
    gammas = [1e-4]
    # Initialization
    w_initial = np.zeros(tX_LRR.shape[1])
    list_tX_LRR = np.split(tX_LRR,K)
    list_y_LRR = np.split(y_LRR,K)

    result_loss =[]
    result_opt_w=[]
    result_gamma=[]
    for ind,lambda_ in enumerate(lambdas):  
        result_loss_gamma = []
        result_opt_w_gamma = []
        for ind_gamma,gamma in enumerate(gammas):
            loss_errors=[]
            weights=[]
            #K-fold crossvalidation
            for ind, tX_bloc in enumerate(list_tX_LRR):
                
                tX_test = tX_bloc
                y_test = list_y_LRR[ind]
                tX_train= list_tX_LRR[:ind] + list_tX_LRR[ind+1:]
                tX_train= np.concatenate(tX_train)
                y_train= list_y_LRR[:ind] + list_y_LRR[ind+1:]
                y_train=np.concatenate(y_train)
        
                loss, opt_w = reg_logistic_regression(y_train,tX_train,lambda_,w_initial,max_iters,gamma)
                loss_errors.append(calculate_loss_logistic_reg(y_test, tX_test,opt_w))
                weights.append(opt_w)
    
            result_loss_gamma.append(np.mean(loss_errors))
            result_opt_w_gamma.append(np.mean(weights,axis=0))
        result_loss.append(np.min(result_loss_gamma))
        result_gamma.append(np.argmin(result_loss_gamma))
        result_opt_w.append(result_opt_w_gamma[np.argmin(result_loss_gamma)])

    del result_loss_gamma
    del result_opt_w_gamma
    del loss_errors
    del weights
    print(np.min(result_loss))
    print(result_gamma[np.argmin(result_loss)])
    print(np.argmin(result_loss))
    print("   gamma={l:.3f}, batch={b:.0f}, mse={mse:.3f}".format(mse = np.min(result_loss), l =result_gamma[np.argmin(result_loss)], b=np.argmin(result_loss)))

    optimal_weights_LRR = result_opt_w[np.argmin(result_loss)]
    print(optimal_weights_LRR)

    #Training Accuracy
    y_predicted = predict_labels(optimal_weights_LRR, tX_LRR)
    accuracy = (list(y_predicted == y_LRR).count(True))/len(y_LRR)
    print("   accuracy={acc:.3f}".format(acc=accuracy))

## Test Part

### Accuracy for cleaned dataset (with removed samples containing Nans)

In [101]:
x1, m_X, s = standardize(x_clean.copy())
x1 = x1[0:68110]
y1 = y_clean[0:68110].copy()
print("Least-Square-GD")
#cross_validation_LS_GD_demo(x1,y1,5)
print("Least-Square-SDG")
#cross_validation_LS_SGD_demo(x1,y1,5)
print("Least-Square")
compute_least_squares(x1,y1)
#print("Ridge Regression")
#cross_validation_demo_RR(x1,y1,5)
print("Logistic Regression")
cross_validation_LR_demo(x1,y1,5)
print("Regularized Logistic Regression")
cross_validation_LRR_demo(x1,y1,5)

Least-Square-GD
Least-Square-SDG
Least-Square
   mse=0.739442671577914
   accuracy=0.720
Logistic Regression
logistic: tx [ 1.00000000e+00  9.06558806e-04  1.33780382e-03  6.24933869e-04
 -1.47443966e-03  5.53671848e-04  9.28986397e-04 -1.25193717e-03
  2.42491307e-03 -9.12795964e-04  2.09847364e-04  1.30510123e-03
  6.61380400e-04  2.33657654e-03 -1.37187462e-03 -1.50744978e-03
 -1.49209454e-03  1.77545260e-03  1.33345938e-03  1.57040564e-03
 -5.80787133e-04  3.67445661e-04 -7.04917159e-04 -7.03525780e-04
  5.14341003e-04  4.75896075e-03  5.11197086e-04  9.63971033e-04
  3.38677589e-04  5.21022467e-04  1.56228046e-04][0.         1.00304992 1.00221484 1.00121771 1.00136823 0.99989288
 1.00146039 0.99988482 0.99986202 0.99967504 0.99994954 0.99992292
 1.00201317 0.99956155 1.00242439 1.0017459  1.00002298 1.00257573
 0.99955735 0.99960208 0.99983199 0.99929841 0.99974946 0.99969941
 1.00010881 0.99988517 1.00145354 1.00136824 0.99949904 0.99990404
 0.99863095]
logistic: g 0.809333647080

logistic: g 0.7959413867018137
logistic: g 0.7956967516356956
logistic: g 0.7954521993876391
logistic: g 0.7952077299299217
logistic: g 0.7949633432348362
logistic: g 0.7947190392746818
logistic: g 0.7944748180217688
logistic: g 0.7942306794484166
logistic: g 0.7939866235269524
logistic: g 0.7937426502297136
logistic: g 0.7934987595290458
logistic: g 0.7932549513973075
logistic: g 0.793011225806862
logistic: g 0.792767582730083
logistic: g 0.7925240221393565
logistic: g 0.7922805440070744
logistic: g 0.7920371483056363
logistic: g 0.7917938350074577
logistic: g 0.7915506040849564
logistic: g 0.7913074555105629
logistic: g 0.7910643892567165
logistic: g 0.7908214052958665
logistic: g 0.7905785036004689
logistic: g 0.7903356841429895
logistic: g 0.7900929468959076
logistic: g 0.7898502918317057
logistic: g 0.7896077189228795
logistic: g 0.7893652281419322
logistic: g 0.7891228194613759
logistic: g 0.7888804928537344
logistic: g 0.7886382482915382
logistic: g 0.7883960857473271
logistic: 

logistic: g 0.7912628994384009
logistic: g 0.7910181789393678
logistic: g 0.7907735419361815
logistic: g 0.7905289884006775
logistic: g 0.7902845183046971
logistic: g 0.7900401316200928
logistic: g 0.7897958283187275
logistic: g 0.7895516083724742
logistic: g 0.7893074717532125
logistic: g 0.7890634184328331
logistic: g 0.7888194483832385
logistic: g 0.7885755615763348
logistic: g 0.7883317579840438
logistic: g 0.7880880375782952
logistic: g 0.7878444003310242
logistic: g 0.7876008462141804
logistic: g 0.7873573751997208
logistic: g 0.7871139872596117
logistic: g 0.7868706823658299
logistic: g 0.78662746049036
logistic: g 0.7863843216051977
logistic: g 0.786141265682348
logistic: g 0.7858982926938237
logistic: g 0.7856554026116488
logistic: g 0.785412595407856
logistic: g 0.785169871054488
logistic: g 0.784927229523597
logistic: g 0.7846846707872436
logistic: g 0.7844421948174986
logistic: g 0.7841998015864419
logistic: g 0.7839574910661634
logistic: g 0.7837152632287621
logistic: g 0.

logistic: g 0.7785727043215696
logistic: g 0.7781807450110727
logistic: g 0.7777890038087323
logistic: g 0.7773974805946268
logistic: g 0.7770061752489013
logistic: g 0.776615087651766
logistic: g 0.7762242176834983
logistic: g 0.7758335652244428
logistic: g 0.7754431301550048
logistic: g 0.7750529123556597
logistic: g 0.7746629117069497
logistic: g 0.7742731280894802
logistic: g 0.7738835613839206
logistic: g 0.7734942114710124
logistic: g 0.7731050782315555
logistic: g 0.7727161615464212
logistic: g 0.772327461296542
logistic: g 0.771938977362918
logistic: g 0.7715507096266173
logistic: g 0.771162657968769
logistic: g 0.7707748222705698
logistic: g 0.7703872024132792
logistic: g 0.7699997982782286
logistic: g 0.769612609746808
logistic: g 0.7692256367004768
logistic: g 0.7688388790207566
logistic: tx [ 1.00000000e+00 -6.17667785e-04 -1.35492644e-03  3.19907365e-04
 -3.03187482e-04 -2.82965135e-03 -3.86490780e-03  3.59155767e-03
  2.80796563e-04  2.02963357e-03  1.18072159e-03  2.6625

logistic: g 0.7659696977353744
logistic: g 0.765586894561219
logistic: g 0.765204304048185
logistic: g 0.7648219260790408
logistic: g 0.7644397605366232
logistic: tx [ 1.00000000e+00 -6.65530356e-04  1.67055068e-03 -1.13005664e-03
 -5.74115484e-04 -2.55934401e-03 -1.71518530e-03  2.19162347e-03
 -1.35585764e-03 -5.86601822e-04 -1.53019046e-03 -9.52356514e-04
  8.71761465e-04 -1.14888174e-03 -3.42257373e-04 -1.46675886e-03
 -3.14031149e-04 -7.94004329e-04 -1.68363846e-03 -8.71825114e-04
 -3.42247404e-06  5.55385840e-04 -1.08012129e-03 -6.63616496e-04
 -1.95736909e-03 -3.84011200e-03 -8.17904894e-04 -1.36124376e-03
  3.46463271e-03  3.63191642e-04 -1.53515330e-03][0.         1.00160631 1.00526527 0.99982048 0.99677444 0.99900542
 1.00212897 0.99999627 0.9992884  0.99965283 0.99879903 0.99769387
 0.99983978 1.0004812  0.99710382 1.00121027 0.99999426 0.99991747
 1.00173758 0.99833994 0.99785845 0.99820961 0.99794524 0.99971648
 0.99759224 0.99953285 1.00110103 1.0009125  0.99888897 0.9990

logistic: g 0.7937384968528366
logistic: g 0.7930883463535946
logistic: g 0.7924387831623826
logistic: g 0.7917898067548369
logistic: g 0.791141416607065
logistic: g 0.79049361219564
logistic: g 0.7898463929976048
logistic: g 0.7891997584904699
logistic: g 0.7885537081522119
logistic: g 0.7879082414612764
logistic: g 0.7872633578965735
logistic: g 0.7866190569374805
logistic: g 0.7859753380638397
logistic: g 0.7853322007559602
logistic: g 0.7846896444946134
logistic: g 0.7840476687610367
logistic: g 0.7834062730369313
logistic: g 0.7827654568044609
logistic: g 0.7821252195462534
logistic: g 0.7814855607453998
logistic: g 0.7808464798854511
logistic: g 0.7802079764504206
logistic: g 0.7795700499247883
logistic: g 0.7789326997934862
logistic: g 0.7782959255419141
logistic: g 0.7776597266559291
logistic: g 0.7770241026218482
logistic: g 0.7763890529264466
logistic: g 0.7757545770569608
logistic: g 0.7751206745010849
logistic: g 0.7744873447469706
logistic: g 0.7738545872832261
logistic: g

logistic: g 0.7675235811354022
logistic: g 0.7669005981909406
logistic: g 0.7662781772138244
logistic: g 0.7656563177010148
logistic: g 0.7650350191499269
logistic: g 0.7644142810584315
logistic: g 0.7637941029248527
logistic: g 0.7631744842479721
logistic: g 0.7625554245270172
logistic: g 0.7619369232616783
logistic: g 0.7613189799520907
logistic: g 0.7607015940988461
logistic: g 0.7600847652029866
logistic: g 0.7594684927660056
logistic: g 0.7588527762898472
logistic: g 0.7582376152769073
logistic: g 0.7576230092300318
logistic: g 0.7570089576525146
logistic: g 0.7563954600481003
logistic: g 0.7557825159209816
logistic: g 0.755170124775801
logistic: g 0.7545582861176475
logistic: g 0.7539469994520569
logistic: g 0.7533362642850159
logistic: g 0.7527260801229544
logistic: g 0.7521164464727474
logistic: g 0.7515073628417217
logistic: g 0.7508988287376436
logistic: g 0.7502908436687276
logistic: g 0.749683407143631
logistic: g 0.7490765186714586
logistic: g 0.7484701777617533
logistic: 

logistic: g 0.7157748601905899
logistic: g 0.7148364793241058
logistic: g 0.7138994722185416
logistic: g 0.712963836888302
logistic: g 0.7120295713506627
logistic: g 0.7110966736257653
logistic: g 0.7101651417366187
logistic: tx [ 1.00000000e+00 -8.36393404e-04  2.35163884e-04 -6.42871926e-04
  1.11764349e-03  2.92874922e-03  2.25754715e-03 -2.85253191e-03
 -2.10109443e-03 -1.48951792e-03 -9.47539980e-04 -6.50183185e-04
 -7.94379646e-05 -1.36045245e-03  2.90461756e-03  1.53842546e-03
  7.12947749e-04 -3.96140830e-04  8.70225460e-05 -8.72957340e-04
  1.02740870e-03 -7.03838656e-04 -6.48032120e-04 -3.65681278e-03
  6.50408490e-04  5.34282947e-04  2.57624667e-03 -3.17054402e-03
 -2.99510827e-04  3.62781946e-03 -1.83005183e-03][0.         1.00321375 1.00243494 1.00199688 1.0002725  1.002381
 1.00250382 1.00295822 0.99924906 0.99984387 1.00062074 1.00143746
 0.99992998 0.99970427 1.00697699 0.99886387 0.99996758 0.99858649
 0.99916288 1.00004498 0.99915047 1.000325   1.00019551 0.99843119
 

logistic: g 0.7833715154312635
logistic: g 0.7823367337625495
logistic: g 0.7813034675458781
logistic: g 0.7802717145779591
logistic: g 0.7792414726587464
logistic: g 0.7782127395914298
logistic: g 0.7771855131824267
logistic: g 0.7761597912413855
logistic: g 0.775135571581173
logistic: g 0.7741128520178733
logistic: g 0.773091630370781
logistic: g 0.772071904462397
logistic: g 0.7710536721184255
logistic: g 0.7700369311677641
logistic: g 0.7690216794425052
logistic: g 0.7680079147779265
logistic: g 0.766995635012487
logistic: g 0.7659848379878225
logistic: g 0.7649755215487413
logistic: g 0.7639676835432203
logistic: g 0.7629613218223964
logistic: g 0.7619564342405654
logistic: g 0.7609530186551736
logistic: g 0.7599510729268166
logistic: g 0.7589505949192343
logistic: g 0.757951582499303
logistic: g 0.756954033537032
logistic: g 0.7559579459055592
logistic: g 0.7549633174811471
logistic: g 0.7539701461431751
logistic: g 0.7529784297741395
logistic: g 0.7519881662596418
logistic: g 0.

logistic: g 0.6995824307858426
logistic: g 0.6980972486839899
logistic: g 0.6966155983479637
logistic: g 0.6951374714888795
logistic: g 0.693662859837308
logistic: g 0.6921917551432298
logistic: g 0.6907241491759898
logistic: g 0.6892600337242452
logistic: g 0.6877994005959284
logistic: g 0.6863422416181925
logistic: g 0.68488854863737
logistic: g 0.6834383135189249
logistic: g 0.6819915281474065
logistic: g 0.6805481844264053
logistic: g 0.6791082742785032
logistic: g 0.677671789645235
logistic: g 0.6762387224870378
logistic: g 0.6748090647832048
logistic: g 0.6733828085318425
logistic: g 0.6719599457498276
logistic: g 0.6705404684727567
logistic: g 0.6691243687549041
logistic: g 0.6677116386691782
logistic: g 0.6663022703070762
logistic: g 0.6648962557786379
logistic: g 0.6634935872124007
logistic: g 0.662094256755359
logistic: g 0.6606982565729173
logistic: g 0.659305578848846
logistic: g 0.657916215785238
logistic: g 0.6565301596024613
logistic: g 0.6551474025391244
logistic: tx [ 

logistic: g 0.6750423980240013
logistic: g 0.6736266423521761
logistic: g 0.6722142264458458
logistic: g 0.6708051425333067
logistic: g 0.6693993828609022
logistic: g 0.6679969396929754
logistic: g 0.66659780531183
logistic: g 0.6652019720176859
logistic: g 0.6638094321286399
logistic: g 0.6624201779806189
logistic: g 0.6610342019273435
logistic: g 0.6596514963402784
logistic: g 0.6582720536085972
logistic: g 0.6568958661391392
logistic: g 0.6555229263563646
logistic: g 0.6541532267023149
logistic: g 0.6527867596365724
logistic: g 0.6514235176362156
logistic: g 0.6500634931957823
logistic: g 0.6487066788272235
logistic: g 0.6473530670598645
logistic: g 0.6460026504403659
logistic: tx [ 1.00000000e+00  1.08987999e-03 -1.79093511e-03  8.28760116e-04
  1.28458807e-03  2.19364116e-03  2.53943813e-03 -1.77881923e-03
  5.86543847e-04  9.92232673e-04  9.49820464e-04 -2.42780413e-03
 -7.40405980e-04  1.57601828e-03  1.27069729e-04 -6.19858553e-04
 -3.75801281e-04 -3.09606706e-03  2.85102984e-0

logistic: g 0.6688549613679455
logistic: g 0.667444435288657
logistic: g 0.6660372628784631
logistic: g 0.6646334362725004
logistic: g 0.6632329476243155
logistic: g 0.6618357891058209
logistic: g 0.6604419529072529
logistic: g 0.659051431237127
logistic: g 0.657664216322195
logistic: g 0.6562803004073988
logistic: g 0.6548996757558291
logistic: g 0.653522334648682
logistic: g 0.6521482693852148
logistic: tx [ 1.00000000e+00  9.06558806e-04  1.33780382e-03  6.24933869e-04
 -1.47443966e-03  5.53671848e-04  9.28986397e-04 -1.25193717e-03
  2.42491307e-03 -9.12795964e-04  2.09847364e-04  1.30510123e-03
  6.61380400e-04  2.33657654e-03 -1.37187462e-03 -1.50744978e-03
 -1.49209454e-03  1.77545260e-03  1.33345938e-03  1.57040564e-03
 -5.80787133e-04  3.67445661e-04 -7.04917159e-04 -7.03525780e-04
  5.14341003e-04  4.75896075e-03  5.11197086e-04  9.63971033e-04
  3.38677589e-04  5.21022467e-04  1.56228046e-04][0.         1.00304992 1.00221484 1.00121771 1.00136823 0.99989288
 1.00146039 0.999

logistic: g 0.7508662571247828
logistic: g 0.748277277465456
logistic: g 0.7456982000691535
logistic: g 0.7431289875241324
logistic: g 0.7405696025599625
logistic: g 0.7380200080469882
logistic: g 0.7354801669957859
logistic: g 0.7329500425566227
logistic: g 0.730429598018928
logistic: g 0.7279187968107543
logistic: g 0.7254176024982483
logistic: g 0.7229259787851205
logistic: g 0.720443889512118
logistic: g 0.7179712986564978
logistic: g 0.7155081703315049
logistic: g 0.7130544687858504
logistic: g 0.7106101584031919
logistic: g 0.7081752037016145
logistic: g 0.7057495693331157
logistic: g 0.7033332200830954
logistic: g 0.7009261208698371
logistic: g 0.6985282367440059
logistic: g 0.6961395328881341
logistic: g 0.6937599746161213
logistic: g 0.6913895273727227
logistic: g 0.6890281567330584
logistic: g 0.6866758284020993
logistic: g 0.6843325082141818
logistic: g 0.6819981621325026
logistic: g 0.6796727562486292
logistic: g 0.6773562567820052
logistic: g 0.6750486300794581
logistic: g

KeyboardInterrupt: 

### Accuracy for standardized dataset 

In [102]:
x2= x_nan.copy()
y2= y.copy()
print("Least-Square-GD")
#cross_validation_LS_GD_demo(x2,y2,5)
print("Least-Square-SDG")
#cross_validation_LS_SGD_demo(x2,y2,5)
print("Least-Square")
compute_least_squares(x2,y2)
#print("Ridge Regression")
#cross_validation_demo_RR(x2,y2,5)
print("Logistic Regression")
cross_validation_LR_demo(x2,y2,5)
print("Regularized Logistic Regression")
cross_validation_LRR_demo(x2,y2,5)

Least-Square-GD
Least-Square-SDG
Least-Square
   mse=0.6987273652535154
   accuracy=0.742
Logistic Regression
logistic: tx [ 1.00000000e+00  2.72906387e-02 -1.11526814e-03 -1.32354424e-04
 -2.24835306e-04 -2.58126024e-01  1.86308253e-01  2.87995263e-01
  6.63724179e-04 -3.22792125e-04  2.07847053e-04  7.35909364e-04
  1.17624719e-03  5.80138054e-02 -5.68243246e-04  6.61344805e-04
 -3.61934550e-04  4.20637353e-04  7.76560338e-04  1.26857884e-04
 -2.09183674e-04 -6.90654908e-04  1.82969563e-04  1.26864038e-04
  1.40882526e-01  2.02694997e-02 -1.82871816e-01  3.08289105e-02
  2.11931630e-01 -1.33554024e-01  2.80601849e-04][0.         0.92327414 0.99910185 1.00058822 0.99695026 0.56339917
 0.55224503 0.56933956 0.99916684 0.96535855 0.99975604 1.00085028
 1.00071346 0.53986142 1.00180488 1.00043027 0.99935549 1.00115283
 0.99940366 1.00039042 0.9861732  0.99922396 0.99952522 1.00000165
 0.79413268 0.77591968 0.80759923 0.53980828 0.55491645 0.54563248
 0.99957361]
logistic: g 0.75505215469

logistic: g 0.751827195322328
logistic: g 0.7516066146705432
logistic: g 0.7513861563735698
logistic: g 0.7511658203439003
logistic: g 0.7509456064940956
logistic: g 0.750725514736767
logistic: g 0.7505055449845921
logistic: g 0.7502856971503035
logistic: g 0.750065971146696
logistic: g 0.7498463668866215
logistic: g 0.7496268842829906
logistic: g 0.7494075232487738
logistic: g 0.7491882836970022
logistic: g 0.7489691655407642
logistic: g 0.7487501686932065
logistic: g 0.7485312930675356
logistic: g 0.7483125385770184
logistic: g 0.7480939051349788
logistic: g 0.7478753926547983
logistic: g 0.7476570010499223
logistic: g 0.7474387302338501
logistic: g 0.7472205801201416
logistic: g 0.747002550622414
logistic: g 0.7467846416543493
logistic: g 0.746566853129676
logistic: g 0.746349184962196
logistic: g 0.7461316370657591
logistic: g 0.7459142093542775
logistic: g 0.7456969017417217
logistic: g 0.7454797141421196
logistic: g 0.7452626464695629
logistic: g 0.7450456986381938
logistic: g 0.

logistic: g 0.7524798523802887
logistic: g 0.7522588053958259
logistic: g 0.7520378806055555
logistic: g 0.7518170779223843
logistic: g 0.7515963972592904
logistic: g 0.7513758385292987
logistic: g 0.751155401645499
logistic: g 0.7509350865210408
logistic: g 0.750714893069128
logistic: g 0.750494821203027
logistic: g 0.7502748708360615
logistic: g 0.7500550418816159
logistic: g 0.7498353342531298
logistic: g 0.7496157478641026
logistic: g 0.7493962826280959
logistic: g 0.7491769384587235
logistic: g 0.7489577152696651
logistic: g 0.7487386129746527
logistic: g 0.748519631487479
logistic: g 0.748300770721996
logistic: g 0.7480820305921141
logistic: g 0.7478634110118009
logistic: g 0.747644911895087
logistic: g 0.74742653315605
logistic: g 0.7472082747088399
logistic: g 0.7469901364676566
logistic: g 0.7467721183467572
logistic: g 0.7465542202604636
logistic: g 0.7463364421231534
logistic: g 0.7461187838492571
logistic: g 0.7459012453532695
logistic: g 0.7456838265497424
logistic: g 0.74

logistic: g 0.7419944098649908
logistic: g 0.7416443325381874
logistic: g 0.7412945695960569
logistic: g 0.7409451206714439
logistic: g 0.7405959853976032
logistic: g 0.7402471634081905
logistic: g 0.7398986543372735
logistic: g 0.7395504578193292
logistic: g 0.7392025734892351
logistic: g 0.7388550009822831
logistic: g 0.7385077399341672
logistic: g 0.7381607899809862
logistic: g 0.7378141507592465
logistic: g 0.7374678219058599
logistic: g 0.7371218030581397
logistic: g 0.7367760938538087
logistic: g 0.7364306939309916
logistic: g 0.7360856029282179
logistic: g 0.7357408204844178
logistic: g 0.7353963462389264
logistic: g 0.7350521798314829
logistic: g 0.7347083209022297
logistic: g 0.7343647690917066
logistic: g 0.7340215240408601
logistic: g 0.7336785853910364
logistic: g 0.7333359527839814
logistic: g 0.7329936258618442
logistic: g 0.732651604267174
logistic: g 0.7323098876429197
logistic: g 0.7319684756324274
logistic: g 0.7316273678794487
logistic: g 0.7312865640281275
logistic:

logistic: g 0.7426937103861355
logistic: g 0.7423411112999494
logistic: g 0.7419888305539051
logistic: g 0.7416368677755761
logistic: g 0.7412852225929495
logistic: g 0.7409338946344353
logistic: g 0.7405828835288377
logistic: g 0.7402321889053951
logistic: g 0.7398818103937425
logistic: g 0.739531747623937
logistic: g 0.7391820002264412
logistic: g 0.7388325678321318
logistic: g 0.738483450072297
logistic: g 0.7381346465786335
logistic: g 0.737786156983256
logistic: g 0.7374379809186764
logistic: g 0.7370901180178262
logistic: g 0.7367425679140441
logistic: g 0.7363953302410781
logistic: g 0.7360484046330811
logistic: g 0.7357017907246216
logistic: g 0.7353554881506693
logistic: g 0.7350094965466063
logistic: g 0.7346638155482162
logistic: g 0.7343184447917
logistic: g 0.7339733839136519
logistic: g 0.7336286325510838
logistic: g 0.7332841903414069
logistic: g 0.73294005692244
logistic: g 0.7325962319324111
logistic: g 0.7322527150099489
logistic: g 0.7319095057940842
logistic: g 0.73

logistic: g 0.727878995621463
logistic: g 0.7273327107943717
logistic: g 0.7267872097864476
logistic: g 0.726242491116365
logistic: g 0.7256985533054892
logistic: g 0.7251553948778701
logistic: g 0.724613014360239
logistic: g 0.724071410282014
logistic: g 0.7235305811752774
logistic: g 0.7229905255747939
logistic: g 0.722451242017992
logistic: g 0.7219127290449635
logistic: g 0.7213749851984625
logistic: g 0.7208380090239004
logistic: g 0.7203017990693398
logistic: g 0.7197663538854939
logistic: g 0.719231672025723
logistic: g 0.7186977520460305
logistic: g 0.7181645925050565
logistic: g 0.7176321919640756
logistic: g 0.7171005489869957
logistic: g 0.7165696621403498
logistic: g 0.7160395299932949
logistic: g 0.7155101511176102
logistic: g 0.7149815240876917
logistic: g 0.7144536474805463
logistic: g 0.7139265198757903
logistic: g 0.7134001398556487
logistic: g 0.7128745060049413
logistic: g 0.7123496169110931
logistic: g 0.7118254711641202
logistic: g 0.7113020673566335
logistic: g 0.

KeyboardInterrupt: 

### Accuracy for standardized subgroups

In [99]:
x3 = x_subgroups
#x3 = x3 / x3.max(axis=0)
y3 = y.copy()
print("Least-Square-GD")
#cross_validation_LS_GD_demo(x3,y3,5)
print("Least-Square-SDG")
#cross_validation_LS_SGD_demo(x3,y3,5)
print("Least-Square")
compute_least_squares(x3,y3)
#print("Ridge Regression")
#cross_validation_demo_RR(x3,y3,5)
print("Logistic Regression")
cross_validation_LR_demo(x3,y3,2)
print("Regularized Logistic Regression")
cross_validation_LRR_demo(x3,y3,5)

Least-Square-GD
Least-Square-SDG
Least-Square


  return (compute_mse(y,tx,opt_w), opt_w)


   mse=0.6980640368286709
   accuracy=0.743
Logistic Regression
logistic: tx [ 1.00000000e+00  2.87499500e+05 -9.18525143e-03 -4.80303490e-03
 -2.33681188e-03  2.01804823e-03  7.68831412e-04  6.44060568e-04
 -1.20109232e-03 -2.74561223e-03  2.20210210e-04  1.95818087e-03
  4.90876828e-04  2.54135243e-03  2.12014156e-04  1.82406071e-03
  8.96160612e-04  1.00630071e-03  1.10527821e-03  1.41038895e-03
 -6.09465049e-04 -1.02380990e-05  7.89812234e-04  3.88481005e-03
  9.79216000e-01  9.79216000e-01  9.79216000e-01  9.79216000e-01
  2.40939776e-03  1.05320187e-03  5.89070010e-04 -4.69114204e-04
 -1.77472527e-04  1.15291623e-03 -6.63960953e-02][0.00000000e+00 3.60843918e+04 9.24876896e-01 9.97959269e-01
 1.00036848e+00 9.44443185e-01 5.39608712e-01 5.38853815e-01
 5.39563916e-01 9.97572918e-01 9.37261043e-01 1.00915294e+00
 1.00225880e+00 1.00100449e+00 5.38250993e-01 1.01103291e+00
 1.00041342e+00 9.99221799e-01 1.00717477e+00 9.98462089e-01
 9.99754693e-01 9.64434005e-01 9.99398502e-01 1.0

  if len(losses) > 1 and np.abs(losses[-1] - losses[-2]) < threshold:


logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: tx [ 1.00000000e+00  2.87499500e+05 -9.18525143e-03 -4.80303490e-03
 -2.33681188e-03  2.01804823e-03  7.68831412e-04  6.44060568e-04
 -1.20109232e-03 -2.74561223e-03  2.20210210e-04  1.95818087e-03
  4.90876828e-04  2.54135243e-03  2.12014156e-04  1.82406071e-03
  8.96160612e-04  1.00630071e-03  1.10527821e-03  1.41038895e-03
 -6.09465049e-04 -1.02380990e-05  7.89812234e-04  3.88481005e-03
  9.7

logistic: g 1.1460594146234004e+71
logistic: g 2.537009343784639e+78
logistic: g 5.616128036926936e+85
logistic: g 1.2432312953213215e+93
logistic: g 2.752116838333476e+100
logistic: g 6.092307296592816e+107
logistic: g 1.3486421680626633e+115
logistic: g 2.9854628286625645e+122
logistic: g 6.608860758172395e+129
logistic: g 1.4629905990314296e+137
logistic: g 3.2385937170905334e+144
logistic: g 7.169211662277403e+151
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
lo

 5.37888076e-01 5.38237010e-01 7.76431045e-01]
logistic: g 51511.9056053919
logistic: g 611074813151.0148
logistic: g 7.249050931123945e+18
logistic: g 8.599395404805032e+25
logistic: g 1.0201280420127453e+33
logistic: g 1.210156264612828e+40
logistic: g 1.435582715569915e+47
logistic: g 1.7030013342140114e+54
logistic: g 2.0202343709490393e+61
logistic: g 2.3965611955597945e+68
logistic: g 2.8429897276546655e+75
logistic: g 3.3725784288441706e+82
logistic: g 4.000818275234585e+89
logistic: g 4.746085883297529e+96
logistic: g 5.630181043480489e+103
logistic: g 6.678964384930799e+110
logistic: g 7.923113823636086e+117
logistic: g 9.399021920812888e+124
logistic: g 1.1149860400134813e+132
logistic: g 1.3226842961947556e+139
logistic: g 1.5690723332994098e+146
logistic: g 1.8613572371037986e+153
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logi

logistic: tx [ 1.00000000e+00  2.87499500e+05 -9.18525143e-03 -4.80303490e-03
 -2.33681188e-03  2.01804823e-03  7.68831412e-04  6.44060568e-04
 -1.20109232e-03 -2.74561223e-03  2.20210210e-04  1.95818087e-03
  4.90876828e-04  2.54135243e-03  2.12014156e-04  1.82406071e-03
  8.96160612e-04  1.00630071e-03  1.10527821e-03  1.41038895e-03
 -6.09465049e-04 -1.02380990e-05  7.89812234e-04  3.88481005e-03
  9.79216000e-01  9.79216000e-01  9.79216000e-01  9.79216000e-01
  2.40939776e-03  1.05320187e-03  5.89070010e-04 -4.69114204e-04
 -1.77472527e-04  1.15291623e-03 -6.63960953e-02][0.00000000e+00 3.60843918e+04 9.24876896e-01 9.97959269e-01
 1.00036848e+00 9.44443185e-01 5.39608712e-01 5.38853815e-01
 5.39563916e-01 9.97572918e-01 9.37261043e-01 1.00915294e+00
 1.00225880e+00 1.00100449e+00 5.38250993e-01 1.01103291e+00
 1.00041342e+00 9.99221799e-01 1.00717477e+00 9.98462089e-01
 9.99754693e-01 9.64434005e-01 9.99398502e-01 1.00444640e+00
 9.76704677e-01 9.76704677e-01 9.76704677e-01 9.7670

logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic

logistic: g 7.495976830457405e+83
logistic: g 6.181876113545681e+91
logistic: g 5.098147065763363e+99
logistic: g 4.204403813140166e+107
logistic: g 3.4673404270068457e+115
logistic: g 2.85948975671219e+123
logistic: g 2.3581998482336544e+131
logistic: g 1.9447898042493233e+139
logistic: g 1.6038536282432003e+147
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g n

logistic: g 4.839125354377439e+66
logistic: g 3.1883503345952956e+75
logistic: g 2.100705625845855e+84
logistic: g 1.3840900978093363e+93
logistic: g 9.119342449908885e+101
logistic: g 6.008453268348297e+110
logistic: g 3.958784405369717e+119
logistic: g 2.6083208553449556e+128
logistic: g 1.7185421047934224e+137
logistic: g 1.1322943494071087e+146
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic:

logistic: g 18187735492329.57
logistic: g 6.421694528160354e+21
logistic: g 2.2673609163931492e+30
logistic: g 8.00555912873656e+38
logistic: g 2.8265891195499802e+47
logistic: g 9.980072499969468e+55
logistic: g 3.523746922244736e+64
logistic: g 1.2441585341255956e+73
logistic: g 4.392853664563054e+81
logistic: g 1.5510212556494874e+90
logistic: g 5.476319311255252e+98
logistic: g 1.9335694523586023e+107
logistic: g 6.827013938742307e+115
logistic: g 2.41047040037421e+124
logistic: g 8.510847646153494e+132
logistic: g 3.0049955247237887e+141
logistic: g 1.0609986782798471e+150
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan

logistic: g 3.5991307031501743e+80
logistic: g 1.0152563847777775e+90
logistic: g 2.8638735623851326e+99
logistic: g 8.078522730121766e+108
logistic: g 2.2788202090437633e+118
logistic: g 6.428182130110799e+127
logistic: g 1.8132858982857218e+137
logistic: g 5.114985360355956e+146
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g na

logistic: g 2.689722418619845e+41
logistic: g 4.0659068473811365e+50
logistic: g 6.146209875464953e+59
logistic: g 9.290890630609134e+68
logistic: g 1.4044533209730412e+78
logistic: g 2.123035572385036e+87
logistic: g 3.2092772143466437e+96
logistic: g 4.851289527360134e+105
logistic: g 7.333430086084177e+114
logistic: g 1.1085546744671172e+124
logistic: g 1.6757417086648657e+133
logistic: g 2.533127448593268e+142
logistic: g 3.829190762297799e+151
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
log

logistic: g 1.3094437518928883e+25
logistic: g 1.5814079078413284e+35
logistic: g 1.9098575004599825e+45
logistic: g 2.3065242395570588e+55
logistic: g 2.7855764455635863e+65
logistic: g 3.3641251199548536e+75
logistic: g 4.062835123672759e+85
logistic: g 4.906663293893948e+95
logistic: g 5.925749863529373e+105
logistic: g 7.156495023576676e+115
logistic: g 8.642859081462077e+125
logistic: g 1.0437932655010557e+136
logistic: g 1.2605832986936226e+146
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
l

logistic: g 1.5854001322473638e+83
logistic: g 1.026048286216326e+93
logistic: g 6.64043773072677e+102
logistic: g 4.2975963069210835e+112
logistic: g 2.7813428521134847e+122
logistic: g 1.8000453063831257e+132
logistic: g 1.1649635723872756e+142
logistic: g 7.539477590796088e+151
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g na

logistic: g 2.4001851394443574e+26
logistic: g 1.241025623107395e+37
logistic: g 6.416774155870505e+47
logistic: g 3.317819535776385e+58
logistic: g 1.7154922714411987e+69
logistic: g 8.870023524910668e+79
logistic: g 4.586282237597684e+90
logistic: g 2.371356141709418e+101
logistic: g 1.2261194709570512e+112
logistic: g 6.339701281547238e+122
logistic: g 3.27796868831057e+133
logistic: g 1.694887226440761e+144
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic:

logistic: g 4.958819793974831e+98
logistic: g 1.3739982600189797e+109
logistic: g 3.807097851849782e+119
logistic: g 1.0548771767264893e+130
logistic: g 2.9228717024905868e+140
logistic: g 8.098742846756435e+150
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g inf
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logistic: g nan
logi

ValueError: All-NaN slice encountered

### Accuracy with removed features

In [None]:
x4 =x_s.copy()
y4 =y_s.copy()
print("Least-Square-GD")
cross_validation_LS_GD_demo(x4,y4,4)
print("Least-Square-SDG")
cross_validation_LS_SGD_demo(x4,y4,4)
print("Least-Square")
compute_least_squares(x4,y4)
#print("Ridge Regression")
#cross_validation_demo_RR(x4,y4,4)
print("Logistic Regression")
cross_validation_LR_demo(x4,y4,4)
print("Regularized Logistic Regression")
cross_validation_LRR_demo(x4,y4,4)

## Generate predictions and Save output in CSV format for submission

In [None]:
DATA_TEST_PATH = '../data/test.csv/test.csv' 
_, tX_test, ids_test = load_csv_data(DATA_TEST_PATH)

In [None]:
OUTPUT_PATH = './logisticRegression_x_te_s' # TODO: fill in desired name of output file for submission
tX_test = np.c_[np.ones((tX_test.shape[0], 1)), tX_test]
tX_test = selected_non_nan_columns(tX_test)
y_pred = predict_labels(optimal_weights_LR, tX_test)
create_csv_submission(ids_test, y_pred, OUTPUT_PATH)