Logistic regression is used in classification problems, and works for binary classification, that is, when we have two possible labels for the target variable. This is not the case for us. Our `class_` feature has three possible values, star, galaxy, and quasar. We will need to implement a multinomial logistic regression model. The multinomial logistic regression algorithm is an extension to the logistic regression model that involves changing the loss function to cross-entropy loss and predict probability distribution to a multinomial probability distribution to natively support multi-class classification problems.

The approach involves changing the logistic regression model to support the prediction of multiple class labels directly. Specifically, to predict the probability that an input example belongs to each known class label.

In [1]:
# import dependecies
import pandas as pd
import numpy as np
from numpy import mean
from numpy import std
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LinearRegression

# import dataset
DR_df = pd.read_csv("../Resources/DR_complete_clean_scaled.csv")
DR_df.head()

Unnamed: 0,class_,u_,g_,r_,i_,z_,redshift_
0,2,-1.06772,-1.123867,-0.878764,-0.036867,-0.558164,-0.713568
1,2,-0.030126,-0.487516,-0.660599,-0.035391,-0.6683,-0.71327
2,2,-0.429575,-0.452186,-0.317882,-0.009295,-0.131955,-0.713123
3,2,0.489953,0.523653,0.40981,0.026467,0.430269,-0.713607
4,2,-1.644429,-1.964525,-2.027211,-0.111112,-1.892926,-0.713444


In [3]:
X = np.array(DR_df['redshift_']).reshape((-1, 1)) # independent feature
y = np.array(DR_df['class_']) # Target

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 0.0026964603242726204


In [4]:
X = np.array(DR_df['u_']).reshape((-1, 1)) # independent feature

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 0.0836319348775989


In [5]:
X = np.array(DR_df['g_']).reshape((-1, 1)) # independent feature

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 0.055502689913690895


In [6]:
X = np.array(DR_df['r_']).reshape((-1, 1)) # independent feature

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 0.007710028984193706


In [7]:
X = np.array(DR_df['i_']).reshape((-1, 1)) # independent feature

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 2.5637780663889664e-05


In [8]:
X = np.array(DR_df['z_']).reshape((-1, 1)) # independent feature

# define the multinomial linear regression model
model = LinearRegression().fit(X, y)
r_sq = model.score(X, y)

# report the model performance
print('coefficient of determination:', r_sq)

coefficient of determination: 0.002572019270146053
