In [1]:
from sklearn import linear_model # for model
from sklearn.datasets import load_iris # the iris dataset 

import numpy
numpy.set_printoptions(suppress=True) # disable scientific notation

In [2]:
iris = load_iris() # load iris dataset

In [3]:

model = linear_model.LogisticRegression() # make logistic regression model 
model.fit(iris.data, iris.target) # fit model to dataset 

print('Intercept: {0}  Coefficients: {1}'.format(model.intercept_, model.coef_)) # print coefficients 

Intercept: [  9.84028024   2.21683511 -12.05711535]  Coefficients: [[-0.41874027  0.96699274 -2.52102832 -1.08416599]
 [ 0.53123044 -0.31473365 -0.20002395 -0.94866082]
 [-0.11249017 -0.65225909  2.72105226  2.03282681]]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [4]:
start_class_two = list(iris.target).index(1)
start_class_three = list(iris.target).index(2)
# Use the first input from each class
inputs = [iris.data[0], iris.data[start_class_two], iris.data[start_class_three]]

print('Class predictions: {0}'.format(model.predict(inputs))) # predict which class, should be [0, 1, 2]
print('Probabilities:\n{0}'.format(model.predict_proba(inputs))) # get probability of each class

Class predictions: [0 1 2]
Probabilities:
[[0.98180291 0.01819708 0.00000001]
 [0.00211747 0.87434557 0.12353696]
 [0.00000089 0.00392306 0.99607606]]


In [5]:
# use only two features to train second logistic regression model, the first and fourth column 
x1_feature = 0
x2_feature = 3

partial_data = iris.data[:,[x1_feature, x2_feature]] # get the first and fourth clumn 

partial_model = linear_model.LogisticRegression() # make model 
partial_model.fit(partial_data, iris.target) # fit model 

partial_inputs = [partial_data[0], partial_data[start_class_two], partial_data[start_class_three]] # make new inputs with only two features each

print('Class predictions: {0}'.format(partial_model.predict(partial_inputs))) # predict which class
print('Probabilities:\n{0}'.format(partial_model.predict_proba(partial_inputs))) # get probability of each class

Class predictions: [0 1 2]
Probabilities:
[[0.92831857 0.07157087 0.00011056]
 [0.00559652 0.62519423 0.36920925]
 [0.00001365 0.03313084 0.96685551]]


# Exercise Option #1 - Standard Difficulty

Answer the following questions. You can also use the graph below, if seeing the data visually helps you understand the data.
1. In the above cell, the expected class predictions should be [0, 1, 2], because the first datapoint of each class was used. If the model was not giving the expected output, some reasons could be that the data values chosen to test were outliers, or that logistic regression does not work well predicting the data. 
2. How do the probabilities output by the above cell relate to the class predictions? Why do you think the model might be more or less confident in its predictions?
3. Looking at the intercept and coefficient output further above, if a coefficient is negative, what has the model learned about this feature? In other words, if you took a datapoint and you increased the value of a feature that has a negative coefficient, what would you expect to happen to the probabilities the model gives this datapoint?
4. Do these two features allow you to predict the iris type well? How do you know? Explain using both the text output in the cells above and the graph below.
5. Using all the different feature pair combinations, the best pair was petal length and petal width. I know this because I calculated for each combination how confident the model was for the expected outputs. This finding actually aligns with what I found about the iris dataset with a decision tree: in the case of the decision tree, most nodes separated the data based on petal length and petal width, i.e. that they were the best predictors.  

In [6]:
feature_pairs = [[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]
feature_pair_probabilities = []

# generate model and probabilities for each feature pair 
for feature_pair in feature_pairs:
    feature_pair_data = iris.data[:,feature_pair] # get data of feature pair 
    feature_pair_model = linear_model.LogisticRegression() # make model 
    feature_pair_model.fit(feature_pair_data, iris.target) # fit model to feature pair 
    feature_pair_inputs = [feature_pair_data[0], feature_pair_data[start_class_two], feature_pair_data[start_class_three]] # make new inputs with only the feature pair
    feature_pair_probabilities.append(feature_pair_model.predict_proba(feature_pair_inputs)) # push class probabilities for specific feature pair to array
    
best_pair_score = 0 # scale from 0-1, how well feature pair performed based on how close it was to expected outputs 
best_pair = [] # feature pair with best score 
index = 0 # for indexing 

for feature_pair_probability in feature_pair_probabilities:
    # print probabilities for feature pair 
    print('Probabilities for {} & {}:\n{}'.format(iris.feature_names[feature_pairs[index][0]], iris.feature_names[feature_pairs[index][1]], feature_pair_probability))
    # calculate score
    feature_pair_score = ((feature_pair_probability[0][0]/1)+(feature_pair_probability[1][1]/1)+(feature_pair_probability[2][2]/1))/3
    # if it's better than current best feature pair score, update it 
    if (feature_pair_score > best_pair_score):
        best_pair_score = feature_pair_score
        best_pair = feature_pairs[index]
    # index 
    index += 1
    
# print info on the best feature pair 
print('Best pair: {} & {}, with score: {}'.format(iris.feature_names[best_pair[0]], iris.feature_names[best_pair[1]], best_pair_score))

Probabilities for sepal length (cm) & sepal width (cm):
[[0.92347315 0.0585081  0.01801875]
 [0.00176572 0.1981595  0.80007478]
 [0.05009604 0.37235578 0.57754818]]
Probabilities for sepal length (cm) & petal length (cm):
[[0.97521958 0.02478036 0.00000005]
 [0.00105972 0.7765676  0.22237268]
 [0.00000087 0.01201376 0.98798537]]
Probabilities for sepal length (cm) & petal width (cm):
[[0.92831857 0.07157087 0.00011056]
 [0.00559652 0.62519423 0.36920925]
 [0.00001365 0.03313084 0.96685551]]
Probabilities for sepal width (cm) & petal length (cm):
[[0.98200697 0.01799299 0.00000004]
 [0.00633308 0.66738949 0.32627743]
 [0.0000065  0.01584795 0.98414555]]
Probabilities for sepal width (cm) & petal width (cm):
[[0.95767161 0.04223211 0.00009628]
 [0.10787733 0.65224682 0.23987585]
 [0.00009767 0.02361994 0.97628239]]
Probabilities for petal length (cm) & petal width (cm):
[[0.97983058 0.02016939 0.00000003]
 [0.0024148  0.77883567 0.21874952]
 [0.00000027 0.00463449 0.99536524]]
Best pair:

# Exercise Option #2 - Advanced Difficulty

The plot above is only showing the data, and not anything about what the model learned. Come up with some ideas for how to show the model fit and implement one of them in code. Remember, we are here to help if you are not sure how to write the code for your ideas!