# EECS 545 (WN 2025) Homework 2 Q2: Softmax Regression via Gradient Ascent

<span class="instruction">Before starting the assignment, please fill in the following cell.</span>

In [1]:
###################################################################
# Enter your first and last name, e.g. "John Doe"                 #
# for example                                                     #
__NAME__ = "Andrew Mayo"                                        #
__UNIQID__ = "acmayo"                                          #
###################################################################
###################################################################
#                        END OF YOUR CODE                         #
###################################################################

print(f"Your name and email: {__NAME__} <{__UNIQID__}@umich.edu>")
assert __NAME__ and __UNIQID__

Your name and email: Andrew Mayo <acmayo@umich.edu>


# Softmax Regression via Gradient Ascent
In this notebook you will implement a gradient ascent for softmax regression from a given dataset. 

Among various ways in computing gradient, we will use gradient ascent update rule derived in our homework.

After implementing it, you will report the accuracy of your implementation.

## Setup code
Before getting started, we need to run some boilerplate code to set up our environment. You'll need to rerun this setup code each time you start the notebook. Let's start by checking whether we are using Python 3.11 or higher.

In [2]:
import sys
if sys.version_info[0] < 3:
    raise Exception("You must use Python 3")

if sys.version_info[1] < 11:
    print("Autograder will execute your code based on Python 3.11 environment. Please use Python 3.11 or higher to prevent any issues")
    print("You can create a conda environment with Python 3.11 like 'conda create --name eecs545 python=3.11'")
    raise Exception("Python 3 version is too low: {}".format(sys.version))
else:
    print("You are good to go")

You are good to go


First, run this cell load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html) extension. This allows us to edit `.py` source files, and re-import them into the notebook for a seamless editing and debugging experience.

In [3]:
%load_ext autoreload
%autoreload 2

Once you located the `softmax_regression.py` correctly, run the following cell allow us to import from `softmax_regression.py`. If it works correctly, it should print the message:
```Hello from softmax_regression.py```

In [4]:
from softmax_regression import hello
hello()

Hello from softmax_regression.py


Then, we run some setup code for this notebook: Import some useful packages and increase the default figure size.

In [5]:
# install required libraries
# !pip install numpy==1.24.1 scikit-learn==1.2.0

# import libraries
import math
import numpy as np

In [6]:
from IPython.display import display_html, HTML

display_html(HTML('''
<style type="text/css">
  .instruction { background-color: yellow; font-weight:bold; padding: 3px; }
</style>
'''));

## Load the dataset
The following codebase will load the dataset and print out the dimension of each file.

In [7]:
import os
filename = 'data/q2_data.npz'

assert os.path.exists(filename), f"{filename} cannot be found."

q2_data = np.load(filename)
for k, v in q2_data.items():
    print(f'Key "{k}" has shape {v.shape}')
    print(f'First three rows of {k} are {v[:3]}')

num_classes = len(np.unique(q2_data['q2y_test']))
print(f'We have {num_classes} different classes in our dataset')

Key "q2x_train" has shape (100, 4)
First three rows of q2x_train are [[6.3 3.3 4.7 1.6]
 [5.1 3.8 1.9 0.4]
 [5.2 3.4 1.4 0.2]]
Key "q2x_test" has shape (50, 4)
First three rows of q2x_test are [[5.8 2.6 4.  1.2]
 [6.9 3.1 5.4 2.1]
 [5.  3.2 1.2 0.2]]
Key "q2y_train" has shape (100, 1)
First three rows of q2y_train are [[2.]
 [1.]
 [1.]]
Key "q2y_test" has shape (50, 1)
First three rows of q2y_test are [[2.]
 [3.]
 [1.]]
We have 3 different classes in our dataset


Note: In this problem, the indexes in y starts from 1. We have three classes now, but please be aware that we will test with various number of classes in Autograder.

print(np.unique(q2_data['q2y_train']))
print(np.unique(q2_data['q2y_test']))

In [8]:
num_classes

3

## Fit W to the train set of the data

Now that we have prepared our data, it is time to implement the gradient ascent algorithm for softmax regression. As the first step to implment, we will first implement the softmax probability computation. <span class="instruction">In the file `softmax_regression.py`, implement the function `compute_softmax_probs` that computes softmax for the data X and weight W.</span> You should double-check the numeric stability of your `compute_softmax_probs` implementations.

After implementing the softmax function, we will compute the weight W by fitting the train set. In this problem, we will use the gradient ascent algorithm in. <span class="instruction">Please implement gradient ascent in `gradient_ascent_train` of `softmax_regression.py`.</span>

We then measure the accuracy with respect to W. You need to implement `compute_accuracy` function in `softmax_regression.py`. Once you correctly implement all the codes, you should be able to get an accuracy above 90%.

In [9]:
from softmax_regression import gradient_ascent_train, compute_accuracy
import numpy as np

np.random.seed(0)
your_accuracy = None
W = gradient_ascent_train(q2_data['q2x_train'], q2_data['q2y_train'], num_classes)
print(W.shape)

your_accuracy = compute_accuracy(q2_data['q2x_test'], q2_data['q2y_test'], W, num_classes) * 100
print(f'The accuracy of Softmax Regression - our implementation: {your_accuracy:.2f}%')

(3, 4)
The accuracy of Softmax Regression - our implementation: 94.00%


## Bonus: Performance comparison with SciKit-Learn

At the end of this question, we would like to check whether our performance is reasonable or not. Here, we use [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) from [SciKit-Learn](https://scikit-learn.org/stable/). You should be able to get similar (or could be even better) performance as scikit-learn's one.

In [10]:
# !pip install scikit-learn if you haven't done so
from sklearn.linear_model import LogisticRegression

# check the previous cell output
assert your_accuracy is not None

# Note: accuracy varies depends on the solver
MLR = LogisticRegression(multi_class='multinomial',solver ='newton-cg')
MLR.fit(q2_data['q2x_train'], np.reshape(q2_data['q2y_train'], -1) - 1)

# Generate predictions and compute accuracy
preds = MLR.predict(q2_data['q2x_test']) + 1  # the shape is (50, )
preds = preds[:, np.newaxis]

# Count the number of matched label
accuracy = 100 * np.mean((preds == q2_data['q2y_test']).astype(np.float32))

print(f'The accuracy of Sklearn Logistic Regression is: {accuracy:.2f}%')
print(f'Please compare with the accuracy of your implementation: {your_accuracy:.2f}%')

The accuracy of Sklearn Logistic Regression is: 92.00%
Please compare with the accuracy of your implementation: 94.00%




In [25]:
from softmax_regression import compute_softmax_probs
probabilities = compute_softmax_probs(q2_data['q2x_train'], W)
for i, row in enumerate(probabilities):
    row_max = np.max(row)
    probabilities[i] -= row_max
probabilities

array([[-0.44375801,  0.        , -0.03545823],
       [ 0.        , -0.83209423, -0.88381706],
       [ 0.        , -0.86884278, -0.91634737],
       [-0.41670439,  0.        , -0.07483739],
       [-0.59573972, -0.21862136,  0.        ],
       [-0.3966987 ,  0.        , -0.00126313],
       [-0.4902449 ,  0.        , -0.17525551],
       [-0.62366592, -0.26283152,  0.        ],
       [-0.46554324,  0.        , -0.12097762],
       [-0.46159591,  0.        , -0.06812118],
       [ 0.        , -0.87967603, -0.92150401],
       [-0.66136505, -0.34841438,  0.        ],
       [-0.5522322 , -0.13028032,  0.        ],
       [-0.54785281, -0.11963302,  0.        ],
       [-0.4903248 , -0.09383233,  0.        ],
       [ 0.        , -0.72429103, -0.81887054],
       [-0.51091148,  0.        , -0.21522078],
       [-0.52665562, -0.08238027,  0.        ],
       [-0.59176198, -0.20574606,  0.        ],
       [-0.50224776, -0.0269573 ,  0.        ],
       [-0.45408198,  0.        , -0.103

In [22]:
probabilities

array([[4.93140739e-02, 4.93072079e-01, 4.57613847e-01],
       [9.05303763e-01, 7.32095319e-02, 2.14867047e-02],
       [9.28396716e-01, 5.95539340e-02, 1.20493496e-02],
       [8.04762063e-02, 4.97180594e-01, 4.22343200e-01],
       [9.04730893e-03, 3.86165667e-01, 6.04787024e-01],
       [6.92885751e-02, 4.65987275e-01, 4.64724150e-01],
       [6.49218998e-02, 5.55166804e-01, 3.79911297e-01],
       [5.16655844e-03, 3.66000960e-01, 6.28832482e-01],
       [6.32970498e-02, 5.28840286e-01, 4.07862664e-01],
       [4.83097836e-02, 5.09905697e-01, 4.41784519e-01],
       [9.33726680e-01, 5.40506468e-02, 1.22226734e-02],
       [8.56142631e-03, 3.21512099e-01, 6.69926474e-01],
       [8.60530743e-03, 4.30557188e-01, 5.60837505e-01],
       [7.97580189e-03, 4.36195589e-01, 5.55828609e-01],
       [3.77275784e-02, 4.34220047e-01, 5.28052374e-01],
       [8.47720523e-01, 1.23429489e-01, 2.88499880e-02],
       [6.44659386e-02, 5.75377419e-01, 3.60156642e-01],
       [9.68967467e-03, 4.53965