Build a CNN to classify galaxy morphology from images. Given 64x64 color images from the Galaxy Zoo survey, your model classifies galaxies into 4 types: smooth round, smooth cigar, edge-on disk, and spiral.
- Design and implement a CNN architecture in Flax
- Implement a training step with cross-entropy loss
- Understand how architecture choices affect accuracy on image data
- Exploit data symmetries to improve generalization
Click the GitHub Classroom link shared by the instructor. This creates your own private copy of this repository under your GitHub account.
Open a terminal and run:
git clone https://github.com/bu-ds595/lab-03-galaxy-cnn-YOUR_USERNAME.gitReplace YOUR_USERNAME with your actual GitHub username.
Then navigate into the folder:
cd lab-03-galaxy-cnn-YOUR_USERNAMEFrom inside the lab folder, run:
pip install -r requirements.txtIf you get permission errors, try pip install --user -r requirements.txt.
Option A: VS Code
- Open VS Code
- File -> Open Folder -> select the lab folder
- Open
lab-03-galaxy-cnn.ipynb - If prompted, install the Python and Jupyter extensions
Option B: JupyterLab
jupyter labThen click on lab-03-galaxy-cnn.ipynb in the file browser.
Option C: Google Colab
Upload the notebook, cnn.py, and galaxy_data.npz to Google Colab. Add a cell at the top:
! pip install jax jaxlib flax optaxComplete the TODO sections in cnn.py:
CNNclass — Design your own CNN. Must accept(batch, 64, 64, 3)and return(batch, 4)logits.train_step— Single gradient descent step with cross-entropy loss.- Save your model — After training, call
save_model(params)to savemodel_params.pkl. The autograder loads this file to evaluate your model.
The notebook includes a self-check cell that replicates the autograder
pytest test_cnn.py -vSave your notebook, cnn.py, and trained model, then commit and push:
git add lab-03-galaxy-cnn.ipynb cnn.py model_params.pkl
git commit -m "Complete lab 3"
git pushIf git push asks for credentials, enter your GitHub username and a personal access token (not your password).
You can push multiple times — only the final version at the deadline will be graded.
- 1 pt:
train_stepcorrectly reduces loss - 1 pt: Saved model achieves >70% test accuracy
- 2 pts: Saved model achieves >80% test accuracy — think about what symmetries the data has
GalaxyMNIST (Walmsley et al.), from Galaxy Zoo DECaLS Campaign A.