<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Assigment_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Assignment 1: Convolutional Neural Networks (CNN) for Computer Vision**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)



# **The Purpose of Assignments**

In this course, **_Assignments_** are designed to help me (and you) assess your ability to transfer knowledge gained in completing class coding exercises to solving more realistic problems.

Assignments play a pivotal role in reinforcing your learning, as they require you to apply theoretical concepts to practical scenarios. This helps solidify your understanding and enhances your problem-solving skills. By tackling these assignments independently, you develop critical thinking and the ability to synthesize information from various sources. Moreover, assignments encourage you to explore topics more deeply, fostering intellectual curiosity and promoting a deeper engagement with the subject matter. Ultimately, these assignments are not just a measure of your learning, but a means to equip you with the skills needed for real-world applications and future challenges.

### Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.
  Running the following code will map your GDrive to ```/content/drive```.

In [None]:
# YOU MUST RUN THIS CELL FIRST

try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("Note: not using Google CoLab")
    print("THIS ASSIGNMENT WILL **NOT** BE GRADED")
    COLAB = False

Your GMAIL address **must** appear in the output in order for your work to be graded.

### Define functions

The cell below creates several functions that are needed for this assignment. If you don't run this cell, you will receive errors later when you try to run some cells.

In [None]:
# Create functions for this lesson

import psutil
import os

def check_current_ram():
  ram = psutil.virtual_memory()
  print(f"Available RAM: {ram.available / (1024 ** 3):.2f} GB")

def list_files():
   files = os.listdir('.')
   print(f"Current files: {files}")

def list_extract():
  files = os.listdir(EXTRACT_TARGET)
  print(f"Current files in EXTRACT_TARGET: {files}")

# Simple function to print out elasped time
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# List files in current directory
list_files()

### Record your specific GPU/TPU accelerator

You will need to record what hardware you will be using in this Assignment by entering the appropiate value in the `my_GPU_dict` below.

In [None]:
# Record your current Runtime GPU/TPU


# List of Current GPU/TPUs
my_GPU_dict = {
    1: 'CPU',
    2: 'A100 GPU',
    3: 'L4 GPU',
    4: 'T4 GPU',
    5: 'TPU v2-8'
}

# Enter the correct key number in the square brackets [ ]
my_GPU = my_GPU_dict[ ]

# Print selection
print(f"My current runtime GPU/TPU is: {my_GPU}")

If your code is correct, you should see something like the following:
~~~text
My current runtime GPU/TPU is: L4 GPU
~~~

In some situations, it will be helpful to the Instructor to know your hardware environment when trying to help you resolve coding problems.

# **Assigment 1: Keras Neural Networks for Medical MNIST**

**Assignment_01** is pecifically designed to assess your ability to write the Python/Tensorflow/Keras code necessary to classify image data in a MedMNIST dataset. This assignment is designed so that you can re-use the code in **Class_06_1**
. The same series of steps used in both the Example and the **Exercise** in Class_06_1, are provided in this assignment.

**NOTE: Do _not_ turn in `Class_06_1` as your assignment.

For the most part, the code in Class_06_1 should be reusable here. BUT, since the MedMNSIT datafiles vary in scale (see below), it will be up to you to troubleshoot errors when they come up. You Instructor is more than happy to help you when you encounter an error that you can figure out. The best way to get help is to COME TO CLASS!



## **MedMNIST Datasets**

**MedMNIST** offers a collection of 12 pre-processed 2D datasets designed for various biomedical image classification tasks1
. These datasets cover primary data modalities such as **X-Ray**, **OCT (Optical Coherence Tomography)**, **Ultrasound**, **CT (Computed Tomography)**, and **Electron Microscope** images.

The datasets are diverse, ranging from binary/multi-class classification to ordinal regression and multi-label tasks. They also vary in scale, with data sizes ranging from 100 to 100,000 images.

Here's a list of the 12 2D datasets offered by MedMNIST, along with their names and the classes they contain:

| Dataset Name       | Classes                         |Datafile Name
|--------------------|---------------------------------|--------------|
| DermaMNIST         | 7 (skin conditions)   |dermamnist_64.npz
| OCTMNIST           | 10 (retinal layers)   |octmnist.npz
| PneumoniaMNIST     | 2 (normal, pneumonia) |pneumoniamnist_64.npz
| RetinaMNIST        | 5 (retinal diseases)  |retinamnist_128.npz
| MammographyMNIST   | 2 (benign, malignant) |breastmnist_224.npz
| PathMNIST          | 9 (histopathological conditions) |pathmnist_128.npz
| BloodMNIST         | 8 (blood cell types)  |bloodmnist_128.npz
| TissueMNIST        | 7 (tissue types)   |tissuemnist.npz
| OrganMNIST - A     | 9 (organs -axial view)   |organamnist.npz
| OrganMNIST - C     | 9 (organs - coronal view)   |organcmnist.npz
| OrganMNIST - S     | 9 (organs - saggital view) |organsmnist.npz
| CellMNIST          | 5 (cell types)                      |
| UltrasoundMNIST    | 3 (ultrasound views)                |

# **Your Assignmment 1 Dataset**

The last digit in your myUTSA ID (e.g. 'abc123`) will determine which MedMNIST dataset you are to analyze for this assignment.

**---WARNING------WARNING------WARNING------WARNING------WARNING------WARNING---**

You are **not** free to choose any dataset for this assignment. If analyze the wrong dataset, **_30 points_**  will be immediately deducted from your score!

If you are uncertain which dataset you should be working on, contact your Instructor for help.

Remember, your score in this assignment will have a large impact on your course grade so please be careful.


| Last Digit in my UTSA ID | MedMNIST Dataset to Analyze
---------------------------|--------------------------------
0                          | breastmnist_224.npz
1                          | chestmnist.npz
2                          | octmnist.npz
3                          | organamnist.npz
4                          | organcmnist.npz
5                          | organsmnist.npz
6                          | pathmnist_128.npz
7                          | pneumoniamnist_64.npz
8                          | retinamnist_128.npz
9                          | tissuemnist.npz



### **Step - 1: Setup Evironmental Variables**

In the cell below, create environmental variables so you can download your specific MedMNIST dataset that has been assigned to you in the cell above.

If you don't use the code provided in Class_06_1 as a template, you will make an unacceptable number of coding errors. Based on past experience, students who tried to us AI to help with their coding errors turned easily corrected errors into HORRIBLE coding errors that couldn't be fixed. In short, use the code provided for you in `Class_06_1` this assignment.

If you use the code from Step 1 in Class_06_1, you will only need to make changes to the `DOWNLOAD_SOURCE` and the `EXTRACT_TARGET`.

For your `EXTRACT_TARGET` you should use the file name of the MedMNIST assigned to you in the cell above, **exactly** as it is written.

For example, if the last digit of your myUTSA ID was `6`, your `DOWNLOAD_SOURCE` would be:
~~~text
DOWNLOAD_SOURCE = URL+"/pathmnist_224.npz"
~~~
and your `EXTRACT_TARGET` would be:
~~~text
EXTRACT_TARGET = os.path.join(PATH,"pathmnist_224")
~~~

Be careful when you cut-and-paste that you don't accidently include any spaces.


In [None]:
# Step - 1: Setup Environmental Variables



### **Step - 2: Download and Extract Data**

If your code in Step 1 is correct, you should be ready to download and extract your dataset.

In the cell below, write the code to download your datafile, make the appropiate file folders and then extract (unzip) your datafile into the file folders you created.

**Please Note:** There is considerable differences in the size of these MedMNIST datasets. The larger ones (e.g. `pathmnist_128.npz`) are more than 3GB in size and will require several minutes to upload to Colab and then to extract it. As long as the "little wheel" at the top left of the code cell keeps spinning, your code is working correctly, so be patient.

In [None]:
# Step 2: Download and Extract Data



### **Step - 3: Load and Shuffle Images and Labels into Numpy arrays**

In the cell below, write the Python code to read (load) and shuffle the image and label data into Numpy arrays. In total, you should create the following 6 numpy arrays: `train_X`, `train_Y`, `test_X`, `test_Y`, `val_X` and `val_Y`. The `X` arrays will have the images, the `Y` arrays will have their corresponding labels.

Make sure to print out the `shape` of each numpy array.   

In [None]:
# Step - 3: Load and Shuffle Images and Labels into Numpy arrays



Take a good look at your output. Make a note of the `shape` value for the array called `train_X`.

The `shape` should have 4 numbers. The first number is the number of images in your particular dataset. The next 2 numbers are the dimensions (in pixels) of the image and the last number specifies the number of color channels. For example, the number `3` means a color image (RGB).

**You will need to know these 4 numbers later in your analysis.** Please note that these values vary significantly between the different datasets.

### **Step 4 - Add Color Channel and Resize Images**

In [None]:
# Step 4 - Add Color Channel and Resize Images



### **Step 5 - Check Available Memory**

In [None]:
# Step 5: Check available memory



### **Step - 6: Augment Training Image Set**


In [None]:
# Step - 6: Augment Training Image Set


Available memory (43.33 GB) should be enough to augment train_X
Augmenting the number of images in train_X... done
Original number of train_X images: 44998
Augmented number of train_X images: 134994


### **Step - 7: One-Hot Encode Labels**


In [None]:
# Step 7: One-Hot Encode Labels



### **Step - 8: Create and Compile CNN neural network model**



In [None]:
# Step - 8: Create and Compile CNN neural network mode


### **Step - 9: Train the Neural Network**




In [None]:
# Step - 9: Train the Neural Networ



## **Evaluating Model's Training**

Now that we have trained our model, let's look at how it changed during its training.

### **Step 10: Plot `accuracy` and `val_accuracy`**



In [None]:
# Step 10: Plot `accuracy` and `val_accuracy`



### **Step 11: Compute Accuracy Score with Validation Data**


In [None]:
# Step 11: Compute Accuracy Score with Validation DataExample: Plot 1 image with label



### **Step 12: Plot Image with Label**

Make sure you use a meaningful label, **not** the label used in Class_06_1, Step 12. For example, if your datafile was `pathmnist_128`, you label might say `Organ type` or `Pathology type`. Do not use `Blood Cell type` unless you analyzing the bloodcell datafile, `bloodmnist_224`.

In [None]:
# Step 12: Plot Image with Label

### **Step 13: Plot 4 Frames with Label**

Again, make sure you use a meaningful label, **not** the label used in Class_06_1, Step 12. For example, if your datafile was `pathmnist_128`, you label might say `Organ type` or `Pathology type`. Do not use `Blood Cell type` unless you analyzing the bloodcell datafile, `bloodmnist_224`.

## **Assignment Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Copy of Assignment_01.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.