# Test analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Bioinformatics-Research-Network/Scientific-Image-Search/blob/main/analysis/test/test_analysis.ipynb)


This analysis is proof that the dev env is working as expected. It should be compiled by the CI workflow with the proper packages and then perform a basic analysis of MNIST.

The `torch_env.yml` file defines the dependencies. The code block below will install it into the current environment. This is intended to cover the usecase in which only the current environment is exposed to the jupyter notebook. For local development, consider simply creating and activating a separate `torch_env` environment like so:

```shell
conda env create -n torch_env -f ../torch_env.yml
conda activate torch_env
```

In [1]:
# Skip this cell if running on colab...
! conda env update --file ../torch_env.yml

Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 4.12.0
  latest version: 4.13.0

Please update conda by running

    $ conda update -n base -c defaults conda


#
# To activate this environment, use
#
#     $ conda activate torch_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate



In [2]:
from sklearn.datasets import load_digits

In [3]:
# Get the mnist data from sklearn load_digits
digits = load_digits()

In [4]:
# Train a logistic regression model on the data
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0)
logreg = LogisticRegression(max_iter=5000).fit(X_train, y_train)

In [5]:
# Evaluate the model on the test data
y_pred = logreg.predict(X_test)
print("Accuracy:", logreg.score(X_test, y_test))

Accuracy: 0.9583333333333334


## Test the pytorch nn implementation

In [6]:
# Retry the analysis using a deep neural network from pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

# Create a dataset class that inherits from torch.utils.data.Dataset
class MNISTDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
    def __len__(self):
        return len(self.X)
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Create a model class that inherits from torch.nn.Module
class LogisticRegression(nn.Module):
    def __init__(self):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(64, 10)
    def forward(self, x):
        return F.log_softmax(self.linear(x), dim=1)
    def predict(self, x):
        return self.forward(x).argmax(dim=1)

In [7]:
# Create a dataset object
dataset = MNISTDataset(X_train, y_train)

# Create a dataloader object
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# Create a model object
model = LogisticRegression()

# Create an optimizer object
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Create a loss function object
criterion = nn.NLLLoss()

In [8]:
# Train the model
for epoch in range(10):
    for i, (data, target) in enumerate(dataloader):
        optimizer.zero_grad()
        output = model(data.float())
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if i % 100 == 0:
            print("Epoch:", epoch, "Iteration:", i, "Loss:", loss.item())

Epoch: 0 Iteration: 0 Loss: 6.30604887008667
Epoch: 1 Iteration: 0 Loss: 0.508837103843689
Epoch: 2 Iteration: 0 Loss: 0.45972394943237305
Epoch: 3 Iteration: 0 Loss: 0.24492105841636658
Epoch: 4 Iteration: 0 Loss: 0.18297936022281647
Epoch: 5 Iteration: 0 Loss: 0.20093871653079987
Epoch: 6 Iteration: 0 Loss: 0.21299907565116882
Epoch: 7 Iteration: 0 Loss: 0.11195831745862961
Epoch: 8 Iteration: 0 Loss: 0.08718807995319366
Epoch: 9 Iteration: 0 Loss: 0.10586725920438766


In [9]:
# Evaluate the model on the test data
# There is no .predict() method in the model class, so we need to define it
model.predict = lambda x: model(x.float()).argmax(dim=1)
# Need a tensor for the test data
test_data = torch.tensor(X_test)
# Predict the labels
y_pred = model.predict(test_data)
# Calculate the accuracy given that y_pred is a tensor and y_test is a list
print("Accuracy:", (y_pred.tolist() == y_test).mean())

Accuracy: 0.9416666666666667
