<a href="https://colab.research.google.com/github/CS222-UIUC/course-project-mri-detector/blob/ViT/MRI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [12]:
!pip install flask-ngrok

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flask-ngrok
  Downloading flask_ngrok-0.0.25-py3-none-any.whl (3.1 kB)
Installing collected packages: flask-ngrok
Successfully installed flask-ngrok-0.0.25


In [13]:
from flask import Flask
from flask_ngrok import run_with_ngrok

In [14]:
app = Flask(__name__)
run_with_ngrok(app)

In [15]:
@app.route("/")
def home():
    return "<h1>MRI Detector</h1>"

In [16]:
app.run()

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m


 * Running on http://fda0-35-224-109-166.ngrok.io
 * Traffic stats available on http://127.0.0.1:4040


# Choosing A Model

In [17]:
from tabulate import tabulate

In [18]:
table = [['Model', 'Type', 'Function', 'Assumptions'],
         ['kNN', 'Supervised', 'Finds nearest neighbors of an image and uses class'+'\n'+'labels to predict label of new image', 'Assumes Euclidean distance is an appropriate metric'],
         ['Hidden Markov', 'Unsupervised', 'Models relationship between sequence of extracted'+'\n'+'features and corresponding class label', 'Current state does not depend on the past', ],
         ['CNN', 'Supervised', 'Puts pixels into pixel array, applies filters'+'\n'+'to create feature map, combines with activation,'+'\n'+'pooling, etc, to predict image class', 'Sufficient data is provided; not spatial dependent'],
         ['ViT', 'Supervised', 'Splits images into token patches, maps each'+'\n'+'patch to a feature space with positional encoding,'+'\n'+'before passing into a Transformer', 'Fixed resolution']]

In [19]:
print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))

╒═══════════════╤══════════════╤════════════════════════════════════════════════════╤═════════════════════════════════════════════════════╕
│ Model         │ Type         │ Function                                           │ Assumptions                                         │
╞═══════════════╪══════════════╪════════════════════════════════════════════════════╪═════════════════════════════════════════════════════╡
│ kNN           │ Supervised   │ Finds nearest neighbors of an image and uses class │ Assumes Euclidean distance is an appropriate metric │
│               │              │ labels to predict label of new image               │                                                     │
├───────────────┼──────────────┼────────────────────────────────────────────────────┼─────────────────────────────────────────────────────┤
│ Hidden Markov │ Unsupervised │ Models relationship between sequence of extracted  │ Current state does not depend on the past           │
│               │   

In [20]:
table2 = [['Model', 'Pros', 'Cons'],
         ['kNN', 'Simple and intuitive; has no assumptions; has no \ntraining step; easy to implement; can be used for \nboth regression and classification', 'slow runtime; does not work well with \nincreased number of variables; sensitive to outliers'],
         ['Hidden Markov', 'Can model complex temporal dependencies in data; \nused for both supervised and unsupervised learning; \nhandles missing data; computationally efficient', 'Limited to modeling linear dependencies in data; \nrequires large dataset for training; sensitive to \nchoice of initial parameters; does not work well \nwith increased number of variables'],
         ['CNN', 'Automatically learns hierarchical features from raw \ndata; can handle inputs of different shapes and \nsizes; reduces number of parameters needed for the model', 'Computationally intensive; prone to overfitting; \nrequires large amount of training data'],
         ['ViT', 'Can handle input images of different sizes; can learn \nglobal representations of images; has a variety of \napplications; can be trained with small datasets', 'Computationally expensive to train; requires \nmore training data than CNNs']]

In [21]:
print(tabulate(table2, headers='firstrow', tablefmt='fancy_grid'))

╒═══════════════╤══════════════════════════════════════════════════════════╤══════════════════════════════════════════════════════╕
│ Model         │ Pros                                                     │ Cons                                                 │
╞═══════════════╪══════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════╡
│ kNN           │ Simple and intuitive; has no assumptions; has no         │ slow runtime; does not work well with                │
│               │ training step; easy to implement; can be used for        │ increased number of variables; sensitive to outliers │
│               │ both regression and classification                       │                                                      │
├───────────────┼──────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ Hidden Markov │ Can model complex temporal dependencies in data;         │

**Chosen**: Vision Transformers (ViT)

# About Vision Transformers
First introduced in 2020 by Google Brain, ViT is a powerful and efficient model that translates the popular Transformer models in NLP to computer vision.

## Model Architecture

![image.png](https://viso.ai/wp-content/uploads/2021/09/vision-transformer-vit.png)

As we can see, an image is broken into patches of fixed size like 16x16 or 32x32 (which is why "An Image is worth 16x16 words").

The patches are flattened and sent to the encoder through a linear projection. To keep track of where each patch is, a positional embedding vector is also sent into the encoder as an input.

![image1.png](https://github.com/lucidrains/vit-pytorch/raw/main/images/vit.gif)

The first token of the transformer is special.

The encoder then combines the patches with the positional embedding vector. Its output is passed directly into an MLP to obtain a classification output.

In [22]:
table3 = [['Library', 'Pros', 'Cons'],
         ['TensorFlow', 'well documented with a large community; \ngood framework for neural networks; \nhas a dedicated medical imaging library called TensorFlow Medical Imaging ', 'is more difficult to use/learn; \nis slower when training larger data models; \nrequires more code to do simple tasks;'],
         ['PyTorch', 'beginner friendly library (easy to use); \nhas dynamic computational graph which allows for more flexibility in building models', 'fewer resources and features than TensorFlow; \nnot as much connection into apps (used more for research)'],
         ['JAX', 'built for high speed numerical computing; \nhas built in engine for building deep learnig models; \nsupports python and numpy;', 'fewer resources than Pytorch and TensorFlow; \n']]

In [23]:
print(tabulate(table3, headers='firstrow', tablefmt='fancy_grid'))

╒════════════╤══════════════════════════════════════════════════════════════════════════════════════╤═══════════════════════════════════════════════════════════╕
│ Library    │ Pros                                                                                 │ Cons                                                      │
╞════════════╪══════════════════════════════════════════════════════════════════════════════════════╪═══════════════════════════════════════════════════════════╡
│ TensorFlow │ well documented with a large community;                                              │ is more difficult to use/learn;                           │
│            │ good framework for neural networks;                                                  │ is slower when training larger data models;               │
│            │ has a dedicated medical imaging library called TensorFlow Medical Imaging            │ requires more code to do simple tasks;                    │
├────────────┼──────────────

In [24]:
table4 = [['Pre-train vs. Self-train', 'Pros', 'Cons'],
         ['Pre-train', 'Simple and intuitive; has no assumptions; has no \ntraining step; easy to implement; can be used for \nboth regression and classification', 'slow runtime; does not work well with \nincreased number of variables; sensitive to outliers'],
         ['Self-train', 'Can model complex temporal dependencies in data; \nused for both supervised and unsupervised learning; \nhandles missing data; computationally efficient', 'Limited to modeling linear dependencies in data; \nrequires large dataset for training; sensitive to \nchoice of initial parameters; does not work well \nwith increased number of variables']]

In [25]:
print(tabulate(table4, headers='firstrow', tablefmt='fancy_grid'))

╒════════════════════════════╤══════════════════════════════════════════════════════╤══════════════════════════════════════════════════════╕
│ Pre-train vs. Self-train   │ Pros                                                 │ Cons                                                 │
╞════════════════════════════╪══════════════════════════════════════════════════════╪══════════════════════════════════════════════════════╡
│ Pre-train                  │ Simple and intuitive; has no assumptions; has no     │ slow runtime; does not work well with                │
│                            │ training step; easy to implement; can be used for    │ increased number of variables; sensitive to outliers │
│                            │ both regression and classification                   │                                                      │
├────────────────────────────┼──────────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ Self-train 