In [0]:
#imports cell 
import numpy as np
import pandas as pd
import os

### Mount the Google Drive to Google Colab

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


### Download the dataset from the link provided by Stanford University upon registration. (This cell is only executed once)

In [0]:
!wget http://download.cs.stanford.edu/deep/MRNet-v1.0.zip
!unzip -qq "MRNet-v1.0.zip" -d "/content/gdrive/My Drive/Dataset/"

  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  4294967296
  (attempting to re-compensate)
file #2547:  bad zipfile offset (local header sig):  1353202
  (attempting to re-compensate)


### Understanding the dataset


*   The MRNet Dataset contains data from 1,370 knee MRI (magnetic resonance imaging) exams.
*   1,104 (80.6%) are abnormal cases
*   319 (23.3%) diagnosed as ACL (anterior cruciate ligament) tears
*   508 (37.1%) as Meniscal tears.
*   For model training, the dataset is further divided into a training set with 1,130 exams, a validation set with 120 exams, and a hidden test set with 120 exams. 

*  The MRNet Dataset was originally used to develop MRNet, a deep learning model that can “rapidly generate accurate clinical pathology classifications of knee MRI exams.”  Trained on the MRNet dataset, MRNet is a CNN-based model that maps a 3-dimensional MRI series to a probability, in order to predict abnormalities in knee MRI exams.

![MRNet Model](https://cdn-images-1.medium.com/max/1000/0*8Q-qUL4GFJqHj-q1.png)



### Basic medical imaging terminology


*   Magnetic resonance imaging (MRI) is a cross-sectional imaging modality, meaning that 2D images are acquired more-or-less sequentially in different imaging planes. 

* The standard planes of imaging included in the MRNet data set are: **axial, coronal and sagittal.**

* MR images are acquired by sequences of **radiofrequency (RF) pulses.**

* Different sequences are designed to produce a signal that can be acquired and processed to reveal different patterns of signal intensity in biologic tissues. 

* **T1-weighting, T2-weighting, and proton density (PD)-weighting** are the 3 core pulse sequence types used in musculoskeletal imaging.

* **Slice**: an individual image is often referred to as a slice

* **Series**: a full stack of images acquired with a given pulse sequence in a given plane

* **Field-of-view (FOV)**: the spatial coverage of the image

![Coronal T1-weighted images without fat saturation](https://cdn-images-1.medium.com/max/800/1*0EDKWMoEClKKGoj-7OXRkA.png)

![alt text](https://cdn-images-1.medium.com/max/800/1*T2Rqq_cOXaWctLj9ThcexQ.png)

![alt text](https://cdn-images-1.medium.com/max/800/1*l1ecYeukNKKzYHBYUl5YAA.png)



###  Understaing the files structure of the dataset


*   The *.csv files contain the labels for the cases.

*   The *.npy files contained in the subdirectories of train and valid are NumPy arrays of dimension (slices, x, y). The x and y dimensions are consistently 256 x 256 across all exams with int values ranging from 0 to 255. This implies that the pixel data has already been normalized by the Stanford ML Group.

* The image stack for each exam may contain different numbers of images and each exam may have a different number of slices for any given plane. This is completely normal for medical imaging data.



### Exploring the dataset further


*   There are many abnormal exams that don’t contain an ACL or meniscal tear.

*   There are far more exams that contain both an ACL tear and a meniscal tear than there are cases with just an ACL tear.  (Due to the mechanisms of injury and forces required to tear the ACL, the menisci are very often injured as well)

* **Per case, there are three sequencies:**
1.   Coronal T1-weighted images without fat saturation
2.   Sagittal T2-weighted images with fat-sat (fluid-sensitive)
3.   Axial PD-weighted images with fat-sat (fluid-sensitive)


* More from here: https://towardsdatascience.com/a-radiologists-exploration-of-the-stanford-ml-group-s-mrnet-data-8e2374e11bfb
 
 And here: https://medium.com/syncedreview/stanford-ml-releases-mrnet-knee-mri-dataset-9f44d7621131
 
 And here: https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002699









# Loading dataset into Pandas Tables

In [0]:
!mkdir /content/gdrive/My\ Drive/Dataset/MRNet_3_Middle_Slices

In [0]:
data_path = Path('/content/gdrive/My Drive/Dataset/MRNet-v1.0')

In [0]:
dilation=3
for d in ('train','valid'):
  for plane in ('axial','coronal','sagittal'):
    base_dirpath='/content/gdrive/My Drive/Dataset'
    data_dirpath=base_dirpath+'/MRNet-v1.0/'+d+'/'+plane
    npy_files = [f for f in os.listdir(data_dirpath) if f[-4:]=='.npy']
    for file in npy_files:
      orig_file_path=data_dirpath+'/'+file
      new_file_path=base_dirpath+'/'+'MRNet_3_Middle_Slices/MRNet-v1.0'+'/'+d+'/'+plane+'/'+file
      orig_stack=np.load(orig_file_path)
      mid_slice_idx=orig_stack.shape[0]//2
      mid_slice = orig_stack[mid_slice_idx]
      lower_slice = orig_stack[mid_slice_idx-dilation]
      upper_slice = orig_stack[mid_slice_idx+dilation]
      new_stack=np.concatenate((lower_slice,mid_slice,upper_slice))
      np.save(new_file_path,new_stack)
      