In [2]:
import pandas as pd
import numpy as np
from fastai.data.all import *
from fastai.vision.all import *
import os

Magnetic resonance imaging (MRI) is a medical imaging technique that uses a strong magnetic field and radio waves to produce detailed images of the body's internal structures. Because it can produce precise images of the brain's soft tissue, it is frequently employed in the diagnosis of brain cancers.

Machine learning techniques can analyse vast volumes of imaging data and spot patterns that may be challenging for human experts to see. To improve the precision and effectiveness of methods for diagnosing brain tumours, researchers are now finding ways to train machine learning models on vast datasets of MRI scans.

This project will utilise the highly efficient *fastai** deep learning package (https://github.com/fastai/fastai) which is built on top of the *PyTorch* library, as I am currently studying deep learning using the course and would like to supplement my learning through the building of a project.

In [13]:
# Load dataset
data_dir1 = 'Datasets/brain_tumor_dataset'
data_dir2 = 'Datasets/brain_tumor_dataset2'

# Check the labels for each MRI photo, between each dataset
print(os.listdir(f'{data_dir1}'))
print(os.listdir(f'{data_dir2}'))

# Create dictionary to map label to final label
labels = {'yes': 'Tumour', 'Brain Tumor': 'Tumour', 'no': 'No Tumour', 'Healthy': 'No Tumour'}

# Merge the two datasets and change labels between datasets using parent_label from fastai
df = pd.DataFrame({'image':(*get_image_files(data_dir1), *get_image_files(data_dir2))})
df['label'] = df.apply(lambda x: labels[parent_label(x.image)], 'columns')
df['size'] = df.apply(lambda x: )

df.head()

['no', 'yes']
['Brain Tumor', 'Healthy']


Unnamed: 0,image,label
0,Datasets\brain_tumor_dataset\no\1 no.jpeg,No Tumour
1,Datasets\brain_tumor_dataset\no\10 no.jpg,No Tumour
2,Datasets\brain_tumor_dataset\no\11 no.jpg,No Tumour
3,Datasets\brain_tumor_dataset\no\12 no.jpg,No Tumour
4,Datasets\brain_tumor_dataset\no\13 no.jpg,No Tumour


We have two separate datasets containing MRI images that have been classified as having a brain tumour (yes/Brain Tumour folder) or not (no/Healthy folder), which we will merge together. This merging will form the full dataset used to train and validate the deep learning model.

To utilise *fastai* for image classification, we need to build a *DataBlock*, which is an object containing the inputs and labels that we feed into the deep learning algorithm. A DataBlock requires us to specify how large we want the training and validation sets to be.

In [None]:
data = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items = get_image_files)