# **Resorting Images from "Brain Tumor Classification (MRI) Kaggle dataset**

This kaggle dataset has a file structure that lends itself best to a multiclass classification problem, but I wanted to change the file structure to be one more conducive to a binary classification problem, so I changed the file structure. First, I downloaded the kaggle data, which was all in a folder called "archive" and which I renamed to be "Brain MRI Tumor Images." The file structure of this downloaded data looks like this:

![](images/FileStructure.png)

To make it a Binary classification problem, I decided to combine the three types of tumors into one folder in both the Training and Testing folder, so that now in each of these folders there would be only two subfolders: either an "AllTumorsTrain" or "AllTumorsTest" (respective to each parent folder name) and "no_tumor".

Additionally, because there were some images with the same name between the different subfolders in the Testing folder (not within the same subfolder, but within separate subfolders) I decided to rename these images before combining them into one folder. The code for this is shown below. In order to do this, I put the downloaded file in the same repository level as this notebook.

In [2]:
# Importing libraries that will be needed for resorting the images amongst the different folders
import pandas as pd
import os
import glob

In [2]:
!ls Brain_MRI_Tumor_Images/Testing

[34mglioma_tumor[m[m     [34mmeningioma_tumor[m[m [34mno_tumor[m[m         [34mpituitary_tumor[m[m


In [5]:
# Rename images in glioma_tumor subfolder so that they are all unique names and can be combined into one folder
# code is from this stack overflow article: 
# https://stackoverflow.com/questions/37695215/how-to-rename-jpg-files-with-running-order-using-python
os.chdir('Brain_MRI_Tumor_Images/Testing/glioma_tumor')
for index, oldfile in enumerate(glob.glob("*.jpg"), start=1):
    newfile = 'Glioma'+'{}.jpg'.format(index)
    os.rename (oldfile,newfile)

In [4]:
# Rename images in meningioma_tumor subfolder
os.chdir('Brain_MRI_Tumor_Images/Testing/meningioma_tumor')
for index, oldfile in enumerate(glob.glob("*.jpg"), start=1):
    newfile1 = 'Meningioma'+'{}.jpg'.format(index)
    os.rename (oldfile,newfile1)

In [3]:
# Rename images in no_tumor subfolder
os.chdir('Brain_MRI_Tumor_Images/Testing/no_tumor')
for index, oldfile in enumerate(glob.glob("*.jpg"), start=1):
    newfile2 = 'NoTumor'+'{}.jpg'.format(index)
    os.rename (oldfile,newfile2)

In [3]:
# Rename images in Pituitary subfolder
os.chdir('Brain_MRI_Tumor_Images/Testing/pituitary_tumor')
for index, oldfile in enumerate(glob.glob("*.jpg"), start=1):
    newfile3 = 'Pituitary'+'{}.jpg'.format(index)
    os.rename (oldfile,newfile3)

After executing this code, I then combined all of the different tumor types into one folder, for both training and testing data. Then, I uploaded this data back onto kaggle under the name "Resorted "BrainTumorClassification(MRI)" Data. From there, I was able to make a new kaggle notebook using this data. (This resorted data can be accessed on kaggle, with [this link](https://www.kaggle.com/brookejudithsmyth/resortedbraintumorclassificationmridata).)

I worked on kaggle for the majority of the time for developing neural networks, but I switched to google Colab in order to have more GPU time. I was able to upload the data I had downloaded from kaggle and resorted using instructions from this [great blog](https://towardsdatascience.com/an-informative-colab-guide-to-load-image-datasets-from-github-kaggle-and-local-machine-75cae89ffa1e). For the details on how I did this, please visit the notebook entitled, 'Colab Binary Brain Tumor Classification.ipynb'.