# PP5 - ML Brain Tumor Detector

## Notebook 1 - Data Collection

### Objectives

* Fetch data from kaggle and prepare data for further processes.


### Inputs

* 


### Outputs

| **output**      |          |       |
|-----------------|----------|-------|
| **train/**      | no_tumor | tumor |
| **test/**       | no_tumor | tumor |
| **validation/** | no_tumor | tumor |


### Additional Comments

* Dataset: [Kaggle](https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri?select=Training)
* License: [MIT](https://www.mit.edu/~amini/LICENSE.md)

---

### Import packages

In [1]:
%pip install -r ../requirements.txt

Note: you may need to restart the kernel to use updated packages.


## Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\tobis\\Documents\\GitHub\\ml-brain-tumor-detection\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [4]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\tobis\\Documents\\GitHub\\ml-brain-tumor-detection'

## Get data from Kaggle

**Install Kaggle**

In [5]:
%pip install kaggle==1.6.8

Note: you may need to restart the kernel to use updated packages.


**Change the Kaggle configuration directory to the current working directory and set permissions for the Kaggle authentication JSON**

In [6]:
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
os.chmod("kaggle.json", 0o600)

**Set the kaggle dataset and download it**

In [13]:
KaggleDatasetPath = "sartajbhuvaji/brain-tumor-classification-mri"
DestinationFolder = "input/"   
! kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Downloading brain-tumor-classification-mri.zip to input




  0%|          | 0.00/86.8M [00:00<?, ?B/s]
  1%|          | 1.00M/86.8M [00:00<00:32, 2.74MB/s]
  5%|▍         | 4.00M/86.8M [00:00<00:08, 9.96MB/s]
  9%|▉         | 8.00M/86.8M [00:00<00:04, 18.2MB/s]
 14%|█▍        | 12.0M/86.8M [00:00<00:03, 24.4MB/s]
 18%|█▊        | 16.0M/86.8M [00:00<00:02, 28.0MB/s]
 23%|██▎       | 20.0M/86.8M [00:01<00:02, 26.1MB/s]
 27%|██▋       | 23.0M/86.8M [00:01<00:02, 26.0MB/s]
 30%|██▉       | 26.0M/86.8M [00:01<00:02, 25.4MB/s]
 33%|███▎      | 29.0M/86.8M [00:01<00:02, 24.8MB/s]
 37%|███▋      | 32.0M/86.8M [00:01<00:02, 23.9MB/s]
 40%|████      | 35.0M/86.8M [00:01<00:02, 23.8MB/s]
 44%|████▍     | 38.0M/86.8M [00:01<00:02, 24.0MB/s]
 47%|████▋     | 41.0M/86.8M [00:01<00:02, 23.9MB/s]
 51%|█████     | 44.0M/86.8M [00:02<00:01, 24.1MB/s]
 54%|█████▍    | 47.0M/86.8M [00:02<00:01, 24.3MB/s]
 58%|█████▊    | 50.0M/86.8M [00:02<00:01, 24.4MB/s]
 61%|██████    | 53.0M/86.8M [00:02<00:01, 23.5MB/s]
 65%|██████▍   | 56.0M/86.8M [00:02<00:01, 24.3MB/s]
 

**Unzip the file and delete the zip folder.**

In [14]:
import zipfile
with zipfile.ZipFile(DestinationFolder + '/brain-tumor-classification-mri.zip', 'r') as zip_ref:
    zip_ref.extractall(DestinationFolder)

os.remove(DestinationFolder + '/brain-tumor-classification-mri.zip')

---

## Prepare the Data

**Merge pre-split folders**

* Move all files from one folder to the other and merge contents

In [16]:
import shutil

def merge_and_rename(source_folder, destination_folder, new_folder_name):
    for root, dirs, files in os.walk(source_folder):
        for folder in dirs:
            source_subfolder = os.path.join(root, folder)
            destination_subfolder = os.path.join(destination_folder, folder)
            
            if not os.path.exists(destination_subfolder):
                os.makedirs(destination_subfolder)
            
            for file in os.listdir(source_subfolder):
                source_file = os.path.join(source_subfolder, file)
                destination_file = os.path.join(destination_subfolder, file)
                
                if os.path.exists(destination_file):
                    new_filename = os.path.splitext(file)[0] + "_renamed" + os.path.splitext(file)[1]
                    destination_file = os.path.join(destination_subfolder, new_filename)
                shutil.move(source_file, destination_file)
                
            shutil.rmtree(source_subfolder)
    
    shutil.rmtree(source_folder)
    
    os.rename(destination_folder, os.path.join(os.path.dirname(destination_folder), new_folder_name))

testing_folder = 'input/Testing'
training_folder = 'input/Training'
new_folder_name = 'brain-mri-scans'

merge_and_rename(testing_folder, training_folder, new_folder_name)

In [20]:
def merge_tumor_folders(input_folder):
    tumor_folders = ["glioma_tumor", "meningioma_tumor", "pituitary_tumor"]
    tumor_destination = os.path.join(input_folder, "tumor")
    
    # Create the 'tumor' folder if it doesn't exist
    os.makedirs(tumor_destination, exist_ok=True)
    
    # Move the tumor folders into the 'tumor' folder
    for tumor_folder in tumor_folders:
        source_folder = os.path.join(input_folder, tumor_folder)
        if os.path.exists(source_folder):
            # Move each file from the tumor folder to the tumor_destination folder
            for file in os.listdir(source_folder):
                shutil.move(os.path.join(source_folder, file), tumor_destination)
            # Remove the now empty tumor folder
            os.rmdir(source_folder)

# Define the path to the 'input/brain-mri_scans' directory
input_folder = 'input/brain-mri-scans'

# Merge the tumor folders and remove empty folders
merge_tumor_folders(input_folder)

Error: Destination path 'input/brain-mri-scans\tumor\image(1).jpg' already exists

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In case you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create here your folder
  # os.makedirs(name='')
except Exception as e:
  print(e)
