# Tutorial dcm-classifier

## Background

This tutorial is meant to show developers the process for altering the dcm-classifier package for developement with new DICOM data.

## Setup

If you have not already, clone the git hub repository by running the following in the terminal.

In [None]:
!git clone git@github.com:BRAINSia/dcm-classifier.git

Next, to install the necessary development packages, run the following.

In [None]:
!pip install -r ../requirements_dev.txt

## Data Curation 

### Field Sheet Creation

The first step given a DICOM session directory, is to create a DICOM field sheet at a volume level with the images' associated file metadata. The `generate_dicom_dataframe` method can be called via basic function call as well as from command line. You are going to generate a small dataframe to display the functions output.

In [None]:
from create_dicom_fields_sheet.py import *

current_file_path: Path = Path(_file__).absolute()

root_directory = current_file_path.parent.parent
test_file = "test_file.xlsx"

model: Path = root_directory / "models" / "rf_classifier.onnx"

# make the inferer object
inferer = ImageTypeClassifierBase(classification_model_filename=model)

# make the DICOM field sheet
generate_dicom_dataframe(session_dirs=root_directory / "test" / "testing_data" / "anonymized_testing_data" / "anonymized_data", output_file="./" + test_file, inferer=inferer)

**Note:** The `create_dicom_fields.py` script can also be automated using the `run_all_dicom_data_sheets.sh` script.

In [None]:
!python3 create_dicom_fields_sheet.py --dicom_path <path_to_dicom_session> --out <output_dicom_field_sheet>

### Field Sheet Combination

If you are dealing with multiple field sheets, the `combine_excel_spreadsheets.py` script will combine the sheets into one big field sheet. 

In [None]:
from pathlib import Path
import pandas as pd
from combine_excel_spreadsheets import get_all_column_names

excel_files: list[Path] = [x for x in (".").glob("*.xls*")]

# create combined dataframe
all_column_names: pd.DataFrame = get_all_column_names(excel_files)

# save the combined frame to an excel file
all_column_names.to_excel("./all_dicom_combined_data.xlsx")

### Feature Creation

Feature creation is a pertinent step which allows developers to choose the features used in the model. The `parse_useful_column_headers` method allows developers to choose features they believe will be useful to enter into the model.

#### Header Dictionary

A header dictionary is a spreadsheet with the fields taken from DICOM images. From these fields, you can select whether to keep them or remove them from the training file. You can do this by either setting the training flag to 1 to use in training, and any other number will not be included in the file. Also, you can choose the action for the header such as dropping the field "drop" and "keep" to keep the field. The header dictionary also allows for control over one hot encoding headers that contain arrays and strings.

In [None]:
EXAMPLE PIC OF HEADER DATA DICT

### Running the Script

In order to utilize the `parse_useful_column_headers` script, the header dictionary is needed for running this script

In [None]:
from parse_useful_column_headers import *

tutorial_training_file = "tutorial_training_file.xlsx"

# create the training file
parse_column_headers(
    header_data_df=clean_header_df,
    input_file=test_file,
    output_path=tutorial_training_file
)



## Training the Model

## Test the Model