# Intensity variance issues across datasets: an exploration

Intensity may vary across different datasets, and it may vary differently on different tissues. In the simplest case scenario of intensity vairnace across daatasets some are simply lighter than others. What we could expect is more complicated. Some MRI machines may match each other in intensity on some materials i.e. air; but not others i.e. certain tissues. 
We can automatically set the "air" around a brain MRI to zero, however the question of matching intensities in the tissues remains,
This notebook represents initial approaches to the problem. An augemented group of datasets of can be created, which do not match in intensity distribution, and then remapped.  

### Imports
The data will be processed using the libraries and modules below:

In [None]:
import os       # using operating system dependent functionality (folders)
import glob
import pandas as pd # data analysis and manipulation
import numpy as np    # numerical computing (manipulating and performing operations on arrays of data)
import copy     # Can Copy and Deepcopy files so original file is untouched.
from ipywidgets import IntSlider, Output
import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt
import SimpleITK as sitk
import skimage
#import hashlib
import sys
sys.path.insert(0, '../') # path to functions
from cvasl import file_handler as fh # 
from cvasl import mold #
from cvasl import carve
from cvasl.file_handler import Config

### Load image files
Use the config pathways for the different datasets, then view one image as an example.

In [None]:
config = Config.from_file()
root_mri_directory = config.get_directory('raw_data')

In [None]:
mri_pattern = os.path.join(root_mri_directory, '**/*.gz')
gz_files = glob.glob(mri_pattern, recursive=True)

In [None]:
gz_files

In [None]:
# an example path to an mrid brain .nii image:
t1_fn = gz_files[0]
# read the .nii image containing the volume with SimpleITK:
sitk_t1 = sitk.ReadImage(t1_fn)
# and access the numpy array:
t1 = sitk.GetArrayFromImage(sitk_t1)
# now display it
mold.SliceViewer(t1)

### Create augmented datasets
Here we will copy our base dataset to create two seperate datasets which we will change in terms of intensity values.

In [None]:
# just make two identical array sets
arrays_dataset_1 = []
arrays_dataset_2 = []
names = []
together = []
together_2 = []
for file in gz_files:
    read_file = sitk.ReadImage(file)
    arrayed_file = sitk.GetArrayFromImage(read_file)
    arrays_dataset_1.append(arrayed_file)
    arrays_dataset_2.append(arrayed_file)
    names.append(file)
    together.append((file, arrayed_file))
    together_2.append((file, arrayed_file))

In [None]:
# show example of first in array_dataset
plt.hist(together[0][1].ravel(),425,[-175,252])
plt.title(together[0][0])
plt.show()


OK, but let's see what scale these were all on, before we go further

In [None]:
for image in arrays_dataset_1:
    print(image.min(), image.max(), image.shape[0]*image.shape[1]*image.shape[2])

So our pixel values were set in floating points ranging from -177 to over 4000, and some images are very large. This richness of information is something we probably want to keep. 

### Creating an artificially darker and/or transformed datasets

Create transformer equasions grooup:

In [None]:
def transformer_equasion1(pixvals):
    pixvals = ((pixvals - pixvals.min()) / (pixvals.max()-pixvals.min())) * 100
    return pixvals
def transformer_equasion2(pixvals):
    pixvals[pixvals>125]-=100
    return pixvals

Apply trasnformer equasions to create different groups of images

In [None]:
darker_images = []
original_images = []
new_vals_images = []
for name, image in together_2:
    image= skimage.exposure.rescale_intensity(image, out_range=(0, 256))
    pixvals = image.copy()
    new_vals = transformer_equasion2(pixvals)
    pixvals = transformer_equasion1(pixvals)
    new_vals_images.append((name, new_vals))
    darker_images.append((name, pixvals))
    original_images.append((name, image))

In [None]:
list_of_numbers_strung = []
for i in range(len(darker_images)):
    list_of_numbers_strung.append(str(i))


btn = widgets.Dropdown(
    options=list_of_numbers_strung,
    value='0',
    description='Picked File:',
    disabled=False,
)
display(btn)

In [None]:

number_chosen = int(btn.value)
darker_chosen = darker_images[number_chosen]
original_chosen = original_images[number_chosen]
new_vals_chosen = new_vals_images[number_chosen]
print("The files you chose are based on:\n", original_chosen[0])

In [None]:
plt.hist(darker_chosen[1].ravel(),bins=100,range=  [0,255], alpha = 0.5, color= "orange")
plt.hist(new_vals_chosen[1].ravel(),bins=100,range=[0,255], alpha = 0.5, color = "blue")
plt.hist(original_chosen[1].ravel(),bins=100,range=[0,255], alpha = 0.5, color = "red")
plt.title("Comparing histograms, original in red")
plt.show()


In [None]:
# display orgiginal
mold.SliceViewer(original_chosen[1])

In [None]:
# display first transforemd
mold.SliceViewer(darker_chosen[1])

In [None]:
#  display second transformed
mold.SliceViewer(new_vals_chosen[1])

Now we need to save off our files so we can do a group analysus