# An Introduction to Playing with XV Data
## Made By Ronan Smith

This is a very simple introduction to playing with XV Data, there are many many many more and better ways of looking at the data - all this notebook aims to do is reproduce the same analysis available in the PDF reports from 4D medical.

Once we understand that, going beyond is easier.

### Step 1: Load some data

In [9]:
# load some libraries

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
import scipy

import matplotlib.cm as cmx
import matplotlib

In [10]:
# path to location
baseline_folder = r"../../University of Adelaide/Mouse MPS Study/csv/"

files = os.listdir(baseline_folder)

# load a test dataset

print(baseline_folder+files[0])

data = pd.read_csv(baseline_folder + files[0], encoding="utf-8")


print(data.shape)

# the data is in the format [value, X, Y, Z]
data = data.to_numpy()

../../University of Adelaide/Mouse MPS Study/csv/448.specificVentilation.csv
(3577, 4)


As you should be able to see, the CSV files provided contain specific ventillation values for points in the lung, and X Y Z coordinates showing where that point is. 

### Step 2: Plotting and Looking

We can plot these on a 3D scatter plot, and use a colormap to give each point a colour. Taking slices through this volume would produce the images seen in the PDF reports from 4D medical. 
Note that the reports 4D medical produce use interpolation/smoothing to make the images appear less pixellated, and also use CT data to add in bones etc for context. That data is available in DICOM format for most experiments - ask Ronan if you would like access as the files are large.

In [11]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cmx
import matplotlib.colors as colors
import numpy as np

%matplotlib notebook

# Assuming 'data' is your data array
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')

# Color mapping
cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=np.min(data[:, 0]), vmax=np.max(data[:, 0]))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=cm)

# Scatter plot
scatter = ax.scatter(data[:, 1], -data[:, 2], data[:, 3], c=scalarMap.to_rgba(data[:, 0]))

# Add colorbar
cbar = fig.colorbar(scatter, ax=ax)
cbar.set_label('Colorbar Label')

plt.show()

<IPython.core.display.Javascript object>

In [21]:
import plotly.express as px
import numpy as np

# Assuming 'data' is your data array
fig = px.scatter_3d(data, x=data[:, 1], y=-data[:, 2], z=data[:, 3], color=data[:, 0],
                    color_continuous_scale='jet')

fig.update_traces(marker=dict(size=4))  # Adjust the 'size' parameter to make points smaller

# Hide XYZ planes
fig.update_layout(scene=dict(aspectmode='data'))
fig.update_layout(scene_xaxis=dict(showgrid=False, showspikes=False),
                  scene_yaxis=dict(showgrid=False, showspikes=False),
                  scene_zaxis=dict(showgrid=False, showspikes=False))

fig.show()


This image is interesting, but contains a lot of data. This makes quickly interpreting it challenging

### Step 3: Histograms

One way to reduce the data is to plot a histogram of the specific ventiallation values. This can be looked at, and the shape can tell us a lot about what is going on. Normal (healthy) lungs generally follow a normal(ish) distributions, with deviations from this indicating something worth investigating is occurring. 


In [13]:
plt.hist(data[:,0], bins = 50)

<IPython.core.display.Javascript object>

(array([  2.,   1.,   2.,   0.,   1.,   1.,   6.,   1.,   0.,  10.,  24.,
         38.,  55.,  84., 156., 214., 439., 491., 507., 572., 433., 289.,
        108.,  39.,  20.,  14.,  14.,  10.,   8.,   6.,   5.,   2.,   3.,
          4.,   4.,   5.,   1.,   1.,   0.,   0.,   2.,   2.,   1.,   0.,
          0.,   0.,   0.,   0.,   0.,   2.]),
 array([-0.420772  , -0.37942576, -0.33807952, -0.29673328, -0.25538704,
        -0.2140408 , -0.17269456, -0.13134832, -0.09000208, -0.04865584,
        -0.0073096 ,  0.03403664,  0.07538288,  0.11672912,  0.15807536,
         0.1994216 ,  0.24076784,  0.28211408,  0.32346032,  0.36480656,
         0.4061528 ,  0.44749904,  0.48884528,  0.53019152,  0.57153776,
         0.612884  ,  0.65423024,  0.69557648,  0.73692272,  0.77826896,
         0.8196152 ,  0.86096144,  0.90230768,  0.94365392,  0.98500016,
         1.0263464 ,  1.06769264,  1.10903888,  1.15038512,  1.19173136,
         1.2330776 ,  1.27442384,  1.31577008,  1.35711632,  1.39846256,
 

These histograms can tell us a lot about the data, and we can fit curves for quantatitive analysis, but we can also get a few simple paramaters from the specific ventiallation data for easy comparison.

### Step 4: Key Values

These values can all be found at the top of the 4D medical PDF reports, here is how they are defined. Note that what 'healthy' or 'normal' values are is not fully known and may change with age and species. 

In [14]:
def mean_specific_ventillation(data):
    '''
    Self Explanatory
    '''
    return np.mean(data)

def volume_defect_percentage(data):
    '''
    The percentage of voxels below 60% of the mean
    '''
    mean = mean_specific_ventillation(data)
    sixty = mean * 0.6
    
    defect = np.count_nonzero(data < sixty)
    total = len(data)
    
    return (defect/total) * 100

def ventillation_heterogeneity(data):
    '''
    Defined as interquartile range divided by the mean
    '''
    iqr = scipy.stats.iqr(data, nan_policy='omit')
    mean = mean_specific_ventillation(data)
    
    return iqr/mean

print('Mean Specific Ventillation is: {}'.format(mean_specific_ventillation(data[:,0])))
print('Volume Defect Percentage is: {}'.format(volume_defect_percentage(data[:,0])))
print('Ventillation Heterogeneity is: {}'.format(ventillation_heterogeneity(data[:,0])))
    


Mean Specific Ventillation is: 0.3429625278725189
Volume Defect Percentage is: 11.462119094213028
Ventillation Heterogeneity is: 0.4122666137233401


### Step 5: Next Steps

All of these things are useful for looking at data, but we really want to go further. 

Through inspection of the images and hypotheses about how the lungs are expected to change in the study being done, a custom analysis pipeline based around the principles here can be built. For example, the lung volume can be split into sub volumes if something is expected or seen to have an effect in only part of a lung. Histograms and values can then be found for part of the lung. 

The richest data is within the volumetric data, reducing it to histograms or numbers for easy interpretation throws away a lot of the interest - exploring new ways of analysing the volumetric data is key. 