# Proccessing & plotting data pipeline example

This notebook provides the functions and scripts for parsing simulation files (`.json`) into pickled numpy arrays (`.pkl`) and subsequentially selecting, plotting, and analyzing various subsets or all of the data. This notebook shows the logical order in which data is processed to obtain the results in the paper.

---

1. Realistic co-culture `dish`, `tissue`, and vasculature graph imaging
2. Heuristic data plotting
3. Simulated data processing & plotting
    - 3.1 Parse sample co-culture `dish` data
    - 3.2 Analyze parsed sample co-culture `dish` data
    - 3.3 Subset sample co-culture `dish` data
    - 3.4 Plot full subsetted co-culture `dish` and `tissue` data
    - 3.5 Multi-feature & outcome analysis of full co-culture `dish` and `tissue` data
4. Experimental literature data plotting
6. Ranked data plotting

---

## 1. Realistic co-culture `dish`, `tissue`, and vasculature graph imaging

Images of the simulations in each context are generated to highlight the differences in cancer and healthy cell spatial distributions over time.

The main function `image` takes in a `.json` file output from the simulation and produces an image of either the populations, cell states, volume density, or graphs. For this analysis, only the population and graph figures were used. 

The population images were generated for the following files:
    
    VITRO_DISH_TREAT_CH_0_NA_NA_1000_100_00.json
    VIVO_TISSUE_TREAT_CH_0_NA_NA_1000_100_00.json

The graph images were generated for the following file:

    VIVO_TISSUE_TREAT_CH_0_NA_NA_1000_100_00.GRAPH.json
    
These represent the untreated realistic co-culture `dish` and `tissue` simulations.

#### Workspace variables

+ `DATA_PATH` variables are the path to subsetted data files (`.json` files generated from simulation output)
+ `...TIMES` variables indicate which time points to make images at
+ `SIZE` variable indicates the size to make the image
+ `POPS_TO_IGNORE` indicate which cell population nubmers to ignore (where CAR T-cell populations are listed, but are not present in these untreated simulations)
+ `BGCOL` indicates what color to make the background of the image
+ `RADIUS` indicates the simulation radius out which to draw to (cells stop at radius 36, but the grpah exists within the margins and out to the full radius of 40)

In [None]:
# Untreated co-culture dish images
DATA_PATH_IMAGE_UNTREATED_COCULTURE_CELLS = 'examples/files/full/coculture/jsons/'
DATA_PATH_IMAGE_UNTREATED_COCULTURE_SVG = 'examples/figures/coculture/images/'
DISH_TIMES = '0,4,7'

# Untreated tissue images
DATA_PATH_IMAGE_UNTREATED_TISSUE_CELLS = 'examples/files/full/tissue/jsons/cells/'
DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG = 'examples/figures/tissue/images/'
TISSUE_TIMES = '1,16,31'

# Population image specifications
SIZE = '5'
POPS_TO_IGNORE = '2,3'
BGCOL = '#FFFFFF'
RADIUS = '40'

# Tissue graph image specifications
DATA_PATH_IMAGE_UNTREATED_TISSUE_GRAPH = 'examples/files/full/tissue/jsons/graph/'
DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG = 'examples/figures/tissue/images/'

#### Image full untreated realistic co-culture `dish` and `tissue` data

In [None]:
from scripts.image.image import image

In [None]:
image(DATA_PATH_IMAGE_UNTREATED_COCULTURE_CELLS, DATA_PATH_IMAGE_UNTREATED_COCULTURE_SVG, size=SIZE, time=DISH_TIMES, ignore=POPS_TO_IGNORE, radius=RADIUS, bgcol=BGCOL, pops=True)
image(DATA_PATH_IMAGE_UNTREATED_TISSUE_CELLS, DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG, size=SIZE, time=TISSUE_TIMES, ignore=POPS_TO_IGNORE, radius=RADIUS, bgcol=BGCOL, pops=True)
image(DATA_PATH_IMAGE_UNTREATED_TISSUE_GRAPH, DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG, size=SIZE, time=TISSUE_TIMES, radius=RADIUS, graph=True)

---

## 2. Heuristic data plotting

The main function (`plot_heuristics_data`) will make a plot the probability of binding and/or killing based on the CAR-antigen and PD1-PDL1 binding heuristics used in the paper across various values of ligand/receptor and/or binding affinity. Each time this function is run the same output will be produced.

#### Workspace variables

+ `RESULTS_PATH_HEURISTICS` variable indicates where to save the heuristic plots (`.svg` files as a result of plotting)

In [None]:
RESULTS_PATH_HEURISTICS = 'examples/figures/heuristics/'

#### Plot heuristic data

In [None]:
from scripts.plot.plot_data import plot_heuristics_data

In [None]:
plot_heuristics_data(RESULTS_PATH_HEURISTICS)

---

## 3. Simulated data processing & plotting

This data processing and plotting pipeline is used for both `dish` and `tissue` data. A small set of co-culture `dish` data, which is not used in the paper and is a shorter time scale than paper simulations for ease of running sections 3.1-3.3 of this notebook in a reasonable time scale, is provided for the purpose of exploring this data processing pipeline. Toy `tissue` files are not used in this pipeline but are provided for exploration. For the plotting, full sets of the co-culture `dish` and `tissue` data used in the paper are provided to see figure outputs. Only a subset of `tissue` simulations were generated in the paper and used in this example. This set are those that showed effective treatment in the realistic co-culture `dish` context.

---

### 3.1 Parse sample co-culture `dish` data

The main parsing function (`parse`) iterates through each file in the data path and parses each simulation instance, extracting fields from the simulation setup, cells, and environment.

The parsed arrays are organized as:

`{
    "setup": {
        "radius": R,
        "height": H,
        "time": [],
        "pops": [],
        "types": [],
        "coords": []
    },
    "agents": (N seeds) x (T timepoints) x (H height) x (C coordinates) x (P positions),
    "environments": {
        "glucose": (N seeds) x (T timepoints) x (H height) x (R radius)
        "oxygen": (N seeds) x (T timepoints) x (H height) x (R radius)
        "tgfa": (N seeds) x (T timepoints) x (H height) x (R radius)
        "IL-2": (N seeds) x (T timepoints) x (H height) x (R radius)
    }
}
`

where each entry in the agents array is a structured entry of the shape:

`
"pop"       int8    population code
"type"      int8    cell type code
"volume"    int16   cell volume (rounded)
"cycle"     int16   average cell cycle length (rounded)
`
The `parse.py` file contains general parsing functions.

Parsing can take some time.

#### Workspace variables

Set up workspace variables for parsing simulations. 

+ `DATA_PATH` variables are the path to simulaiton output data files (`.tar.xz` files of compressed simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by parsing)

In [None]:
# Toy data workspace variables
DATA_PATH_TOY_COCULTURE_TAR = 'examples/files/toy/coculture/tars/'
RESULTS_PATH_TOY_COCULTURE_PARSED = 'examples/files/toy/coculture/parsed/'

#### Parse sample co-culture `dish` simulations

In [None]:
from scripts.parse.parse import parse

In [None]:
parse(DATA_PATH_TOY_COCULTURE_TAR, RESULTS_PATH_TOY_COCULTURE_PARSED)

---

### 3.2 Analyze parsed sample coculture `dish` data

Each main analyzing function (`analyze_cells`, `analyze_env`, `analyze_spatial`, and `analyze_lysis`) iterate through each parsed file (`.pkl`) in the data path and analyzes each simulation instance, extracting fields from the simulation setup, cells, and environment depending on the function.

`analyze_cells` collects cell counts for each population and state over time (files produced will end with `ANALYSED`).

When the `sharedLocs` flag is set to `True`, only the data for CAR T-cells that share a location with at least one cancer cell is collected. When this flag is used, files produced will end with `SHAREDLOCS`. This was used to further analyze the spatial differences between the `dish` and `tissue` contexts by providing an additional analysis on the effective treatments is compared to show the cell state dynamics over time for only those CAR T-cells that share locations with at least one cancer cell.

`analyze_env` collects information on environmental species concentrations over time (files produced will end with `ENVIRONMENT`).

`analyze_spatial` collects cell counts for each population across simulation radii over time (files produced will end with `SPATIAL`).

`analyze_lysis` collects lysed cell information over time (files produced will end with `LYSED`).

#### Workspace variables

Set up workspace variables for analyzing simulations.

+ `DATA_PATH...PARSED` variables are the path to parsed data files (`.pkl` files generated by parsing), where for the `co-culture dish` data, one may need to manually put the effective treatment files into a folder separate from the rest of the data.
+ `DATA_PATH...LYSIS` variables are the path to `LYSIS` data files (`.LYSIS.json` files generated direclty from the simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by analyzing)
+ `SHARED_LOCATIONS` indicates to only collect data for CAR T-cells that share a location with at least one cancer cell.

In [None]:
# Toy data workspace variables
DATA_PATH_TOY_COCULTURE_PARSED = 'examples/files/toy/coculture/parsed/'
DATA_PATH_TOY_COCULTURE_LYSIS = 'examples/files/toy/coculture/lysis/'
RESULTS_PATH_TOY_COCULTURE_CELLS = 'examples/files/toy/coculture/analyzed/cells/'
RESULTS_PATH_TOY_COCULTURE_ENVIRONMENT = 'examples/files/toy/coculture/analyzed/environment/'
RESULTS_PATH_TOY_COCULTURE_SPATIAL = 'examples/files/toy/coculture/analyzed/spatial/'
RESULTS_PATH_TOY_COCULTURE_LYSED = 'examples/files/toy/coculture/analyzed/lysed/'
RESULTS_PATH_TOY_COCULTURE_SHAREDLOCS = 'examples/files/toy/coculture/analyzed/sharedlocs/'

# Shared locations workspace variables
SHARED_LOCATIONS = True

#### Analyze parsed sample co-cululture `dish` simulations

In [None]:
from scripts.analyze.analyze_cells import analyze_cells
from scripts.analyze.analyze_env import analyze_env
from scripts.analyze.analyze_spatial import analyze_spatial
from scripts.analyze.analyze_lysis import analyze_lysis

In [None]:
# Analyze co-culture dish data
analyze_cells(DATA_PATH_TOY_COCULTURE_PARSED, RESULTS_PATH_TOY_COCULTURE_CELLS)
analyze_env(DATA_PATH_TOY_COCULTURE_PARSED, RESULTS_PATH_TOY_COCULTURE_ENVIRONMENT)
analyze_spatial(DATA_PATH_TOY_COCULTURE_PARSED, RESULTS_PATH_TOY_COCULTURE_SPATIAL)
analyze_lysis(DATA_PATH_TOY_COCULTURE_LYSIS, RESULTS_PATH_TOY_COCULTURE_LYSED)
analyze_cells(DATA_PATH_TOY_COCULTURE_PARSED, RESULTS_PATH_TOY_COCULTURE_SHAREDLOCS, sharedLocs=SHARED_LOCATIONS)

---

### 3.3 Subset analyzed sample co-culture `dish` data

The main subsetting function (`subset_data`) takes in a given desired subset of data and iterates through each analyzed file in the data path (`.pkl`) and adds simulations matching the subset reqiresments to a single data file. Each subset will also automatically include the untreated control if it is in the file directory where the data is being pulled from.

Since we are working with a limited toy example for this exercise, we will select a few small subsets of data, but they will not be exhaustive collections of the full data set as in the paper.

For this example, we will select all simulations where indicated features meet the following requirements:

+ `CAR AFFINITY` : 1e-7

This means all data within this subset will have the specified values of the `CAR AFFINITY` listed above, but all features values of `DOSE`, `TREAT RATIO`, `ANTIGENS CANCER`, and if applicable `ANTIGENS HEALTHY`, will be included. All subsets will be saved in the following format:

`XML_NAME` + `DOSE` + `TREAT RATIO` + `CAR AFFINITY` + `ANTIGENS CANCER` + `ANTIGENS HEALTHY`

where the gap between values is separed by a `_`, values specified in the subset are replaced with the desired value, and values not specified in the subset are replaced by an `X` to indicate that all values of that feature are present.

Thus, the above example will produce the following name for the toy data:
    
    VITRO_DISH_TREAT_CH_2D_X_X_1e-07_X_X_DATATYPE.pkl


Where `DATATYPE` is either `ANALYZED` for cells, `ENVIRONMENT` for enviornment, `SPATIAL` for spatial, or `LYSED` for lysed analyses.

We will also make a subset with the following requirements:

+ `CAR AFFINITY` : 1e-6
+ `ANTIGENS CANCER` : 1000

Which will prodice the follwing toy dataset subset:

    VITRO_DISH_TREAT_CH_2D_X_X_1e-06_1000_X_DATATYPE.pkl
    
And finally, we will collect all of the cell data into a single file by not selecting a subset, producing data with the following name:

    VITRO_DISH_TREAT_CH_2D_X_X_X_X_X_DATATYPE.pkl

We will additionally do this with and without the `states` flag, such that we are collecting only the cell population and state counts data over time, but will exclude the volume and cell cycle distribution data. The file with the `states` information only will be listed as:


    VITRO_DISH_TREAT_CH_2D_X_X_X_X_X_STATES_ANALYZED.pkl

For the `tissue`, only the subset of all data is required. Additionally, for the sharedLocs data, only the subset of all data is required for both effective treatments in realistic co-culture `dish` and `tissue` contexts.

#### Workspace variables

Set up workspace variables for subsetting simulations.

+ `DATA_PATH` variables are the path to analyzed data files (`.pkl` files generated by analyzing data)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by subsetting)
+ `...XML_NAME` variables are strings (that vary based on data setup) that will preceed the file name extension (that vary based on subset selected)
+ `...SUBSET...` variables are the requested data subsets to make (`;` separated lists with tuples containing keys to specific feature values that simulations in subset must include)
+ `...ALL` variables indicate to collect all data without subsetting
+ `DATA_TYPE` variables indicate which type of analyzed data is being fed into the subsetting function
+ `STATES` variable enable collecting only the relevant cell count and states data (thus excluding cell distribution data such as volumes and average cell cycle lengths)

In [None]:
# Co-culture dish workspace variables
DATA_PATH_TOY_COCULTURE_CELLS = 'examples/files/toy/coculture/analyzed/cells/'
DATA_PATH_TOY_COCULTURE_ENVIRONMENT = 'examples/files/toy/coculture/analyzed/environment/'
DATA_PATH_TOY_COCULTURE_SPATIAL = 'examples/files/toy/coculture/analyzed/spatial/'
DATA_PATH_TOY_COCULTURE_LYSED = 'examples/files/toy/coculture/analyzed/lysed/'
DATA_PATH_TOY_COCULTURE_SHAREDLOCS = 'examples/files/toy/coculture/analyzed/sharedlocs/'

TOY_COCULTURE_XML_NAME = 'VITRO_DISH_TREAT_CH_2D'

RESULTS_PATH_TOY_COCULTURE_SUBSET_CELLS = 'examples/files/toy/coculture/subset/cells/'
RESULTS_PATH_TOY_COCULTURE_SUBSET_ENVIRONMENT = 'examples/files/toy/coculture/subset/environment/'
RESULTS_PATH_TOY_COCULTURE_SUBSET_SPATIAL = 'examples/files/toy/coculture/subset/spatial/'
RESULTS_PATH_TOY_COCULTURE_SUBSET_LYSED = 'examples/files/toy/coculture/subset/lysed/'
RESULTS_PATH_TOY_COCULTURE_SUBSET_SHAREDLOCS = 'examples/files/toy/coculture/subset/sharedlocs/'

TOY_COCULTURE_SUBSETS = '[(CAR AFFINITY:1e-7)];[(CAR AFFINITY:1e-6),(ANTIGENS CANCER:10000)]'
TOY_COCULTURE_ALL = ''

# Types of analyses
DATA_TYPE_CELLS = 'ANALYZED'
DATA_TYPE_ENVIRONMENT = 'ENVIRONMENT'
DATA_TYPE_SPATIAL = 'SPATIAL'
DATA_TYPE_LYSED = 'LYSED'
DATA_TYPE_SHAREDLOCS = 'SHAREDLOCS'

STATES = True

#### Subset sample co-culture `dish` simulations

In [None]:
from scripts.subset.subset import subset_data

In [None]:
# Subset co-culture dish data
subset_data(DATA_PATH_TOY_COCULTURE_CELLS, TOY_COCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_TOY_COCULTURE_SUBSET_CELLS, TOY_COCULTURE_SUBSETS)
subset_data(DATA_PATH_TOY_COCULTURE_CELLS, TOY_COCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_TOY_COCULTURE_SUBSET_CELLS, TOY_COCULTURE_ALL, states=STATES)
subset_data(DATA_PATH_TOY_COCULTURE_ENVIRONMENT, TOY_COCULTURE_XML_NAME, DATA_TYPE_ENVIRONMENT, RESULTS_PATH_TOY_COCULTURE_SUBSET_ENVIRONMENT, TOY_COCULTURE_SUBSETS)
subset_data(DATA_PATH_TOY_COCULTURE_SPATIAL, TOY_COCULTURE_XML_NAME, DATA_TYPE_SPATIAL, RESULTS_PATH_TOY_COCULTURE_SUBSET_SPATIAL, TOY_COCULTURE_SUBSETS)
subset_data(DATA_PATH_TOY_COCULTURE_LYSED, TOY_COCULTURE_XML_NAME, DATA_TYPE_LYSED, RESULTS_PATH_TOY_COCULTURE_SUBSET_LYSED, TOY_COCULTURE_SUBSETS)
subset_data(DATA_PATH_TOY_COCULTURE_SHAREDLOCS, TOY_COCULTURE_XML_NAME, DATA_TYPE_SHAREDLOCS, RESULTS_PATH_TOY_COCULTURE_SUBSET_SHAREDLOCS, subsetsRequested=TOY_COCULTURE_ALL, states=STATES)

---

### 3.4 Plot full subsetted co-culture `dish` and `tissue` data

The main plotting function (`plot_data`) iterates through each subsetted file (`.pkl`) in the data path and plots relevant data for each subset instance.

The function enables choosing which feature to color the data by. Choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset. 

For the `tissue` data only: since not all possible combinations of the `tissue` data was collected, each possible feature will need to be specified (and stored in different locations as the feature color is not stored in the file name). For this example, we use `ANTIGENS CANCER` as the sample variable and put all resulting figures in the `antigens_cancer` subfolders. The same set of code can be run for the `DOSE`, `CAR AFFINITY`, and `TREAT RATIO` variables, which would go in the `dose`, `car_affinity`, and `treat_ratio` subfolders, respectively. 

For the sharedLocs data, since not all possible combinations of the `tissue` data was collected and only the effective treatment realistic co-culture `dish` are desired, each desired feature to color by will need to be specified (and stored in different locations as the feature color is not stored in the file name). In this analysis, only `ANTIGENS CANCER` was used.

For the purpose of this example, instead of using the toy data, we have provided a few full, pre-computed subsets of all the entire co-culture `dish` and `tissue` datasets used in the paper for exploring.

#### Workspace variables

Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting)
+ `...COLOR` variables indicate which feature to color the variables by (choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset)
+ `TISSUE_PARTIAL` indicates that only a partial set of a full combinatorial set of features is present and is a flag used for selecting which plots to make.
+ `SHAREDLOCS_COLOR` variables indicate which feature to color the variables by
+ `SHAREDLOCS_PARTIAL` indicates that only a partial set of a full combinatorial set of features is present and is a flag used for selecting which plots to make.

In [None]:
# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_SUBSET_CELLS = 'examples/files/full/coculture/subset/cells/'
DATA_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT = 'examples/files/full/coculture/subset/environment/'
DATA_PATH_DISH_COCULTURE_SUBSET_SPATIAL = 'examples/files/full/coculture/subset/spatial/'
DATA_PATH_DISH_COCULTURE_SUBSET_LYSED = 'examples/files/full/coculture/subset/lysed/'
DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS = 'examples/files/full/coculture/subset/sharedlocs/'

DISH_COCULTURE_COLOR = 'X'

RESULTS_PATH_DISH_COCULTURE_FIGURES_CELLS = 'examples/figures/coculture/cells/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_ENVIRONMENT = 'examples/figures/coculture/environment/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_SPATIAL = 'examples/figures/coculture/spatial/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_LYSED = 'examples/figures/coculture/lysed/'
RESULTS_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_FIGURES_SHAREDLOCS = 'examples/figures/coculture/sharedlocs/'

# Tissue workspace variables
DATA_PATH_TISSUE_SUBSET_CELLS = 'examples/files/full/tissue/subset/cells/'
DATA_PATH_TISSUE_SUBSET_ENVIRONMENT = 'examples/files/full/tissue/subset/environment/'
DATA_PATH_TISSUE_SUBSET_SPATIAL = 'examples/files/full/tissue/subset/spatial/'
DATA_PATH_TISSUE_SUBSET_LYSED = 'examples/files/full/tissue/subset/lysed/'
DATA_PATH_TISSUE_SUBSET_SHAREDLOCS = 'examples/files/full/tissue/subset/sharedlocs/'


TISSUE_COLOR_ANTIGENS_CANCER = 'ANTIGENS CANCER'

RESULTS_PATH_TISSUE_FIGURES_CELLS_ANTIGENS_CANCER = 'examples/figures/tissue/cells/antigens_cancer/'
RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_ANTIGENS_CANCER = 'examples/figures/tissue/environment/antigens_cancer/'
RESULTS_PATH_TISSUE_FIGURES_SPATIAL_ANTIGENS_CANCER = 'examples/figures/tissue/spatial/antigens_cancer/'
RESULTS_PATH_TISSUE_FIGURES_LYSED_ANTIGENS_CANCER = 'examples/figures/tissue/lysed/antigens_cancer/'
RESULTS_PATH_TISSUE_FIGURES_SHAREDLOCS = 'examples/figures/tissue/sharedlocs/'

TISSUE_PARTIAL = True

# Shared locations workspace variables
SHAREDLOCS_COLOR = 'ANTIGENS CANCER'
SHAREDLOCS_PARTIAL = True

#### Plot full co-culture `dish` and `tissue` simulations

In [None]:
from scripts.plot.plot_data import plot_data

In [None]:
# Plot coculture dish data
plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_CELLS, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_CELLS)
plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_ENVIRONMENT)
plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_SPATIAL, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_SPATIAL)
plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_LYSED, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_LYSED)
plot_data(DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS, SHAREDLOCS_COLOR, RESULTS_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_FIGURES_SHAREDLOCS, partial=SHAREDLOCS_PARTIAL)

# Plot tissue data
plot_data(DATA_PATH_TISSUE_SUBSET_CELLS, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_CELLS_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)
plot_data(DATA_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)
plot_data(DATA_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_SPATIAL_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)
plot_data(DATA_PATH_TISSUE_SUBSET_LYSED, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_LYSED_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)
plot_data(DATA_PATH_TISSUE_SUBSET_SHAREDLOCS, SHAREDLOCS_COLOR, RESULTS_PATH_TISSUE_FIGURES_SHAREDLOCS, partial=SHAREDLOCS_PARTIAL)

---

### 3.5 Multi-feature & outcome analysis of full co-culture `dish` and `tissue` data

The main outcome analysis function (`stats`) iterates through each subsetted file (`.pkl`) in the data path and plots and analyzes relevant data for each subset instance.

For these analyses, only full data (analyzed by cell counts) subsets were used. We provided the following real, full co-culture `dish` subset for this analysis:

    VITRO_DISH_TREAT_CH_2D_X_X_X_X_X_STATES_ANALYZED.pkl
    
and the following real, full `tissue` subset:

    VIVO_TISSUE_TREAT_C_2D_X_X_X_X_X_STATES_ANALYZED.pkl

We will analyze this data both in full and averaged across replicates.

#### Workspace variables

Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` or `.pdf` or `.csv` files generated by analyzing the data)
+ `AVERAGE` and `NOT_AVERAGE` indicate whether or not to average the data across replicates.

In [None]:
# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_FULL = 'examples/files/full/coculture/subset/cells/VITRO_DISH_TREAT_CH_2D_X_X_X_X_X_STATES_ANALYZED.pkl'

RESULTS_PATH_DISH_COCULTURE_STATS = 'examples/figures/coculture/stats/'
RESULTS_PATH_DISH_COCULTURE_STATS_AVERAGE = 'examples/figures/coculture/stats/average/'

# Tissue workspace variables
DATA_PATH_TISSUE_SUBSET_CELLS = 'examples/files/full/tissue/subset/cells/'

RESULTS_PATH_TISSUE_STATS = 'examples/figures/tissue/stats/'
RESULTS_PATH_TISSUE_STATS_AVERAGE = 'examples/figures/tissue/stats/average/'

# Average workspace variables
AVERAGE = True
NOT_AVERAGE = False

#### Analyze full co-culture `dish` and `tissue` simulations

In [None]:
from scripts.stats.stats import stats

In [None]:
# Co-culture dish outcome analysis
stats(DATA_PATH_DISH_COCULTURE_FULL, RESULTS_PATH_DISH_COCULTURE_STATS, average=NOT_AVERAGE)
stats(DATA_PATH_DISH_COCULTURE_FULL, RESULTS_PATH_DISH_COCULTURE_STATS_AVERAGE, average=AVERAGE)

# Tissue outcome analysis
stats(DATA_PATH_TISSUE_SUBSET_CELLS, RESULTS_PATH_TISSUE_STATS, average=NOT_AVERAGE)
stats(DATA_PATH_TISSUE_SUBSET_CELLS, RESULTS_PATH_TISSUE_STATS_AVERAGE, average=AVERAGE)

---

## 4. Experimental literature data plotting

The main function (`plot_kill_curve_exp_data`) will plot experimental data kill curves using data extracted from a variety of reference papers. Each time this function is run the same output will be produced.

#### Workspace variables

+ `RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES` variable indicates where to save plots based on extracted data from literature (`.svg` files as a result of plotting)

In [None]:
RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES = 'examples/figures/kill_curves/'

#### Plot experimental literature data

In [None]:
from scripts.plot.plot_data import plot_kill_curve_exp_data

In [None]:
plot_kill_curve_exp_data(RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES)

---

## 5. Ranked data plotting

After finding the effective treatments from the realistic co-culture `dish` context and analyzing them in `tissue`, a score and rank for each simulation (averaged across replicates) were provided in both contexts and combined into a single `.xlsx` file for analysis.

The main function `plot_dish_tissue_compare_data` takes this `.xlsx` file as an input and generates parity and ladder plots for the rank and score of these simulations in the `dish` compared to the the `tissue` context.

#### Workspace variables

+ `DATA_PATH` variables are the path to excel data file (`.xlsx` file generated for this analysis)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting the data)
+ `COMPARE_COLOR` variables indicate which feature to color the variables by (choosing the color to be `X` will enable the function to automatically color the data based on each feature)

In [None]:
DATA_PATH_COMPARE_DISH_TISSUE_EXCEL = 'examples/files/full/ranked/RANK_SCORE_COMPARE_DISH_TISSUE.xlsx'
RESULTS_PATH_COMPARE_DISH_TISSUE_FIGURES = 'examples/figures/ranked/'

COMPARE_COLOR = 'X'

#### Compare full effective treatment realistic co-culture `dish` and `tissue` simulations

In [None]:
from scripts.plot.plot_data import plot_dish_tissue_compare_data

In [None]:
plot_dish_tissue_compare_data(DATA_PATH_COMPARE_DISH_TISSUE_EXCEL, COMPARE_COLOR, RESULTS_PATH_COMPARE_DISH_TISSUE_FIGURES)