# PROCESSING & PLOTTING DATA

This notebook provides the functions and scripts for parsing simulation files (`.json`) into pickled numpy arrays (`.pkl`) and subsequentially selecting, plotting, and analyzing various subsets or all of the data. This notebook shows the logical order in which data is processed to obtain the results in the paper.

---

1. HEURISTIC DATA PLOTTING
2. MONOCULTURE AND CO-CULTURE DISH DATA PROCESSING & PLOTTING
    - 2.1 PARSE MONOCULTURE AND CO-CULTURE DISH DATA
    - 2.2 ANALYZE PARSED MONOCULTURE AND CO-CULTURE DISH DATA
    - 2.3 SUBSET ANALYZED MONOCULTURE AND CO-CULTURE DISH DATA
    - 2.4 PLOT SUBSETTED MONOCULTURE AND CO-CULTURE DISH DATA
    - 2.5 MULTI-FEATURE & OUTCOME ANALYSIS MONOCULTURE AND CO-CULTURE DISH DATA
3. EXPERIMENTAL LITERATURE DATA PLOTTING
4. TISSUE DATA PROCESSING & PLOTTING
    - 2.1 PARSE TISSUE DATA
    - 2.2 ANALYZE PARSED TISSUE DATA
    - 2.3 SUBSET ANALYZED TISSUE DATA
    - 2.4 PLOT SUBSETTED TISSUE DATA
    - 2.5 MULTI-FEATURE & OUTCOME ANALYSIS TISSUE
5. RANKED DATA PLOTTING
6. CO-CULTURE DISH AND TISSUE SHARED LOCS DATA PROCESSING & PLOTTING
    - 6.1 ANALYZE PARSED CO-CULTURE DISH AND TISSUE SHARED LOCS DATA
    - 6.2 SUBSET ANALYZED CO-CULTURE DISH AND TISSUE SHARED LOCS DATA
    - 6.3 PLOT CO-CULTURE DISH AND TISSUE SHARED LOCS DATA
7. CO-CULTURE, TISSUE, AND GRAPH IMAGING

---

## 1. HEURISTIC DATA PLOTTING 

The main function (`plot_heuristics_data`) will make a plot the probability of binding and/or killing based on the CAR-antigen and PD1-PDL1 binding heuristics used in the paper across various values of ligand/receptor and/or binding affinity. Each time this function is run the same output will be produced.

#### WORKSPACE VARIABLES

+ `RESULTS_PATH_HEURISTICS` variable indicates where to save the heuristic plots (`.svg` files as a result of plotting)

In [None]:
RESULTS_PATH_HEURISTICS = 'path/to/figures/heuristics/'

#### PLOT HEURISTIC DATA

In [None]:
import scripts.plot.plot_data

In [None]:
scripts.plot.plot_data.plot_heuristics_data(RESULTS_PATH_HEURISTICS)

#### EXAMPLE FIGURE

---

## 2. MONOCULTURE AND CO-CULTURE DISH DATA PROCESSING & PLOTTING

A full combinatorial set of `monoculture dish` and `co-culture dish` data were generated where all of the following features were tested in the main set of data:

+ `DOSE` : [250, 500, 1000]
+ `TREAT RATIO` : [0:100, 25:75, 50:50, 75:25, 100:0]
+ `CAR AFFINITY` : [1e-6, 1e-7, 1e-8, 1e-9]
+ `ANTIGENS CANCER` : [100, 500, 1000, 5000, 10000]

In the `co-culture dish` dataset, an additional feature was varied:

+ `ANTIGENS HEALTHY` : [0, 100]

which produced the `ideal` (`ANTIGENS HEALTHY` = 0) and `realistic` (`ANTIGENS HEALTHY` = 100) `co-culture dish` data.

In extended datasets, `DOSE` or `TREAT RATIO` were extended to the following:

+ `DOSE` : [250, 500, 1000, 5000, 10000]
+ `TREAT RATIO` : [0:100, 10:90, 25:75, 50:50, 75:25, 90:10, 100:0]

---

### 2.1 PARSE MONOCULTURE AND CO-CULTURE DISH DATA

The main parsing function (`parse`) iterates through each file in the data path and parses each simulation instance, extracting fields from the simulation setup, cells, and environment.

The parsed arrays are organized as:

`{
    "setup": {
        "radius": R,
        "height": H,
        "time": [],
        "pops": [],
        "types": [],
        "coords": []
    },
    "agents": (N seeds) x (T timepoints) x (H height) x (C coordinates) x (P positions),
    "environments": {
        "glucose": (N seeds) x (T timepoints) x (H height) x (R radius)
        "oxygen": (N seeds) x (T timepoints) x (H height) x (R radius)
        "tgfa": (N seeds) x (T timepoints) x (H height) x (R radius)
        "IL-2": (N seeds) x (T timepoints) x (H height) x (R radius)
    }
}
`

where each entry in the agents array is a structured entry of the shape:

`
"pop"       int8    population code
"type"      int8    cell type code
"volume"    int16   cell volume (rounded)
"cycle"     int16   average cell cycle length (rounded)
`
The `parse.py` file contains general parsing functions.

Parsing can take some time.

#### WORKSPACE VARIABLES

Set up workspace variables for parsing simulations.

+ `DATA_PATH` variables are the path to simulaiton output data files (`.tar.xz` files of compressed simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by parsing)

In [None]:
# Monoculture dish data workspace variable
DATA_PATH_DISH_MONOCULTURE_TAR = 'path/to/dish/monoculture/files/tars/'
RESULTS_PATH_DISH_MONOCULTURE_PARSED = 'path/to/dish/monoculture/files/parsed/'

# Co-culture dish data workspace variable
DATA_PATH_DISH_COCULTURE_TAR = 'path/to/dish/coculture/files/tars/'
RESULTS_PATH_DISH_COCULTURE_PARSED = 'path/to/dish/coculture/files/parsed/'

#### PARSE `MONOCULTURE DISH` AND `CO-CULTURE DISH` SIMULATIONS

In [None]:
import scripts.parse.parse

In [None]:
scripts.parse.parse.parse(DATA_PATH_DISH_MONOCULTURE_TAR, RESULTS_PATH_DISH_MONOCULTURE_PKL)
scripts.parse.parse.parse(DATA_PATH_DISH_COCULTURE_TAR, RESULTS_PATH_DISH_COCULTURE_PKL)

---

### 2.2 ANALYZE PARSED  MONOCULTURE AND CO-CULTURE DISH DATA

Each main analyzing function (`analyze_cells`, `analyze_env`, `analyze_spatial`, and `analyze_lysis`) iterate through each parsed file (`.pkl`) in the data path and analyzes each simulation instance, extracting fields from the simulation setup, cells, and environment depending on the function.

`analyze_cells` collects cell counts for each population and state over time (files produced will end with `ANALYSED`).

`analyze_env` collects information on environmental species concentrations over time (files produced will end with `ENVIRONMENT`).

`analyze_spatial` collects cell counts for each population across simulation radii over time (files produced will end with `SPATIAL`).

`analyze_lysis` collects lysed cell information over time (files produced will end with `LYSED`).

#### WORKSPACE VARIABLES

Set up workspace variables for analyzing simulations.

+ `DATA_PATH...PARSED` variables are the path to parsed data files (`.pkl` files generated by parsing)
+ `DATA_PATH...LYSIS` variables are the path to `LYSIS` data files (`.LYSIS.json` files generated direclty from the simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by analyzing)

In [1]:
# Monoculture dish workspace variables
DATA_PATH_DISH_MONOCULTURE_PARSED = 'path/to/dish/monoculture/files/parsed/'
DATA_PATH_DISH_MONOCULTURE_LYSIS = 'path/to/dish/monoculture/files/lysis/'
RESULTS_PATH_DISH_MONOCULTURE_CELLS = 'path/to/dish/monoculture/files/cells/'
RESULTS_PATH_DISH_MONOCULTURE_ENVIRONMENT = 'path/to/dish/monoculture/files/environment/'
RESULTS_PATH_DISH_MONOCULTURE_SPATIAL = 'path/to/dish/monoculture/files/spatial/'
RESULTS_PATH_DISH_MONOCULTURE_LYSED = 'path/to/dish/monoculture/files/lysed/'

# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_PARSED = 'path/to/dish/coculture/files/parsed/'
DATA_PATH_DISH_COCULTURE_LYSIS = 'path/to/dish/coculture/files/lysis/'
RESULTS_PATH_DISH_COCULTURE_CELLS = 'path/to/dish/coculture/files/cells/'
RESULTS_PATH_DISH_COCULTURE_ENVIRONMENT = 'path/to/dish/coculture/files/environment/'
RESULTS_PATH_DISH_COCULTURE_SPATIAL = 'path/to/dish/coculture/files/spatial/'
RESULTS_PATH_DISH_COCULTURE_LYSED = 'path/to/dish/coculture/files/lysed/'

#### ANALYZE `MONOCULTURE DISH` AND `CO-CULTURE DISH` SIMULATIONS

In [1]:
import scripts.analyze.analyze_cells
import scripts.analyze.analyze_env
import scripts.analyze.analyze_spatial
import scripts.analyze.analyze_lysis

In [None]:
# Analyze monoculture dish data
scripts.analyze.analyze_cells.analyze_cells(DATA_PATH_DISH_MONOCULTURE_PARSED, RESULTS_PATH_DISH_MONOCULTURE_CELLS)
scripts.analyze.analyze_env.analyze_env(DATA_PATH_DISH_MONOCULTURE_PARSED, RESULTS_PATH_DISH_MONOCULTURE_ENVIRONMENT)
scripts.analyze.analyze_spatial.analyze_spatial(DATA_PATH_DISH_MONOCULTURE_PARSED, RESULTS_PATH_DISH_MONOCULTURE_SPATIAL)
scripts.analyze.analyze_lysis.analyze_lysis(DATA_PATH_DISH_MONOCULTURE_LYSIS, RESULTS_PATH_DISH_MONOCULTURE_LYSED)

# Analyze co-culture dish data
scripts.analyze.analyze_cells.analyze_cells(DATA_PATH_DISH_COCULTURE_PARSED, RESULTS_PATH_DISH_COCULTURE_CELLS)
scripts.analyze.analyze_env.analyze_env(DATA_PATH_DISH_COCULTURE_PARSED, RESULTS_PATH_DISH_COCULTURE_ENVIRONMENT)
scripts.analyze.analyze_spatial.analyze_spatial(DATA_PATH_DISH_COCULTURE_PARSED, RESULTS_PATH_DISH_COCULTURE_SPATIAL)
scripts.analyze.analyze_lysis.analyze_lysis(DATA_PATH_DISH_COCULTURE_LYSIS, RESULTS_PATH_DISH_COCULTURE_LYSED)

---

### 2.3 SUBSET ANALYZED MONOCULTURE AND CO-CULTURE DISH DATA

The main subsetting function (`subset_data`) takes in a given desired subset of data and iterates through each analyzed file in the data path (`.pkl`) and adds simulations matching the subset reqiresments to a single data file. Each subset will also automatically include the untreated control if it is in the file directory where the data is being pulled from.

A data subsetet for example might be all simulations where indicated features meet the following requirements:

+ `TREAT RATIO` : 50-50
+ `CAR AFFINITY` : 1e-7
+ `ANTIGENS CANCER` : 1000

This means all data within this subset will have the specific values of the features listed above, but all features values of `DOSE`, and if applicable `ANTIGENS HEALTHY`, will be included. All subsets will be saved in the following format:

`XML_NAME` + `DOSE` + `TREAT RATIO` + `CAR AFFINITY` + `ANTIGENS CANCER` + `ANTIGENS HEALTHY`

where the gap between values is separed by a `_`, values specified in the subset are replaced with the desired value, and values not specified in the subset are replaced by an `X` to indicate that all values of that feature are present.

Thus, the above example would produce the following name for the `monoculture dish` data:
    
    VITRO_DISH_TREAT_C_2D_X_50-50_1e-07_1000_NA_DATATYPE.pkl

Where `DATATYPE` is either `ANALYZED` for cells, `ENVIRONMENT` for enviornment, `SPATIAL` for spatial, or `LYSED` for lysed analyses.

And would produce the following name for the `co-culture dish` data:


    VITRO_DISH_TREAT_CH_2D_X_50-50_1e-07_1000_X_DATATYPE.pkl

where the `X` in place of the `ANTIGENS HEALTHY` would be replaced by a value if a value were instead specified.

We can also collect all the data together into one file by not specifying a subset. We only need to do this for the cell (`ANALYZED`) data for the `monoculture dish` and `co-culture dish` data. Additionally, we can collect this with or without specifying the `states` flag, which if `True` will collect only the cell population and state counts data over time, but will exclude the volume and cell cycle distribution data. This is used particularlly because files including the distribution data are very large. The `states` flag is only relevant to the `analyze_cells` function.

#### WORKSPACE VARIABLES

Set up workspace variables for subsetting simulations.

+ `DATA_PATH` variables are the path to analyzed data files (`.pkl` files generated by analyzing data)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by subsetting)
+ `...XML_NAME` variables are strings (that vary based on data setup) that will preceed the file name extension (that vary based on subset selected)
+ `...SUBSET...` variables are the requested data subsets to make (`;` separated lists with tuples containing keys to specific feature values that simulations in subset must include)
+ `...ALL` variables indicate to collect all data without subsetting
+ `DATA_TYPE` variables indicate which type of analyzed data is being fed into the subsetting function
+ `STATES` variable enable collecting only the relevant cell count and states data (thus excluding cell distribution data such as volumes and average cell cycle lengths)

In [None]:
# Monoculture dish workspace variables
DATA_PATH_DISH_MONOCULTURE_CELLS = 'path/to/dish/monoculture/files/cells/'
DATA_PATH_DISH_MONOCULTURE_ENVIRONMENT = 'path/to/dish/monoculture/files/environment/'
DATA_PATH_DISH_MONOCULTURE_SPATIAL = 'path/to/dish/monoculture/files/spatial/'
DATA_PATH_DISH_MONOCULTURE_LYSED = 'path/to/dish/monoculture/files/lysed/'

DISH_MONOCULTURE_XML_NAME = 'VITRO_DISH_TREAT_C_2D'

RESULTS_PATH_DISH_MONOCULTURE_SUBSET_CELLS = 'path/to/dish/monoculture/files/subset/cells/'
RESULTS_PATH_DISH_MONOCULTURE_SUBSET_ENVIRONMENT = 'path/to/dish/monoculture/files/subset/environment/'
RESULTS_PATH_DISH_MONOCULTURE_SUBSET_SPATIAL = 'path/to/dish/monoculture/files/subset/spatial/'
RESULTS_PATH_DISH_MONOCULTURE_SUBSET_LYSED = 'path/to/dish/monoculture/files/subset/lysed/'

DISH_MONOCULTURE_SUBSET_CELLS = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7)];[(TREAT RATIO:50-50),(DOSE:500)]'
DISH_MONOCULTURE_SUBSET_OTHER = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7)]'
DISH_MONOCULTURE_ALL = ''

# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_CELLS = 'path/to/dish/coculture/files/cells/'
DATA_PATH_DISH_COCULTURE_ENVIRONMENT = 'path/to/dish/coculture/files/environment/'
DATA_PATH_DISH_COCULTURE_SPATIAL = 'path/to/dish/coculture/files/spatial/'
DATA_PATH_DISH_COCULTURE_LYSED = 'path/to/dish/coculture/files/lysed/'

DISH_COCULTURE_XML_NAME = 'VITRO_DISH_TREAT_CH_2D'

RESULTS_PATH_DISH_COCULTURE_SUBSET_CELLS = 'path/to/dish/coculture/files/subset/cells/'
RESULTS_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT = 'path/to/dish/coculture/files/subset/environment/'
RESULTS_PATH_DISH_COCULTURE_SUBSET_SPATIAL = 'path/to/dish/coculture/files/subset/spatial/'
RESULTS_PATH_DISH_COCULTURE_SUBSET_LYSED = 'path/to/dish/coculture/files/subset/lysed/'

DISH_COCULTURE_SUBSET_CELLS_IDEAL_PLUS_HA = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)];[(TREAT RATIO:50-50),(DOSE:500),(ANTIGENS HEALTHY:0)]'
DISH_COCULTURE_SUBSET_CELLS_REALISTIC = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS HEALTHY:100)];[(TREAT RATIO:50-50),(DOSE:500),(ANTIGENS HEALTHY:100)]'
DISH_COCULTURE_SUBSET_OTHER_IDEAL_PLUS_HA = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS HEALTHY:0)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000)]'
DISH_COCULTURE_SUBSET_OTHER_REALISTIC = '[(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(CAR AFFINITY:1e-7),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(TREAT RATIO:50-50),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)];[(DOSE:500),(TREAT RATIO:50-50),(CAR AFFINITY:1e-7),(ANTIGENS HEALTHY:100)];[(TREAT RATIO:50-50),(ANTIGENS CANCER:1000),(ANTIGENS HEALTHY:100)]'
DISH_COCULTURE_ALL = ''

# Types of analyses
DATA_TYPE_CELLS = 'ANALYZED'
DATA_TYPE_ENVIRONMENT = 'ENVIRONMENT'
DATA_TYPE_SPATIAL = 'SPATIAL'
DATA_TYPE_LYSED = 'LYSED'

STATES = True

#### SUBEST `MONOCULTURE DISH` AND `CO-CULTURE DISH` SIMULATIONS

In [None]:
import scripts.subset.subset

In [None]:
# Subset monoculture dish data
scripts.subset.subset.subset_data(DATA_PATH_DISH_MONOCULTURE_CELLS, DISH_MONOCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_DISH_MONOCULTURE_SUBSET_CELLS, DISH_MONOCULTURE_SUBSET_CELLS)
scripts.subset.subset.subset_data(DATA_PATH_DISH_MONOCULTURE_CELLS, DISH_MONOCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_DISH_MONOCULTURE_SUBSET_CELLS, DISH_MONOCULTURE_ALL, states=STATES)
scripts.subset.subset.subset_data(DATA_PATH_DISH_MONOCULTURE_ENVIRONMENT, DISH_MONOCULTURE_XML_NAME, DATA_TYPE_ENVIRONMENT, RESULTS_PATH_DISH_MONOCULTURE_SUBSET_ENVIRONMENT, DISH_MONOCULTURE_SUBSET_OTHER)
scripts.subset.subset.subset_data(DATA_PATH_DISH_MONOCULTURE_SPATIAL, DISH_MONOCULTURE_XML_NAME, DATA_TYPE_SPATIAL, RESULTS_PATH_DISH_MONOCULTURE_SUBSET_SPATIAL, DISH_MONOCULTURE_SUBSET_OTHER)
scripts.subset.subset.subset_data(DATA_PATH_DISH_MONOCULTURE_LYSED, DISH_MONOCULTURE_XML_NAME, DATA_TYPE_LYSED, RESULTS_PATH_DISH_MONOCULTURE_SUBSET_LYSED, DISH_MONOCULTURE_SUBSET_OTHER)

# Subset co-culture dish data
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_CELLS, DISH_COCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_DISH_COCULTURE_SUBSET_CELLS, DISH_COCULTURE_SUBSET_CELLS_IDEAL_PLUS_HA)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_CELLS, DISH_COCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_DISH_COCULTURE_SUBSET_CELLS, DISH_COCULTURE_SUBSET_CELLS_REALISTIC)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_CELLS, DISH_COCULTURE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_DISH_COCULTURE_SUBSET_CELLS, DISH_COCULTURE_ALL, states=STATES)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_ENVIRONMENT, DISH_COCULTURE_XML_NAME, DATA_TYPE_ENVIRONMENT, RESULTS_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT, DISH_COCULTURE_SUBSET_OTHER_IDEAL_PLUS_HA)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_ENVIRONMENT, DISH_COCULTURE_XML_NAME, DATA_TYPE_ENVIRONMENT, RESULTS_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT, DISH_COCULTURE_SUBSET_OTHER_REALISTIC)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_SPATIAL, DISH_COCULTURE_XML_NAME, DATA_TYPE_SPATIAL, RESULTS_PATH_DISH_COCULTURE_SUBSET_SPATIAL, DISH_COCULTURE_SUBSET_OTHER_IDEAL_PLUS_HA)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_SPATIAL, DISH_COCULTURE_XML_NAME, DATA_TYPE_SPATIAL, RESULTS_PATH_DISH_COCULTURE_SUBSET_SPATIAL, DISH_COCULTURE_SUBSET_OTHER_REALISTIC)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_LYSED, DISH_COCULTURE_XML_NAME, DATA_TYPE_LYSED, RESULTS_PATH_DISH_COCULTURE_SUBSET_LYSED, DISH_COCULTURE_SUBSET_OTHER_IDEAL_PLUS_HA)
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_LYSED, DISH_COCULTURE_XML_NAME, DATA_TYPE_LYSED, RESULTS_PATH_DISH_COCULTURE_SUBSET_LYSED, DISH_COCULTURE_SUBSET_OTHER_REALISTIC)

---

### 2.4 PLOT SUBSETTED  MONOCULTURE AND CO-CULTURE DISH DATA

The main plotting function (`plot_data`) iterates through each subsetted file (`.pkl`) in the data path and plots relevant data for each subset instance.

The function enables choosing which feature to color the data by. Choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset (ex, in the example `VITRO_DISH_TREAT_C_2D_X_50-50_1e-07_1000_NA.pkl` for the `monoculture dish` subset from the example above, plots would be colored by `DOSE` value automatically if `X` were selected in place of a specific feature).

#### WORKSPACE VARIABLES


Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting)
+ `...COLOR` variables indicate which feature to color the variables by (choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset)

In [None]:
# Monoculture dish workspace variables
DATA_PATH_DISH_MONOCULTURE_SUBSET_CELLS = 'path/to/dish/monoculture/files/subset/cells/'
DATA_PATH_DISH_MONOCULTURE_SUBSET_ENVIRONMENT = 'path/to/dish/monoculture/files/subset/environment/'
DATA_PATH_DISH_MONOCULTURE_SUBSET_SPATIAL = 'path/to/dish/monoculture/files/subset/spatial/'
DATA_PATH_DISH_MONOCULTURE_SUBSET_LYSED = 'path/to/dish/monoculture/files/subset/lysed/'

DISH_MONOCULTURE_COLOR = 'X'

RESULTS_PATH_DISH_MONOCULTURE_FIGURES_CELLS = 'path/to/dish/monoculture/figures/cells/'
RESULTS_PATH_DISH_MONOCULTURE_FIGURES_ENVIRONMENT = 'path/to/dish/monoculture/figures/environment/'
RESULTS_PATH_DISH_MONOCULTURE_FIGURES_SPATIAL = 'path/to/dish/monoculture/figures/spatial/'
RESULTS_PATH_DISH_MONOCULTURE_FIGURES_LYSED = 'path/to/dish/monoculture/figures/lysed/'

# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_SUBSET_CELLS = 'path/to/dish/coculture/files/subset/cells/'
DATA_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT = 'path/to/dish/coculture/files/subset/environment/'
DATA_PATH_DISH_COCULTURE_SUBSET_SPATIAL = 'path/to/dish/coculture/files/subset/spatial/'
DATA_PATH_DISH_COCULTURE_SUBSET_LYSED = 'path/to/dish/coculture/files/subset/lysed/'

DISH_COCULTURE_COLOR = 'X'

RESULTS_PATH_DISH_COCULTURE_FIGURES_CELLS = 'path/to/dish/coculture/figures/cells/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_ENVIRONMENT = 'path/to/dish/coculture/ffigures/environment/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_SPATIAL = 'path/to/dish/coculture/figures/spatial/'
RESULTS_PATH_DISH_COCULTURE_FIGURES_LYSED = 'path/to/dish/coculture/figures/lysed/'

#### PLOT  `MONOCULTURE DISH` AND `CO-CULTURE DISH` SIMULATIONS

In [None]:
import scripts.plot.plot_data

In [None]:
# Plot monoculture dish data
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_MONOCULTURE_SUBSET_CELLS, DISH_MONOCULTURE_COLOR, RESULTS_PATH_DISH_MONOCULTURE_FIGURES_CELLS)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_MONOCULTURE_SUBSET_ENVIRONMENT, DISH_MONOCULTURE_COLOR, RESULTS_PATH_DISH_MONOCULTURE_FIGURES_ENVIRONMENT)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_MONOCULTURE_SUBSET_SPATIAL, DISH_MONOCULTURE_COLOR, RESULTS_PATH_DISH_MONOCULTURE_FIGURES_SPATIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_MONOCULTURE_SUBSET_LYSED, DISH_MONOCULTURE_COLOR, RESULTS_PATH_DISH_MONOCULTURE_FIGURES_LYSED)

# Plot coculture dish data
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_CELLS, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_CELLS)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_ENVIRONMENT, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_ENVIRONMENT)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_SPATIAL, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_SPATIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_COCULTURE_SUBSET_LYSED, DISH_COCULTURE_COLOR, RESULTS_PATH_DISH_COCULTURE_FIGURES_LYSED)

#### EXAMPLE FIGURES

Example `MONOCULTURE DISH` cell counts figure.

Example `MONOCULTURE DISH` environment figure.

Example `MONOCULTURE DISH` spatial figure.

Example `MONOCULTURE DISH` lysis figure.

---

### 2.5 MULTI-FEATURE & OUTCOME ANALYSIS MONOCULTURE AND CO-CULTURE DISH DATA

The main outcome analysis function (`stats`) iterates through each subsetted file (`.pkl`) in the data path and plots and analyzes relevant data for each subset instance.

For these analyses, only full data (analyzed by cell counts) subsets were used. Meaning the only relevant files are as follows:

    VITRO_DISH_TREAT_C_2D_X_X_X_X_NA_STATES_ANALYZED.pkl
    VITRO_DISH_TREAT_CH_2D_X_X_X_X_0_STATES_ANALYZED.pkl
    VITRO_DISH_TREAT_CH_2D_X_X_X_X_100_STATES_ANALYZED.pkl
    VITRO_DISH_TREAT_CH_2D_X_X_X_X_X_STATES_ANALYZED.pkl

#### WORKSPACE VARIABLES

Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` or `.pdf` or `.xlsx` files generated by analyzing the data)
+ `AVERAGE` and `NOT_AVERAGE` indicate whether or not to average the data across replicates.

In [None]:
# Monoculture dish workspace variables
DATA_PATH_DISH_MONOCULTURE_FULL = 'path/to/dish/monoculture/subset/cells/all/'

RESULTS_PATH_DISH_MONOCULTURE_STATS = 'path/to/dish/monoculture/stats/figures/'
RESULTS_PATH_DISH_MONOCULTURE_STATS_AVERAGE = 'path/to/dish/monoculture/stats/figures/average/'

# Co-culture dish workspace variables
DATA_PATH_DISH_COCULTURE_FULL = 'path/to/dish/monoculture/subset/cells/all/'
DATA_PATH_DISH_COCULTURE_DEFINED_HA = 'path/to/dish/monoculture/subset/cells/defined/healthy/antigens/'

RESULTS_PATH_DISH_COCULTURE_STATS = 'path/to/dish/coculture/stats/figures/'
RESULTS_PATH_DISH_COCULTURE_STATS_AVERAGE = 'path/to/dish/coculture/stats/figures/average/'

# Average workspace variables
AVERAGE = True
NOT_AVERAGE = False

#### ANALYZE `MONOCULTURE DISH` AND `CO-CULTURE DISH` SIMULATIONS

In [None]:
import scripts.stats.stats

In [None]:
scripts.stats.stats.stats(DATA_PATH_DISH_MONOCULTURE_FULL, RESULTS_PATH_DISH_MONOCULTURE_STATS, average=NOT_AVERAGE)
scripts.stats.stats.stats(DATA_PATH_DISH_MONOCULTURE_FULL, RESULTS_PATH_DISH_MONOCULTURE_STATS_AVERAGE, average=AVERAGE)
scripts.stats.stats.stats(DATA_PATH_DISH_COCULTURE_FULL, RESULTS_PATH_DISH_COCULTURE_STATS, average=NOT_AVERAGE)
scripts.stats.stats.stats(DATA_PATH_DISH_COCULTURE_FULL, RESULTS_PATH_DISH_COCULTURE_STATS_AVERAGE, average=AVERAGE)
scripts.stats.stats.stats(DATA_PATH_DISH_COCULTURE_DEFINED_HA, RESULTS_PATH_DISH_COCULTURE_STATS, average=NOT_AVERAGE)
scripts.stats.stats.stats(DATA_PATH_DISH_COCULTURE_DEFINED_HA, RESULTS_PATH_DISH_COCULTURE_STATS_AVERAGE, average=AVERAGE)

#### EXAMPLE FIGURE

Example `MONOCULTURE DISH` HEATMAP

---

## 3. EXPERIMENTAL LITERATURE DATA PLOTTING

The main function (`plot_kill_curve_exp_data`) will plot experimental data kill curves using data extracted from a variety of reference papers. Each time this function is run the same output will be produced.

#### WORKSPACE VARIABLES

+ `RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES` variable indicates where to save plots based on extracted data from literature (`.svg` files as a result of plotting)

In [None]:
RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES = 'path/to/figures/experimental/literature/kill/curves/'

#### PLOT EXPERIMENTAL LITERATURE DATA

In [None]:
import scripts.plot.plot_data

In [None]:
scripts.plot.plot_data.plot_kill_curve_exp_data(RESULTS_PATH_EXPERIMENTAL_LITERATURE_KILL_CURVES)

#### EXAMPLE FIGURE

---

## 4. TISSUE DATA PROCESSING & PLOTTING

Only a subset of `tissue` simulations were generated. This set are those that showed effective treatment in the `realistic co-culture dish` context.

---

### 4.1 PARSE TISSUE DATA

The main parsing function (`parse`) iterates through each file in the data path and parses each simulation instance, extracting fields from the simulation setup, cells, and environment.

The parsed arrays are organized as:

`{
    "setup": {
        "radius": R,
        "height": H,
        "time": [],
        "pops": [],
        "types": [],
        "coords": []
    },
    "agents": (N seeds) x (T timepoints) x (H height) x (C coordinates) x (P positions),
    "environments": {
        "glucose": (N seeds) x (T timepoints) x (H height) x (R radius)
        "oxygen": (N seeds) x (T timepoints) x (H height) x (R radius)
        "tgfa": (N seeds) x (T timepoints) x (H height) x (R radius)
        "IL-2": (N seeds) x (T timepoints) x (H height) x (R radius)
    }
}
`

where each entry in the agents array is a structured entry of the shape:

`
"pop"       int8    population code
"type"      int8    cell type code
"volume"    int16   cell volume (rounded)
"cycle"     int16   average cell cycle length (rounded)
`
The `parse.py` file contains general parsing functions.

Parsing can take some time.

#### WORKSPACE VARIABLES

Set up workspace variables for parsing simulations.

+ `DATA_PATH` variables are the path to simulaiton output data files (`.tar.xz` files of compressed simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by parsing)

In [None]:
DATA_PATH_TISSUE_TAR = 'path/to/tissue/files/tars/'
RESULTS_PATH_TISSUE_PKL = 'path/to/tissue/files/pkls/'

#### PARSE `TISSUE` SIMULATIONS

In [None]:
import scripts.parse.parse

In [None]:
scripts.parse.parse.parse(DATA_PATH_TISSUE_TAR, RESULTS_PATH_TISSUE_PKL)

---

### 4.2 ANALYZE PARSED TISSUE DATA

Each main analyzing function (`analyze_cells`, `analyze_env`, `analyze_spatial`, and `analyze_lysis`) iterate through each parsed file (`.pkl`) in the data path and analyzes each simulation instance, extracting fields from the simulation setup, cells, and environment depending on the function.

`analyze_cells` collects cell counts for each population and state over time (files produced will end with `ANALYSED`).

`analyze_env` collects information on environmental species concentrations over time (files produced will end with `ENVIRONMENT`).

`analyze_spatial` collects cell counts for each population across simulation radii over time (files produced will end with `SPATIAL`).

`analyze_lysis` collects lysed cell information over time (files produced will end with `LYSED`).

#### WORKSPACE VARIABLES

Set up workspace variables for analyzing simulations.

+ `DATA_PATH...PARSED` variables are the path to parsed data files (`.pkl` files generated by parsing)
+ `DATA_PATH...LYSIS` variables are the path to `LYSIS` data files (`.LYSIS.json` files generated direclty from the simulation outputs)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by analyzing)

In [None]:
# Tissue workspace variables
DATA_PATH_TISSUE_PARSED = 'path/to/tissue/files/parsed/'
DATA_PATH_TISSUE_LYSIS = 'path/to/tissue/files/lysis/'
RESULTS_PATH_TISSUE_CELLS = 'path/to/tissue/files/cells/'
RESULTS_PATH_TISSUE_ENVIRONMENT = 'path/to/tissue/files/environment/'
RESULTS_PATH_TISSUE_SPATIAL = 'path/to/tissue/files/spatial/'
RESULTS_PATH_TISSUE_LYSED = 'path/to/tissue/files/lysed/'

#### ANALYZE `TISSUE` SIMULATIONS

In [None]:
import scripts.analyze.analyze_cells
import scripts.analyze.analyze_env
import scripts.analyze.analyze_spatial
import scripts.analyze.analyze_lysis

In [None]:
# Analyze tissue data
scripts.analyze.analyze_cells.analyze_cells(DATA_PATH_TISSUE_PARSED, RESULTS_PATH_TISSUE_CELLS)
scripts.analyze.analyze_env.analyze_env(DATA_PATH_TISSUE_PARSED, RESULTS_PATH_TISSUE_ENVIRONMENT)
scripts.analyze.analyze_spatial.analyze_spatial(DATA_PATH_TISSUE_PARSED, RESULTS_PATH_TISSUE_SPATIAL)
scripts.analyze.analyze_lysis.analyze_lysis(DATA_PATH_TISSUE_LYSIS, RESULTS_PATH_TISSUE_LYSED)

---

### 4.3 SUBSET ANALYZED TISSUE DATA

The main subsetting function (`subset_data`) takes in a given desired subset of data and iterates through each analyzed file in the data path (`.pkl`) and adds simulations matching the subset reqiresments to a single data file. Since not all possible combinations of the `tissue` data was collected, only the subset of all data is required.

#### WORKSPACE VARIABLES

Set up workspace variables for subsetting simulations.

+ `DATA_PATH` variables are the path to analyzed data files (`.pkl` files generated by analyzing data)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by subsetting)
+ `TISSUE_XML_NAME` variable is the strings (that vary based on data setup) that will preceed the file name extension (that vary based on subset selected)
+ `TISSUE_ALL` variable indicate to collect all data without subsetting
+ `DATA_TYPE` variables indicate which type of analyzed data is being fed into the subsetting function
+ `STATES` variable enable collecting only the relevant cell count and states data (thus excluding cell distribution data such as volumes and average cell cycle lengths)

In [None]:
# Tissue workspace variables
DATA_PATH_TISSUE_CELLS = 'path/to/tissue/files/cells/'
DATA_PATH_TISSUE_ENVIRONMENT = 'path/to/tissue/files/environment/'
DATA_PATH_TISSUE_SPATIAL = 'path/to/tissue/files/spatial/'
DATA_PATH_TISSUE_LYSED = 'path/to/tissue/files/lysed/'

TISSUE_XML_NAME = 'VIVO_TISSUE_TREAT_CH_2D'

RESULTS_PATH_TISSUE_SUBSET_CELLS = 'path/to/tissue/files/subset/cells/'
RESULTS_PATH_TISSUE_SUBSET_ENVIRONMENT = 'path/to/tissue/files/subset/environment/'
RESULTS_PATH_TISSUE_SUBSET_SPATIAL = 'path/to/tissue/files/subset/spatial/'
RESULTS_PATH_TISSUE_SUBSET_LYSED = 'path/to/tissue/files/subset/lysed/'

TISSUE_ALL = ''

# Types of analyses
DATA_TYPE_CELLS = 'ANALYZED'
DATA_TYPE_ENVIRONMENT = 'ENVIRONMENT'
DATA_TYPE_SPATIAL = 'SPATIAL'
DATA_TYPE_LYSED = 'LYSED'

STATES = True

#### SUBSET `TISSUE` SIMULATIONS

In [None]:
import scripts.subset.subset

In [None]:
# Subset tissue data
scripts.subset.subset.subset_data(DATA_PATH_TISSUE_CELLS, TISSUE_XML_NAME, DATA_TYPE_CELLS, RESULTS_PATH_TISSUE_SUBSET_CELLS, TISSUE_ALL, states=STATES)
scripts.subset.subset.subset_data(DATA_PATH_TISSUE_ENVIRONMENT, TISSUE_XML_NAME, DATA_TYPE_ENVIRONMENT, RESULTS_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_ALL)
scripts.subset.subset.subset_data(DATA_PATH_TISSUE_SPATIAL, TISSUE_XML_NAME, DATA_TYPE_SPATIAL, RESULTS_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_ALL)
scripts.subset.subset.subset_data(DATA_PATH_TISSUE_LYSED, TISSUE_XML_NAME, DATA_TYPE_LYSED, RESULTS_PATH_TISSUE_SUBSET_LYSED, TISSUE_ALL)

---

### 4.4 PLOT SUBSETTED TISSUE DATA

The main plotting function (`plot_data`) iterates through each subsetted file (`.pkl`) in the data path and plots relevant data for each subset instance.

The function enables choosing which feature to color the data by. Choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset. Since not all possible combinations of the `tissue` data was collected, each possible feature will need to be specified (and stored in different locations as the feature color is not stored in the file name).

#### WORKSPACE VARIABLES


Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting) and one for each feature colored by is listed per each type of data plotted
+ `TISSUE_COLOR` variables indicate which feature to color the variables by
+ `TISSUE_PARTIAL` indicates that only a partial set of a full combinatorial set of features is present and is a flag used for selecting which plots to make.

In [None]:
# Tissue workspace variables
DATA_PATH_TISSUE_SUBSET_CELLS = 'path/to/tissue/files/subset/cells/'
DATA_PATH_TISSUE_SUBSET_ENVIRONMENT = 'path/to/tissue/files/subset/environment/'
DATA_PATH_TISSUE_SUBSET_SPATIAL = 'path/to/tissue/files/subset/spatial/'
DATA_PATH_TISSUE_SUBSET_LYSED = 'path/to/tissue/files/subset/lysed/'

TISSUE_COLOR_DOSE = 'DOSE'
TISSUE_COLOR_TREAT_RATIO = 'TREAT RATIO'
TISSUE_COLOR_CAR_AFFINITY = 'CAR AFFINITY'
TISSUE_COLOR_ANTIGENS_CANCER = 'ANTIGENS CANCER'

RESULTS_PATH_TISSUE_FIGURES_CELLS_DOSE = 'path/to/dish/coculture/figures/cells/dose/'
RESULTS_PATH_TISSUE_FIGURES_CELLS_TREAT_RATIO = 'path/to/dish/coculture/figures/cells/treat/ratio/'
RESULTS_PATH_TISSUE_FIGURES_CELLS_CAR_AFFINITY = 'path/to/dish/coculture/figures/cells/car/affinity/'
RESULTS_PATH_TISSUE_FIGURES_CELLS_ANTIGENS_CANCER = 'path/to/dish/coculture/figures/cells/antigens/cancer/'

RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_DOSE = 'path/to/dish/coculture/figures/environment/dose/'
RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_TREAT_RATIO = 'path/to/dish/coculture/figures/environment/treat/ratio/'
RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_CAR_AFFINITY = 'path/to/dish/coculture/figures/environment/car/affinity/'
RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_ANTIGENS_CANCER = 'path/to/dish/coculture/figures/environment/antigens/cancer/'

RESULTS_PATH_TISSUE_FIGURES_SPATIAL_DOSE = 'path/to/dish/coculture/figures/spatial/dose/'
RESULTS_PATH_TISSUE_FIGURES_SPATIAL_TREAT_RATIO = 'path/to/dish/coculture/figures/spatial/treat/ratio/'
RESULTS_PATH_TISSUE_FIGURES_SPATIAL_CAR_AFFINITY = 'path/to/dish/coculture/figures/spatial/car/affinity/'
RESULTS_PATH_TISSUE_FIGURES_SPATIAL_ANTIGENS_CANCER = 'path/to/dish/coculture/figures/spatial/antigens/cancer/'

RESULTS_PATH_TISSUE_FIGURES_LYSED_DOSE = 'path/to/dish/coculture/figures/lysed/dose/'
RESULTS_PATH_TISSUE_FIGURES_LYSED_TREAT_RATIO = 'path/to/dish/coculture/figures/lysed/treat/ratio/'
RESULTS_PATH_TISSUE_FIGURES_LYSED_CAR_AFFINITY = 'path/to/dish/coculture/figures/lysed/car/affinity/'
RESULTS_PATH_TISSUE_FIGURES_LYSED_ANTIGENS_CANCER = 'path/to/dish/coculture/figures/lysed/antigens/cancer/'

TISSUE_PARTIAL = True

#### PLOT `TISSUE` SIMULATIONS

In [None]:
import scripts.plot.plot_data

In [None]:
# Plot tissue data
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_CELLS, TISSUE_COLOR_DOSE, RESULTS_PATH_TISSUE_FIGURES_CELLS_DOSE, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_CELLS, TISSUE_COLOR_TREAT_RATIO, RESULTS_PATH_TISSUE_FIGURES_CELLS_TREAT_RATIO, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_CELLS, TISSUE_COLOR_CAR_AFFINITY, RESULTS_PATH_TISSUE_FIGURES_CELLS_CAR_AFFINITY, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_CELLS, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_CELLS_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)

scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_COLOR_DOSE, RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_DOSE, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_COLOR_TREAT_RATIO, RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_TREAT_RATIO, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_COLOR_CAR_AFFINITY, RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_CAR_AFFINITY, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_ENVIRONMENT, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_ENVIRONMENT_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)

scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_COLOR_DOSE, RESULTS_PATH_TISSUE_FIGURES_SPATIAL_DOSE, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_COLOR_TREAT_RATIO, RESULTS_PATH_TISSUE_FIGURES_SPATIAL_TREAT_RATIO, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_COLOR_CAR_AFFINITY, RESULTS_PATH_TISSUE_FIGURES_SPATIAL_CAR_AFFINITY, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_SPATIAL, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_SPATIAL_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)

scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_LYSED, TISSUE_COLOR_DOSE, RESULTS_PATH_TISSUE_FIGURES_LYSED_DOSE, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_LYSED, TISSUE_COLOR_TREAT_RATIO, RESULTS_PATH_TISSUE_FIGURES_LYSED_TREAT_RATIO, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_LYSED, TISSUE_COLOR_CAR_AFFINITY, RESULTS_PATH_TISSUE_FIGURES_LYSED_CAR_AFFINITY, partial=TISSUE_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_LYSED, TISSUE_COLOR_ANTIGENS_CANCER, RESULTS_PATH_TISSUE_FIGURES_LYSED_ANTIGENS_CANCER, partial=TISSUE_PARTIAL)

#### EXAMPLE FIGURES

Example `TISSUE` cell counts figure.

Example `TISSUE` environment figure.

Example `TISSUE` spatial figure.

Example `TISSUE` lysis figure.

---

### 4.5 MULTI-FEATURE & OUTCOME ANALYSIS TISSUE DATA

The main outcome analysis function (`stats`) iterates through each subsetted file (`.pkl`) in the data path and plots and analyzes relevant data for each subset instance.

For these analyses, only full data (analyzed by cell counts) subsets were used. Meaning the only relevant file is as follows:

    VIVO_TISSUE_TREAT_C_2D_X_X_X_X_X_STATES_ANALYZED.pkl

#### WORKSPACE VARIABLES

Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` or `.pdf` or `.xlsx` files generated by analyzing the data)
+ `TISSUE_PARTIAL` indicates that only a partial set of a full combinatorial set of features is present and is a flag used for selecting which plots to make.
+ `AVERAGE` and `NOT_AVERAGE` indicate whether or not to average the data across replicates.

In [None]:
# Tissue workspace variables
DATA_PATH_TISSUE_PARTIAL = '/path/to/tissue/subset/cells/partial/'

RESULTS_PATH_TISSUE_STATS = 'path/to/tissue/stats/figures/'
RESULTS_PATH_TISSUE_STATS_AVERAGE = 'path/to/tissue/stats/figures/average/'

TISSUE_PARTIAL = True

# Average workspace variables
AVERAGE = True
NOT_AVERAGE = False

####  ANALYZE `TISSUE` SIMULATIONS

In [None]:
import scripts.stats.stats

In [None]:
scripts.stats.stats.stats(DATA_PATH_TISSUE_PARTIAL, RESULTS_PATH_TISSUE_STATS, partial=TISSUE_PARTIAL, average=NOT_AVERAGE)
scripts.stats.stats.stats(DATA_PATH_TISSUE_PARTIAL, RESULTS_PATH_TISSUE_STATS_AVERAGE, partial=TISSUE_PARTIAL, average=AVERAGE)

#### EXAMPLE FIGURE

Example `TISSUE` heatmap.

---

## 5. RANKED DATA PLOTTING

After finding the effective treatments from the `realistic co-culture dish` context and analyzing them in `tissue`, a score and rank for each simulation (averaged across replicates) were provided in both contexts and combined into a single `.xlsx` file for analysis.

The main function `plot_dish_tissue_compare_data` takes this `.xlsx` file as an input and generates parity and ladder plots for the rank and score of these simulations in the `dish` compared to the the `tissue` context.

#### WORKSPACE VARIALBLES

+ `DATA_PATH` variables are the path to excel data file (`.xlsx` file generated for this analysis)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting the data)
+ `COMPARE_COLOR` variables indicate which feature to color the variables by (choosing the color to be `X` will enable the function to automatically color the data based on each feature)

In [None]:
DATA_PATH_COMPARE_DISH_TISSUE_EXCEL = 'path/to/dish/tissue/compare/excel/file/file.xlsx'
RESULTS_PATH_COMPARE_DISH_TISSUE_FIGURES = 'path/to/dish/tissue/compare/figures/'

COMPARE_COLOR = 'X'

#### COMPARE `REALISTIC CO-CULTURE DISH` AND `TISSUE` DATA

In [None]:
import scripts.plot.plot_data

In [None]:
scripts.plot.plot_data.plot_dish_tissue_compare_data(DATA_PATH_COMPARE_DISH_TISSUE_EXCEL, COMPARE_COLOR, RESULTS_PATH_COMPARE_DISH_TISSUE_FIGURES)

#### EXAMPLE FIGURE

Example rank comparison figure.

---

## 6. CO-CULTURE DISH AND TISSUE SHARED LOCS DATA PROCESSING & PLOTTING

To further analyze the spatial differences between the dish and tissue contexts, an additional analysis on the effective treatments is compared to show the cell state dynamics over time for only those CAR T-cells that share locations with at least one cancer cell. This analysis pipeline paraells that for the full analyses above but a different type of data is collected at the analysis stage.

---

### 6.1 ANALYZE PARSED CO-CULTURE DISH AND TISSUE SHARED LOCS DATA

The main analyzing function used in this analysis (`analyze_cells`) iterates through each parsed file (`.pkl`) in the data path and analyzes each simulation instance, extracting fields from the simulation setup, cells, and environment depending on the function.

`analyze_cells` collects cell counts for each population and state over time (files produced will end with `ANALYSED`).

When the `sharedLocs` flag is set to `True`, only the data for CAR T-cells that share a location with at least one cancer cell is collected. When this flag is used, files produced will end with `SHAREDLOCS`.

#### WORKSPACE VARIABLES

Set up workspace variables for analyzing simulations.

+ `DATA_PATH...PARSED` variables are the path to parsed data files (`.pkl` files generated by parsing), where for the `co-culture dish` data, one may need to manually put the effective treatment files into a folder separate from the rest of the data.
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by analyzing)
+ `SHARED_LOCATIONS` indicates to only collect data for CAR T-cells that share a location with at least one cancer cell.

In [None]:
# Co-culture dish effective treatment workspace variables
DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS = 'path/to/dish/coculture/files/effective/treatments/parsed/'
RESULTS_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SHAREDLOCS = 'path/to/dish/coculture/files/effective/treatments/sharedlocs/'

# Tissue effective treatment workspace variables
DATA_PATH_TISSUE_PARSED = 'path/to/tissue/files/parsed/'
RESULTS_PATH_TISSUE_SHAREDLOCS = 'path/to/tissue/files/sharedlocs/'

# Shared locations workspace variables
SHARED_LOCATIONS = True

#### ANALYZE EFFECTIVE TREATMENT `REALISTIC CO-CULTURE DISH` AND `TISSUE` SIMULATIONS

In [None]:
import scripts.analyze.analyze_cells

In [None]:
scripts.analyze.analyze_cells.analyze_cells(DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS, RESULTS_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SHAREDLOCS, sharedLocs=SHARED_LOCATIONS)
scripts.analyze.analyze_cells.analyze_cells(DATA_PATH_TISSUE_PARSED, RESULTS_PATH_TISSUE_SHAREDLOCS, sharedLocs=SHARED_LOCATIONS)

---

### 6.2 SUBSET ANALYZED CO-CULTURE DISH AND TISSUE SHARED LOCS DATA

The main subsetting function (`subset_data`) takes in a given desired subset of data and iterates through each analyzed file in the data path (`.pkl`) and adds simulations matching the subset reqiresments to a single data file. Since not all possible combinations of the `tissue` data was collected and only the effective treatment `realistic co-culture dish` are desired, only the subset of all data is required.

#### WORKSPACE VARIABLES

Set up workspace variables for subsetting simulations.

+ `DATA_PATH` variables are the path to analyzed data files (`.pkl` files generated by analyzing data)
+ `RESULT_PATH` variables are the path for result files (`.pkl` files generated by subsetting)
+ `..._XML_NAME` variables are the strings (that vary based on data setup) that will preceed the file name extension (that vary based on subset selected)
+ `SHAREDLOCS_ALL` variable indicate to collect all data without subsetting
+ `DATA_TYPE` variables indicate which type of analyzed data is being fed into the subsetting function
+ `STATES` variable enable collecting only the relevant cell count and states data (thus excluding cell distribution data such as volumes and average cell cycle lengths)

In [None]:
# Co-culture dish effective treatment workspace variables
DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SHAREDLOCS = 'path/to/dish/coculture/files/effective/treatments/sharedlocs/'

DISH_COCULTURE_XML_NAME = 'VITRO_DISH_TREAT_CH_2D'

RESULTS_PATH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS = 'path/to/dish/coculture/subest/effective/treatments/sharedlocs/'

# Tissue effective treatment workspace variables
DATA_PATH_TISSUE_SHAREDLOCS = 'path/to/tissue/files/sharedlocs/'

TISSUE_XML_NAME = 'VIVO_TISSUE_TREAT_CH_2D'

RESULTS_PATH_TISSUE_SUBSET_SHAREDLOCS = 'path/to/tissue/subset/sharedlocs/'

# Shared locations workspace variables
DATA_TYPE_SHAREDLOCS = 'SHAREDLOCS'
SHAREDLOCS_ALL = ''
STATES = True

#### SUBSET EFFECTIVE TREATMENT `REALISTIC CO-CULTURE DISH` AND `TISSUE` SIMULATIONS

In [None]:
import scripts.subset.subset

In [None]:
scripts.subset.subset.subset_data(DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SHAREDLOCS, DISH_COCULTURE_XML_NAME, DATA_TYPE_SHAREDLOCS, RESULTS_PATH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS, subsetsRequested=SHAREDLOCS_ALL, states=STATES)
scripts.subset.subset.subset_data(DATA_PATH_TISSUE_SHAREDLOCS, TISSUE_XML_NAME, DATA_TYPE_SHAREDLOCS, RESULTS_PATH_TISSUE_SUBSET_SHAREDLOCS, subsetsRequested=SHAREDLOCS_ALL, states=STATES)

---

### 6.3 PLOT CO-CULTURE DISH AND TISSUE SHARED LOCS DATA

The main plotting function (`plot_data`) iterates through each subsetted file (`.pkl`) in the data path and plots relevant data for each subset instance.

The function enables choosing which feature to color the data by. Choosing the color to be `X` will enable the function to automatically color the data based on whichever features are not held constant in the subset. Since not all possible combinations of the `tissue` data was collected and only the effective treatment `realistic co-culture dish` are desired, each desired feature to color by will need to be specified (and stored in different locations as the feature color is not stored in the file name). In this analysis, only `ANTIGENS CANCER` was used.

#### WORKSPACE VARIABLES


Set up workspace variables for plotting simulations.

+ `DATA_PATH` variables are the path to subsetted data files (`.pkl` files generated by subsetting data)
+ `RESULT_PATH` variables are the path for result files (`.svg` files generated by plotting) and one for each feature colored by is listed per each type of data plotted
+ `SHAREDLOCS_COLOR` variables indicate which feature to color the variables by
+ `SHAREDLOCS_PARTIAL` indicates that only a partial set of a full combinatorial set of features is present and is a flag used for selecting which plots to make.

In [None]:
# Co-culture dish effective treatment workspace variables
DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS = 'path/to/dish/coculture/subset/effective/treatments/sharedlocs/'
RESULTS_PATH_COCULTURE_EFFECTIVE_TREATMENTS_FIGURES_SHAREDLOCS = 'path/to/dish/coculture/figures/effective/treatments/sharedlocs/'

# Tissue effective treatment workspace variables
DATA_PATH_TISSUE_SUBSET_SHAREDLOCS = 'path/to/tissue/subset/sharedlocs/'
RESULTS_PATH_TISSUE_FIGURES_SHAREDLOCS = 'path/to/tissue/figures/sharedlocs/'

# Shared locations workspace variables
SHAREDLOCS_COLOR = 'ANTIGENS CANCER'
SHAREDLOCS_PARTIAL = True

#### PLOT EFFECTIVE TREATMENT `REALISTIC CO-CULTURE DISH` AND `TISSUE` SIMULATIONS

In [None]:
import scripts.plot.plot_data

In [None]:
scripts.plot.plot_data.plot_data(DATA_PATH_DISH_COCULTURE_EFFECTIVE_TREATMENTS_SUBSET_SHAREDLOCS, SHAREDLOCS_COLOR, RESULTS_PATH_COCULTURE_EFFECTIVE_TREATMENTS_FIGURES_SHAREDLOCS, partial=SHAREDLOCS_PARTIAL)
scripts.plot.plot_data.plot_data(DATA_PATH_TISSUE_SUBSET_SHAREDLOCS, SHAREDLOCS_COLOR, RESULTS_PATH_TISSUE_FIGURES_SHAREDLOCS, partial=SHAREDLOCS_PARTIAL)

#### EXAMPLE FIGURE

Example `TISSUE` shared locations states data.

---

## 7. CO-CULTURE, TISSUE, AND GRAPH IMAGING

Images of the simulations in each context are generated to highlight the differences in cancer and healthy cell spatial distributions over time.

The main function `image` takes in a `.json` file output from the simulation and produces an image of either the populations, cell states, volume density, or graphs. For this analysis, only the population and graph figures were used. 

The population images were generated for the following files:
    
    VITRO_DISH_TREAT_CH_0_NA_NA_1000_100_00.json
    VIVO_TISSUE_TREAT_CH_0_NA_NA_1000_100_00.json

The graph images were generated for the following file:

    VIVO_TISSUE_TREAT_CH_0_NA_NA_1000_100_00.GRAPH.json
    
These represent the untreated `realistic co-culture dish` and `tissue` simulations.

#### WORKSPACE VARIABLES

+ `DATA_PATH` variables are the path to subsetted data files (`.json` files generated from simulation output)
+ `...TIMES` variables indicate which time points to make images at
+ `SIZE` variable indicates the size to make the image
+ `POPS_TO_IGNORE` indicate which cell population nubmers to ignore (where CAR T-cell populations are listed, but are not present in these untreated simulations)
+ `BGCOL` indicates what color to make the background of the image
+ `RADIUS` indicates the simulation radius out which to draw to (cells stop at radius 36, but the grpah exists within the margins and out to the full radius of 40)

In [None]:
# Untreated co-culture dish images
DATA_PATH_IMAGE_UNTREATED_COCULTURE_CELLS = 'path/to/untreated/coculture/cells/json/to/image/'
DATA_PATH_IMAGE_UNTREATED_COCULTURE_SVG = 'path/to/untreated/coculture/svg/to/save/'
DISH_TIMES = '0,4,7'

# Untreated tissue images
DATA_PATH_IMAGE_UNTREATED_TISSUE_CELLS = 'path/to/untreated/tissue/cells/json/to/image/'
DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG = 'path/to/untreated/tissue/svg/to/save/'
TISSUE_TIMES = '1,16,31'

# Population image specifications
SIZE = '5'
POPS_TO_IGNORE = '2,3'
BGCOL = '#FFFFFF'
RADIUS = '40'

# Tissue graph image specifications
DATA_PATH_IMAGE_UNTREATED_TISSUE_GRAPH = 'path/to/untreated/tissue/graph/json/to/image/'
DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG = 'path/to/untreated/tissue/svg/to/save/'

#### IMAGE UNTREATED `REALISTIC CO-CULTURE DISH` AND `TISSUE` DATA

In [None]:
import scripts.image.image

In [None]:
scripts.image.image.image(DATA_PATH_IMAGE_UNTREATED_COCULTURE_CELLS, DATA_PATH_IMAGE_UNTREATED_COCULTURE_SVG, size=SIZE, time=DISH_TIMES, ignore=POPS_TO_IGNORE, radius=RADIUS, pops=True)
scripts.image.image.image(DATA_PATH_IMAGE_UNTREATED_TISSUE_CELLS, DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG, size=SIZE, time=TISSUE_TIMES, ignore=POPS_TO_IGNORE, radius=RADIUS, pops=True)
scripts.image.image.image(DATA_PATH_IMAGE_UNTREATED_TISSUE_GRAPH, DATA_PATH_IMAGE_UNTREATED_TISSUE_SVG, size=SIZE, time=TISSUE_TIMES, radius=RADIUS, graph=True)

#### EXAMPLE FIGURE

Example untreated `REALISTIC COCULTURE DISH` image.