# Decoding Example

*note: this example was generated using data and a jupyter notebook that are freely available at the [SeqFISHSyndromeDecoding github repository](https://github.com/CaiGroup/SeqFISHSyndromeDecoding)*

In [None]:
using DataFrames
using CSV
using SeqFISHSyndromeDecoding
using GLPK

This notebook shows demonstrates how to use SeqFISHSyndromeDecoding. The example data was taken from the 561 channel of cell number 8 in position 4 of replicate 2 of the 2019 SeqFISH+ NIH3T3 cell experiment. This particular subset of the data was chosen for its small size.

First load the codebook that we will use to decode our sample data.

In [None]:
cb = DataFrame(CSV.File("../example_data/codebook_ch_561.csv"))
first(cb, 5)

Define the [parity check matrix](https://en.wikipedia.org/wiki/Parity-check_matrix) for the codebook

In [None]:
H = [1 1 -1 -1;]

We can verify that H is actually the parity check matrix of the codebook.

In [None]:
all(H * Matrix(cb[:,2:end])' .% 20 .== 0)

Next we can load the aligned points from each hybridization for our example cell.

In [None]:
pnts = DataFrame(CSV.File("../example_data/example_cell_points.csv"))
first(pnts, 5)

The SeqFISHSyndromeDecoding package requires that they hybridization column be UInt8s (to increase efficiency), and that
there be a z column (for generality to 3d data)

In [None]:
pnts.z = zeros(Float64, nrow(pnts))
pnts.hyb = UInt8.(pnts.hyb);

Next we initialize a ```DecodeParams``` object, and set the parameters

In [None]:
params = DecodeParams()

set_lat_var_cost_coeff(params, 8.0)
set_z_var_cost_coeff(params, 0.0)
set_lw_var_cost_coeff(params, 3.2)
set_s_var_cost_coeff(params, 0.0)
set_free_dot_cost(params, 1.0)

set_xy_search_radius(params, sqrt(size(H)[2]/6.0)*3)
set_z_search_radius(params, 0.0);

We can then decode

In [None]:
barcodes = decode_syndromes!(pnts, cb, H, params)
first(barcodes, 5)

Alternatively, if we aren't sure what parameters we want to use, we can save time by splitting decode_syndromes! into its two steps. First we can identify barcode candidates with the ```get_codepaths``` (named for the paths that candidate barcodes take the the decoding graph in figure 1a) function using the least strict parameter set that we are interested in.

In [None]:
candidates = get_codepaths(pnts, cb, H, params)
first(candidates, 5)

We can then use the ```choose_optimal_codepaths``` function to find the same barcodew that we found earlier

In [None]:
barcodes_again = choose_optimal_codepaths(pnts, cb, H, params, candidates, GLPK.Optimizer)
barcodes == barcodes_again

We can now also try choosing candidates using stricter parameters. This saves computation time by reducing the number of times that we have to run ```get_codepaths```.

In [None]:
strict_params = DecodeParams()
set_lat_var_cost_coeff(strict_params, 12.0)
set_z_var_cost_coeff(strict_params, 0.0)
set_lw_var_cost_coeff(strict_params, 4.8)
set_s_var_cost_coeff(strict_params, 0.0)
set_free_dot_cost(strict_params, 1.0)

set_xy_search_radius(strict_params, sqrt(size(H)[2]/6.0)*3)
set_z_search_radius(strict_params, 0.0);


stricter_barcodes = choose_optimal_codepaths(pnts, cb, H, strict_params, candidates, GLPK.Optimizer)
first(stricter_barcodes, 5)

We can compare the decoding results using the two different sets of parameters.

In [None]:
println("Number of gene encoding barcodes: ", sum(barcodes.gene_name .!= "negative_control"))
estimated_false_positive_rate = sum(barcodes.gene_name .== "negative_control")*sum(cb.gene_name .!= "negative_control")/sum(cb.gene_name .== "negative_control")/sum(barcodes.gene_name .!= "negative_control")
println("Estimated False Positive rate: ", estimated_false_positive_rate)

In [None]:
println("Number of gene encoding barcodes: ", sum(stricter_barcodes.gene_name .!= "negative_control"))
estimated_false_positive_rate = sum(stricter_barcodes.gene_name .== "negative_control")*sum(cb.gene_name .!= "negative_control")/sum(cb.gene_name .== "negative_control")/sum(stricter_barcodes.gene_name .!= "negative_control")
println("Estimated False Positive rate: ", estimated_false_positive_rate)

The less strict parameter set decodes about 40% more gene encoding barcodes at a cost of having twice the estimated false positive rate. Since the estimated false positive rate is still small, it is probably an acceptable trade off.

To save your results, use the ```CSV.write``` command.

In [None]:
CSV.write("example_results.csv", barcodes)