# Decoding Reed-Solomon Codes Example

This notebook contains an example of how to decode Reed-Solomon encoded seqFISH data in our [preprint](https://doi.org/10.1101/2025.06.10.658913).

## Instructions
To run this notebook, you will need to change to the Julia runtime environment. To do that select from the drop down menu:

Runtime > Change runtime type

In the "Change runtime type"  prompt window, select "Julia" (not julia x.xx.xx) from the Runtime type drop-down menu. Click Save.

*note: this example was generated using data and a jupyter notebook that are freely available at the [SeqFISHSyndromeDecoding github repository](https://github.com/CaiGroup/SeqFISHSyndromeDecoding) and on [Google colab](https://colab.research.google.com/github/CaiGroup/SeqFISHSyndromeDecoding.jl/blob/master/example_notebook/colab/example_decode_RS_colab.jl.ipynb)*


In [None]:
using Pkg
Pkg.add(name="DataFrames")
Pkg.add(name="GLPK")
Pkg.add(url="https://github.com/CaiGroup/SeqFISHSyndromeDecoding.jl")
using DataFrames
using CSV
using SeqFISHSyndromeDecoding
using GLPK
using DelimitedFiles
using Downloads

This notebook shows demonstrates how to use SeqFISHSyndromeDecoding.jl. The smallest cell with the fewest dots in our Reed-Solomon encoded experiment, chosen for computational convienience. We also reduce computation time by using the highest lateral position variance reported in our manuscript with half the search radius used in the manuscript computations. The larger positional variance penalty would prohibit most additional candidate barcodes found with larger search radius.

First, load the codebook that we will use to decode our sample data.

In [None]:
cb = DataFrame(CSV.File(Downloads.download("https://raw.githubusercontent.com/CaiGroup/SeqFISHSyndromeDecoding.jl/refs/heads/master/example_data/full_RS_q11_k7_half_pool_cb.csv")))
println(first(cb, 5))

Define the [parity check matrix](https://en.wikipedia.org/wiki/Parity-check_matrix) for the codebook

In [None]:
H = readdlm(Downloads.download("https://raw.githubusercontent.com/CaiGroup/SeqFISHSyndromeDecoding.jl/refs/heads/master/example_data/RS_q11_k7_H.csv"), ',', UInt8)

We can verify that H is actually the parity check matrix of the codebook.

In [None]:
all(H * Matrix(cb[:,2:end])' .% 11 .== 0)

Next we can load the aligned points from each hybridization for our example cell.

In [None]:
pnts = DataFrame(CSV.File(Downloads.download("https://raw.githubusercontent.com/CaiGroup/SeqFISHSyndromeDecoding.jl/refs/heads/master/example_data/example_RS_cell_points.csv")))
filter!(pnt -> ~ismissing(pnt.pseudocolor), pnts)
pnts.block = UInt8.(pnts.block)
select!(pnts, Not([:ch,:hyb]))
SeqFISHSyndromeDecoding.sort_readouts!(pnts)
println(first(pnts, 5))

Next we initialize a ```DecodeParams``` object, and set the parameters

In [None]:
params = DecodeParams()

set_zeros_probed(params, false)
set_lat_var_cost_coeff(params, 7.0)
set_z_var_cost_coeff(params, 0.0)
set_lw_var_cost_coeff(params, 0.0)
set_s_var_cost_coeff(params, 0.0)
set_free_dot_cost(params, 1.0)
set_n_allowed_drops(params, 0)

set_xy_search_radius(params, 2)
set_z_search_radius(params, 0.0);

We can then decode

In [None]:
barcodes = decode_syndromes!(pnts, cb, H, params);
println(first(barcodes, 5))

Alternatively, if we aren't sure what parameters we want to use, we can save time by splitting decode_syndromes! into its two steps. First we can identify barcode candidates with the ```get_codepaths``` (named for the paths that candidate barcodes take the the decoding graph in figure 1a) function using the least strict parameter set that we are interested in.

In [None]:
candidates = get_codepaths(pnts, cb, H, params);
println(first(candidates, 5))


We can then use the ```choose_optimal_codepaths``` function to find the same barcodew that we found earlier

In [None]:
barcodes_again = choose_optimal_codepaths(pnts, cb, H, params, candidates, GLPK.Optimizer)
barcodes == barcodes_again

We can now also try choosing candidates using stricter parameters. This saves computation time by reducing the number of times that we have to run ```get_codepaths```.

In [None]:
strict_params = DecodeParams()

set_zeros_probed(strict_params, false)
set_lat_var_cost_coeff(strict_params, 10.0)
set_z_var_cost_coeff(strict_params, 0.0)
set_lw_var_cost_coeff(strict_params, 0.0)
set_s_var_cost_coeff(strict_params, 0.0)
set_free_dot_cost(strict_params, 1.0)
set_n_allowed_drops(strict_params, 0)

set_xy_search_radius(strict_params, 2)
set_z_search_radius(strict_params, 0.0);

stricter_barcodes = choose_optimal_codepaths(pnts, cb, H, strict_params, candidates, GLPK.Optimizer)
println(first(stricter_barcodes, 5))

We can compare the decoding results using the two different sets of parameters. For brevity, we use gene encoding barcodes found in decoding runs that include searches for negative control barcodes, which differs from the procedure described in our manuscript in which datasets are also decoded with the negative control codewords ommitted from the codebook.

In [None]:
println("Number of gene encoding barcodes: ", sum(barcodes.gene .!= "negative_control"))
estimated_false_discovery_rate = sum(barcodes.gene .== "negative_control")*sum(cb.gene .!= "negative_control")/sum(cb.gene .== "negative_control")/sum(barcodes.gene .!= "negative_control")
println("Estimated False Discovery rate: ", estimated_false_discovery_rate)

In [None]:
println("Number of gene encoding barcodes: ", sum(stricter_barcodes.gene .!= "negative_control"))
estimated_false_discovery_rate = sum(stricter_barcodes.gene .== "negative_control")*sum(cb.gene .!= "negative_control")/sum(cb.gene .== "negative_control")/sum(stricter_barcodes.gene .!= "negative_control")
println("Estimated False Discovery rate: ", estimated_false_discovery_rate)

The less strict parameter set decodes about 40% more gene encoding barcodes at a cost of having twice the estimated false discovery rate. Since the estimated false positive rate is still small, it is probably an acceptable trade off.

To save your results, use the ```CSV.write``` command.

In [None]:
CSV.write("example_RS_results.csv", barcodes)