# Extended field Reed-Solomon codebook generating Notebook - Colab version

This notebook runs on [Google Colab](https://colab.research.google.com/github/CaiGroup/UntanglingBarcodes/blob/master/codebook_generation/get_RS_codebooks/colab/gen_extended_codes_ext_fields_q9_n9-10_nmk4.ipynb) to generate Reed-Solomon codebooks with symbols from extended Finite Fields for seqFISH experiments

To run this notebook, you will need to change to a Julia runtime environment. To do that select from the drop down menu:

Runtime > Change runtime type

In the "Change runtime type"  prompt window, select "Julia" from the Runtime type drop-down menu. Click Save.

In [None]:
using Pkg
# This notebook does not work with new versions of Nemo
Pkg.add(name="Nemo", version="0.36.2");
Pkg.add("Combinatorics")
Pkg.add("DataFrames")
Pkg.add("CSV")

In [None]:
using Nemo
using LinearAlgebra
using Test
using Combinatorics
using DataFrames
using CSV


Welcome to Nemo version 0.36.2

Nemo comes with absolutely no warranty whatsoever



# Introduction

This notebook shows how to generate codebooks for seqFISH experiments using [Reed-Solomon Codes](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction). Reed-Solomon are part of a special class of error-correcting codes called [Maximum Distance Separable codes](https://en.wikipedia.org/wiki/Singleton_bound#MDS_codes) (MDS code) which achieve equality in the [Singleton bound](https://en.wikipedia.org/wiki/Singleton_bound). This means that MDS codes acheive the maximum possible extra difference between their codewords from every redundant parity check symbol, and gain the the most possible robustless to error for the increased cost of encoding information with more symbols.

The number of codewords of a given weight in an MDS code weight is given by the the expression 

$(q-1)\binom{w}{n}\sum_{i=0}^{w-d}(-1)^i \binom{i}{w-1}q^{w-d-i}$

(Macwilliams and Sloan)



In [3]:
function get_num_codewords_of_weight_vn(q,n,w,d)
    i = collect(0:(w-d))
    (q-1)*binomial(n,w)*sum(((-1) .^ i) .* (binomial.(w-1, i)) .* (q .^ ((w - d) .- i)))
end

get_num_codewords_of_weight_vn (generic function with 1 method)

This example notebook shows how to generate a code for seqFISH using a Reed-Solomon code over an extension field. In this example we create a code using an alphabet of 9 elements, which can be achieved using the second order extension field of the finite field of 3 elements.

# Specify Code parameters
Specify the paramters of the code that you want to generate. 

In [4]:
p = 3 # the number of elements in the base field
deg = 2 # the order of of the finite field, in this case 3^2 gives 9 elements
nmk = 4 # the number of parity check symbols
wmax = 6 # the maximum weight of codewords to save
wmin=5 # the minimum weight of codewords to save

5

In [5]:
function def_RS_code(p :: Int64, deg :: Int64, nmk :: Int64)
    @assert is_prime(p)
    @assert deg > 0
    @assert nmk > 0
    global q = p^deg
    global q_uint8 = UInt8(q)
    global n = q-1
    global k = (q-1) - nmk
    F, α = FiniteField(p, deg, "α")
    R, x = PolynomialRing(F, "x")
    RR =  ResidueRing(R, x^(q-1)-1)
    gp = 1
    for i = 1:nmk
        gp = gp*RR(x - α^i)
    end
    return F, RR, R, gp, x, α
end


function cvt_fq_nmod_2_int(x :: Nemo.fq_nmod)
    if iszero(x)
        return 0
    end
    for i = 1:(q-1)
        if iszero(i+x)
            return q-i
        end
    end 
end

function get_cw_array(cw)
    coeffs = Array{Union{Int8, Nothing}}(nothing, q-1)
    for i = 0:(q-2) #9
        coeffs[i+1] = cvt_fq_nmod_2_int(coeff(cw.data,i))
    end
    
    return coeffs
end

function get_num_codewords_of_weight(q,w,d)
    i = collect(0:(w-d))
    (q-1)*binomial(q-1,w)*sum(((-1) .^ i) .* (binomial.(w-1, i)) .* (q .^ ((w - d) .- i)))
end

get_num_codewords_of_weight (generic function with 1 method)

In [6]:
F, RR, R, gp, x, α = def_RS_code(p,deg,nmk)

(Finite field of degree 2 over GF(3), Residue ring of univariate polynomial ring modulo x^8 + 2, Univariate polynomial ring in x over GF(3^2), x^4 + (2*α + 2)*x^3 + x^2 + (2*α + 1)*x + α + 1, x, α)

In [7]:
gp

x^4 + (2*α + 2)*x^3 + x^2 + (2*α + 1)*x + α + 1

In [8]:
typeof(gp)

AbstractAlgebra.Generic.ResidueRingElem{fqPolyRepPolyRingElem}

In [9]:
F.([0,1,2,3])

4-element Vector{fqPolyRepFieldElem}:
 0
 1
 2
 0

In [10]:
F.(Matrix(I(2)))

2×2 Matrix{fqPolyRepFieldElem}:
 1  0
 0  1

In [11]:
k

4

Get the generator and parity check matrices of the base Reed-Solomon code and print the parity check matrix.

In [12]:
gm_fq_band_form = fill(α-α, k, q-1)
for i in 1:k
    shift = gp*x^(i-1)
    gm_fq_band_form[i,:] = coeff.(shift.data, 0:(q-2))
end
gm_fqn = matrix_space(F, k,q-1)(gm_fq_band_form)
H_fq = [(α^(i[1]))^(i[2]-1) for i in CartesianIndices((q-k-1, q-1))]
Matrix(H_fq)

4×8 Matrix{fqPolyRepFieldElem}:
 1  α        α + 1    2*α + 1  2  2*α    2*α + 2  α + 2
 1  α + 1    2        2*α + 2  1  α + 1  2        2*α + 2
 1  2*α + 1  2*α + 2  α        2  α + 2  α + 1    2*α
 1  2        1        2        1  2      1        2

Check that the generator and parity check matrices are indeed orthogonal to each other

In [13]:
@test all(iszero.(Matrix(gm_fqn)*transpose(H_fq)))

[32m[1mTest Passed[22m[39m

Get Parity check Matrices of singly and doubly extended code and print them.

In [14]:
ext1col = fill(F(0), q-k-1)
ext1col[end] = F(1)
ext2col = fill(F(0), q-k-1)
ext2col[1] = F(1)
H_fq_ext1 = hcat(ext1col, H_fq)

4×9 Matrix{fqPolyRepFieldElem}:
 0  1  α        α + 1    2*α + 1  2  2*α    2*α + 2  α + 2
 0  1  α + 1    2        2*α + 2  1  α + 1  2        2*α + 2
 0  1  2*α + 1  2*α + 2  α        2  α + 2  α + 1    2*α
 1  1  2        1        2        1  2      1        2

In [15]:
H_fq_ext2 = hcat(ext2col, H_fq_ext1)

4×10 Matrix{fqPolyRepFieldElem}:
 1  0  1  α        α + 1    2*α + 1  2  2*α    2*α + 2  α + 2
 0  0  1  α + 1    2        2*α + 2  1  α + 1  2        2*α + 2
 0  0  1  2*α + 1  2*α + 2  α        2  α + 2  α + 1    2*α
 0  1  1  2        1        2        1  2      1        2

Find the row-reduced echelon form (linear algebra terminology) or systematic form (coding theory terminology) of the parity check matrices, then print them.

In [16]:
H_fq_ext1_rref = rref(matrix_space(F, q-k-1, q)(H_fq_ext1))[2]
H_fq_ext2_rref = rref(matrix_space(F, q-k-1, q+1)(H_fq_ext2))[2]
Matrix(H_fq_ext1_rref)

4×9 Matrix{fqPolyRepFieldElem}:
 1  0  0  0  2*α + 1  2*α      2*α + 1  α + 1  2*α
 0  1  0  0  2*α + 2  2*α      2*α      α + 2  2*α + 1
 0  0  1  0  2*α      2*α + 1  2*α + 2  α + 2  2*α
 0  0  0  1  α + 2    α + 2    α        2*α    α + 1

In [17]:
Matrix(H_fq_ext2_rref)

4×10 Matrix{fqPolyRepFieldElem}:
 1  0  0  0  α + 1  α      α        2*α + 1  α + 2    2
 0  1  0  0  2*α    2*α    2*α + 2  α        2*α + 2  2
 0  0  1  0  α      2*α    2*α + 1  1        1        α + 2
 0  0  0  1  2      α + 1  α + 2    α + 2    2*α + 2  α + 2

In [18]:
Matrix(H_fq_ext1_rref)[:, (n-k+1):end]

4×5 Matrix{fqPolyRepFieldElem}:
 2*α + 1  2*α      2*α + 1  α + 1  2*α
 2*α + 2  2*α      2*α      α + 2  2*α + 1
 2*α      2*α + 1  2*α + 2  α + 2  2*α
 α + 2    α + 2    α        2*α    α + 1

Get the corresponding generator matrices for each extended code, and then print them.

In [19]:
gm_ext1 = hcat(transpose(-Matrix(H_fq_ext1_rref)[:, (n-k+1):end]), F.(Matrix(I(k+1))))
gm_ext2 = hcat(transpose(-Matrix(H_fq_ext2_rref)[:, (n-k+1):end]), F.(Matrix(I(k+2))))

6×10 Matrix{fqPolyRepFieldElem}:
 2*α + 2  α      2*α      1        1  0  0  0  0  0
 2*α      α      α        2*α + 2  0  1  0  0  0  0
 2*α      α + 1  α + 2    2*α + 1  0  0  1  0  0  0
 α + 2    2*α    2        2*α + 1  0  0  0  1  0  0
 2*α + 1  α + 1  2        α + 1    0  0  0  0  1  0
 1        1      2*α + 1  2*α + 1  0  0  0  0  0  1

In [20]:
gm_ext1

5×9 Matrix{fqPolyRepFieldElem}:
 α + 2    α + 1    α        2*α + 1  1  0  0  0  0
 α        α        α + 2    2*α + 1  0  1  0  0  0
 α + 2    α        α + 1    2*α      0  0  1  0  0
 2*α + 2  2*α + 1  2*α + 1  α        0  0  0  1  0
 α        α + 2    α        2*α + 2  0  0  0  0  1

Check that the generator matrices of the extended Reed-Solomon codes are indeed orthogonal to their parity check matrices.

In [21]:
transpose(gm_ext1) 

9×5 transpose(::Matrix{fqPolyRepFieldElem}) with eltype fqPolyRepFieldElem:
 α + 2    α        α + 2  2*α + 2  α
 α + 1    α        α      2*α + 1  α + 2
 α        α + 2    α + 1  2*α + 1  α
 2*α + 1  2*α + 1  2*α    α        2*α + 2
 1        0        0      0        0
 0        1        0      0        0
 0        0        1      0        0
 0        0        0      1        0
 0        0        0      0        1

In [22]:
Matrix(H_fq_ext1_rref)

4×9 Matrix{fqPolyRepFieldElem}:
 1  0  0  0  2*α + 1  2*α      2*α + 1  α + 1  2*α
 0  1  0  0  2*α + 2  2*α      2*α      α + 2  2*α + 1
 0  0  1  0  2*α      2*α + 1  2*α + 2  α + 2  2*α
 0  0  0  1  α + 2    α + 2    α        2*α    α + 1

In [23]:
H_fq_ext1_rref_mat = Matrix(H_fq_ext1_rref)
H_fq_ext2_rref_mat = Matrix(H_fq_ext2_rref)
@testset begin
    @test all(iszero.(H_fq_ext1_rref_mat * transpose(gm_ext1)))
    @test all(iszero.(H_fq_ext2_rref_mat * transpose(gm_ext2)))
end;

[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
test set      | [32m   2  [39m[36m    2  [39m[0m0.2s


Define function to find codewords of the desired weight in the codes and test that the results agree with theoretical parameters.

In [24]:
function find_codewords_of_desired_weights(G, H, q, k, n, n_extended, wmin, wmax)
    cbs = [zeros(F, 0, n+n_extended) for w in wmin:wmax]
    #cu_gmsp = CuArray{UInt16}(G[:, 1:(n-k)])
    #cu_H = CuArray{UInt16}(H)
    cu_gmsp = G[:, 1:(n-k)]
    cu_H = H
    println(cu_H)
    for nnmessage_nzeros = maximum([1, wmin-(n-k)]):minimum([wmax, k+n_extended])
        #message_nonzeros = CuArray{UInt16}(undef, (q-1)^nnmessage_nzeros, nnmessage_nzeros)
        message_nonzeros = F.(Array{UInt16}(undef, (q-1)^nnmessage_nzeros, nnmessage_nzeros))
        for i in 1:nnmessage_nzeros
            #message_nonzeros[:, i] .= vcat(repeat(CUDA.fill.(1:(q-1), (q-1)^(i-1)), (q-1)^(nnmessage_nzeros-i))...)
            message_nonzeros[:, i] .= vcat(repeat(fill.(α.^(1:(q-1)), (q-1)^(i-1)), (q-1)^(nnmessage_nzeros-i))...)
        end
        #message_array = CuArray{UInt16}(undef, (q-1)^nnmessage_nzeros, k+n_extended)
        message_array = F.(Array{UInt16}(undef, (q-1)^nnmessage_nzeros, k+n_extended))
        combs = combinations(1:(k+n_extended), nnmessage_nzeros)

        for comb in combs
            message_array .= F(0)
            message_array[:, comb] .= message_nonzeros
            
            pcs = message_array*cu_gmsp
            cws= hcat(pcs, message_array)
            @assert all(cu_H * transpose(cws) .== 0 )

            ws = sum(.~iszero.(cws), dims=2)
            ws = reshape(ws, length(ws))

            for (i, wi) in enumerate(wmin:wmax)
                cbs[i] = vcat(cbs[i], (cws[findall(w -> w == wi, ws), :]))
            end     
        end
    end

    @testset begin
        for (i, w) in enumerate(wmin:wmax)
            theoretical_ncws = get_num_codewords_of_weight_vn(q,q-1+n_extended,w,n-k+1)
            if theoretical_ncws > 0
                @test theoretical_ncws == size(cbs[i])[1]
                println("w$w: ", size(cbs[i])[1])
            else
                @test length(cbs[i]) == 0
            end
        end
    end

    return cbs
end

find_codewords_of_desired_weights (generic function with 1 method)

Find codebooks for the singly extended code.

In [25]:
cbs_ext1 = find_codewords_of_desired_weights(gm_ext1, H_fq_ext1_rref_mat, q, k, n, 1, wmin, wmax) 

fqPolyRepFieldElem[1 0 0 0 2*α + 1 2*α 2*α + 1 α + 1 2*α; 0 1 0 0 2*α + 2 2*α 2*α α + 2 2*α + 1; 0 0 1 0 2*α 2*α + 1 2*α + 2 α + 2 2*α; 0 0 0 1 α + 2 α + 2 α 2*α α + 1]
w5: 1008
w6: 2688
[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
test set      | [32m   2  [39m[36m    2  [39m[0m0.1s


2-element Vector{Matrix{fqPolyRepFieldElem}}:
 [1 2*α + 1 … 0 0; α 2 … 0 0; … ; 0 0 … 2*α α + 2; 0 0 … 2*α + 2 1]
 [α + 2 2 … 0 0; 2*α + 1 α … 0 0; … ; 0 2*α + 1 … 1 1; α + 1 0 … 1 1]

In [26]:
cbs_ext1

2-element Vector{Matrix{fqPolyRepFieldElem}}:
 [1 2*α + 1 … 0 0; α 2 … 0 0; … ; 0 0 … 2*α α + 2; 0 0 … 2*α + 2 1]
 [α + 2 2 … 0 0; 2*α + 1 α … 0 0; … ; 0 2*α + 1 … 1 1; α + 1 0 … 1 1]

In [27]:
cbs_ext2 = find_codewords_of_desired_weights(gm_ext2, H_fq_ext2_rref_mat, q, k, n, 2, wmin, wmax) 

fqPolyRepFieldElem[1 0 0 0 α + 1 α α 2*α + 1 α + 2 2; 0 1 0 0 2*α 2*α 2*α + 2 α 2*α + 2 2; 0 0 1 0 α 2*α 2*α + 1 1 1 α + 2; 0 0 0 1 2 α + 1 α + 2 α + 2 2*α + 2 α + 2]
w5: 2016
w6: 6720
[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
test set      | [32m   2  [39m[36m    2  [39m[0m0.0s


2-element Vector{Matrix{fqPolyRepFieldElem}}:
 [α + 2 α + 1 … 0 0; 1 2*α + 1 … 0 0; … ; 0 0 … 2*α α + 2; 0 0 … 2*α + 2 1]
 [2*α 2 … 0 0; 2*α + 1 2*α … 0 0; … ; 0 0 … 2*α 1; 0 0 … α + 2 1]

In [28]:
#powdict = Dict([α^i => "α" * string(i) for i in 1:(q-2)])
α_powers = ["α", "α²", "α³", "α⁴", "α⁵", "α⁶", "α⁷"]
powdict = Dict([α^i => α_powers[i] for i in 1:(q-2)])
powdict[α^(q-1)] = "1"
powdict[α - α] = "0"
get_cw(p) = map(fqelem -> powdict[fqelem], coeff.(p.data,0:(q-2))) 
powdict

Dict{fqPolyRepFieldElem, String} with 9 entries:
  α + 1   => "α²"
  0       => "0"
  2*α + 1 => "α³"
  2       => "α⁴"
  α       => "α"
  α + 2   => "α⁷"
  1       => "1"
  2*α     => "α⁵"
  2*α + 2 => "α⁶"

In [None]:

for (i, w) in enumerate(wmin:wmax) #[3,4,5,6])
    if length(cbs_ext1[i]) > 0
        cbdf = DataFrame(map(s -> powdict[s], cbs_ext1[i]), "block".*string.(1:(n+1)))
        insertcols!(cbdf, 1, ("gene_name" => 1:nrow(cbdf)))
        CSV.write("RS_q"*string(q)*"_n"*string(q)*"_k"*string(k+1)*"_w"*string(w)*"cb.csv", cbdf)
        println("saved RS_q"*string(q)*"_n"*string(q)*"_k"*string(k+1)*"_w"*string(w)*"cb.csv")
    end
end;




saved RS_q9_n9_k5_w5cb.csv
saved RS_q9_n9_k5_w6cb.csv


In [None]:
for (i, w) in enumerate(wmin:wmax) #[3,4,5,6])
    if length(cbs_ext2[i]) > 0
        cbdf = DataFrame(map(s -> powdict[s], cbs_ext2[i]), "block".*string.(1:(n+2)))
        insertcols!(cbdf, 1, ("gene_name" => 1:nrow(cbdf)))
        CSV.write("RS_q"*string(q)*"_n"*string(q+1)*"_k"*string(k+2)*"_w"*string(w)*"cb.csv", cbdf)
        println("saved RS_q"*string(q)*"_n"*string(q+1)*"_k"*string(k+2)*"_w"*string(w)*"cb.csv")
    end
end;

saved RS_q9_n10_k6_w5cb.csv
saved RS_q9_n10_k6_w6cb.csv


In [31]:
using DelimitedFiles
println("RS_q"*string(q)*"_n"*string(q)*"_k"*string(k+1)*"_H.csv")
open("RS_q"*string(q)*"_n"*string(q)*"_k"*string(k+1)*"_H.csv", "w") do io
    writedlm(io, map(s -> powdict[s], H_fq_ext1_rref_mat),",")
end

RS_q9_n9_k5_H.csv


In [32]:
using DelimitedFiles
println("RS_q"*string(q)*"_n"*string(q+1)*"_k"*string(k+2)*"_H.csv")
open("RS_q"*string(q)*"_n"*string(q+1)*"_k"*string(k+2)*"_H.csv", "w") do io
    writedlm(io, map(s -> powdict[s], H_fq_ext2_rref_mat),",")
end

RS_q9_n10_k6_H.csv
