Structural representations of DNA regulatory substrates can enhance sequence-based algorithms by associating functional sequence variants
Link to paper: 10.1145/3388440.3412482
Figure. Schematic depiction of the (A) construction and (B) usage of structural representations. In a structural representation of a given DNA sequence, each central nucleotide position and its neighboring regions define a k-mer from 3 to 9 bp in length, and are encoded as an s-mer with n structural dimensions (S. dim.) that can be defined as a sequence of s-mer cluster centroids.
Supplementary Table S1 and Figure S1 are available here.
This repository contains scripts to reproduce the figures and analysis. The data is available at .
Requires Matlab v2017b (Bioinformatics Toolbox v4.9), Python v3.6 (numpy v1.15.4, pandas v0.24.2, pyfaidx v0.5.5.2, seaborn v0.9.0, scipy v1.1.0, biopython v1.72) and R v3.5 (DNAshapeR v1.10.0), or higher.
See also the repositories DNA_structural_variables predictor for prediction of DNA structural properties, Non-parametric_multivariate ANOVA_with bootstraps and oriT-Strast for the alignment algorithm.