A number of different modules are offered to enable analysis of sequences. The function call for each is delineated below along with two representative sample datasets. In line with the representative examples included in the associated manuscript, a statistical copolymer and multiblock copolymer included as sample polymer sequences for each section. Below, we read in sequences of a blocky copolymer:

In [None]:
import numpy as np
blocky = np.loadtxt('sample_data/blocky_copolymer_sequences.csv', delimiter=' ')

To generate a monomer frequency plot, use:

In [None]:
from analysis_functions.sequence_statistics import MonomerFrequency
m = MonomerFrequency(blocky, 2) # the first argument is name of the data variable, the second argument is the number of monomers in the system
m.plot_frequency() # show monomer frequency plot

To generate a adjacency matrix (graph representation), use the following call:

In [None]:
from analysis_functions.kmer_representation import ConstructGraph
e = ConstructGraph(blocky, 2) # the first argument is the name of the data variable, the second argument is the number of monomers in the system
g = e.get_graph_as_heatmap(num_seq = 1000, segment_size=2) # constrauct the adjacency matrix, passing the number of sequences as the first argument and the desired segment size as the second

For global patterning metrics, we have used multiblock copolymers as a sample system. Data for the blocks d1 and d2 shown in the manuscript are included as a sample data.

In [None]:
d1 = np.loadtxt('sample_data/d1_sequences.csv', delimiter=' ')
d2 = np.loadtxt('sample_data/d2_sequences.csv', delimiter=' ')

Pairwise comparison metrics are demonstrated below. First the single monomer comparisons are implemented as follows

In [None]:
from analysis_functions.sequence_statistics import EnsembleSimilarity
e = EnsembleSimilarity(d1, d2, num_monomers=3) # first two arguments are the sequence ensembles to compare, the third specifies the number of monomers

s1, s2, s3 = e.global_difference() # similarity between distributions of all three monomers
s = e.global_difference(k=4) # similarity between coarse-grained representation of segment length of k = 4

Correlation functions are also offered:

In [None]:
e = EnsembleSimilarity(d1, d2, num_monomers=3)
d1_correlation, d2_correlation = e.correlation(3) # this function takes the index of monomer for an autocorrelation function
d1, d2 = e.correlation([1,3], corr_type='pair') # the first argument is the indexes of monomers for an pairwise correlation function, then use the 'corr' keyword

Finally, chemical patterning can be mapped out as follows:

In [None]:
m = MonomerFrequency(d1, 3)
pattern3 = m.chemical_patterning(features = [-0.4, 4.5, -4.5], method = 'mean') 
# the feature list is the corresponding chemical features (e.g., hydropathy) for each of the three corresponding monomers.