# Combinatorial Polymer Design

The goal of this study is to **create all the possible logical combinations of monomeric sugars to create mock-up SRUs**. Then, a theoretical characterization of their structural parameters, like surface area and polarizability will be carried out by *first principia* analysis.

For now, we will deal only with linear chains.

https://docs.python.org/3/library/itertools.html#itertools-recipes

### Vocabulary

- $M =$ multiplicity, or unique monomers in a structure
- $B =$ blocks, or total number of monomers in a structure


### Constraints

- $B \geqslant M$, for the simple reason that there cannot be less total blocks than the amount of uniques present.


### Remarks
- Combinations when $M=B=1$ can be discarded, due to the intrinsic repeating pattern property of an SRU.


- Combinations when $M=B=2$, without repetition, can be discarded, because 1D spatial expansion results in the same combination by sorting.

>*E.g.* $AB = BA$, because  $ [ABAB]-AB-[ABAB] = [BABA]-BA-[BABA]$


- Combinations when $M=B\geqslant3$, without repetition, **cannot** be discarded, because 1D spatial expansion results in different combinations, and symmetry is broken.


>*E.g.* $ABC \not= ACB$, because $[ABCABC]-ABC-[ABCABC]$ yields:
>
>\begin{cases}
ABC \\
BCA \\
CAB
\end{cases}
>
> ... and $[ACBACB]-ACB-[ACBACB]$ yields:
>
>\begin{cases}
ACB \\
CBA \\
BAC
\end{cases}
>
> which are only interconvertible through a symmetry-breaking rule, that is, by covalent breaking and arrangement of glycosidic linkages.

- Combinations when $M = B-1$ for $B\geqslant3$, but with sorting, can be discarded, because 1D spatial expansion results in the same combination by sorting.

>*E.g.* $AAB = ABA = BAA$, because spatial expansion of $\color{blue}{AAB}$ yields:
>
>\begin{cases}
AAB \\
ABA \\
BAA
\end{cases}
>
> spatial expansion of $\color{red}{ABA}$ yields:
>
>\begin{cases}
ABA \\
BAA \\
AAB
\end{cases}
>
> and spatial expansion of $\color{orange}{BAA}$ yields:
>
>\begin{cases}
BAA \\
AAB \\
ABA
\end{cases}
>
> which, from a point of view of neighboring monomers influencing their surface areas, polarizability and water interaction, they are equivalent. In fact, all combinations can be found in the same polymer chain, and in whichever order of observation (denoted by |).
>
>$$...AAB|\color{blue}{AAB}|AAB|AAB|A\color{red}{AB}|\color{red}{A}AB|AA\color{orange}{B}|\color{orange}{AA}B|AAB|AAB...$$
>$$......A\color{orange}{BA}|\color{orange}{A}BA|ABA|\color{red}{ABA}|AB\color{blue}{A}|\color{blue}{AB}A|ABA|ABA|ABA...$$
>$$.........BA\color{red}{A}|\color{red}{BA}A|BAA|\color{orange}{BAA}|BAA|B\color{blue}{AA}|\color{blue}{B}AA|BAA...$$
>
>Notice that the left and right neighboring sequences for AAB are identical, regardless of which triplet block (AAB, ABA or BAA) was used for chain construction. This logic applies to ABA and BAA as well, thus supporting the idea that they are fully interconvertible by symmetry operations.

----------------------

In [14]:
import numpy as np
import pandas as pd
import itertools 

In [41]:
# DATA IMPORT
data = pd.read_excel(
    "data/mda18feb2021.xlsm", sheet_name=3)

monomers = pd.read_excel(
    "data/mda18feb2021.xlsm", sheet_name=4)

# select only finished rows and columns
data = data[:102].iloc[:, 0:158]

# create subsets of data
identity = data.iloc[:, :19]
biometrics = data.iloc[:, 19:43]
polymer = data.iloc[:, 43:112]
properties = data.iloc[:, 112:143]
outcome = data.iloc[:, 143:147]
calculations = data.iloc[:, 147:]

# Create a list of cationic, anionic, and uncharged monomers for composition radar charts
cationic=[]
anionic=[]
neutral=[]

for i in monomers.index:
    monomer = monomers['sugar'][i]
    charge = monomers['Physiological Charge'][i]
    if charge == -1:
        anionic.append(monomer)
    elif charge == 1:
        cationic.append(monomer)
    else:
        neutral.append(monomer)

print('Positively charged monomers:', ', '.join(cationic))
print('Negatively charged monomers:', ', '.join(anionic))
print('Uncharged monomers:', ', '.join(neutral))

Positively charged monomers: GalN, GlcN, ManN
Negatively charged monomers: GalA, GlcA
Uncharged monomers: Ara, Rib, Xyl, Rha, Fuc, Fru, Gal, Glc, Man, GalNAc, GlcNAc, Tre, Alt, QuiNAc, Kdo


In [42]:
input = cationic + anionic + neutral

output = sum([list(map(list, itertools.combinations(input, i))) for i in range(len(input) + 1)], [])

outputDF = pd.DataFrame(output)
outputDF

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,,,,,,,,,,,,,,,,,,,,
1,GalN,,,,,,,,,,,,,,,,,,,
2,GlcN,,,,,,,,,,,,,,,,,,,
3,ManN,,,,,,,,,,,,,,,,,,,
4,GalA,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048571,GalN,GlcN,ManN,GlcA,Ara,Rib,Xyl,Rha,Fuc,Fru,Gal,Glc,Man,GalNAc,GlcNAc,Tre,Alt,QuiNAc,Kdo,
1048572,GalN,GlcN,GalA,GlcA,Ara,Rib,Xyl,Rha,Fuc,Fru,Gal,Glc,Man,GalNAc,GlcNAc,Tre,Alt,QuiNAc,Kdo,
1048573,GalN,ManN,GalA,GlcA,Ara,Rib,Xyl,Rha,Fuc,Fru,Gal,Glc,Man,GalNAc,GlcNAc,Tre,Alt,QuiNAc,Kdo,
1048574,GlcN,ManN,GalA,GlcA,Ara,Rib,Xyl,Rha,Fuc,Fru,Gal,Glc,Man,GalNAc,GlcNAc,Tre,Alt,QuiNAc,Kdo,


In [43]:
# what's missing: with repetition, with shuffling
# then substract cases on constraints that can be discarded
# then input realistic cases. for instance, there hasnt been an SRU with average B>10 and M>6