# Monomer Input Validation Testing
### Requirements:
A good input validation step should require that the input monomer smarts:
1. Be unambiguous. This means no "or" conditionals
2. Have all atoms with a single element contain:
    1. query connectivity (X?) with explicit connectivity that matches (even for H)
    2. formal charge (even if 0)
    3. NO other info/conditionals (send warning if this happens, but allow it). So if the User tries to send atomic mass of something that isn't supported, don't allow that
3. Require that all atoms:
    1. be connected
    2. have an atom map number
4. Require that all bonds have a single discrete bond order

Note that currently there is no restriction on where wild-type atoms exist and how many of them there are. Wildtype atoms are treated as a variable node not used for information/chemistry assignment, but them must be connected to all other nodes and have an atom number. 

### Approach
RDKit's internal atom query logic contains many of the tools necessary for validation, which is explored first. See the final few cells in this notebook for a working example for input validation and a few example cases. 


## 1. RDKit testing and query examples
From https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html:
Valid logical operators can be:
1. "not" -> `!`
2. "and" (high precedence) -> `&`
3. "or" -> `,`
4. "and" (low precedence) -> `;`

The only 2 operators that are allowed in this format are the 2 "and" operators. Rdkit's internal query logic represents this with a block of text and can be parsed:


In [51]:
from rdkit import Chem

In [79]:
smarts = "[C;D4;+0:1]-[C&D4&+0:3](-[C,D4,+0:2])(-[C&!D4&+0:4])-[*&X4&+0:5]"
qmol = Chem.MolFromSmarts(smarts)
qmol.GetAtomWithIdx(0).GetDegree()

1

In [88]:
def print_atom_info(a):
    print("=====================================================")
    print(f"          Atom with Idx {a.GetIdx()} and map num {a.GetAtomMapNum()}")
    print("=====================================================")
    print(f"degree is {a.GetDegree()}")
    print(f"formal charge is {a.GetFormalCharge()}")
    print(f"has query is {a.HasQuery()}")
    print(f"Query is desicribed as: \n{a.DescribeQuery()}")

In [89]:
print_atom_info(qmol.GetAtomWithIdx(0))

          Atom with Idx 0 and map num 1
degree is 1
formal charge is 0
has query is True
Query is desicribed as: 
AtomAnd
  AtomAnd
    AtomType 6 = val
    AtomExplicitDegree 4 = val
  AtomFormalCharge 0 = val



In [82]:

print_atom_info(qmol.GetAtomWithIdx(1))

          Atom with Idx 1 and map num 3
degree is 4
formal charge is 0
has query is True
Query is desicribed as: 
AtomAnd
  AtomAnd
    AtomType 6 = val
    AtomExplicitDegree 4 = val
  AtomFormalCharge 0 = val



In [83]:
print_atom_info(qmol.GetAtomWithIdx(2))

          Atom with Idx 2 and map num 2
degree is 1
formal charge is 0
has query is True
Query is desicribed as: 
AtomOr
  AtomOr
    AtomType 6 = val
    AtomExplicitDegree 4 = val
  AtomFormalCharge 0 = val



In [84]:
print_atom_info(qmol.GetAtomWithIdx(3))

          Atom with Idx 3 and map num 4
degree is 1
formal charge is 0
has query is True
Query is desicribed as: 
AtomAnd
  AtomAnd
    AtomType 6 = val
    AtomExplicitDegree 4 != val
  AtomFormalCharge 0 = val



In [85]:
print_atom_info(qmol.GetAtomWithIdx(4))

          Atom with Idx 4 and map num 5
degree is 1
formal charge is 0
has query is True
Query is desicribed as: 
AtomAnd
  AtomTotalDegree 4 = val
  AtomFormalCharge 0 = val



Smarts queries can then be enforced by looking for specific keywords in the query such as "!=", "AtomOr". "AtomTotalDegree" will be used (X? format) in all atoms to ensure that atoms with an explicit element are explicitly specified. 

## Setting queries
Setting queries is most usefull when handling transitions from user-inputted monomer formats to the internal json representation of monomers

In [92]:
a = qmol.GetAtomWithIdx(0)
# print(a.GetSmarts())
# new_a = Chem.AtomFromSmarts("[C&X3&+1:4]")
# a.SetQuery(new_a)
# print(a.GetSmarts())
a.ExpandQuery()

ArgumentError: Python argument types in
    QueryAtom.ExpandQuery(QueryAtom)
did not match C++ signature:
    ExpandQuery(RDKit::QueryAtom* self, RDKit::QueryAtom const* other, Queries::CompositeQueryType how=rdkit.Chem.rdchem.CompositeQueryType.COMPOSITE_AND, bool maintainOrder=True)

## 2. Working example

In [None]:
def is_connected(rdmol):
    # determine if the rdmol object has all atoms connected
    # input: rdmol.Chem molecule object
    # ouput: True if graph is connected, False if not
    found_atom_ids = set()
    # perform simple graph search over the graph
    queue = []
    while len(queue) != 0:
        atom_id = queue.pop()
        found_atom_ids.add(atom_id)
        atom = rdmol.GetAtomWithIdx(atom_id)
        for neighbor in atom.GetNeighbors():
            n_idx = neighbor.GetIdx()
            if n_idx not in queue and n_idx not in found_atom_ids:
                queue.append(n_idx)
    if len(found_atom_ids) == rdmol.GetNumAtoms():
        return True
    else:
        return False

def is_valid_monomer(smarts):
    def is_valid_query(query_str):
        if "!=" in query_str:
            return False
        elif "AtomOr" in query_str:
            return False
        if "AtomType" not in query_str:
            return False
        elif "AtomTotalDegree" not in query_str:
            return False
        elif "AtomFormalCharge" not in query_str:
            return False

    qmol = Chem.MolFromSmarts(smarts)

    if not is_connected(qmol):
        return False
    
    for atom in qmol.GetAtoms():
        if not is_valid_query(atom.DescribeQuery()):
            return False
    return True


## 3. Examples