# GLOSIMMultiLevel

In level1.dat etc we have different levels of protein hierarchy, as an CATH. So in level1 proteins are distinct classes, in level2 proteins share a class but have different architecture, in level3 proteins share an architecture but have different superfamily, and in level proteins are in the same superfamily. THe hypothesis is that SOAP tells us something useful about the protein, so if we get the similarity matrices at each level superfamilies should be similar, architectures less so, etc.

### Algo

- import datafiles
- for each level:
    - choose a group randomly (as level2, level3, level4 are lists of groups
    - for each protein, pull the pdb file and create an Atoms object
    - create an AtomList from the Atoms
    - write out a .xyz file
    - run glosim on the xyz file
    - save the glosim output

In [16]:
import glob
dataFiles = dict()
for i, dataFile in enumerate(["testproteins/level1.dat", "testproteins/level2.dat", "testproteins/level3.dat", "testproteins/level4.dat"]):
    with open(dataFile) as flines:
        dataFiles[i] = eval(flines.read())
print(dataFiles)

In [None]:
import random
import quippy
import ase
import requests

# level 0 is special
for i in [1,2,3]:
    inputProteins = dataFiles[i]
    # choose a sublist randomly
    inputProteinsSpecific = random.choice(inputProteins)
    print(inputProteinsSpecific)
    listOfAtoms = []
    for proteinId in inputProteinsSpecific:
        # Pull the pdb file
        pdb = proteinId[:4]
        chain = proteinId[4]
        print(pdb)
        url = "http://www.rcsb.org/pdb/files/{}.pdb".format(pdb)
        data = requests.get(url).text.split("\n")
        newData = []
        # Trim so it's single-chain
        for line in data:
            if line[:4] != "ATOM" or (line[:4] == "ATOM" and line[21] == chain):
                newData.append(line)
        with open("temp.pdb", 'w') as outflines:
            outflines.write("\n".join(newData))        
        #Create an Atoms object
        protein = quippy.Atoms(ase.io.read("temp.pdb", format='proteindatabank'))
        listOfAtoms.append(protein)
    
    # Make the AtomsList
    listOfAtoms = quippy.AtomsList(listOfAtoms)
    # Write an xyz file
    listOfAtoms.write("temp.xyz")
    # run glosim
    !python /usr/local/src/glosim/glosim.py /root/temp.xyz --kernel match --np 4 --prefix level{i}
    # save output?
