### Creating the seed data set

Starting from complete trEMBL dataset <span style='background:#f7f3f7;padding:0.4em;border-radius:2px; border:solid bgrey 1px'>arwen:/mobi/group/NOX_CH/data/uniprot_trembl.fasta.gz</span> which is a symbolic link for `arwen:/mobi/group/databases/flat/uniprot_trembl_2019_02.fasta.gz`
 *  Split the dataset in small volumes
     * script: <span style="color:green">**split.py**</span>
     * Usage:
     Create and go to the `/mobi/group/NOX_GL/volumes` 
```console
    ROOT_DIR=/mobi/group/NOX_CH
    SCRIPT_DIR=/mobi/group/NOX_CH/nox-analysis/scripts
    $SCRIPT_DIR/split.py $ROOT_DIR/data/uniprot_trembl.fasta.gz
```

 * Run the HMMR and TMHMM annotations
    * script: <span style="color:green">**runHMMR_slurm.sh**</span>
    * Usage:  
  
```console
    mkdir $ROOT_DIR/seedSet
    mkdir $ROOT_DIR/seedSet/work
    $SCRIPT_DIR/runHMMR_slurm.sh $ROOT_DIR/volumes $ROOT_DIR/seedSet/work $ROOT_DIR/data/profiles
```

 * Use this notebook to parse the _work_ folder (see **Parsing all data files** section)

    * Filter-out non eukaryotic entries and dump the corresponding fasta sequence in folder <span style='background:#f7f3f7;padding:0.4em;border-radius:2px; border:solid bgrey 1px'>/mobi/group/NOX_CH/seedSet/NOX_noEukaryota</span> (create directory before)
         


 * Preparing folders/sbatch scripts for pairwise N&W across the set of __NOX_noEukaryota__ fasta sequences
    * script: <span style="color:green">**runEMBOSS_slurm.sh**</span>
    * Usage:
```console
mkdir $ROOT_DIR/seedSet/NOX_noEukaryota_needlePairwise_work
$SCRIPT_DIR/runEMBOSS_slurm.sh $ROOT_DIR/seedSet/NOX_noEukaryota NOX_noEukaryota $ROOT_DIR/seedSet/NOX_noEukaryota_needlePairwise_work
```

 * Concatenate all fasta sequences in a single file, clusters redundant sequences
```console
     cat $ROOT_DIR/seedSet/NOX_noEukaryota/*.fasta > $ROOT_DIR/seedSet/NOX_noEukaryota.mfasta
    mmseqs createdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota.mfasta /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb 
    mmseqs cluster /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100 /Volumes/arwen$ROOT_DIR/seedSet/tmp_NOX_noEukaryota_clust100 -c 1 
    mmseqs createtsv /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb  /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100  /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100.tsv --full-header
    mmseqs result2repseq /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100 /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100_seq 
    mmseqs result2flat /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_mmseqsdb /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100_seq  /Volumes/arwen$ROOT_DIR/seedSet/NOX_noEukaryota_clust100.fasta --use-fasta-header
 ```
 
* Enrich the datacontainer with redundant information, see **Add redundant informations** section   


* Perform full Pfam annotation

```console
     sbatch $SCRIPT_DIR/runHMMSCAN.sbatch /mobi/group/databases/hmmr/Pfam-A.hmm $ROOT_DIR/seedSet/NOX_noEukaryota_clust100.fasta $ROOT_DIR/seedSet/NOX_noEukaryota_clust100_hmmscan.out
```


 * Enrich the datacontainer with these new annotation, see **Read in additional PFAM annotations // Erase previous** section


 * Use the [Taxonomy notebook](http://localhost:8888/notebooks/NOX/Taxonomy.ipynb) to output a hierarchal tree
     * link the output json file as $latest.json$


 * Start adhoc http server
   
   Go to `~/work/projects/NOX`
```console
node index.js
``` 

* Visualize w/ D3 at `localhost:9615`
 
### Creating the extended data set


* Perform a psiblast on fasta files present in <span style='background:#f7f3f7;padding:0.4em;border-radius:2px; border:solid bgrey 1px'>arwen:/mobi/group/NOX_GL/seedSet/NOX_noEukaryota</span>

    * Create the `extendedSet` folder

    * script: <span style="color:green">**runPsiBlast_slurm.sh**</span>
    * Usage:
```console
$SCRIPT_DIR/runPsiBlast_slurm.sh $ROOT_DIR/seedSet/NOX_noEukaryota $ROOT_DIR/extendedSet/psiblastWork
```

* Browse all the psiblast workfolder and eliminate strictly identical proteins
    * Go to `$ROOT/extendedSet`
    * script:<span style="color:green">**makePsiBlastNR.py**</span>
    * Usage:
```console
python $SCRIPT_DIR/makePsiBlastNR.py ./psiblastWork ./NOX_noEukaryota_PB_NR.fasta > makePsiBlastNR.log
```
* Perform a full PFAM annotation
```console
hmmscan NOX_noEukaryota_PB_NR.fasta /mobi/group/databases/hmmr/Pfam-A.hmm > NOX_noEukaryota_PB_NR_hmmscan.out
```

In [2]:
%matplotlib inline
import sys, os
sys.path.append("/Users/chilpert/Work/pyproteinsExt/src")
sys.path.append("/Users/chilpert/Work/pyproteins/src")
%load_ext autoreload

In [8]:
import gzip, io
import urllib.request

def mFastaParseZip(inputFile):
    data = None
    with io.TextIOWrapper(gzip.open(inputFile, 'r')) as f:
        data = mFastaParseStream(f)
    return data

def mFastaParseUrl(url):
    fp = urllib.request.urlopen(url)
    mybytes = fp.read()
    #mFastaParseStream(fp)
    mystr = mybytes.decode("utf8")
    fp.close()
    data = mFastaParseStream(mystr.split('\n'))
    
#    print(mystr)
    return data

def mFastaParseStream(stream):
    
    data = {}    
    headPtr = ''
    for line in stream:
        #print (line)
        if line == '':
            continue
        s = line.replace('\n','')
        if s.startswith('>'):
            headPtr = s.split()[0][1:]
            
            if headPtr in data:
                raise ValueError('Smtg wrong')
            data[headPtr] = {'header': s, 'sequence' : '' }
            
            continue
        data[headPtr]['sequence'] += s
    return data

#mFastaParseUrl('http://www.uniprot.org/uniprot/S4Z6V5.fasta')
#data = mFastaParse('/Volumes/arwen/home/ygestin/prositetask-backup/alignTrembl/bibl/Trembl_47/Trembl_47.fasta.gz')
#test=None
#with open('/Volumes/arwen/mobi/group/NOX_GL/work/uniprot_trembl_v11/hmmsearch.fasta', 'r') as f:
#    test = mFastaParseStream(f)

In [9]:
import re

def num(s):
    try:
        return int(s)
    except ValueError:
        return float(s)
    
    
reTMH = re.compile('^(\# ){0,1}([\S]+)[\s]+([\S].*)[\s]+([\d\.]+)$')
def loadTMHMM(lDir):
    
    fastaContainer = None
    with open( lDir+ '/hmmsearch.fasta', 'r') as f:
        fastaContainer = mFastaParseStream(f)
    
    file = lDir+ '/tmhmm.out'
    data = {}
    with open(file, 'r') as f:
        for l in f:
            m = reTMH.search(l)
            if m:
                _id = m.groups()[1] 
                if _id not in data:
                    if _id not in fastaContainer:
                        raise ValueError("Misisng fasta for tmhmm prediction")
                    data[_id] = {'hCount':0 ,
                                'helix':[], 'fasta' : fastaContainer[_id],
                                'mask': '-' * len(fastaContainer[_id]['sequence'])
                                }
                
                if not m.groups()[2].startswith('TMHMM2'):
                    data[_id][re.sub('[\s]*:[\s]*$', '',m.groups()[2])] = num(m.groups()[3])
                    continue
                
                
                m2 = m.groups()[2].split('\t')
                if not m2:
                    raise ValueError('could not parse helix line')
                helixCoor =  {'volume' : m2[1], 
                              'start'  : num(m2[2].replace(' ', '')),
                              'stop'   : num(m.groups()[3]) 
                            }
                data[_id]['helix'].append(helixCoor)
                
                
                data[_id]['helix'].append(helixCoor)
                #print (data[_id]['mask']) 
                l_1 = len(data[_id]['mask'])
                buf = list(data[_id]['mask'])
                symbol = None
                if helixCoor['volume'] == 'TMhelix':
                    data[_id]['hCount'] += 1
                    #symbol = 'H'
                    symbol = str(data[_id]['hCount']) if data[_id]['hCount'] < 10 else str(data[_id]['hCount'])[-1]
                elif helixCoor['volume'] == 'inside':
                    symbol = 'i'
                elif helixCoor['volume'] == 'outside':
                    symbol = 'e'
                else :
                    raise ValueError("unknown symbol " + helixCoor['volume'])

                i=helixCoor['start'] - 1
                j=helixCoor['stop']
                #print(i,j,len(buf))
                toAdd = symbol * (j - i)
                buf[i:j] =  list(toAdd)#helixCoor['stop'] - helixCoor['start'] + 1
                data[_id]['mask'] = ''.join(buf)
                if len(data[_id]['mask']) != l_1:
                    print("ERROR ", _id, l_1, len(data[_id]['mask']), '>>', i, j, '<<')
                    print (len(buf[i:j]), len(list(toAdd)), symbol, '-->', toAdd )
                #print(data[_id]['mask'])
    
    #        Hcluster(data)
    return data
#d = loadTMHMM('/Volumes/arwen/home/ygestin/prositetask-backup/alignTrembl/bibl/Trembl_47')
#d = loadTMHMM('/Volumes/arwen/mobi/group/NOX_GL/work_sample/uniprot_trembl_v11')
#d

In [10]:
def HIS_clust(data, min=2, max=7):
    for _id in data:
        data[_id]['Htest'] = {'status' : False, 'data' : [] }

        #Discard unwanted numbe of helices
        if data[_id]['hCount'] < min or data[_id]['hCount'] > max:
            #print('Wrong helices number ', _id, data[_id]['hCount'])
            continue
        
        H_status = []
        iMax = len(data[_id]['mask'])
        # internal error check
        if len(data[_id]['mask']) != len(data[_id]['fasta']['sequence']) :
            print( len(data[_id]['mask']), len(data[_id]['fasta']['sequence']) )
            print(_id, data[_id])
            raise ValueError("")
        # Select only residues that are Histidine within TMH
        for i in range(0, iMax):
            if data[_id]['mask'][i] == "i" or  data[_id]['mask'][i] == "e":
                continue
            if not data[_id]['fasta']['sequence'][i] == "H":
                continue
            H_status.append( [i, data[_id]['mask'][i], False] )
        # Pairwise comparaison between Histidine of the same helix, marking pairs separated by 12 to 14 residues
        for i in range (0, len(H_status) - 1):
            for j in range (i + 1, len(H_status)):
                if H_status[i][1] != H_status[j][1]:
                    continue
                d = H_status[i][0] - H_status[j][0]
                if d >= 12 or d <= 14:
                    H_status[i][2] = True
                    H_status[j][2] = True
        
        #print(H_status)
        # Only keep marked histidine
        H_status = [ x for x in H_status if x[2] ]
        # Create a dicitinary where keys are Helices numbers
        H_groups = {}
        for x in H_status:
            if not x[2]:
                continue
            if x[1] not in H_groups:
                H_groups[x[1]]=[]
            H_groups[x[1]].append(x)
        
        # The test is passed if at least two distinct helices feature at least one correctly spaced histidine pair
        # ie : if the helice dictionary has more than 1 entrie
        #print(H_status)
        #print("-->", H_groups)
        HisTestBool = True if len(H_groups) > 1 else False
        
        data[_id]['Htest']['status'] = HisTestBool
        data[_id]['Htest']['data'] = H_groups
    return data

#m = HIS_clust(d)
#print(len([ m[x] for x in m if m[x]['Htest']['status'] ]), len(m))

# Parsing all data files 

### Parsing HMMR data
NB: There are stdout of 3 consecutive hmmr calls

All in a single **data** container

In [117]:
import pyproteinsExt.hmmrContainerFactory as hm
import glob
dataDir=glob.glob('/Volumes/arwen/mobi/group/NOX_CH/seedSet/work/uniprot_trembl_v*')

data = hm.parse(inputFile=dataDir[0] + '/hmmsearch.out')
i=0

for iDir in dataDir[1:]:
    #print(iDir)
    data += hm.parse(inputFile=iDir + '/hmmsearch.out')
    i += 1
    #if i == 1:
     #   break   

   [No individual domains that satisfy reporting thresholds (although complete target did)]




## Loading TMHMM data

In [12]:
dataTMHMM = {}
for lDir in dataDir:
    d = loadTMHMM(lDir)
    if set( dataTMHMM.keys() ) & set( d.keys() ):
        print('doublons')
    dataTMHMM.update(d)

dataTMHMM = HIS_clust(dataTMHMM)

##### Transform a PFAM domain indexed data structure in a protein indexed data structure
Then filter out the protein that feature the 3 domains


In [126]:
T = data.T()
D = {}
fad=0
nad=0
ferric=0
for protein in T:
    if len(T[protein]) == 3:
           D[protein] = T[protein]
    for dom in T[protein]: 
        if dom == "PF08022_full":
            fad+=1
        elif dom == "PF01794_full": 
            ferric+=1
        elif dom == "PF08030_full": 
            nad+=1
        else: 
            print("OOOO")
        #if dom == "PF08022_full":
            
print('Number of proteins entries featuring FAD',fad)
print('Number of proteins entries featuring NAD',nad)
print('Number of proteins entries featuring Ferric reductase',ferric)

Number of proteins entries featuring FAD 77203
Number of proteins entries featuring NAD 121386
Number of proteins entries featuring Ferric reductase 59209


3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
2
3
3
2
3
3
3
2
2
2
2
3
1
2
3
2
1
2
2
3
2
2
2
1
3
3
3
2
1
2
1
2
2
2
2
2
2
2
3
2
1
2
2
2
2
2
2
3
2
1
2
2
2
1
2
2
2
2
1
2
2
1
2
2
1
2
2
2
1
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
1
2
1
1
2
2
2
2
1
2
2
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
1
1
2
1
2
2
1
2
1
1
2
2
2
1
1
2
1
2
2
2
2
2
2
1
1
1
1
2
1
2
1
1
1
1
1
2
2
2
1
1
1
2
1
2
2
2
2
2
2
1
2
1
2
2
1
1
2
2
2
3
2
2
2
2
1
1
1
2
2
1
1
1
1
1
2
2
2
1
2
2
1
2
1
1
1
2
1
1
2
1
2
2
1
2
2
2
1
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
2
2
1
1
1
1
2
1
2
2
1
1
2
1
1
1
1
2
2
2
2
1
1
1
1
1
2
2
2
2
2
2
1
2
1
2
1
1
2
2
1
2
2
1
2
1
1
2
1
1
1
2
1
1
2
1
1
1
2
1
2
2
1
2
1
2
2
2
2
1
2
2
1
1
1
2
2
2
1
1
2
2
2
2
2
1
2
1
2
2
1
1
1
2
1
1
1
2
1
1
2
1
2
2
2
2
1
2
1
1
2
2
2
1
2
2
2
2
2
2
1
1
1
2
2
2
1
2
2
1
2
2
1
2
2
1
1
1
1
1
2
2
2
2
2
1
2
1
1
1
2
1
2
2
1
1
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
2
2
3
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
2
2
1
2
2
1
1
1
1
1
1
1
1
2
2
1
1
1
2
1
1
2
2
2
2
1
2
2
1
1
1
2
2
1
2
2
2
1
2
1
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
1
1
2
1
2
2
2
1
2
2
2
2
2
2
1
1
2
2
1
1
2
1
2
2
1
1
2
2
1
1
1
1
1
1
2
2
2
2
2
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
2
3
3
3
3
2
2
3
3
2
2
2
2
2
3
3
3
2
2
2
2
1
1
2
2
2
2
2
1
3
2
2
2
2
1
1
2
2
2
2
2
2
2
2
3
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
2
1
1
2
2
1
2
2
2
1
1
2
2
1
2
1
2
3
2
1
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
2
2
2
2
2
2
1
2
1
1
2
2
2
2
2
2
2
2
2
1
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
1
1
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
2
1
2
2
1
2
2
2
2
2
2
2
2
1
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
1
2
2
1
2
1
2
2
2
2
1
1
2
2
1
2
2
2
1
1
1
2
1
2
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
2
2
1
1
2
1
2
1
1
1
1
1
1
1
2
2
2
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
2
2
2
2
2
2
2
3
2
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
2
2
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
3
3
3
3
3
3
3
3
3
3
2
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
2
2
2
2
3
2
2
2
2
3
3
2
3
2
2
3
3
2
2
2
3
3
3
2
3
3
3
2
2
3
3
3
2
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
3
2
3
2
3
2
3
3
2
1
3
2
1
2
2
3
3
2
3
2
2
2
3
2
3
3
3
3
3
1
1
3
1
1
1
1
1
1
2
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
3
3
3
2
3
3
1
2
3
3
3
3
3
3
1
2
2
2
3
2
1
1
1
1
1
1
2
1
1
1
1
1
2
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3


3
2
3
3
3
3
3
3
3
2
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
2
3
3
1
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
1
3
3
3
3
2
2
3
1
1
1
2
2
2
2
2
3
2
2
2
3
2
2
2
2
2
2
1
2
2
2
3
2
2
2
1
2
2
2
3
2
2
2
2
2
2
1
3
3
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
1
2
1
2
1
1
2
2
1
1
3
2
2
1
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
2
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
2
2
2
1
1
1
2
2
1
2
2
2
2
2
2
2
3
2
2
2
2
2
1
2
1
2
2
1
2
1
2
2
2
2
2
2
2
2
1
2
1
1
2
2
2
1
2
2
2
1
2
1
2
2
2
2
2
1
1
1
2
2
2
1
1
2
2
2
1
2
2
2
1
2
1
1
2
2
2
1
1
1
1
1
2
2
2
2
2
2
2
1
1
1
1
2
1
2
2
1
2
2
2
2
2
2
2
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
2
3
3
2
3
3
3
2
1
1
3
2
2
2
2
2
2
2
2
2
2
2
1
1
2
1
2
2
2
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
1
2
1
2
1
2
1
2
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
2
3
2
2
3
3
3
3
3
2
2
2
2
2
1
2
2
1
2
2
2
1
1
1
1
1
2
2
1
1
1
1
1
2
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
3
2
1
2
3
2
3
1
3
2
1
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
2
1
1
2
1
2
2
2
2
1
2
1
2
2
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
1
2
1
2
1
2
2
1
2
2
2
1
2
1
2
1
2
2
2
2
2
1
1
2
2
1
2
1
2
2
1
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
1
2
2
2
2
2
2
2
2
2
1
2
1
2
1
2
2
2
2
2


2
2
2
1
1
2
2
2
2
2
2
2
2
1
1
2
1
2
2
2
1
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
1
1
1
1
2
1
2
1
2
2
2
2
1
2
1
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
1
2
2
2
2
2
1
2
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
1
1
2
2
1
2
2
2
2
1
2
2
2
2
2
2
1
2
1
2
1
2
2
2
2
2
1
2
2
1
2
2
2
2
2
1
1
2
1
1
2
1
2
2
2
1
2
2
2
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
2
2
2
2
1
1
2
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
1
2
1
2
2
2
2
2
2
1
1
2
2
2
2
2
1
1
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
2
2
1
1
2
1
2
2
2
1
2
1
1
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
1
1
1
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
2
2
1
1
2
1
1
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
1
1
1
2
2
1
2
1
1
1
1
1
2
2
1
1
1
1
1
2
2
2
2
2
1
2
2
1
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
1
2
2
1
2
1
2
2
2
1
1
1
2
2
2
2
2
2
1
1
2
2
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
2
2
1
1
2
2
2
2
2
2
2
3
2
2
2
2
2
1
1
1
2
2
2
1
3
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
1
1
2
1
2
1
2
2
2
1
1
1
2
2
1
2
2
2
1
1
2
1
1
1
2
2
1
2
1
2
2
1
1
2
1
1
2
1
2
1
2
2
1
2
1
1
1
2
1
2
1
1
1
3
1
1
2
1
1
1
1
2
1
2
2
1
1
2
2
1
1
1
2
1
2
2
2
1
1
1
1
1
1
2
2
2
1
1
1
2
1
2
2
2
2
1


2
2
1
1
2
1
1
2
2
2
2
2
1
1
2
2
2
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
2
3
3
2
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
2
3
3
2
3
3
2
3
2
2
3
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
3
1
1
1
1
1
2
1
2
2
2
3
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
1
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
2
2
2
2
2
2
2
1
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
3
3
3
2
3
3
3
3
3
3
3
3
2
2
2
2
3
2
1
1
2
2
1
2
2
2
2
1
1
1
2
3
3
2
2
2
1
1
1
2
1
2
2
2
1
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
1
2
1
1
2
1
2
2
2
2
1
1
1
2
2
2
1
2
2
2
2
2
1
2
2
2
2
1
1
2
2
2
2
2
2
2
1
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
1
2
2
2
2
1
2
2
2
1
1
2
1
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
2
2
1
2
2
1
1
1
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
1
1
2
2
2
2
1
2
1
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
1
2
2
2
2
2
2
1
1
1
1


3
1
1
2
2
2
2
2
2
2
2
2
2
2
1
2
2
3
2
2
2
1
2
3
2
1
2
1
2
1
2
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3


2
1
2
1
1
1
1
1
2
1
2
2
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
2
1
1
2
1
1
1
2
1
1
1
1
1
1
1
2
2
1
1
1
2
1
1
1
1
1
1
1
1
1
2
2
1
2
1
1
1
1
1
1
1
1
2
1
2
1
2
2
1
2
2
1
2
2
1
2
1
1
2
2
2
2
2
1
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
2
3
3
3
3
3
1
2
3
3
3
3
3
3
3
2
1
2
3
2
2
2
1
2
3
1
2
3
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
1
1
2
2
1
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
1
2
1
2
2
1
2
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
1
1
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
1
2
2
2
2
1
2
1
1
1
1
2
2
2
2
2
2
2
2
1
2
1
2
1
2
1
1
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
1
2
1
1
1
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
1
1
1
1
2
1
1
2
1
1
2
1
1
1
1
1
2
2
2
2
2
1
1
2
2
2
2
1
2
1
2
2
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
2
2
2
2
3
1
2
1
2
2
2
2
2
2
2
2
2
1
1
1
2
1
1
1
1
2
1
1
2
2
1
2
2
2
2
1
1
2
2
2
1
2
1
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
1
1
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
2
2
1
2
1
2
2
1
1
2
2
1
1
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
1
2
2
2
2
2
2
2
2
1
2
2
2
1
1
2
2
1
2
2
1
2
2
2
2
2
2
2
1
1
2
2
1
1
2
2
2
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
2
3
3
2
2
2
3
3
3
3
3
3
3
2
3
2
2
2
3
3
3
3
3
3
2
3
2
2
3
3
3
3
3
3
3
2
3
2
2
3
3
3
3
2
2
3
2
3
3
2
2
2
2
2
2
3
2
3
3
2
3
3
3
3
2
2
2
2
3
2
3
2
2
2
3
3
2
3
3
2
2
3
1
2
2
2
2
2
2
3
2
3
1
2
1
2
1
2
2
1
2
2
1
2
2
3
3
2
2
3
1
2
2
2
1
2
2
2
2
3
2
2
1
2
2
2
2
2
1
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
2
1
2
1
1
1
1
1
1
2
2
1
2
2
1
1
1
1
1
1
1
1
1
2
1
1
1
1
2
1
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
2
1
1
2
2
2
1
2
2
1
1
1
2
2
2
2
1
2
1
2
2
2
2
1
1
2
2
2
2
2
1
2
2
2
1
1
1
2
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
2
3
3
3
3
3
3
2
2
2
3
3
2
2
2
2


2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
1
2
2
2
1
2
2
1
2
2
1
2
1
2
2
2
1
2
2
1
1
2
2
2
2
2
2
1
2
1
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
1
1
1
1
1
2
2
2
1
1
2
2
2
2
1
2
2
2
2
2
2
2
1
1
2
2
2
2
1
2
2
1
2
1
2
2
2
2
2
2
1
2
1
1
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
1
1
2
2
2
2
2
2
2
2
1
2
2
1
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
1
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
1
2
1
2
2
1
1
2
2
1
2
1
2
2
2
1
2
2
1
2
2
1
1
2
1
2
2
1
1
2
2
2
1
2
1
2
2
2
2
1
2
2
2
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
1
2
1
2
1
1
1
2
1
2
2
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
1
2
1
2
2
2
1
2
2
2
1
1
2
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
2
3
2
2
3
3
3
3
3
3
2
3
3
2
3
3
2
2
3
3
1
2
2
2
1
1
2
2
2
2
1
2
2
2
2
2
1
1
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
2
1
2
2
1
2
2
1
2
2
3
2
2
2
1
2
2
2
1
2
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
2
3
3
2
3
3
3
3
3
3
3
3
3
2
2
2
1
2
2
2
2
3
1
2
2
1
2
2
3
2
2
2
2
2
2
2
1
2
2
2
2
2
1
1
2
2
2
2
2
1
2
2
2
1
1
2
2
2
1
2
2
2
1
1
1
2
2
2
2
2
2
1
2
2
1
2
2
2
1
2
1
1
1
2
2
2
2
1
1
2
2
1
2
2
2
1
2
2
2
2
1
2
2
2
2
2
1
1
1
2
2
2
2
2
2
2
1
1
2
2
2
2
1
1
2
1
2
1
2
2
2
2
1
1
2
2
1
1
2
1
1
2
2
1
1
2
2
1
2
2
2
1
2
2
2
2
2
1
2
2
2
1
2
2
2
1
2
2
2
1
1
2
2
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
1
1
1
1
1
1
1
2
1
1
1
1
2
2
2
2
1
2
1
1
1
2
2
2
2
1
1
2
2
1
2
1
2
2
1
2
2
1
1
2
2
2
1
2
1
2
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
2
3
2
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
2
3
3
3
3
2
1
2
2
2
1
2
1
2
2
2
1
2
2
2
2
2
1
1
1
2
2
1
1
2
1
1
2
2
2
1
2
2
2
2
2
2
2
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
3
2
3
2
3
3
3
3
3
3
3
2
3
3
3
3
2
3
3
2
3
1
3
2
3
2
2
2
2
1
2
2
3
2
3
2
1
2
2
1
2
2
2
2
2
1
1
2
1
1
2
2
1
2
2
1
3
2
2
2
1
1
2
2
2
2
2
3
1
2
2
2
1
2
2
2
2
2
1
2
2
2
1
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
1
1
2
2
2
2
1
1
2
2
2
1
2
2
2
1
1
2
2
1
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
1
1
2
2
1
2
2
2
2
2
2
1
2
2
1
2
2
2
2
1
1
2
1
2
2
2
2
2
2
2
1
2
2
2
2
1
2
1
2
1
2
2
1
2
2
2
1
2
1
2
1
2
1
1
2
2
2
1
1
2
1
2
2
2
1
2
2
1
2
1
2
1
2
1
2
2
2
2
1
2
2
2
1
2
2
2
1
2
1
1
1
2
2
2
2
2
2
2
1
2
1
1
1
2
1
1
2
2
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
2
3
3
2
3
2
3
2
3
1
1
2
1
2
1
2
2
2
2
2
1
3
2
2
2
2
2
1
2
2
2
1
1
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
2
1
1
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
1
2
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
1
1
2
2
2
2
2
2
2
1
2
1
1
2
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
1
1
2
2
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
1
2
2
2
2
2
2
2
1
1
1
1
1
2
2
2
1
2
1
1
1
2
2
2
2
2
1
2
1
1
1
1
2
2
2
1
2
2
2
2
2
2
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
2
2
2
1
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
1
2
1
1
2
2
1
2
2
2
2
2
2
2
2
2
2
1
1
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
1
2
2
2
1
2
2
2
2
2
2
1
2
2
1
2
1
2
2
2
2
1
1
1
1
2
2
2
1
2
2
1
1
2
1
2
1
1
2
2
2
2
1
1
2
2
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
1
1
2
2
1
2
2
1
2
2
2
1
2
2
2
2
2
2
1
1
2
2
1
1
1
1
1
1
2
2
2
2
1
2
2
1
2
2
2
1
2
2
2
2
1
2
2
1
2
2
1
2
2
1
2
1
2
1
2
1
1
1
1
2
2
2
1
2
2
2
2
1
2
1
2
2
2
1
2
1
1
2
2
2
1
2
2
1
1
2
2
2
2
1
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
1
2
2
1
2
2
2
1
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
1
2
1
1
2
2
2
2
1
2
2
2
2
2
1
1
2
1
2
1
2
2
1
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
1
2
2
2
1
2
2
2
1
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


3
3
3
1
1
1
3
2
2
1
2
1
2
2
2
2
2
1
1
1
2
1
1
2
1
2
2
1
1
1
2
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
2
2
1
2
2
2
2
1
2
1
2
2
2
2
2
2
2
1
2
1
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
1
2
2
1
1
1
2
2
1
2
1
2
2
1
1
2
2
2
2
2
2
1
1
1
1
1
2
2
1
1
2
2
2
2
1
2
2
2
1
1
2
1
2
2
2
2
2
2
1
2
1
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
2
2
2
3
3
3
2
2
2
2
3
3
2
3
2
2
2
2
1
2
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
3
2
1
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
1
2
2
2
2
1
2
2
2
2
2
2
1
2
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
1
2
2
2
2
1
2
1
1
1
1
1
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
2
3
2
1
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
1
2
1
2
2
1
1
2
1
2
2
2
2
2
2
1
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
1
2
1
2
2
2
2
1
2
2
1
2
1
1
2
2
2
2
2
1
2
2
1
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
2
1
2
2
1
2
2
1
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
1
1
1
1
1
1
1


2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
2
2
2
3
2
2
3
2
2
1
2
2
2
1
2
2
2
2
3
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
1
2
1
1
2
2
2
3
2
2
2
2
2
2
1
2
2
1
1
2
2
2
1
1
1
2
2
1
1
2
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
1
3
3
3
3
3
3
3
2
1
2
1
2
2
2
2
2
2
2
3
2
2
1
1
2
2
1
2
2
1
2
2
1
2
2
2
2
2
2
2
1
1
1
1
2
1
2
2
2
1
2
2
2
2
2
2
2
2
1
2
2
1
2
1
1
1
2
2
2
1
1
1
1
1
1
2
2
1
1
2
2
2
2
2
2
2
1
2
2
1
2
2
1
2
2
2
2
1
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
1
2
1
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
1
2
2
2
1
2
1
1
1
2
2
2
2
1
2
2
1
1
2
2
2
2
2
2
2
2
2
1
2
2
1
2
1
1
2
2
1
1
1
1
2
2
1
2
2
2
2
2
2
2
2
1
2
1
1
1
2
1
2
2
1
2
1
2
2
2
1
2
2
2
1
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


2
2
2
2
1
1
1
2
2
1
2
2
2
1
2
2
2
2
1
2
1
2
2
2
2
1
1
1
2
1
1
1
1
1
2
1
2
1
1
2
2
1
1
1
2
1
1
2
1
2
2
2
1
1
1
2
2
1
2
2
2
1
2
2
2
2
2
2
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
3
3
3
3
2
3
3
3
2
3
3
1
2
3
3
3
2
1
2
1
2
2
2
2
2
2
2
2


1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


## Merge TMHMM & HMMR data

  * Proteins with the 3 domain types
  * Their TMHMM status


In [14]:
merged = {}
for _id in D:
    if _id not in dataTMHMM:
        print('Missing protein ID' + _id)
    if not dataTMHMM[_id]['Htest']['status']:
        continue
    merged[_id] = {
        'hmmr' : D[_id],
        'tmhmm' : dataTMHMM[_id]
    }
    
print('Number of protein entries featuring FAD,NAD and Ferric transferase domains', len(D))
print('Number of protein featuring 2 to 7 TMH and 2 bi-histine', len(dataTMHMM))
print('Size of their intersection', len(merged))

Number of protein entries featuring FAD,NAD and Ferric transferase domains 18020
Number of protein featuring 2 to 7 TMH and 2 bi-histine 178540
Size of their intersection 5972


#### Inspect NCBI Taxonomy

In [15]:
import pyproteinsExt.ontology
taxonTree = pyproteinsExt.ontology.Ontology(file='/Users/chilpert/flatFiles/ncbitaxon.owl')

#### Extract TaxonID

In [16]:
def getTaxID(datum):
    reTaxID = re.compile('OX=([\d]+)')
    m = reTaxID.search(datum['tmhmm']['fasta']['header'])
    if not m:
        raise ValueError('Cant parse taxid from', datum['tmhmm']['fasta']['header'])
    datum['taxid'] = m.groups()[0]
    
for _id in merged:
    getTaxID(merged[_id])

###### Flag Non Eukaryota phylum members

In [17]:
cnt = 0
cnt_not_found=0
u = 0
for _id in merged:
    u += 1
    taxid=merged[_id]['taxid']
    n = taxonTree.onto.search(iri='http://purl.obolibrary.org/obo/NCBITaxon_' + taxid)
    if not n:
        cnt_not_found+=1
        continue

    bool=True
    for t in taxonTree._getLineage(n[0]):
        if not t.label:
            continue
        if t.label[0] == 'Eukaryota':
            bool=False
            break
    if bool:
        cnt += 1
    merged[_id]['isNoEukaryota'] = bool


print("Total number of bacterial (and archae) sequences", cnt, u)
print("Number of taxo node not found",cnt_not_found)

Total number of bacterial (and archae) sequences 832 5972
Number of taxo node not found 25


In [None]:
#### Cull for prokaryotic proteins (original 996)

#### Use proteins as seeds for blast ()

#### --> Tree reconstruction

#### Additional PFAM annotation

#### Sequence clustering

#### Profile génétique



###### Save non Eukaryota sequences in given directory

In [18]:
import re
saveDir="/Volumes/arwen/mobi/group/NOX_CH/seedSet/NOX_noEukaryota"
def mFastaSplitDump(data, saveDir, fileTag='default' ,distinct=True):
    c = 1
    f = None
    if not distinct:
        f = open(saveDir + '/'+ fileTag + '_all.fasta', 'w')
        
    for _id in data:
        if distinct:
            f = open(saveDir + '/'+ fileTag + '_' + str(c) + '.fasta', 'w')
        c += 1
        f.write(data[_id]['tmhmm']['fasta']['header'])
        f.write(re.sub("(.{81})", "\\1\n", data[_id]['tmhmm']['fasta']['sequence'], 0, re.DOTALL))
        if distinct:
            f.close()
    if not distinct:    
        f.close()

d = {}
for k in merged:
    if not 'isNoEukaryota' in merged[k]:
        continue
    if merged[k]['isNoEukaryota']:
        d[k] = merged[k]
merged=d         
mFastaSplitDump(d, saveDir, 'NOX_noEukaryota')

##### De/Serialize the data structure


In [5]:
import pickle, time
import time

def save(data, tag=None):
    saveDir="/Volumes/arwen/mobi/group/NOX_CH/pickle_saved"
    timestr = time.strftime("%Y%m%d-%H%M%S")
    fTag = "NOX_annotation_" + tag + "_" if tag else "NOX_annotation_"
    fSerialDump = fTag + timestr + ".pickle"
    with open(saveDir + '/' + fSerialDump, 'wb') as f:
        pickle.dump(data, f)
    print('data structure saved to', saveDir + '/' + fSerialDump)

def load(fileName):
    saveDir="/Volumes/arwen/mobi/group/NOX_CH/pickle_saved"
    d = pickle.load( open(saveDir + "/" + fileName, "rb" ) )
    print("restore a annotated container of ", len(d), "elements")
    return d

In [38]:
save(merged)

data structure saved to /Volumes/arwen/mobi/group/NOX_CH/pickle_saved/NOX_annotation_20190411-174237.pickle


### Add redundant information


In [5]:
merged_restore = load('NOX_annotation_20190411-152144.pickle')

restore a annotated container of  832 elements


* Add clusters information 

In [24]:
f=open("/Volumes/arwen/mobi/group/NOX_CH/seedSet/NOX_noEukaryota_clust100.tsv","r")
new_data={}
for l in f : 
    l_split=l.rstrip().split("\t")
    representative=l_split[0].strip('"').split(" ")[0]
    seq=l_split[1].strip('"').split(" ")[0]
    if representative not in new_data: 
        new_data[representative]=merged_restore[representative]
        new_data[representative]['clusters']=set()
    if representative != seq : 
        new_data[representative]['clusters'].add(seq)
f.close()

* Save new datacontainer 

In [27]:
save(new_data,'withClusters')

data structure saved to /Volumes/arwen/mobi/group/NOX_CH/pickle_saved/NOX_annotation_withClusters_20190426-105523.pickle


### Read in additional PFAM annotations // Erase previous
  1. restore annotated data structure
  1. import a complete PFAM scan of "isNoEukaryota" entries
  2. replace the 'hmmr' field w/ this one
  3. pickle it

In [28]:
merged_restore = load('NOX_annotation_withClusters_20190426-105523.pickle')

restore a annotated container of  377 elements


In [11]:
%autoreload 2
import pyproteinsExt.hmmrContainerFactory as hm
fileName="/Volumes/arwen/mobi/group/NOX_CH/seedSet/NOX_noEukaryota_clust100_hmmscan.out"
#fileName="/tmp/hmmscan.out"
hscan = hm.parse(inputFile=fileName)
print( len(hscan.T()), 'proteins to reannotate' )
for e in hscan.T():
    merged_restore[e]['hmmr'] = hscan.T()[e]

377 proteins to reannotate


NameError: name 'merged_restore' is not defined

In [12]:
fileName="/Volumes/arwen/mobi/group/NOX_CH/seedSet/NOX_noEukaryota_clust100_hmmscan.out"
#fileName="/tmp/hmmscan.out"
hscan = hm.parse(inputFile=fileName)

In [23]:
for o in hscan.details: 
    #print(o)
    print(len(o.data))
    if len(o.data)==2: 
        print(o)

1
1
2
{'aliLongID': 'NAD_binding_6  Ferric reductase NAD binding domain', 'aliShortID': 'tr|A0A2M7JHP3|A0A2M7JHP3_9DELT', 'hmmID': 'NAD_binding_6', 'queryID': 'tr|A0A2M7JHP3|A0A2M7JHP3_9DELT ', 'data': [{'hmmID': 'NAD_binding_6', 'aliID': 'tr|A0A2M7JHP3|A0A2M7JHP3_9DELT', 'header': '1  score: 20.9 bits;  conditional E-value: 1.2e-07', 'score': '20.9', 'bias': '0.0', 'cEvalue': '1.2e-07', 'iEvalue': '0.00028', 'hmmFrom': '4', 'hmmTo': '49', 'aliFrom': '218', 'aliTo': '258', 'envFrom': '217', 'envTo': '264', 'acc': '0.92', 'hmmStringLetters': 'vllvagGiGitpfisilkdllkkskkealktkkikliwvvresssl', 'matchString': '++++agGiGitpf+s+l  l  + +     +++i+l w+++ ++++', 'aliStringLetters': 'IVFLAGGIGITPFLSMLAWLADRGR-----NRHITLLWANKTKEDI', 'hmmSymbolStuff': {'RF': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'}, 'aliSymbolStuff': {'PP': '799*****************99997.....9**********98876'}}, {'hmmID': 'NAD_binding_6', 'aliID': 'tr|A0A2M7JHP3|A0A2M7JHP3_9DELT', 'header': '2  score: 2.3 bits;  conditional E

{'aliLongID': 'Ferric_reduct  Ferric reductase like transmembrane component', 'aliShortID': 'tr|A0A1H1PJH5|A0A1H1PJH5_9ACTN', 'hmmID': 'Ferric_reduct', 'queryID': 'tr|A0A2M7JHP3|A0A2M7JHP3_9DELT ', 'data': [{'hmmID': 'Ferric_reduct', 'aliID': 'tr|A0A1H1PJH5|A0A1H1PJH5_9ACTN', 'header': '1  score: 44.2 bits;  conditional E-value: 6.2e-15', 'score': '44.2', 'bias': '9.5', 'cEvalue': '6.2e-15', 'iEvalue': '1.7e-11', 'hmmFrom': '39', 'hmmTo': '124', 'aliFrom': '1', 'aliTo': '86', 'envFrom': '1', 'envTo': '87', 'acc': '0.96', 'hmmStringLetters': 'wlgvlafllallHvilyllnflrfsaldeerlldsllkrpynllGvlalllliilaitSlkfirrrlsyelFyylHhllyvaylll', 'matchString': 'w+g ++f+la+lH ++ ++ f r++      +   l ++  +llG++a+ l++++a +S+++ rrrlsye++ ++H llyv+++l+', 'aliStringLetters': 'WTGFTVFWLAVLHPAFVVVGFARYDRVPVFTTAVALSRQIPVLLGLIAVGLIVVIAGLSVRIARRRLSYETWHAVHLLLYVVLVLG', 'hmmSymbolStuff': {}, 'aliSymbolStuff': {'PP': '9**********************9998999999999999999*****************************************997'}}, {'hmmID':

##### Comparing to regExp detection

In [33]:
import re
reMotifNADPH = re.compile('G[ISVL]G[VIAF][TAS][PYTA]')
reMotifFAD = re.compile('H[PSA]F[TS][LIMV]')

NAD_miss = 0
FAD_miss = 0
Both_miss = 0
for p in merged_restore:
    seq = merged_restore[p]['tmhmm']['fasta']['sequence']
    m = reMotifNADPH.search(seq)
    n = reMotifFAD.search(seq)
    merged_restore[p]['NADPH_reg'] = True if m else False
    merged_restore[p]['FAD_reg']   = True if n else False

    if not m:
        NAD_miss += 1
        if not n:
            Both_miss += 1
    if not n:
        FAD_miss += 1

print('Total Number of filtered sequence', len(merged_restore))
print('Number of negative to:')
print('*The NAD pattern',str(NAD_miss), '\n*The FAD pattern', str(FAD_miss), '\n*Both patterns ', Both_miss)

Total Number of filtered sequence 377
Number of negative to:
*The NAD pattern 52 
*The FAD pattern 146 
*Both patterns  17


In [34]:
save(merged_restore, tag='fullPFAM')

data structure saved to /Volumes/arwen/mobi/group/NOX_CH/pickle_saved/NOX_annotation_fullPFAM_20190426-112439.pickle


In [78]:
data=load("NOX_annotation_fullPFAM_20190426-112439.pickle")

restore a annotated container of  377 elements


#### Delete domains with evalue > 1e-3

In [115]:
new_data={}
for p in data :
    new_data[p]={}
    new_data[p]['hmmr']={}
    for d in data[p]['hmmr']:
        deleted_hits=0
        for h in data[p]['hmmr'][d][0].data:
            evalue=h.iEvalue
            if float(evalue)>1e-3: 
                deleted_hits+=1  
        if deleted_hits < len(data[p]['hmmr'][d][0].data): 
            keep=True
            new_data[p]['hmmr'][d]=data[p]['hmmr'][d]
        if keep: 
            new_data[p]['taxid']=data[p]['taxid']
            new_data[p]['clusters']=data[p]['clusters']
        else: 
            print("OOOO")
            del new_data[p]

In [82]:
print(len(new_data))

377


In [111]:
all_domains=set()
for p in new_data : 
    for d in new_data[p]['hmmr']: 
        all_domains.add(d)

In [112]:
print(len(all_domains))

14


In [113]:
print(all_domains)

{'EF-hand_1', 'EF-hand_7', 'EF-hand_5', 'Fer2', 'EF-hand_8', 'DUF4405', 'FAD_binding_8', 'Ferric_reduct', 'NAD_binding_6', 'NAD_binding_1', 'FAD_binding_6', 'SdpI', 'EF-hand_6', 'DUF2339'}


In [116]:
save(new_data,"fullPfam_filteredDomains")

data structure saved to /Volumes/arwen/mobi/group/NOX_CH/pickle_saved/NOX_annotation_fullPfam_filteredDomains_20190502-151754.pickle
