### TODO

* Randomly sample swissprot to create 100 sequence fasta file.
* Use this fasta file on analyses

### Results

* Embedding 100:       13.147 s ± 60 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) - 2core + GPU
* Embedding 1000:      71.882 s ± 347.8 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) - 2core + GPU
* Searching 100x100:   519 ms ± 6.3 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) - 1core
    actual search takes only 25.58ms ± 4.96ms, while the rest are spent on startup overhead, reading databases from the disk and writing results to the disk
* Searching 1000x1000:   976 ms ± 10.6ms
* Searching Swissprot: 1.02 s ± 7.3 ms per loop (mean ± std. dev. of 10 runs, 1 loop each) - 1core
* GPU: Nvidia A100 40GB
* CPU: AMD EPYC 7543 - 1 core used

In [6]:
from contactGroups import commp as cp
from random import randint

#load SwissProt fasta file
spe = {}
for f in cp.fasta_iter('../data/spe.fa'):
    spe[f[0]] = f[1]

#sample 1000 sequences
sampled = []
keys = list(spe.keys())
for i in range(1000):
    r = randint(0,len(keys))
    r = keys[r]
    sampled.append((r,spe[r]))

with open('spe.sampled.1000.fa','w') as f:
    for h,fasta in sampled:
        f.write('>%s\n%s\n'%(h,fasta))

In [10]:
!multitime -n10 -q prost makedb -n spe.sampled.100.fa spe.100.prdb

===> multitime results
1: -q prost makedb -n spe.sampled.100.fa spe.100.prdb
            Mean                Std.Dev.    Min         Median      Max
real        13.147+/-0.0600      0.060       13.035      13.144      13.241      
user        12.649+/-0.1115      0.111       12.461      12.659      12.826      
sys         1.533+/-0.0346      0.035       1.497       1.520       1.596       


In [9]:
!multitime -n10 -q prost makedb -n spe.sampled.1000.fa spe.1000.prdb

===> multitime results
1: -q prost makedb -n spe.sampled.1000.fa spe.1000.prdb
            Mean                Std.Dev.    Min         Median      Max
real        71.882+/-0.3478      0.347       71.525      71.758      72.713      
user        71.018+/-0.3010      0.300       70.648      70.905      71.522      
sys         1.613+/-0.0350      0.035       1.542       1.618       1.669       


In [18]:
!multitime -n10 prost search spe.100.prdb spe.100.prdb test.out

===> multitime results
1: prost search spe.100.prdb spe.100.prdb test.out
            Mean                Std.Dev.    Min         Median      Max
real        0.519+/-0.0063      0.006       0.506       0.520       0.530       
user        0.422+/-0.0133      0.013       0.400       0.423       0.449       
sys         0.210+/-0.0112      0.011       0.187       0.209       0.227       


In [11]:
!multitime -n10 prost search spe.1000.prdb spe.1000.prdb test.out

===> multitime results
1: prost search spe.1000.prdb spe.1000.prdb test.out
            Mean                Std.Dev.    Min         Median      Max
real        0.976+/-0.0106      0.011       0.960       0.974       0.995       
user        0.867+/-0.0111      0.011       0.841       0.866       0.883       
sys         0.209+/-0.0135      0.013       0.176       0.212       0.230       


In [19]:
!multitime -n10 prost searchsp hpo30.prdb test.out

===> multitime results
1: prost searchsp hpo30.prdb test.out
            Mean                Std.Dev.    Min         Median      Max
real        1.017+/-0.0073      0.007       1.008       1.016       1.032       
user        0.814+/-0.0170      0.017       0.781       0.812       0.840       
sys         0.454+/-0.0192      0.019       0.412       0.456       0.480       


In [10]:
!time ./prostTimed searchsp hpo30.prdb test.out

Import 0.3434717655181885
Read databases in: 0.5418472290039062
PROST search time: 0.18520712852478027
Write results in: 0.0036516189575195312

real	0m1.168s
user	0m0.807s
sys	0m0.676s


In [16]:
!multitime -n10 ./prostTimed search spe.100.prdb spe.100.prdb test.out

Import 0.5359923839569092
Click 0.0010104179382324219
Read databases in: 0.0004634857177734375
PROST search time: 0.024248361587524414
Write results in: 0.0029523372650146484
All 0.5670270919799805
Import 0.46673035621643066
Click 0.0005948543548583984
Read databases in: 0.0003085136413574219
PROST search time: 0.024837255477905273
Write results in: 0.003104686737060547
All 0.49733972549438477
Import 0.31944727897644043
Click 0.0005812644958496094
Read databases in: 0.0003006458282470703
PROST search time: 0.0238950252532959
Write results in: 0.0035011768341064453
All 0.3490102291107178
Import 0.3151369094848633
Click 0.0005614757537841797
Read databases in: 0.00029921531677246094
PROST search time: 0.023575544357299805
Write results in: 0.003637075424194336
All 0.3444786071777344
Import 0.31830835342407227
Click 0.000568389892578125
Read databases in: 0.00028252601623535156
PROST search time: 0.025324583053588867
Write results in: 0.0032224655151367188
All 0.34899067878723145
Import 0

In [17]:
from statistics import stdev, mean
prostTime=[0.024248361587524414,0.024837255477905273,0.0238950252532959,0.023575544357299805,0.025324583053588867,0.023729324340820312,0.02361273765563965,0.03959488868713379,0.0234224796295166,0.0235598087310791]
print('%.2fms +- %.2fms'%(mean(prostTime)*1000,stdev(prostTime)*1000))

25.58ms +- 4.96ms


In [1]:
!nvidia-smi

Wed Jan 18 11:55:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-PCI...  Off  | 00000000:C4:00.0 Off |                    0 |
| N/A   26C    P0    34W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!cat /proc/cpuinfo | head -n 20

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 1
model name	: AMD EPYC 7543 32-Core Processor
stepping	: 1
microcode	: 0xa00115d
cpu MHz		: 2800.000
cache size	: 512 KB
physical id	: 0
siblings	: 64
core id		: 0
cpu cores	: 32
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec 

In [3]:
!nproc

2
