Skip to content

Commit

Permalink
Project rename#48 (#55)
Browse files Browse the repository at this point in the history
* Initial commit

* Removed

* UPD: Replaced orgs_ids ref pickle with csv. Depends on dtypes from apis.Columns

* Initial commit

* UPD:Read csv not pickle. Prepare GENES and ORG cols

* UPD:Replaced pickled KEGG name-ID dicts with hard-coded ones.

* Rename

* Rename

* UPD: Read csv not pickle. Prepare the dtypes.

* UPD:Extension change

* FIX:Typo

* Initial commit

* Deleted

* UPD:Read csv not pickle

* UPD:Read csv not pickle
  • Loading branch information
dizak committed Jul 18, 2018
1 parent 846797a commit bbf8090
Show file tree
Hide file tree
Showing 13 changed files with 1,082 additions and 22 deletions.
3 changes: 3 additions & 0 deletions prowler/apis.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ class Columns:
TAXON_ID = "TAXON_ID"
KEGG_ID = "KEGG_ID"
ORF_ID = "ORF_ID"
dtypes = {TAXON_ID: "uint32"}


class KEGG_API(Columns):
Expand Down Expand Up @@ -88,6 +89,8 @@ def get_organisms_ids(self,
regex=True,
inplace=True)
self.organisms_ids_df.dropna(inplace=True)
self.organisms_ids_df = self.organisms_ids_df.astype({k: v for k, v in self.dtypes.items()
if k in self.organisms_ids_df.columns})

def org_name_2_kegg_id(self,
organism,
Expand Down
269 changes: 269 additions & 0 deletions test_data/AnyNetworkTests/ref_anynetwork.csv

Large diffs are not rendered by default.

Binary file removed test_data/AnyNetworkTests/ref_anynetwork.pickle
Binary file not shown.
101 changes: 101 additions & 0 deletions test_data/ApisTests/test_orgs_ids_out.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
GENOME_ID DESCRIPTION KEGG_ORG_ID NAME TAXON_ID
0 gn:T00001 Haemophilus influenzae Rd KW20 (serotype d) hin HAEIN 71421
1 gn:T00002 Mycoplasma genitalium G37 mge MYCGE 243273
2 gn:T00003 Methanocaldococcus jannaschii DSM 2661 mja METJA 243232
3 gn:T00004 Synechocystis sp. PCC 6803 syn SYNY3 1148
4 gn:T00005 Saccharomyces cerevisiae S288c sce YEAST 559292
5 gn:T00006 Mycoplasma pneumoniae M129 mpn MYCPN 272634
6 gn:T00007 Escherichia coli K-12 MG1655 eco ECOLI 511145
7 gn:T00008 Helicobacter pylori 26695 hpy HELPY 85962
8 gn:T00009 Methanothermobacter thermautotrophicus Delta H mth METTH 187420
9 gn:T00010 Bacillus subtilis subsp. subtilis 168 bsu BACSU 224308
10 gn:T00011 Archaeoglobus fulgidus DSM 4304 (VC-16) afu ARCFU 224325
11 gn:T00012 Borrelia burgdorferi B31 bbu BORBU 224326
12 gn:T00013 Aquifex aeolicus VF5 aae AQUAE 224324
13 gn:T00014 Pyrococcus horikoshii OT3 pho PYRHO 70601
14 gn:T00015 Mycobacterium tuberculosis H37Rv, laboratory strain mtu MYCTU 83332
15 gn:T00016 Treponema pallidum subsp. pallidum Nichols tpa TREPA 243276
16 gn:T00017 Chlamydia trachomatis D/UW-3/CX ctr CHLTR 272561
17 gn:T00018 Rickettsia prowazekii Madrid E rpr RICPR 272947
18 gn:T00019 Caenorhabditis elegans (nematode) cel CAEEL 6239
19 gn:T00020 Helicobacter pylori J99 hpj HELPJ 85963
20 gn:T00021 Chlamydophila pneumoniae CWL029 cpn CHLPN 115713
21 gn:T00022 Thermotoga maritima MSB8 tma THEMA 243274
22 gn:T00023 Aeropyrum pernix K1 ape AERPE 272557
23 gn:T00024 Pyrococcus abyssi GE5 pab PYRAB 272844
24 gn:T00025 Deinococcus radiodurans R1 dra DEIRA 243230
25 gn:T00026 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 cje CAMJE 192222
26 gn:T00027 Neisseria meningitidis MC58 (serogroup B) nme NEIMB 122586
27 gn:T00028 Chlamydia muridarum Nigg (Chlamydia trachomatis MoPn) cmu CHLMU 243161
28 gn:T00029 Chlamydophila pneumoniae AR39 cpa CHLPN 115711
29 gn:T00030 Drosophila melanogaster (fruit fly) dme DROME 7227
30 gn:T00031 Neisseria meningitidis Z2491 (serogroup A) nma NEIMA 122587
31 gn:T00032 Chlamydophila pneumoniae J138 cpj CHLPN 138677
32 gn:T00033 Xylella fastidiosa 9a5c xfa XYLFA 160492
33 gn:T00034 Vibrio cholerae O1 biovar El Tor N16961 vch VIBCH 243277
34 gn:T00035 Pseudomonas aeruginosa PAO1 pae PSEAE 208964
35 gn:T00036 Buchnera aphidicola APS, endosymbiont of Acyrthosiphon pisum (pea aphid) buc BUCAI 107806
36 gn:T00037 Thermoplasma acidophilum DSM 1728 tac THEAC 273075
37 gn:T00038 Halobacterium salinarum NRC-1 (Halobacterium sp. NRC-1) hal HALSA 64091
38 gn:T00039 Bacillus halodurans C-125 bha BACHD 272558
39 gn:T00040 Ureaplasma parvum serovar 3 ATCC 700970 uur UREPA 273119
40 gn:T00041 Arabidopsis thaliana (thale cress) ath ARATH 3702
41 gn:T00042 Thermoplasma volcanium GSS1 tvo THEVO 273116
42 gn:T00043 Mesorhizobium japonicum MAFF 303099 (Mesorhizobium loti MAFF303099) mlo RHILO 266835
43 gn:T00044 Escherichia coli O157:H7 EDL933 (EHEC) ece ECOLX 155864
44 gn:T00045 Lactococcus lactis subsp. lactis Il1403 lla LACLA 272623
45 gn:T00046 Pasteurella multocida subsp. multocida Pm70 pmu PASMU 272843
46 gn:T00047 Mycobacterium leprae TN mle MYCLE 272631
47 gn:T00048 Escherichia coli O157:H7 Sakai (EHEC) ecs ECO57 386585
48 gn:T00049 Caulobacter crescentus CB15 ccr CAUCR 190650
49 gn:T00050 Streptococcus pyogenes M1 GAS (serotype M1) spy STRP1 160490
50 gn:T00051 Staphylococcus aureus subsp. aureus N315, hospital-acquired meticillin-resistant, vancomycin-susceptible sau STAAN 158879
51 gn:T00052 Staphylococcus aureus subsp. aureus Mu50, MRSA strain with vancomycin-intermediate resistance sav STAAM 158878
52 gn:T00053 Mycobacterium tuberculosis CDC1551, clinical strain mtc MYCTU 83331
53 gn:T00054 Sulfolobus solfataricus P2 sso SULSO 273057
54 gn:T00055 Mycoplasma pulmonis UAB CTIP mpu MYCPU 272635
55 gn:T00056 Clostridium acetobutylicum ATCC 824 cac CLOAB 272562
56 gn:T00057 Streptococcus pneumoniae TIGR4 (virulent serotype 4) spn STRPN 170187
57 gn:T00058 Sinorhizobium meliloti 1021 sme RHIME 266834
58 gn:T00060 Streptococcus pneumoniae R6 (avirulent, laboratory-adapted D39 derivative) spr STRR6 171101
59 gn:T00061 Rickettsia conorii Malish 7 rco RICCN 272944
60 gn:T00062 Sulfolobus tokodaii 7 sto SULTO 273063
61 gn:T00063 Yersinia pestis CO92 (biovar Orientalis) ype YERPE 214092
62 gn:T00064 Salmonella enterica subsp. enterica serovar Typhi CT18 (Salmonella typhi CT18) sty SALTI 220341
63 gn:T00065 Salmonella enterica subsp. enterica serovar Typhimurium LT2 (Salmonella typhimurium LT2) stm SALTY 99287
64 gn:T00066 Listeria monocytogenes EGD-e (serotype 1/2a) lmo LISMO 169963
65 gn:T00067 Listeria innocua Clip11262 (serotype 6a) lin LISIN 272626
66 gn:T00068 Escherichia coli K-12 W3110 ecj ECOLI 316407
67 gn:T00069 Nostoc sp. PCC 7120 (Anabaena sp. PCC 7120) ana ANASP 103690
68 gn:T00070 Agrobacterium fabrum C58 (Agrobacterium tumefaciens C58) atu AGRT5 176299
69 gn:T00071 Ralstonia solanacearum GMI1000 rso RALSO 267608
70 gn:T00072 Brucella melitensis bv. 1 16M bme BRUME 224914
71 gn:T00073 Pyrobaculum aerophilum IM2 pai PYRAE 178306
72 gn:T00074 Clostridium perfringens 13 cpe CLOPE 195102
73 gn:T00075 Pyrococcus furiosus DSM 3638 pfu PYRFU 186497
74 gn:T00076 Schizosaccharomyces pombe 972h- spo SCHPO 284812
75 gn:T00077 Fusobacterium nucleatum subsp. nucleatum ATCC 25586 fnu FUSNU 190304
76 gn:T00078 Methanopyrus kandleri AV19 mka METKA 190192
77 gn:T00079 Streptococcus pyogenes MGAS8232 (serotype M18) spm STRP8 186103
78 gn:T00080 Methanosarcina acetivorans C2A mac METAC 188937
79 gn:T00081 Thermoanaerobacter tengcongensis MB4(T) tte THETN 273068
80 gn:T00082 Methanosarcina mazei Go1 mma METMA 192952
81 gn:T00083 Xanthomonas campestris pv. campestris ATCC 33913 xcc XANCP 190485
82 gn:T00084 Xanthomonas citri pv. citri 306 (Xanthomonas axonopodis pv. citri 306) xac XANAC 190486
83 gn:T00085 Streptomyces coelicolor A3(2) sco STRCO 100226
84 gn:T00086 Staphylococcus aureus subsp. aureus MW2, community-acquired MRSA sam STAAW 196620
85 gn:T00087 Buchnera aphidicola Sg, endosymbiont of Schizaphis graminum (greenbug) bas BUCAP 198804
86 gn:T00088 Chlorobium tepidum TLS cte CHLTE 194439
87 gn:T00089 Streptococcus pyogenes MGAS315 (serotype M3) spg STRP3 198466
88 gn:T00090 Thermosynechococcus elongatus BP-1 tel SYNEL 197221
89 gn:T00091 Streptococcus agalactiae 2603 (serotype V) sag STRA5 208435
90 gn:T00092 Yersinia pestis KIM10+ (biovar Mediaevalis) ypk YERPE 187410
91 gn:T00093 Oceanobacillus iheyensis HTE831 oih OCEIH 221109
92 gn:T00094 Bifidobacterium longum NCC2705 blo BIFLO 206672
93 gn:T00095 Plasmodium falciparum 3D7 pfa PLAF7 36329
94 gn:T00096 Brucella suis 1330 bms BRUSU 204722
95 gn:T00097 Shigella flexneri 301 (serotype 2a) sfl SHIFL 198214
96 gn:T00098 Leptospira interrogans serovar Lai 56601 lil LEPIN 189518
97 gn:T00099 Shewanella oneidensis MR-1 son SHEON 211586
98 gn:T00100 Streptococcus mutans UA159 (serotype C) smu STRMU 210007
99 gn:T00101 Wigglesworthia glossinidia (Wigglesworthia brevipalpis), endosymbiont of Glossina brevipalpis (tsetse fly) wbr WIGBR 36870
Binary file removed test_data/ApisTests/test_orgs_ids_out.pickle
Binary file not shown.
101 changes: 101 additions & 0 deletions test_data/BioprocessesTests/ref_bioproc_100r.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
ORF GENE BIOPROC
0 YAL004W YAL004W unknown
1 YAL005C SSA1 unknown
2 YAL007C ERP2 ER<->Golgi traffic
3 YAL008W FUN14 metabolism/mitochondria
4 YAL010C MDM10 metabolism/mitochondria
5 YAL011W SWC3 chromatin/transcription
6 YAL012W CYS3 metabolism/mitochondria;amino acid biosynth&transport/nitrogen utilization
7 YAL013W DEP1 chromatin/transcription
8 YAL014C SYN8 Golgi/endosome/vacuole/sorting
9 YAL015C NTG1 metabolism/mitochondria;DNA replication/repair/HR/cohesion
10 YAL017W PSK1 metabolism/mitochondria
11 YAL018C YAL018C unknown
12 YAL019W FUN30 unknown
13 YAL020C ATS1 ribosome/translation
14 YAL021C CCR4 chromatin/transcription;RNA processing
15 YAL021C_damp CCR4_damp chromatin/transcription;RNA processing
16 YAL022C FUN26 drug/ion transport
17 YAL023C PMT2 protein folding/protein glycosylation/cell wall biogenesis&integrity
18 YAL024C LTE1 chromosome segregation/kinetochore/spindle/microtubule
19 YAL025C_damp MAK16_damp ribosome/translation
20 YAL027W SAW1 unknown
21 YAL028W FRT2 unknown
22 YAL029C MYO4 cell polarity/morphogenesis
23 YAL030W SNC1 Golgi/endosome/vacuole/sorting
24 YAL031C GIP4 signaling/stress response;chromosome segregation/kinetochore/spindle/microtubule
25 YAL034C FUN19 unknown
26 YAL034W-A_tsq235 mtw1-ts chromosome segregation/kinetochore/spindle/microtubule
27 YAL036C RBG1 ribosome/translation
28 YAL037W YAL037W unknown
29 YAL038W_tsq26 cdc19-1 metabolism/mitochondria
30 YAL040C CLN3 G1/S and G2/M cell cycle progression/meiosis;signaling/stress response
31 YAL041W_tsq148 cdc24-11 cell polarity/morphogenesis
32 YAL041W_tsq149 cdc24-4 cell polarity/morphogenesis
33 YAL041W_tsq412 cdc24-3 cell polarity/morphogenesis
34 YAL042W ERV46 ER<->Golgi traffic
35 YAL043C_damp PTA1_damp chromatin/transcription;RNA processing
36 YAL043C-A YAL043C-A unknown
37 YAL045C YAL045C unknown
38 YAL046C AIM1 unknown
39 YAL048C GEM1 metabolism/mitochondria
40 YAL049C AIM2 unknown
41 YAL051W OAF1 metabolism/mitochondria;chromatin/transcription
42 YAL053W FLC2 protein folding/protein glycosylation/cell wall biogenesis&integrity
43 YAL054C ACS1 metabolism/mitochondria
44 YAL056W GPB2 signaling/stress response
45 YAL058C-A YAL058C-A protein folding/protein glycosylation/cell wall biogenesis&integrity
46 YAL058W CNE1 protein folding/protein glycosylation/cell wall biogenesis&integrity;protein degradation/proteosome
47 YAL059W ECM1 ribosome/translation
48 YAL060W BDH1 metabolism/mitochondria
49 YAL061W BDH2 unknown
50 YAL062W GDH3 metabolism/mitochondria
51 YAL063C FLO9 protein folding/protein glycosylation/cell wall biogenesis&integrity;cell polarity/morphogenesis
52 YAL064C-A YAL064C-A unknown
53 YAL065C YAL065C unknown
54 YAL066W YAL066W unknown
55 YAL067C SEO1 drug/ion transport
56 YAL068C PAU8 unknown
57 YAR002C-A ERP1 ER<->Golgi traffic
58 YAR002W NUP60 nuclear-cytoplasic transport
59 YAR003W SWD1 chromatin/transcription
60 YAR014C BUD14 cell polarity/morphogenesis
61 YAR015W ADE1 metabolism/mitochondria;amino acid biosynth&transport/nitrogen utilization
62 YAR018C KIN3 unknown
63 YAR020C PAU7 metabolism/mitochondria
64 YAR023C YAR023C unknown
65 YAR027W UIP3 unknown
66 YAR028W YAR028W unknown
67 YAR029W YAR029W unknown
68 YAR030C YAR030C unknown
69 YAR031W PRM9 unknown
70 YAR035W YAT1 metabolism/mitochondria
71 YAR037W YAR037W unknown
72 YAR040C YAR040C unknown
73 YAR042W SWH1 lipid/sterol/fatty acid biosynth
74 YAR043C YAR043C unknown
75 YAR044W YAR044W unknown
76 YAR047C YAR047C unknown
77 YAR050W FLO1 cell polarity/morphogenesis;signaling/stress response
78 YAR071W PHO11 protein folding/protein glycosylation/cell wall biogenesis&integrity;metabolism/mitochondria
79 YBL001C ECM15 unknown
80 YBL003C HTA2 chromatin/transcription
81 YBL005W PDR3 drug/ion transport
82 YBL007C SLA1 cell polarity/morphogenesis
83 YBL008W HIR1 chromatin/transcription
84 YBL009W ALK2 unknown
85 YBL010C YBL010C unknown
86 YBL011W SCT1 lipid/sterol/fatty acid biosynth
87 YBL013W FMT1 ribosome/translation
88 YBL015W ACH1 metabolism/mitochondria
89 YBL017C PEP1 Golgi/endosome/vacuole/sorting
90 YBL019W APN2 DNA replication/repair/HR/cohesion
91 YBL021C HAP3 metabolism/mitochondria;chromatin/transcription
92 YBL023C_tsq111 mcm2-1 DNA replication/repair/HR/cohesion
93 YBL024W NCL1 ribosome/translation
94 YBL027W RPL19B ribosome/translation
95 YBL028C YBL028C unknown
96 YBL029W YBL029W unknown
97 YBL031W SHE1 chromosome segregation/kinetochore/spindle/microtubule
98 YBL032W HEK2 RNA processing
99 YBL034C_tsq274 stu1-5 chromosome segregation/kinetochore/spindle/microtubule
Binary file removed test_data/BioprocessesTests/ref_bioproc_100r.pickle
Binary file not shown.
Loading

0 comments on commit bbf8090

Please sign in to comment.