Skip to content
Mina Li edited this page Feb 17, 2015 · 18 revisions

Setup

Python on Windows

Python 2.x needs to be setup such that it is invokable from the command-line as python. Try it out now: type python into the command prompt. If you get a Python prompt, then you are all set. If you get an error about python not existing, then keep reading.

You need to locate the directory where you installed Python. If you used the installer's default path, then it probably looks something like C:\Python27. This directory needs to be added to your PATH environment variable. To do that, open Windows Explorer, right-click on Computer, and click on Properties in the drop-down menu. In the left-hand pane, click Advanced system settings. In the dialog box, click the Environment Variables... button.

In that dialog box, look at the upper box labeled User variables for and click on the button New... corresponding to that box.

In the New User Variable dialog box:

  • Variable name: PATH
  • Variable value: %PATH%;C:\Python27

Be sure to enter all the funny characters exactly as shown.

Then click OK on all the dialog boxes, save all your open documents, and log out. Log back in again, open command prompt, and try out python.

Biopython

Biopython comes packaged along with Clotho.

installing on Windows

Type python --version in the command prompt to see your Python version. If it is 2.7.x, then use this installer.

Command bar examples

Note that you can return a dict from python and it will easily convert to a JSON object. A python list will be converted to an array.

Wrapping external Python code

Reverse complement via BioPython

clotho.run2("py_biorc", ["atcgc"])

Importing a test Python module in src/main/python/lib/hello.py

clotho.run("org.andersonlab.py_greet", [])

Bill Cao's PCR predictor (please fill in a better example here)

clotho.run2("py_pcr", ["CGCTCCAAGCTGGGCTGTGTG", "CGATAGTTACCGGATAAGGC", "CGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCG"])

Mina Li's NCBI NucSeq fetcher

clotho.run2("py_nucseqfetch", ['123746834', '1322'])

Accessing ServerSideAPI

Returns its own code via a clotho.get call

clotho.run("org.andersonlab.py_selfie", [])

Modifies itself via a clotho.set call

clotho.run("org.andersonlab.py_selfsetter", [])

Makes a clotho.run call which fails, and the Python function catches the resulting ClothoError

clotho.run2("py_error_recover", [])

Handling crashes

Die at various stages of execution (should expect a run error and a clotho.say message)

clotho.run2("py_die_early", [])
clotho.run2("py_die_run", [])

Simulates Python process crashing really hard (shows that Clotho will cleanup properly)

clotho.run2("py_die_abrupt", [])

EXISTING PYTHON FUNCTIONS

(Sans formatting for the time being.)

Author: Mina Li

Contact: li.mina888@berkeley.edu

Timeline: January 2014 - present

Summary: This is a document that outlines all the work I've done on Clotho.

***In src/main/python/lib

**BlastN, BlastP, BlastX, TBlastN, TBlastX

These are probably the most recent addition to the functions here. Essentially, this does what you can do here http://blast.ncbi.nlm.nih.gov/Blast.cgi but using Python instead.

Input: array ([string sequence, int no_of_alignments] or [string sequence, null])

Output: Blast_Record (defined in ClothoPy)

**convertfile, convertgb, convertid, nucseqfetch, id_to_poly, convertpoly

I just want to make a note here that I believe all of the functions returning JSON strings are probably going to just be returning Polynucleotide objects instead, when they're revised.

*convertfile - This isn't exactly functional yet, but I think it might be replaced by Max's page where you can drag and drop files into Clotho (but he then converts them with convertgb).

*convertgb - It expects to see a string representation of the contents of a Genbank file (like http://www.ncbi.nlm.nih.gov/nuccore/M98350.1), and turns it into a JSON string representation of a Polynucleotide.

*convertid - The input is an accession number (NCBI) in the form of a string, and it should return the string representation the Genbank file format. (Basically what convertgb would be expecting.)

*nucseqfetch - This particular function expects a Genbank object and returns a Polynucleotide JSON string representation of its contents. (I'm almost 100% certain this is going to need to change; it was one of the first functions written.)

*filenucseqfetch - As with convertfile, it would expect a path to a Genbank file and return a Polynucleotide JSON string representation of its contents.

*id_to_poly - To go straight from an accession number (NCBI) to a Polynucleotide JSON string representation of the Genbank file, you would use this function.

*convertpoly - This function is the only function that goes the other way around, and turns a Polynucleotide object into a Genbank string.

**oligo_to_poly, poly_to_fasta

*oligo_to_poly - This function takes an Oligo object (defined elsewhere in Clotho) and turns it into a Polynucleotide in the form of a JSON

*poly_to_fasta - The input is an Polynucleotide object, which gets turned into a FASTA string.

**registry_collector, reg_to_poly

*registry_collector - This takes in an ID for the iGem database and returns a JSON string representation of the Part (defined elsewhere in Clotho).

*reg_to_poly - Takes in a Part JSON string and outputs a Polynucleotide JSON string.

**protein_by_name, protein_by_gene, protein_to_orf

Note: In part because of the addition of Polypeptide, anything labeled with "poly" is referring to Polynucleotide, and everything labeled with "protein" is referring to Polypeptide.

*protein_by_name, protein_by_gene - These two are almost the same, so I'm putting them together. The input is in the form of an array ([string organism, string protein_name OR string gene_name, int retmax]), where retmax is null if you don't want to set your own, and will return all results. The output is going to need to change when we reassess, but currently I'm returning a string representation of an array of Polypeptide objects.

*protein_to_orf - The function takes in a Polypeptide as input and returns its ORF (if it has one).

**act_parser, act_query, fetch_uniprot

*act_query - This function is used to obtain a chemical from Act.20n. The input is a string and the output is a JSON dictionary directly from the API.

*act_parser - After retrieving the output from Act.20n, this takes in the dictionary outputted from actQuery, and converts it into a SinglePathway schema.

*fetch_uniprot - This fetches a protein from the UniProt database using the ID as its input.

***In src/main/python/lib/ClothoPy

**accn_retrieval, protein_retrieval

They are essentially the same protocol: they both define a class call_accn which has a method retrieve_gb that takes in a list of accessions and grabs them from NCBI. To actually get them, since they're stored and not returned, there's another method returnGB for a specific accession you might be looking for, or you can grab all of them using the attribute .records.

The only difference is that accn_retrieval stores the records as Genbank (Polynucleotide) and protein_retrieval stores the records as Polypeptide.

**genbank_holder, new_genbank_holder, protein_holder, blast_holder

These all define classes that hold representations of Genbank, New_Genbank, Polypeptide, and Blast_Record.

**ClothoAlignIO, ClothoInsdcIO, ClothoSeqIO, ClothoGenBankScanner

You probably will never need to know what's happening in these, but the general jist of it was that BioPython was insufficient for our needs, but I used their code and modified it to suit Clotho better.

EXAMPLES (incomplete)

  • testing all the functions

    • tested:

      • convertID: output String

        clotho.run("org.andersonlab.py_convertID", ["1234890"])

        clotho.run('org.andersonlab.py_convertID', ['19203732'])

      • convertGB: output String

        clotho.run("org.andersonlab.py_convertGB", ['LOCUS CV961319 921 bp DNA EST 07-FEB-2011\nDEFINITION PYrpcy_2963 mycelium, Plich medium Phytophthora infestans cDNA, mRNA\n sequence.\nACCESSION CV961319\nVERSION CV961319.1 GI:58151110\nDBLINK BioSample:LIBEST_016732\nKEYWORDS EST.\nSOURCE Phytophthora infestans (potato late blight agent)\n ORGANISM Phytophthora infestans\n Eukaryota; Stramenopiles; Oomycetes; Peronosporales; Phytophthora.\nREFERENCE 1 (bases 1 to 921)\n AUTHORS Randall,T., Dwyer,R.A., Huitema,E., Beyer,K., Cvitanich,C.,\n Kelkar,H., Fong,A.M., Gates,K., Roberts,S., Yatzkan,E., Gaffney,T.,\n Law,M., Testa,A., Torto-Alalibo,A., Zhang,M., Zheng,L., Mueller,E.,\n Windass,J., Binder,A., Birch,P.R.J., Gisi,U., Govers,F., Gow,N.A.,\n Mauch,F., van West,P., Waugh,M.E., Yu,J., Boller,T., Kamoun,S.,\n Lam,S.T. and Judelson,H.S.\n TITLE Large-scale gene discovery in the oomycete Phytophthora infestans\n reveals likely components of phytopathogenicity shared with true\n fungi\n JOURNAL Mol. Plant Microbe Interact. 18 (3), 229-243 (2005)\n PUBMED 15782637\nCOMMENT Contact: Judelson HS\n Department of Plant Pathology\n University of California\n Weber Hall, Riverside, CA 92521, USA\n Tel: 909 787 4199\n Fax: 909 787 4294\n Email: howard.judelson@ucr.edu.\nFEATURES Location/Qualifiers\n source 1..921\n /mol_type="mRNA"\n /db_xref="taxon:4787"\n /sex="A1"\n /note="Vector: pSPORT1"\n /strain="88069"\n /organism="Phytophthora infestans"\n /clone_lib="LIBEST_016732 mycelium, Plich medium"\nORIGIN\n 1 tcactatagg gaaagctggt acgcctgcag gtaccggtcc ggaattcccg gtcgacccac\n 61 gcgtccggac gcaacttctt ttcgcaatgt tggccgctaa gtctctctct cgttgccggt\n 121 gttggacgtc gcttgctcgt agcgtcacgt ggcatggctg gaggccgtgc tgcctttaat\n 181 tggcgtgatc cgcttatgct ggatggccag ctgacggacg aggaggccat gattcaaaaa\n 241 tcggccaacg actactgcca ggggcaactg ctgccgcgca ttggagaagc gaaccgtaag\n 301 ggcaagtttg accgctccat tatgaaggaa atgggcgaaa tgggcttcct tggtcccacg\n 361 gtccagggct acggctgcgc cggtgtgggc tacgtgtcct atggactcat tgcgaacgca\n 421 gtggagcgtg ttgacagcgc ctacaggtcg gcgatgagtg tgcagtcgtc tctggtaatg\n 481 cacccaatta accaattcgg atctgacgag cagaaggaaa agtacctccc tcgtcttggc\n 541 actggcgaac tcattggctg gttcggcttg acggagccga aacacggatc agaccctgga\n 601 tcaatggaga cgcgtgctag actcaaagga gacaagtaca tcctcaacgg ctccaagaac\n 661 tggatcacca acgctccgat cgctgacgtg ttcctcgtct gggccaagga cgacgagggc\n 721 gacatccgtg gtttcattct ggagaaggtg ggtttactta ctcttcactt gtcacttgca\n 781 gttgactcac gtaacttcac ctgcatctac aggacttccc tggcctatca gctccctaca\n 841 tcgaaggcaa ggcgacgttg ttggcatctg ctactggtat gatcttcctg gaagacgtcg\n 901 aaagttccca aggagaacat g\n//\n'])

      • nucseqfetch: output String

        clotho.run("org.andersonlab.py_nucseqfetch", ["1234890"])

      • protein_by_gene: output String

        clotho.run("org.andersonlab.py_proteinbygene", "Salmonella enterica", "tyrB", 5)

      • protein_by_name: output String

        clotho.run("org.andersonlab.py_proteinbyname", "Halobacterium salinarum", "NAD-specific glutamate dehydrogenase A", 2)

      • poly_to_fasta: output String

        clotho.run("org.andersonlab.py_polytoFASTA", ['195591271'])

      • registry_collector: output String

        clotho.run('org.andersonlab.py_fetchRegistry', ['BBa_K189004'])

      • fetch_uniprot: output dict

        clotho.run("org.andersonlab.py_fetchuniprot", ['Q6GZX3'])

      • convertpoly: output String

        clotho.run("org.andersonlab.py_convertpoly", ["374333820"])

      • reg_to_poly

        clotho.run("org.andersonlab.py_registryToPoly", ["org.registry.part.BBa_K189004"])

      • oligo_to_poly

        clotho.run("org.andersonlab.py_oligoToPoly", ["ca581F"])

    • testing:

      • ActToOperon: output String

        clotho.run("org.andersonlab.py_act_to_operon", ["1-Butanol"])

      • id_to_poly: output String

        clotho.run("org.andersonlab.py_IDToPoly", ["BBa_B0000"])

        (not JSON serializable)

      • protein_to_orf

        clotho.run("org.andersonlab.py_ProteinToORF", ['ABW97930.1'])

        (not JSON serializable)

      • Blast functions (BlastN, BlastP, BlastX, TBlastN, TBlastX)

        clotho.run("org.andersonlab.py_blastp", 'mikkkkkmrkiiyfdfknfskfckkkfykyffnl', 3)

        clotho.run("org.andersonlab.py_blastx", 'aagatggagaggcaaaattaaaatctatgaaaaattacaaaaaatttat', 3)

        clotho.run("org.andersonlab.py_tblastn", ["mevrrggadgftvtlpslalargevaaltgqsgcgkstllemigailrpdtlgeyrlhqpevdiaaplmaanevamsairarelgfvlqhggllpwltvidnivlprrlagmdihshwlr", 3])

        clotho.run("org.andersonlab.py_tblastx", 'aagatggagaggcaaaattaaaatctatgaaaaattacaaaaaatttat', 3)

        (not JSON serializable)

    • to test:

      • convertfile, filenucseqfetch
Clone this wiki locally