__1) Math Overload__

Use the **math** module to compute the following:

* $4^8$
* $log_{10}(3)$
* $e^3$
* $cos(\pi)$
* $ln(e^3)$

In [2]:
import math

print math.pow(4,8)
print math.log10(3)
print math.exp(3)
print math.cos(math.pi)
##math.log uses e as default base
print math.log(math.exp(3))

65536.0
0.47712125472
20.0855369232
-1.0
3.0


__2. Making Directories__

Bioinformatics projects require an organized system of files and directories. Work using the **subprocess** module to create a mock file system. Use the subprocess module to do the following (all within a Jupyter Notebook)

* (A) Create a *Mock_Project* directory within your /2017_Winter_Python/4.2_Modules_Numpy_Scipy folder
* (B) Create *data*, *processed_data*, *scripts*, and *plots* directories.
* (C) Print a list of the files in the *Mock_Project* directory
* (D) Save the list of all files in the Mock_Project directory to a file called 'directory.txt'

In [1]:
##(A)
import subprocess as sp
sp.check_output('mkdir Mock_Project', shell=True)

''

In [2]:
##(B)
sp.check_output('mkdir Mock_Project/data', shell=True)
sp.check_output('mkdir Mock_Project/processed_data', shell=True)
sp.check_output('mkdir Mock_Project/scripts', shell=True)
sp.check_output('mkdir Mock_Project/plots', shell=True)

''

In [3]:
##(C)
out = sp.check_output('ls Mock_Project', shell=True)
print out

data
plots
processed_data
scripts



In [5]:
##(D)
sp.check_output('ls Mock_Project > Mock_Project/directory.txt',
                shell=True)

''

__3. BLAST a Fasta File__

Write a function that takes the name of a fasta file as input, and prints the results of a BLAST search of every sequence in the file. Use the __blastn__ option __-outfmt 6__ to give the output in summary tabular form. Test this function with the file *test.fasta* in this directory.

Feel free to modify BLAST code from the above lecture, and to re-use your Fasta parser from this morning's solutions. Bonus: Allow an option to save the output to a file.

In [17]:
#%%writefile python_blast.py
import subprocess as sp

#Use our old fasta parser
def parse_fasta(infile):
    fh = open(infile, 'r')

    genes = {}

    # Appending to a list is much, much faster than
    # concatenating strings.
    for line in fh:
        line = line.strip()
        if line.startswith('>'):
            ID = line[1:]
            genes[ID] = []
        else:
            genes[ID].append(line)
            
    # Turn the dictionary of lists into a dictionary of
    # strings with the join method.
    for gene in genes:
        genes[gene] = ''.join(genes[gene])
    
    return genes
        
# This will blast a specific sequence. If no outfh is given it will
# return the output, otherwise, will write it to outfh.
def blastn(querySeq, outfh = None):
    # Make the input file
    fh = open('query.seq', 'w')
    fh.write(querySeq)
    fh.close()
    
    command = 'blastn -db yeast_genome -query query.seq -outfmt 6'
    out = sp.check_output(command, shell=True)
    
    if outfh:
        # Write the output to the given filehandle
        outfh.write(out)
        return None
    else:
        # Simply return the output
        return out

# This wrapper function calls the parser
# and blasts all sequences to the db.
def blast_fasta(fasta_file, output_file = None):
    fasta_dict = parse_fasta(fasta_file)
    
    if output_file:
        outfh = open(output_file, 'w')
    else:
        # We will still need something to pass blastn,
        # but that something can be Nothing.
        outfh = None
    
    results = {name: blastn(seq, outfh)
               for name, seq in fasta_dict.items()}
    # Or, equivalently:
    #results = {}
    #for name, seq in fasta_dict.items():
    #    results[name] = blastn(seq, outfh)

    if outfh:
        outfh.close()
    
    return results

if __name__ == '__main__':
    # This special line makes the following code not run when imported.
    print blast_fasta('test.fasta', output_file = 'test_blastn.txt')
    print blast_fasta('test.fasta')

{'test1': None, 'test3': None, 'test2': None}
{'test1': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t255037\t254958\t1e-36\t148\n', 'test3': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t254877\t254798\t1e-36\t148\n', 'test2': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t254957\t254878\t1e-36\t148\n'}


__4. Your own BLAST module__

Turn the code you wrote above into your own BLAST module, which should include functions to BLAST a specific sequence, as well as to BLAST all sequences in a given Fasta file.

Save these funcitons to a file, and add the folder you saved your module in to your PYTHONPATH by modifying your **~/.bashrc** file.

Finally, try importing and using your new Python BLAST module.

In [16]:
# You should have saved the module from above as something
# (e.g. python_blast.py) in some directory (e.g.
# ~/Dropbox/2018_winter_python/my_modules). Next, you should have added
# that directory to your PYTHONPATH by adding:
#
# PYTHONPATH=$PYTHONPATH:~/Dropbox/2018_winter_python/my_modules
# export PYTHONPATH
#
# or similar to your ~/.bashrc file.
#
# If you have done that correctly, the following should work:
import python_blast

print python_blast.blast_fasta('test.fasta')

{'test1': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t255037\t254958\t1e-36\t148\n', 'test3': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t254877\t254798\t1e-36\t148\n', 'test2': 'Query_1\tgi|6321039|ref|NC_001138.1|\t100.00\t80\t0\t0\t1\t80\t254957\t254878\t1e-36\t148\n'}
