## Hands-On 10: PWM_One - Constructing PWM [Updated: March 13, 2023]

### Problem A: [5' Splice Sites]

From Hands-On Ten, Question 7: <br><br>
7) To double-check the values of Table 4, run the Python program, create_pwm.py, that:

* Takes as input the seven 9-mers created in part 1) of this hands-on exercise, saved in the file:  <font color=blue>**mog_9mers.txt**</font>
* Computes the probability of occurrence of each base at each position. 
* Computes the 36 log-odds scores (log of observed/expected) where expected is 0.28 for A’s and T’s, and 0.22 for C’s and G’s, and with pseudocount values = 1.
* Writes the 36 values of the entries of the PWM into a file: 
<font color=blue>**mog_9mers_pwm.txt**</font>
 - Note: <font color=blue>**mog_9mers_pwm.txt**</font> has 4 rows with 9 values in each row, as in Table 4.


The 9 mers of <font color=blue>**mog_9mers.txt**</font>:
* CAGGTAAGA 
* AAGGTGAGT 
* GAGGTACAG
* TAGGTGAGT
* TTGGTAAGT
* CAGGTGCAG
* TACGTAAGT

In [2]:
# create_pwm.py
# Author: Sami Khuri
# Last updated: March 1, 2023
# Purpose: Program to create a position weight matrix from the 9-mers of MOG
# Program uses the Python function open() and the Python methods readlines(),
# write(), close(), strip(), and log(), and the math package
#
 
import math

# Initialize the PWM with four rows and nine columns [i.e., 4 lists of zeros]
a = [0]*9
c = [0]*9
g = [0]*9
t = [0]*9

# Take mog_9mers.txt as input
# Read line by line, stripping the end of line character and
# updating the PWM with the frequencies of each base at the 9 positions
input_file = open("mog_9mers.txt","r")   
for line in input_file.readlines():
    line = line.strip('\n')
    for i in range(9):
        if line[i] == 'A':
            a[i] = a[i]+1
        elif line[i] == 'C':
            c[i] = c[i]+1
        elif line[i] == 'G':
            g[i] = g[i]+1
        else:
            t[i] = t[i]+1
            
input_file.close()

# Compute the probability of occurrence of each character after adding the 
#    LaPlace pseudocount i.e., +1 added to each base

# Compute the 36 log-odd scores (log of observed/expected) where expected is 0.22
#     for C and G and 0.28 for A and T, and the log is taken in base 2

for i in range(9):
    a[i] = round(math.log(((a[i] + 1)/11)/0.28,2),3)
    c[i] = round(math.log(((c[i] + 1)/11)/0.22,2),3)
    g[i] = round(math.log(((g[i] + 1)/11)/0.22,2),3)
    t[i] = round(math.log(((t[i] + 1)/11)/0.28,2),3)

# Write the 36 values of the entries of the PWM into a file mog_9mers_pwm.txt,
# mog_9mers_pwm.txt should have 4 rows with 9 values in each row

output_file = open("mog_9mers_pwm.txt","w")
for i in range(9):
    output_file.write(str(a[i]) + '\t')
output_file.write("\n")
for i in range(9):
    output_file.write(str(c[i]) + '\t')
output_file.write("\n")
for i in range(9):
    output_file.write(str(g[i]) + '\t')
output_file.write("\n")
for i in range(9):
    output_file.write(str(t[i]) + '\t')
output_file.write("\n")

output_file.close()

Note: The output file of this hands-on exercise: <font color=blue>**mog_9mers_pwm.txt**</font>, will be used as input file to the program of the next hands-on exercise: score_9mers_MOG.py, that will compute the score of the 9 mers of the MOG gene. 
