# 1. Partial Correlations

Modify the code of the `partial_cor()` function from Class 09d, so that it takes as input a correlation matrix and the name of one variable, and returns the partial correlation for each pair of variables controlled for the variable you passed to the function.

For example, if you use the correlation matrices that we calculated in Class 9d, you could use the function by calling: `partial_cor(cor1, CAI)`, this would return the partial correlation of each pair of variables (minus CAI) controlled by CAI.

To do this, please save the `result` dataframe from Class 9d to a file, copy the file to the homework directory, read the file into a datframe, calculate the correlation matrix, and finally test your function.

In [9]:
import pandas as pd
from math import sqrt


result = pd.read_csv("result_df.csv")
cor1 = result.corr(numeric_only=True)

def partial_cor(cor, x):
    partialCor = cor.copy()
    partialCor = partialCor.drop(x, axis=0)
    partialCor = partialCor.drop(x, axis=1)

    for i in partialCor.columns:
        for j in partialCor.columns:
            if i != j:
                num = cor.loc[i, j] - cor.loc[i, x] * cor.loc[j, x]
                denom = sqrt((1 - cor.loc[i, x] ** 2) * (1 - cor.loc[j, x] ** 2))
                partialCor.loc[i, j] = num / denom

    return partialCor

print(partial_cor(cor1, 'CAI'))

                      dS        dN     dN/dS  dS adjusted  dN/dS adjusted  \
dS              1.000000  0.334624  0.081485     1.000000        0.145174   
dN              0.334624  1.000000  0.942190     0.334624        0.967828   
dN/dS           0.081485  0.942190  1.000000     0.081485        0.992111   
dS adjusted     1.000000  0.334624  0.081485     1.000000        0.145175   
dN/dS adjusted  0.145174  0.967828  0.992111     0.145175        1.000000   
fitness         0.072793  0.171000  0.174507     0.072793        0.173285   
fdS             0.922889  0.227405 -0.007260     0.922889        0.056628   
fdN             0.370816  0.827892  0.818125     0.370816        0.812879   
fdNdS           0.136501  0.798871  0.854336     0.136502        0.829747   
fdSadj          0.994316  0.322453  0.067914     0.994316        0.132763   
fdNdSadj        0.216480  0.824645  0.858034     0.216480        0.842410   

                 fitness       fdS       fdN     fdNdS    fdSadj  fdNdSadj 

# 2. Results from Wall et al.

Download the HTML page for the Wall et al paper from the PNAS website (https://www.pnas.org/content/102/15/5483/) using your browser. Then use BeautifulSoup (not pandas) to parse the HTML and scrape Table 1, finally print the value of the BeautifulSoup variable containing the table using the `IPYthon.display.Markdown` function to display the table. 

(Unfortunately, we cannot automatically download the HMTL for the page anymore because the PNAS site uses Cloudfare to prevent users from scraping so we cannot use python's request package to download the page)

In [13]:
from bs4 import BeautifulSoup
import pandas as pd
from IPython.display import display, Markdown


file = open("wall.html")
html = file.read()

soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
rows = table.find_all('tr')
totalTable = []

for tr in rows:
    cells = tr.findAll('td')
    row = []
    for th in tr.find_all('th'):
        row.append(th.get_text().strip())
    for td in cells:
        row.append(td.get_text().strip())
    totalTable.append(row)

finalTable = pd.DataFrame(totalTable)

finalDataTable = finalTable.to_markdown(index=False)

display(Markdown(finalDataTable))

| 0              | 1                | 2        | 3              | 4         | 5        | 6         |
|:---------------|:-----------------|:---------|:---------------|:----------|:---------|:----------|
| Evolution rate | Dispensability   | rdk      | Expression     | rxk       | rdk|x    | xk|d      |
| dN/dS′         | Warringer et al. | 0.239 np | mRNA abundance | -0.368 np | 0.183 np | -0.328 np |
|                |                  |          | CAI            | -0.528 np | 0.190 np | -0.513 np |
| dN             | Warringer et al. | 0.237 np | mRNA abundance | -0.363 np | 0.181 np | -0.324 np |
|                |                  |          | CAI            | -0.493 np | 0.189 np | -0.478 np |
| dN/dS′         | SGTC             | 0.230 np | mRNA abundance | -0.368 np | 0.166 np | -0.330 np |
|                |                  |          | CAI            | -0.528 np | 0.187 np | -0.516 np |
| dN             | SGTC             | 0.227 np | mRNA abundance | -0.363 np | 0.163 np | -0.325 np |
|                |                  |          | CAI            | -0.493 np | 0.185 np | -0.479 np |
| dN/dS′         | Warringer et al. | 0.274    | mRNA abundance | -0.279    | 0.259    | -0.256    |
|                |                  |          | CAI            | -0.522    | 0.241    | -0.505    |
| dN             | Warringer et al. | 0.274    | mRNA abundance | -0.282    | 0.259    | -0.259    |
|                |                  |          | CAI            | -0.509    | 0.241    | -0.491    |
| dN/dS′         | SGTC             | 0.264    | mRNA abundance | -0.279    | 0.252    | -0.258    |
|                |                  |          | CAI            | -0.522    | 0.232    | -0.505    |
| dN             | SGTC             | 0.264    | mRNA abundance | -0.282    | 0.251    | -0.262    |
|                |                  |          | CAI            | -0.509    | 0.232    | -0.491    |