# 1. Partial Correlations

Modify the code of the `partial_cor()` function from Class 09d, so that it takes as input a correlation matrix and the name of one variable, and returns the partial correlation for each pair of variables controlled for the variable you passed to the function.

For example, if you use the correlation matrices that we calculated in Class 9d, you could use the function by calling: `partial_cor(cor1, CAI)`, this would return the partial correlation of each pair of variables (minus CAI) controlled by CAI.

To do this, please save the `result` dataframe from Class 9d to a file, copy the file to the homework directory, read the file into a datframe, calculate the correlation matrix, and finally test your function.

In [1]:
import pandas as pd
import numpy as np

# Read the data from a CSV file
data = pd.read_csv('result_df.csv')

# Describe the data to check for issues
data.describe()

# Strip white-space function
def stripper(x):
    return x.strip()

data['ORF'] = data['ORF'].apply(stripper)

cor1 = data.corr(numeric_only=True)
cor2 = data.corr(method='spearman', numeric_only=True)

# Assuming cor1 is the correlation matrix dataframe from your previous analysis
# Saving cor1 to a CSV file
cor1.to_csv("correlation_matrix.csv")

# Function to calculate partial correlation
def partial_cor(cor, var_name):
    partial_correlation = {}
    for col1 in cor.columns:
        for col2 in cor.columns:
            if col1 != col2 and col1 != var_name and col2 != var_name:
                cor1 = cor.loc[col1, col2]
                cor2 = cor.loc[col1, var_name]
                cor3 = cor.loc[col2, var_name]
                partial_correlation[(col1, col2)] = (cor1 - cor2 * cor3) / np.sqrt((1 - cor2**2) * (1 - cor3**2))
    return pd.Series(partial_correlation).unstack()

# Load the correlation matrix dataframe from file
df = pd.read_csv("correlation_matrix.csv", index_col=0)

# Test the partial_cor function
partial_correlation_matrix = partial_cor(df, 'CAI')
print("Partial Correlation Matrix:")
print(partial_correlation_matrix)



Partial Correlation Matrix:
                      dN     dN/dS  dN/dS adjusted        dS  dS adjusted  \
dN                   NaN  0.942190        0.967828  0.334624     0.334624   
dN/dS           0.942190       NaN        0.992111  0.081485     0.081485   
dN/dS adjusted  0.967828  0.992111             NaN  0.145174     0.145175   
dS              0.334624  0.081485        0.145174       NaN     1.000000   
dS adjusted     0.334624  0.081485        0.145175  1.000000          NaN   
fdN             0.827892  0.818125        0.812879  0.370816     0.370816   
fdNdS           0.798871  0.854336        0.829747  0.136501     0.136502   
fdNdSadj        0.824645  0.858034        0.842410  0.216480     0.216480   
fdS             0.227405 -0.007260        0.056628  0.922889     0.922889   
fdSadj          0.322453  0.067914        0.132763  0.994316     0.994316   
fitness         0.171000  0.174507        0.173285  0.072793     0.072793   

                     fdN     fdNdS  fdNdSadj   

# 2. Results from Wall et al.

Download the HTML page for the Wall et al paper from the PNAS website (https://www.pnas.org/content/102/15/5483/) using your browser. Then use BeautifulSoup (not pandas) to parse the HTML and scrape Table 1, finally print the value of the BeautifulSoup variable containing the table using the `IPYthon.display.Markdown` function to display the table. 

(Unfortunately, we cannot automatically download the HMTL for the page anymore because the PNAS site uses Cloudfare to prevent users from scraping so we cannot use python's request package to download the page)

In [3]:
from bs4 import BeautifulSoup
import pandas as pd
from IPython.display import Markdown

# Load the HTML file (assuming it's saved as 'wall_et_al_paper.html')
with open('wall_et_al_paper.html', 'r', encoding='utf-8') as file:
    html_content = file.read()

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

table = soup.find("table")
fullTable2 = []

rows = table.find_all('tr')
for tr in rows:
    line = []
    if tr.find_all('th'):
        columnNames = []
        for th in tr.find_all('th'):
            columnNames.append(th.get_text().strip())
    else:
        for td in tr.find_all('td'):
            line.append(td.get_text().strip())
        fullTable2.append(line)

newTable = pd.DataFrame(fullTable2, columns=columnNames)
# Convert DataFrame to Markdown format
markdown_table = newTable.to_markdown(index=False)

# Display the Markdown table
Markdown(markdown_table)


| Evolution rate   | Dispensability   | rdk      | Expression     | rxk       | rdk|x    | xk|d      |
|:-----------------|:-----------------|:---------|:---------------|:----------|:---------|:----------|
| dN/dS′           | Warringer et al. | 0.239 np | mRNA abundance | -0.368 np | 0.183 np | -0.328 np |
|                  |                  |          | CAI            | -0.528 np | 0.190 np | -0.513 np |
| dN               | Warringer et al. | 0.237 np | mRNA abundance | -0.363 np | 0.181 np | -0.324 np |
|                  |                  |          | CAI            | -0.493 np | 0.189 np | -0.478 np |
| dN/dS′           | SGTC             | 0.230 np | mRNA abundance | -0.368 np | 0.166 np | -0.330 np |
|                  |                  |          | CAI            | -0.528 np | 0.187 np | -0.516 np |
| dN               | SGTC             | 0.227 np | mRNA abundance | -0.363 np | 0.163 np | -0.325 np |
|                  |                  |          | CAI            | -0.493 np | 0.185 np | -0.479 np |
| dN/dS′           | Warringer et al. | 0.274    | mRNA abundance | -0.279    | 0.259    | -0.256    |
|                  |                  |          | CAI            | -0.522    | 0.241    | -0.505    |
| dN               | Warringer et al. | 0.274    | mRNA abundance | -0.282    | 0.259    | -0.259    |
|                  |                  |          | CAI            | -0.509    | 0.241    | -0.491    |
| dN/dS′           | SGTC             | 0.264    | mRNA abundance | -0.279    | 0.252    | -0.258    |
|                  |                  |          | CAI            | -0.522    | 0.232    | -0.505    |
| dN               | SGTC             | 0.264    | mRNA abundance | -0.282    | 0.251    | -0.262    |
|                  |                  |          | CAI            | -0.509    | 0.232    | -0.491    |