# Plot data on structures

The goal of this notebook is to plot the data on various CHIKV structures using [`dms-viz`](https://dms-viz.github.io/v0/). There are two structures of CHIKV in complex with Mxra8: [6JO8](https://www.rcsb.org/structure/6JO8) from [this paper](https://www.sciencedirect.com/science/article/pii/S0092867419303940?via%3Dihub), and [6NK6](https://www.rcsb.org/structure/6nk6) from [this paper](https://www.cell.com/cell/pdf/S0092-8674(19)30392-7.pdf). We'll plot the data on the monomeric and trimeric forms of both structures.

[Here's](https://dms-viz.github.io/dms-viz-docs/) the documentation for `dms-viz`. Unfortunately, it's a little slow on larger structures and big datasets like this. You might have to wait a bit for your interactions to register.


In [2]:
import pandas as pd
import os
import sys

## Functional Scores

First, combine the average functional scores——the effect of mutations on cell entry——for each cell type into a single dataset.

In [3]:
# Average *observed* effect on cell entry for each cell line
TIM1_func_effects = pd.read_csv('../results/func_effects/averages/293T-TIM1_entry_func_effects.csv')
TIM1_func_effects["condition"] = 'TIM1'
MXRA8_func_effects = pd.read_csv('../results/func_effects/averages/293T-Mxra8_entry_func_effects.csv')
MXRA8_func_effects["condition"] = 'MXRA8'
C636_func_effects = pd.read_csv('../results/func_effects/averages/C636_entry_func_effects.csv')
C636_func_effects["condition"] = 'C636'

In [4]:
# Annotations for each site in CHIKV E
CHIKV_E_annotations = pd.read_csv('../data/site_numbering_map.csv').rename(columns={'sequential_site': 'site'})

In [5]:
# Combine all functional effects
combined_func_effects = (
    pd.concat([TIM1_func_effects, MXRA8_func_effects, C636_func_effects])
    .merge(
        CHIKV_E_annotations[['site', 'wildtype', 'region']], 
        on=['site', 'wildtype'],
        how='left'
    )
) 
combined_func_effects.head()

Unnamed: 0,site,wildtype,mutant,effect,effect_std,times_seen,n_selections,condition,region
0,1,M,I,-5.765,0.008958,17.25,4,TIM1,E3
1,1,M,L,-1.297,0.1075,0.5,2,TIM1,E3
2,1,M,M,0.0,0.0,,4,TIM1,E3
3,1,M,T,-5.745,0.0483,5.0,4,TIM1,E3
4,1,M,V,-5.748,0.01202,1.5,2,TIM1,E3


In [6]:
# Write to file
combined_func_effects.to_csv('./dms-viz/input/combined_functional_effects.csv', index=False)

### Make `dms-viz` JSONs

Now use `configure-dms-viz` to make a `dms-viz` JSON visualization file of the functional scores on both structures.

#### [6JO8](https://www.rcsb.org/structure/6JO8) Monomer w/ Mxra8

First, plot the functional scores for all three cell types on the monomeric CHIKV E structure. This structure contains the E1, E2, and E3 domains.

Currently, I'm applying a default filter of `times_seen > 2` to the data. We might want to explore more filters.

In [7]:
!configure-dms-viz format \
    --input ./dms-viz/input/combined_functional_effects.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6JO8_monomer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6JO8_monomer_functional_scores.json \
    --name "CHIKV Func. Scores" \
    --metric "effect" \
    --metric-name "Functional Effect" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B" \
    --excluded-chains "C E D F M N" \
    --condition "condition" \
    --condition-name "Cell Line" \
    --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.', 'region': 'Region'}" \
    --filter-cols "{'n_selections': '# of Selections', 'times_seen': 'Times Seen'}" \
    --filter-limits "{'times_seen': [0, 2, 25]}" \
    --structure "6JO8" \
    --colors "#0072B2,#CC79A7,#4C3549"

[32m
Formatting data for visualization using the 'effect' column from './dms-viz/input/combined_functional_effects.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6JO8_monomer_sitemap.csv'.[0m
[0m
[33mAbout 97.19% (762 of 784) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 17.21% (163 of 947) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6JO8_monomer_functional_scores.json'[0m


#### [6JO8](https://www.rcsb.org/structure/6JO8) Trimer w/ a single Mxra8

Next, plot the same functional scores on the timer. I'm choosing to show only a single Mxra8 so can see the functional scores with and without the receptor.

In [8]:
!configure-dms-viz format \
    --input ./dms-viz/input/combined_functional_effects.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6JO8_trimer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6JO8_trimer_functional_scores.json \
    --name "CHIKV Func. Scores" \
    --metric "effect" \
    --metric-name "Functional Effect" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B C E D F" \
    --excluded-chains "M N" \
    --condition "condition" \
    --condition-name "Cell Line" \
    --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.', 'region': 'Region'}" \
    --filter-cols "{'n_selections': '# of Selections', 'times_seen': 'Times Seen'}" \
    --filter-limits "{'times_seen': [0, 2, 25]}" \
    --structure "6JO8" \
    --colors "#0072B2,#CC79A7,#4C3549"

[32m
Formatting data for visualization using the 'effect' column from './dms-viz/input/combined_functional_effects.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6JO8_trimer_sitemap.csv'.[0m
[0m
[33mAbout 97.19% (762 of 784) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 17.21% (163 of 947) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6JO8_trimer_functional_scores.json'[0m


#### [6NK6](https://www.rcsb.org/structure/6nk6) Trimer w/ Mxra8

Also, plot the functional scores for all three cell types on the trimeric CHIKV E structure from Cyro-EM of the VLP. This structure contains the E1, E2, and the capsid.

In [9]:
!configure-dms-viz format \
    --input ./dms-viz/input/combined_functional_effects.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6NK6_trimer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6NK6_trimer_functional_scores.json \
    --name "CHIKV Func. Scores" \
    --metric "effect" \
    --metric-name "Functional Effect" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B C D E F G H" \
    --excluded-chains "M O P" \
    --condition "condition" \
    --condition-name "Cell Line" \
    --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.', 'region': 'Region'}" \
    --filter-cols "{'n_selections': '# of Selections', 'times_seen': 'Times Seen'}" \
    --filter-limits "{'times_seen': [0, 2, 25]}" \
    --structure "6NK6" \
    --colors "#0072B2,#CC79A7,#4C3549"

[32m
Formatting data for visualization using the 'effect' column from './dms-viz/input/combined_functional_effects.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6NK6_trimer_sitemap.csv'.[0m
[0m
[33mAbout 94.64% (812 of 858) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 4.67% (42 of 900) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6NK6_trimer_functional_scores.json'[0m


#### [6NK6](https://www.rcsb.org/structure/6nk6) Monomer w/ Mxra8

In [20]:
!configure-dms-viz format \
    --input ./dms-viz/input/combined_functional_effects.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6NK6_monomer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6NK6_monomer_functional_scores.json \
    --name "CHIKV Func. Scores" \
    --metric "effect" \
    --metric-name "Functional Effect" \
    --exclude-amino-acids "*, -" \
    --included-chains "A E" \
    --excluded-chains "N P B C D F G H J K L" \
    --condition "condition" \
    --condition-name "Cell Line" \
    --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.', 'region': 'Region'}" \
    --filter-cols "{'n_selections': '# of Selections', 'times_seen': 'Times Seen'}" \
    --filter-limits "{'times_seen': [0, 2, 25]}" \
    --structure "6NK6" \
    --colors "#0072B2,#CC79A7,#4C3549"

[32m
Formatting data for visualization using the 'effect' column from './dms-viz/input/combined_functional_effects.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6NK6_monomer_sitemap.csv'.[0m
[0m
[33mAbout 94.64% (812 of 858) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 4.67% (42 of 900) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6NK6_monomer_functional_scores.json'[0m


## Functional Scores Difference

Next, we'll plot the difference between functional scores in each cell line.

In [11]:
# Variables for filtering the raw functional effects
min_times_seen = 2

# Pivot and calculate the mean functional effect for each site and wildtype
functional_selection_difference = (
    combined_func_effects
    .query('mutant not in ["*", "-"]')
    .query(f'times_seen > {min_times_seen}')
    .pivot(index=['site', 'wildtype', 'mutant'], columns='condition', values='effect')
    .rename_axis(None, axis=1)
    .reset_index()
)

# Get the condition columns (excluding 'site' and 'wildtype')
condition_cols = [col for col in functional_selection_difference.columns 
                 if col not in ['site', 'wildtype', 'mutant']]

# Calculate all pairwise differences
for col1 in condition_cols:
    for col2 in condition_cols:
        if col1 != col2:
            new_col_name = f"{col1}_v_{col2}"
            functional_selection_difference[new_col_name] = (
                functional_selection_difference[col1] - functional_selection_difference[col2]
            )

# Melt the comparisons into a long format
functional_selection_difference = functional_selection_difference.melt(
    id_vars=["site", "wildtype", "mutant"],
    value_vars=["C636_v_MXRA8", "C636_v_TIM1", "MXRA8_v_C636", "MXRA8_v_TIM1", "TIM1_v_C636", "TIM1_v_MXRA8"],
    var_name="comparison",
    value_name="difference"
)
            
# Write to file
functional_selection_difference.to_csv('./dms-viz/input/functional_selection_difference.csv', index=False)
functional_selection_difference.head()

Unnamed: 0,site,wildtype,mutant,comparison,difference
0,1,M,I,C636_v_MXRA8,0.001
1,1,M,T,C636_v_MXRA8,0.043
2,2,S,A,C636_v_MXRA8,0.0814
3,2,S,C,C636_v_MXRA8,0.0958
4,2,S,D,C636_v_MXRA8,0.3251


#### 6JO8 Trimer w/ Mxra8

In [12]:
!configure-dms-viz format \
    --input ./dms-viz/input/functional_selection_difference.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6JO8_trimer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6JO8_trimer_functional_diff.json \
    --name "CHIKV Cell Entry Difference" \
    --metric "difference" \
    --metric-name "Effect Difference" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B C E D F" \
    --excluded-chains "M N" \
    --condition "comparison" \
    --condition-name "Comparison (Left - Right)" \
    --structure "6JO8" \
    --colors "#0072B2,#CC79A7,#4C3549,#009E73,#E69F00,#56B4E9"

[32m
Formatting data for visualization using the 'difference' column from './dms-viz/input/functional_selection_difference.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6JO8_trimer_sitemap.csv'.[0m
[31m
[33mAbout 97.19% (762 of 784) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 17.21% (163 of 947) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6JO8_trimer_functional_diff.json'[0m


#### 6JO8 Monomer w/ Mxra8

In [13]:
!configure-dms-viz format \
    --input ./dms-viz/input/functional_selection_difference.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6JO8_monomer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6JO8_monomer_functional_diff.json \
    --name "CHIKV Cell Entry Difference" \
    --metric "difference" \
    --metric-name "Effect Difference" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B" \
    --excluded-chains "C E D F M N" \
    --condition "comparison" \
    --condition-name "Comparison (Left - Right)" \
    --structure "6JO8" \
    --colors "#0072B2,#CC79A7,#4C3549,#009E73,#E69F00,#56B4E9"

[32m
Formatting data for visualization using the 'difference' column from './dms-viz/input/functional_selection_difference.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6JO8_monomer_sitemap.csv'.[0m
[31m
[33mAbout 97.19% (762 of 784) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 17.21% (163 of 947) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6JO8_monomer_functional_diff.json'[0m


#### [6NK6](https://www.rcsb.org/structure/6nk6) Trimer w/ Mxra8

In [14]:
!configure-dms-viz format \
    --input ./dms-viz/input/functional_selection_difference.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6NK6_trimer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6NK6_trimer_functional_diff.json \
    --name "CHIKV Cell Entry Difference" \
    --metric "difference" \
    --metric-name "Effect Difference" \
    --exclude-amino-acids "*, -" \
    --included-chains "A B C D E F G H" \
    --excluded-chains "M O P" \
    --condition "comparison" \
    --condition-name "Comparison (Left - Right)" \
    --structure "6NK6" \
    --colors "#0072B2,#CC79A7,#4C3549,#009E73,#E69F00,#56B4E9"

[32m
Formatting data for visualization using the 'difference' column from './dms-viz/input/functional_selection_difference.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6NK6_trimer_sitemap.csv'.[0m
[31m
[33mAbout 94.64% (812 of 858) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 4.67% (42 of 900) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6NK6_trimer_functional_diff.json'[0m


#### [6NK6](https://www.rcsb.org/structure/6nk6) Monomer w/ Mxra8

In [19]:
!configure-dms-viz format \
    --input ./dms-viz/input/functional_selection_difference.csv \
    --sitemap ./dms-viz/sitemap/CHIKV_6NK6_monomer_sitemap.csv \
    --output ./dms-viz/output/CHIKV_6NK6_monomer_functional_diff.json \
    --name "CHIKV Cell Entry Difference" \
    --metric "difference" \
    --metric-name "Effect Difference" \
    --exclude-amino-acids "*, -" \
    --included-chains "A E" \
    --excluded-chains "N P B C D F G H J K L" \
    --condition "comparison" \
    --condition-name "Comparison (Left - Right)" \
    --structure "6NK6" \
    --colors "#0072B2,#CC79A7,#4C3549,#009E73,#E69F00,#56B4E9"

[32m
Formatting data for visualization using the 'difference' column from './dms-viz/input/functional_selection_difference.csv'...[0m
[32m
Using sitemap from './dms-viz/sitemap/CHIKV_6NK6_monomer_sitemap.csv'.[0m
[31m
[33mAbout 94.64% (812 of 858) of the wildtype residues in the data match the corresponding residues in the structure.[0m
[33mAbout 4.67% (42 of 900) of the data sites are missing from the structure.[0m
[32m
Success! The visualization JSON was written to './dms-viz/output/CHIKV_6NK6_monomer_functional_diff.json'[0m
