<style>
.analysis-title {
    color: #2563eb !important;
    font-size: 2.8rem;
    font-weight: 700;
    text-align: center;
    border-bottom: 4px solid #dbeafe;
    padding-bottom: 15px;
    margin-bottom: 25px;
    text-shadow: 2px 2px 4px rgba(0,0,0,0.1);
}
.metadata-box {
    background: linear-gradient(135deg, #f8fafc 0%, #e2e8f0 100%);
    border-left: 5px solid #3b82f6;
    padding: 20px;
    border-radius: 10px;
    margin: 20px 0;
    box-shadow: 0 4px 15px rgba(0,0,0,0.1);
}
.metadata-text {
    color: #1e293b;
    font-size: 1.1rem;
    line-height: 1.8;
    margin: 0;
}
.overview-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
}
.section-text {
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
    text-align: justify;
}
.subsection-header {
    color: #4338ca !important;
    font-size: 1.4rem;
    font-weight: 600;
    margin-top: 25px;
    margin-bottom: 10px;
    border-bottom: 2px solid #e0e7ff;
    padding-bottom: 5px;
}
.data-list {
    background-color: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 8px;
    padding: 15px;
    margin: 15px 0;
}
.data-list ul {
    margin: 0;
    color: #475569;
    font-size: 1rem;
}
.data-list li {
    margin-bottom: 8px;
    padding-left: 5px;
}
.data-list code {
    background-color: #f1f5f9;
    color: #dc2626;
    padding: 2px 6px;
    border-radius: 4px;
    font-weight: 500;
}
</style>

<h1 class="analysis-title">Damaging Mutation Sums</h1>

<div class="metadata-box">
<p class="metadata-text">
<strong style="color: #1e40af;">Project:</strong> Computational Biology DMV Petri Dish<br>
<strong style="color: #1e40af;">Author:</strong> Chris Indorf<br>
<strong style="color: #1e40af;">Date:</strong> August 18, 2025<br>
<strong style="color: #1e40af;">Language:</strong> Python
</p>
</div>

<h2 class="overview-header">Overview</h2>

<p class="section-text">This notebook performs data retrieval and visualization for analysis of cancer cell lines with damaging mutations. The notebook first connects to a PostgreSQL data warehouse to query mutation data, after which data analysis is performed. It then creates an interactive histogram visualization that provides filtering by tissue type, mutation value, and number of histogram bins.</p>

<h3 class="subsection-header">Data Sources</h3>
<div class="data-list">
<ul>
<li><strong>Database:</strong> data_warehouse (PostgreSQL)</li>
<li><strong>Tables:</strong>
    <ul>
        <li><code>im_dep_sprime_damaging_mutations</code></li>
        <li><code>im_dep_raw_secondary_dose_curve</code></li>
        </ul>
</li>
</ul>
</div>

<h3 class="subsection-header">Libraries Used</h3>
<div class="data-list">
<ul>
<li><code>psycopg2</code> - PostgreSQL database connectivity</li>
<li><code>dotenv</code> - To retrieve environmental variable</li>
<li><code>os</code> - To retrieve environmental variable</li>
<li><code>pandas</code> - Data manipulation and analysis</li>
<li><code>altair</code> - Statistical data visualization</li>
<li><code>numpy</code> - Numerical functions</li>
<li><code>warnings</code> - Suppresses warning messages</li>
</ul>
</div>

<h3 class="subsection-header">Change Log</h3>
<div class="data-list">
<ul>
<li><code>August 18, 2025 </code> - Version 1</li>
</ul>
</div>

In [1]:
# Import required libraries
import psycopg2
from dotenv import load_dotenv
import os
import pandas as pd
import altair as alt
import numpy as np
import warnings
# Ignore warnings
warnings.filterwarnings('ignore')

<style>
.section-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
    background: #f8fafc;
    padding-top: 10px;
    padding-bottom: 10px;
}
.description-text {
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
    margin-bottom: 20px;
}
.process-box {
    background: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
}
.process-list {
    color: #475569;
    font-size: 1rem;
    margin: 0;
}
.process-list li {
    margin-bottom: 10px;
    font-weight: 500;
}
.code-highlight {
    background-color: #f8fafc;
    color: #dc2626;
    padding: 2px 6px;
    border-radius: 4px;
    font-family: 'Monaco', 'Consolas', monospace;
    font-weight: 600;
}
</style>

<h2 class="section-header">Data Retrieval</h2>

<p class="description-text">Connects to the PostgreSQL database and executes queries to retrieve cancer cell lines with damaging mutations. The queries perform the following operations:</p>

<div class="process-box">
<ol class="process-list">
<li><strong>Filters</strong> for different values of <span class="code-highlight">mutation_value </span> (damaging mutations)</li>
<li><strong>Retrieves all cell lines or restricts</strong> cell lines to lung cancer cell lines (<span class="code-highlight">CCLE_name LIKE '%LUNG'</span>)</li>
<li><strong>Groups</strong> by <span class="code-highlight">cell line</span> and sums mutation values for each cell line group</li>
<li><strong>Orders</strong> results by cell line group</li>
</ol>
</div>

In [None]:
# Retrieve password from .env file  
load_dotenv()
password = os.getenv('DB_PASSWORD')  

# Establish database connection
conn = psycopg2.connect(
    host='dmvpetridishdatastore.dev',
    port=5432,
    database='data_warehouse',
    user='comp_bio_u2',
    password=password # password from .env
    # password='ENTER PASSWORD HERE' ### IN THIS DEVELOPMENT VERSION CAN REPLACE 'ENTER PASSWORD HERE' WITH ACTUAL PASSWORD
)
# Part 1. Sum damaging mutations in lung tissue cell lines.

# SQL query to sum damaging mutations in lung tissue cell lines
# Version 1. All mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value > 0 
  AND cell_line IN (
    SELECT depmap_id 
    FROM public.im_dep_raw_secondary_dose_curve
    WHERE ccle_name LIKE '%LUNG')  
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_lung_all = pd.read_sql_query(query, conn)

# SQL query to sum damaging mutations in lung tissue cell lines
# Version 2. Biallelic mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value = 2 
  AND cell_line IN (
    SELECT depmap_id 
    FROM public.im_dep_raw_secondary_dose_curve
    WHERE ccle_name LIKE '%LUNG')  
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_lung_2 = pd.read_sql_query(query, conn)

# SQL query to sum damaging mutations in lung tissue cell lines
# Version 3. Monoallelic mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value = 1 
  AND cell_line IN (
    SELECT depmap_id 
    FROM public.im_dep_raw_secondary_dose_curve
    WHERE ccle_name LIKE '%LUNG')  
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_lung_1 = pd.read_sql_query(query, conn)

# Part 2. Sum damaging mutations in cell lines for all tissues.

# SQL query to sum damaging mutations in cell lines for all tissues
# Version 1. All mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value > 0 
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_all = pd.read_sql_query(query, conn)

# SQL query to sum damaging mutations in cell lines for all tissues
# Version 2. Biallelic mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value >= 2 
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_all_2 = pd.read_sql_query(query, conn)

# SQL query to sum damaging mutations in cell lines for all tissues
# Version 3. Monoallelic mutations
query = """
SELECT cell_line, SUM(mutation_value) 
FROM public.im_dep_sprime_damaging_mutations
WHERE mutation_value= 1 
GROUP BY cell_line
ORDER BY cell_line;
"""
# Execute query and create DataFrame
df_all_1 = pd.read_sql_query(query, conn)

# SQL query to retrieve all values of tissue type to populate drop down.

query = """
    SELECT ccle_name FROM public.im_dep_raw_secondary_dose_curve;  
"""
# Execute query and create DataFrame
df_tissue_types = pd.read_sql_query(query, conn)

query = """
    SELECT DiSTINCT ccle_name FROM public.im_dep_raw_secondary_dose_curve;  
"""
# Execute query and create DataFrame
df_distinct_tissue_types = pd.read_sql_query(query, conn)
# Close database connection
conn.close()

# Display results
print("Lung tissue cell line damaging mutations - all mutations:")
print(df_lung_all)
print("Lung tissue cell line damaging mutations - biallelic mutations:")
print(df_lung_2)
print("Lung tissue cell line damaging mutations - monoallelic mutations:")
print(df_lung_1)
print("Cell line damaging mutations - all tissues - all mutations:")
print(df_all)
print("All tissues - cell line damaging mutations - biallelic mutations:")
print(df_all_2)
print("All tissues -  cell line damaging mutations - monoallelic mutations:")
print(df_all_1)
print("Tissue types:")
print(df_tissue_types)
print("Distinct tissue types:")
with pd.option_context('display.max_rows', None):
    tissue_list = df_distinct_tissue_types['ccle_name'].tolist()
    for i in range(0, len(tissue_list), 4):
        row = tissue_list[i:i+4]
        while len(row) < 4:
            row.append("")
        print(f"{str(row[0]):<45} {str(row[1]):<45} {str(row[2]):<45} {str(row[3]):45}")
              


Lung tissue cell line damaging mutations - all mutations:
     cell_line  sum
0   ACH-000012   14
1   ACH-000015   58
2   ACH-000021   63
3   ACH-000030   32
4   ACH-000035   12
..         ...  ...
89  ACH-000924  206
90  ACH-000929  289
91  ACH-000945  345
92  ACH-001075   20
93  ACH-001113   58

[94 rows x 2 columns]
Lung tissue cell line damaging mutations - biallelic mutations:
     cell_line  sum
0   ACH-000012    2
1   ACH-000015   12
2   ACH-000021   30
3   ACH-000030    4
4   ACH-000035    4
..         ...  ...
88  ACH-000921   94
89  ACH-000924   68
90  ACH-000929   88
91  ACH-000945   30
92  ACH-001113   16

[93 rows x 2 columns]
Lung tissue cell line damaging mutations - monoallelic mutations:
     cell_line  sum
0   ACH-000012   12
1   ACH-000015   46
2   ACH-000021   33
3   ACH-000030   28
4   ACH-000035    8
..         ...  ...
89  ACH-000924  138
90  ACH-000929  201
91  ACH-000945  315
92  ACH-001075   20
93  ACH-001113   42

[94 rows x 2 columns]
Cell line damaging muta

<style>
.filtering-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
    background: #f8fafc;
    padding-top: 10px;
    padding-bottom: 10px;
}
.filtering-explanation {
    background: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
}
.highlight-number {
    background-color: #f8fafc;
    color: #475569;
    padding: 3px 8px;
    border-radius: 4px;
    font-weight: 700;
    font-size: 1.1rem;
}
</style>

<h2 class="filtering-header">Data Analysis</h2>

<div class="filtering-explanation">
<p>Basic analysis and display of the cell line dataframes to support verification of the resulting histogram</p>
</div>

In [None]:
# Analyze results of database retrieval Use pandas.
print ("Lung cell line damaging mutations - all mutations:")
print ('Number of cell lines:', df_lung_all.cell_line.count(), 
       'Smallest cell line number:', df_lung_all.cell_line.min(), 
       'Largest cell_line number:', df_lung_all.cell_line.max(), 
       'Total damaging mutations:', sum(df_lung_all['sum']))

# Show distribution of mutation counts across cell lines
# Left column: number of cell lines with damaging mutations
# Right column: number of cell lines with that count
print('\n')
print(df_lung_all['sum'].value_counts())

print ("Lung cell line damaging mutations - biallelic mutations:")
print ('Number of cell lines:', df_lung_2.cell_line.count(), 
       'Smallest cell line number:', df_lung_2.cell_line.min(), 
       'Largest cell_line number:', df_lung_2.cell_line.max(), 
       'Total damaging mutations:', sum(df_lung_2['sum']))

# Show distribution of mutation counts across cell lines
# Left column: number of cell lines with damaging mutations
# Right column: number of cell lines with that count
print('\n')
print(df_lung_2['sum'].value_counts())

print ("Lung cell line damaging mutations -monoallelic mutations:")
print ('Number of cell lines:', df_lung_1.cell_line.count(), 
       'Smallest cell line number:', df_lung_1.cell_line.min(), 
       'Largest cell_line number:', df_lung_1.cell_line.max(), 
       'Total damaging mutations:', sum(df_lung_1['sum']))
print('\n')
print(df_lung_1['sum'].value_counts())

print ("All tissues - cell line damaging mutations - all mutations:")
print ('Number of cell lines:', df_all.cell_line.count(), 
       'Smallest cell line number:', df_all.cell_line.min(), 
       'Largest cell_line number:', df_all.cell_line.max(), 
       'Total damaging mutations:', sum(df_all['sum']))

# Show distribution of mutation counts across cell lines
# Left column: number of cell lines with damaging mutations
# Right column: number of cell lines with that count
print('\n')
print(df_all['sum'].value_counts())

print ("All tissues - cell line damaging mutations - biallelic mutations:")
print ('Number of cell lines:', df_all_2.cell_line.count(), 
       'Smallest cell line number:', df_all_2.cell_line.min(), 
       'Largest cell_line number:', df_all_2.cell_line.max(), 
       'Total damaging mutations:', sum(df_all_2['sum']))

# Show distribution of mutation counts across cell lines
# Left column: number of cell lines with damaging mutations
# Right column: number of cell lines with that count
print('\n')
print(df_all_2['sum'].value_counts())

print ("All tissues - cell line damaging mutations -monoallelic mutations:")
print ('Number of cell lines:', df_all_1.cell_line.count(), 
       'Smallest cell line number:', df_all_1.cell_line.min(), 
       'Largest cell_line number:', df_all_1.cell_line.max(), 
       'Total damaging mutations:', sum(df_all_1['sum']))

# Show distribution of mutation counts across cell lines
# Left column: number of cell lines with damaging mutations
# Right column: number of cell lines with that count
print('\n')
print(df_all_1['sum'].value_counts())

# Print all values of tissue type
print('\n')
print('Tissue types value counts:')
with pd.option_context('display.max_rows', None):
    print(df_tissue_types['ccle_name'].value_counts())
print('\n')



Lung cell line damaging mutations - all mutations:
Number of cell lines: 94 Smallest cell line number: ACH-000012 Largest cell_line number: ACH-001113 Total damaging mutations: 7939


sum
51     4
45     3
63     3
65     3
66     3
      ..
246    1
250    1
206    1
289    1
345    1
Name: count, Length: 67, dtype: int64
Lung cell line damaging mutations - biallelic mutations:
Number of cell lines: 93 Smallest cell line number: ACH-000012 Largest cell_line number: ACH-001113 Total damaging mutations: 2334


sum
12    8
4     8
16    8
18    7
24    7
28    6
8     6
10    4
14    4
30    4
2     3
68    3
32    2
20    2
22    2
46    2
38    2
50    2
76    1
6     1
52    1
40    1
42    1
44    1
26    1
58    1
78    1
48    1
74    1
94    1
88    1
Name: count, dtype: int64
Lung cell line damaging mutations -monoallelic mutations:
Number of cell lines: 94 Smallest cell line number: ACH-000012 Largest cell_line number: ACH-001113 Total damaging mutations: 5605


sum
29     4
46 

'\n# Sort dataframes ascending by sum\ndf_lung_all = df_lung_all.sort_values(\'sum\') \nprint(\'\n\')\nprint ("Lung cell line damaging mutations -sorted ascending by sum")\nprint(df_lung_all.head(50))\nprint(df_lung_all.tail(50))\n\ndf_all = df_all.sort_values(\'sum\') \nprint(\'\n\')\nprint ("Cell line damaging mutations -sorted ascending by sum")\nprint(df_all.head(50))\nprint(df_all.tail(50))\n'

<style>
.viz-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
    background: #f8fafc;
    padding-top: 10px;
    padding-bottom: 10px;
}
.viz-description {
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
    margin-bottom: 20px;
}
.features-box {
    background: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
}
.features-title {
    color: #374151;
    font-size: 1.3rem;
    font-weight: 600;
    margin-top: 0;
    margin-bottom: 15px;
}
.features-list {
    color: #475569;
    font-size: 1rem;
    margin: 0;
}
.features-list li {
    margin-bottom: 12px;
    font-weight: 500;
}
.feature-highlight {
    background-color: #f8fafc;
    color: #374151;
    padding: 2px 6px;
    border-radius: 4px;
    font-weight: 600;
}
</style>

<h2 class="viz-header">Data Preparation</h2>

<p class="viz-description">Prepares dataframes for display in an interactive histogram that uses Altair to visualize a frequency distribution of the number of cell lines in bins of damaging mutation ranges.</p>

<div class="features-box">
<h3 class="features-title">Chart Features:</h3>
<ul class="features-list">
<li><span class="feature-highlight">Slider widget:</span> Dynamic slider allows the viewer to interactively select the number of cell line bins</li>
<li><span class="feature-highlight">Tool tips:</span> Hovering mouse cursor over bin shows bin details</li>
<li><span class="feature-highlight">Drop down menu:</span> Allows the viewer to select a cancer tissue type to display, or 'ALL' to display all tissues</li>
<li><span class="feature-highlight">Radio buttons:</span> Allows the viewer to select the number of mutation values to display, or 'ALL' for all values</li>
</ul>
</div>

In [30]:
# Simple tissue dropdown - just ALL and LUNG

# Create tissue dropdown parameter
tissue_dropdown = alt.param(
    name='tissue_type',
    value='ALL',
    bind=alt.binding_select(
        options=['ALL', 'LUNG'],
        name='Tissue: '
    )
)

# Create mutation value filter (radio buttons)
mutation_filter = alt.param(
    name='mutation_category',
    value='all',
    bind=alt.binding_radio(
        options=['all', 'exactly_1', '2_or_more'],
        labels=['All mutations (≥1)', 'Exactly 1 mutation', '2 or more mutations'],
        name='Mutation Value: '
    )
)

df = df_all.copy()
cell_line = 'cell_line'
mutation_count = 'sum'   

# Configure slider parameters
bin_param = alt.param(
    name='num_bins',  # Same name as working version
    value=20, # Initial value
    bind=alt.binding_range(min=10, max=30, step=1, name='Number of Bins: ') # name = Slider label
)

# Prepare lung-specific data 
# Create combined dataset for LUNG tissues
df_lung_combined = df_lung_all.copy()
df_lung_combined['mutation_category'] = 'all'
df_lung_combined['tissue_filter'] = 'LUNG'

df_lung_2_labeled = df_lung_2.copy()
df_lung_2_labeled['mutation_category'] = '2_or_more'
df_lung_2_labeled['tissue_filter'] = 'LUNG'

df_lung_1_labeled = df_lung_1.copy()
df_lung_1_labeled['mutation_category'] = 'exactly_1'
df_lung_1_labeled['tissue_filter'] = 'LUNG'

# Add tissue_filter to existing ALL tissues data
all_tissues_data['tissue_filter'] = 'ALL'

# Combine lung data
lung_tissues_data = pd.concat([df_lung_combined, df_lung_2_labeled, df_lung_1_labeled], ignore_index=True)

# Combine ALL and LUNG datasets
all_and_lung_data = pd.concat([all_tissues_data, lung_tissues_data], ignore_index=True)

# Update binning loop to include tissue_filter
all_binned_data_simple = []

for mutation_cat in ['all', 'exactly_1', '2_or_more']:
    for tissue in ['ALL', 'LUNG']:
        # Filter data for this combination
        df_filtered = all_and_lung_data[
            (all_and_lung_data['mutation_category'] == mutation_cat) & 
            (all_and_lung_data['tissue_filter'] == tissue)
        ].copy()
        
        if df_filtered.empty:
            continue
            
        for n_bins in range(10, 31):
            df_temp = df_filtered.copy()
            df_temp['num_bins'] = n_bins
            
            # Create bins 
            bin_edges = np.linspace(df_filtered['sum'].min(), df_filtered['sum'].max(), n_bins + 1)
            df_temp['bin_assignment'] = pd.cut(df_temp['sum'], bins=bin_edges, include_lowest=True)
            
            binned = df_temp.groupby(['bin_assignment', 'num_bins', 'mutation_category', 'tissue_filter']).agg({
                'sum': 'sum',
                'cell_line': 'count'
            }).reset_index()
            
            binned.columns = ['bin_assignment', 'num_bins', 'mutation_category', 'tissue_filter', 'sum_value', 'count']
            
            # Calculate bin positions
            binned['bin_left'] = pd.to_numeric(binned['bin_assignment'].apply(lambda x: x.left))
            binned['bin_right'] = pd.to_numeric(binned['bin_assignment'].apply(lambda x: x.right))
            binned['bin_center'] = (binned['bin_left'] + binned['bin_right']) / 2
            
            binned = binned.drop('bin_assignment', axis=1)
            all_binned_data_simple.append(binned)

# Combine all data
combined_simple = pd.concat(all_binned_data_simple, ignore_index=True)
combined_simple = combined_simple[combined_simple['count'] > 0].reset_index(drop=True)

<style>
.viz-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
    background: #f8fafc;
    padding-top: 10px;
    padding-bottom: 10px;
}
.viz-description {
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
    margin-bottom: 20px;
}
.features-box {
    background: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
}
.features-title {
    color: #374151;
    font-size: 1.3rem;
    font-weight: 600;
    margin-top: 0;
    margin-bottom: 15px;
}
.features-list {
    color: #475569;
    font-size: 1rem;
    margin: 0;
}
.features-list li {
    margin-bottom: 12px;
    font-weight: 500;
}
.feature-highlight {
    background-color: #f8fafc;
    color: #374151;
    padding: 2px 6px;
    border-radius: 4px;
    font-weight: 600;
}
</style>

<h2 class="viz-header">Data Visualization</h2>

<p class="viz-description">Creates and displays an interactive histogram that uses Altair to visualize a frequency distribution of the number of cell lines in bins of damaging mutation ranges..</p>

<div class="features-box">
<h3 class="features-title">Chart Features:</h3>
<ul class="features-list">
<li><span class="feature-highlight">Slider widget:</span> Dynamic slider allows the viewer to interactively select the number of cell line bins</li>
<li><span class="feature-highlight">Tool tips:</span> Hovering mouse cursor over bin shows bin details</li>
<li><span class="feature-highlight">Drop down menu:</span> Allows the viewer to select a cancer tissue type to display, or 'ALL' to display all tissues</li>
<li><span class="feature-highlight">Radio buttons:</span> Allows the viewer to select the number of mutation values to display, or 'ALL' for all values</li>
</ul>
</div>

In [31]:
# Create chart
chart = alt.Chart(combined_simple).add_params(    
    tissue_dropdown,
    mutation_filter,
    bin_param  
).transform_filter(
    'datum.count > 0'
).transform_filter(
    'datum.tissue_filter == tissue_type'  
).transform_filter(
    'datum.mutation_category == mutation_category'
).transform_filter(
    'datum.num_bins == num_bins'
).mark_bar(
    color='steelblue',
    stroke='white',
    strokeWidth=0.1
).encode(
    alt.X('bin_center:Q',
          title='Damaging Mutation Sum'),
    alt.Y('count:Q',
          title='Number of Cell Lines',
          scale=alt.Scale(type='symlog'))
          # scale=alt.Scale(type='symlog', domain=[0.1, combined_simple['count'].max() * 1.1]))
).properties(
    width=600,
    height=400,
    title='Damaging Mutation Sums'
)

chart.show()

<style>
.filtering-header {
    color: #1d4ed8 !important;
    font-size: 2rem;
    font-weight: 600;
    border-left: 5px solid #3b82f6;
    padding-left: 15px;
    margin-top: 30px;
    margin-bottom: 15px;
    background: #f8fafc;
    padding-top: 10px;
    padding-bottom: 10px;
}
.filtering-explanation {
    background: #f8fafc;
    border: 1px solid #e2e8f0;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
    color: #374151;
    font-size: 1.05rem;
    line-height: 1.7;
}
.highlight-number {
    background-color: #f8fafc;
    color: #475569;
    padding: 3px 8px;
    border-radius: 4px;
    font-weight: 700;
    font-size: 1.1rem;
}
</style>

<h2 class="filtering-header">Chart storage</h2>

<div class="filtering-explanation">
<p>Save charts to files in HTML and PDF formats and create a link to a shareable URL.</p>
</div>

In [32]:
# Save chart to HTML, PDF, and shareable URL. 
chart.save('Damaging Mutation Sums.html')
chart = chart.configure(background='white', padding=30).configure_view(fill='white')
chart.save('Damaging Mutation Sums.pdf')
chart.to_url() # Creates a shareable URL

'https://vega.github.io/editor/#/url/vega-lite/N4Igxg9gdgZglgcxALlANzgUwO4tJKAFzigFcJSBnAdTgBNCALFAZgAY2AacaYsiygAlMiRoVYdu8ADbSUIbIziFMIAL7cARgEMwAawQAnClDrzFy1dwAO2unRJJk7DSDrbC2vCCjaAtqrIbh7aALTamABsmJoAHADsMPEATGwwAIzx8WCRkbHaAKwALGAAnHFxBWx0sercftqGet6EAJ7WgSA6hiDckNIQPUGUKpjSmtKkViAjxnqdFiq9M4RzmLQMzMhsAHTprphQkA5QTqAAHt7wY2ZBmiQA+mCHKj3cxITSnQAi-toIjgABABZUieYjQQEAZVIfmWbQ68gAjqRtERlB44GhVK5WlcsNJbjxSERlpQwNovi12p1KK0-AMkK4PlSggA5WGaTCGQEQGCAgDCY2kgIAMiRMJR4TTkaj0eCsTjXIwRAgxCgipIQLZDP4pcgANqgXwBeTESiUKYPBHTe6mbwkaxg+SUMaYMDibgQawQqD6g0gACCotFy1FAFU2QBxEAAXW4Js6ABU4BapshAXUQGhKVN5MHQxpjf5On4wZjoE8PJgEIM8VoSLdQI7nUFdQ4IMtvb7-SBKXJuJhzrpPq0HullskHoMHn5Bqp4yBpNoudJe4HZICywroJRAQAKQCmROkAJTLACiw490lagPSW-LvsnvJ5c8MmAfO79cYTJfkoK-QEADVc0wDMsxzSZOn7Opi1NIIyD8B47SlBt7VQEAW3ENs0QQaYGkuZwuBAPwSBQdJiMTeQOT8LkeT5QEACESEocDuBGTBrHI1xILzZBUjURcWR+P4AVOEFHzgSEYT8VCQFWNFKBgQY4UNUAZFeeR3EIWEdkgElCEBAA+QE2FgkANO5LSPF081LUwB5LJ5ABeZzATsq0bXMpzrJ0vwdm3CsoCrFRa0MW9XM-IKQprOtvLgaRNKCbTdMQ5CWMBSK0pQ9

<style>
.results-header {
    color: #2563eb !important;
    font-size: 2.2rem;
    font-weight: 700;
    text-align: center;
    border: 3px solid #dbeafe;
    background: #f8fafc;
    padding: 20px;
    border-radius: 15px;
    margin: 30px 0 25px 0;
    text-shadow: 1px 1px 3px rgba(0,0,0,0.1);
}
.summary-intro {
    background: #f8fafc;
    color: black;
    padding: 25px;
    border-radius: 12px;
    margin: 25px 0;
    box-shadow: 0 8px 25px rgba(0,0,0,0.15);
}
.summary-text {
    font-size: 1.1rem;
    line-height: 1.7;
    margin: 0;
}
.highlight-stat {
    background-color: #f8fafc;
    color: #4338ca;
    padding: 4px 10px;
    border-radius: 6px;
    font-weight: 700;
    font-size: 1.15rem;
}
.key-findings {
    background-color: #f8fafc;
    border: 2px solid #dbeafe;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
}
.findings-title {
    color: #4338ca !important;
    font-size: 1.4rem;
    font-weight: 600;
    margin-top: 0;
    margin-bottom: 15px;
    border-bottom: 2px solid #dbeafe;
    padding-bottom: 8px;
}
.findings-list {
    color: #1e3a8a;
    font-size: 1rem;
    margin: 0;
}
.findings-list li {
    margin-bottom: 10px;
    padding-left: 8px;
    border-left: 3px solid #60a5fa;
    padding-top: 5px;
    padding-bottom: 5px;
    background-color: #f8fafc;
    margin-left: 0;
    padding-left: 12px;
}
.data-distribution {
    background: #f8fafc;
    border: 2px solid #dbeafe;
    border-radius: 10px;
    padding: 20px;
    margin: 20px 0;
}
.distribution-title {
    color: #4338ca !important;
    font-size: 1.4rem;
    font-weight: 600;
    margin-top: 0;
    margin-bottom: 15px;
}
.distribution-list {
    color: #1e3a8a;
    font-size: 1.1rem;
    font-weight: 500;
    margin: 0;
}
.distribution-list li {
    background-color: #f8fafc;
    margin-bottom: 8px;
    padding: 10px 15px;
    border-radius: 6px;
    border-left: 4px solid #60a5fa;
}
.conclusion-box {
    background: #f8fafc;
    border: 2px solid #dbeafe;
    border-radius: 12px;
    padding: 25px;
    margin: 25px 0;
    color: #1e3a8a;
    font-size: 1.05rem;
    line-height: 1.8;
    font-style: italic;
}
.emphasis-text {
    background-color: #f8fafc;
    color: #black;
    padding: 3px 8px;
    border-radius: 4px;
    font-weight: 600;
    font-style: normal;
}
</style>

<h2 class="results-header">Results Summary</h2>

<div class="summary-intro">
<p class="summary-text">This program counts the cell lines in ranges of damaging mutation sums and displays the results in the form of a histogram. The viewer can interactively filter the displayed data by tissue type, mutation value, and number of histogram bins. The program forms a flexible foundation for future analysis as filters can easily be added or modified.</p>
</div>

