# cCRE Scatter Plot Example Usage

This notebook demonstrates how to use the `create_ccre_scatterplot` function to create interactive scatter plots with JScatter.

In [None]:
import pandas as pd
import numpy as np
from ccre_scatter import plot_ccres

## Create Sample Data

Let's create some sample datasets to demonstrate the functionality.

In [2]:
# Create sample cCRE identifiers
n_points = 1000
ccre_ids = [f"cCRE_{i:04d}" for i in range(n_points)]

# Dataset A (for Y-axis) - could be gene expression data
x = pd.DataFrame(
    {
        "cCRE": ccre_ids,
        "expression_level": np.random.lognormal(mean=2, sigma=1, size=n_points),
        "tissue_type": np.random.choice(
            ["brain", "liver", "heart", "lung"], size=n_points
        ),
    }
)

# Dataset B (for X-axis) - could be chromatin accessibility data
y = pd.DataFrame(
    {
        "cCRE": ccre_ids,
        "accessibility_score": np.random.beta(a=2, b=5, size=n_points) * 100,
        "cell_line": np.random.choice(
            ["K562", "HeLa", "HEK293", "MCF7"], size=n_points
        ),
    }
)

# Metadata describing the cCREs
chromosomes = [f"chr{i}" for i in range(1, 23)] + ["chrX", "chrY"]
metadata = pd.DataFrame(
    {
        "rDHS": [f"rDHS_{i:06d}" for i in range(n_points)],
        "cCRE": ccre_ids,
        "chrom": np.random.choice(chromosomes, size=n_points),
        "start": np.random.randint(1000000, 200000000, size=n_points),
        "end": lambda df: df["start"] + np.random.randint(200, 2000, size=n_points),
        "class": np.random.choice(
            ["promoter", "enhancer", "insulator", "silencer"], size=n_points
        ),
    }
)
metadata["end"] = metadata["start"] + np.random.randint(200, 2000, size=n_points)

print(f"Created datasets with {n_points} cCREs")
print(f"Dataset A shape: {x.shape}")
print(f"Dataset B shape: {y.shape}")
print(f"Metadata shape: {metadata.shape}")

Created datasets with 1000 cCREs
Dataset A shape: (1000, 3)
Dataset B shape: (1000, 3)
Metadata shape: (1000, 6)


## Create Interactive Scatter Plot

Now let's create the interactive scatter plot with the metadata table.

In [None]:
# Create the scatter plot
result = plot_ccres(
    x=x,
    y=y,
    metadata=metadata,
    join_column="cCRE",  # This is the default
    x_name="Gene Expression",
    y_name="Chromatin Accessibility",
    x_label="Accessibility Score",
    y_label="Expression Level",
)

Data ranges: x=[1.18, 85.69], y=[0.40, 161.80]
Shared axis range: [-7.67, 169.87]
Connecting to scatter.widget.selection trait


VBox(children=(HTML(value='<h3>cCRE Scatter Plot: Gene Expression vs Chromatin Accessibility</h3>'), HBox(chil…

## Accessing the Results

The function returns a dictionary with several useful objects:

In [4]:
# Access the individual components
scatter_plot = result["scatter"]
metadata_table = result["metadata_table"]
merged_data = result["merged_data"]
container = result["container"]

print(f"Merged data shape: {merged_data.shape}")
print(f"Available columns: {list(merged_data.columns)}")

Merged data shape: (1000, 10)
Available columns: ['cCRE', 'expression_level', 'tissue_type', 'accessibility_score', 'cell_line', 'rDHS', 'chrom', 'start', 'end', 'class']


## Manual Selection Update

You can also manually update the metadata table to show specific points:

In [7]:
# Example: Show metadata for the first 10 points
update_callback = result["update_metadata_callback"]
update_callback(selected_indices=list(range(100)))

## Using Different Join Columns

You can also use a different column for joining the datasets:

In [None]:
# Create datasets with a different join column
x_alt = x.copy()
y_alt = y.copy()
metadata_alt = metadata.copy()

# Add a custom join column
custom_ids = [f"custom_{i}" for i in range(n_points)]
x_alt["custom_id"] = custom_ids
y_alt["custom_id"] = custom_ids
metadata_alt["custom_id"] = custom_ids

# Create plot with custom join column
result_custom = plot_ccres(
    x=x_alt,
    y=y_alt,
    metadata=metadata_alt,
    join_column="custom_id",  # Custom join column
    x_name="Expression (Custom)",
    y_name="Accessibility (Custom)",
)

Data ranges: x=[1.18, 85.69], y=[0.40, 161.80]
Shared axis range: [-7.67, 169.87]
Connecting to scatter.widget.selection trait


VBox(children=(HTML(value='<h3>cCRE Scatter Plot: Expression (Custom) vs Accessibility (Custom)</h3>'), HBox(c…