# cCRE Scatter Plot Example Usage

This notebook demonstrates how to use the `scatterplot` function to create interactive scatter plots with JScatter.

In [3]:
import numpy as np
import polars as pl

from scatterplot import kde, scatterplot

## Create Sample Data

Let's create some sample datasets to demonstrate the functionality.

In [4]:
# Create sample cCRE identifiers
n_points = 10000
ccre_ids = [f"cCRE_{i:04d}" for i in range(n_points)]

# Dataset A (for X-axis) - could be gene expression data
# Mix positive and negative values to make the diagonal line more meaningful
x = pl.DataFrame(
    {
        "cCRE": ccre_ids,
        "expression_level": np.random.normal(
            loc=5, scale=15, size=n_points
        ),  # Normal distribution with negatives
        "tissue_type": np.random.choice(
            ["brain", "liver", "heart", "lung"], size=n_points
        ),
    }
)

# Dataset B (for Y-axis) - could be chromatin accessibility data
# Also allow negative values to create more interesting scatter patterns
y = pl.DataFrame(
    {
        "cCRE": ccre_ids,
        "accessibility_score": np.random.normal(
            loc=10, scale=20, size=n_points
        ),  # Normal distribution with negatives
        "cell_line": np.random.choice(
            ["K562", "HeLa", "HEK293", "MCF7"], size=n_points
        ),
    }
)

# Metadata describing the cCREs
chromosomes = [f"chr{i}" for i in range(1, 23)] + ["chrX", "chrY"]
starts = np.random.randint(1000000, 200000000, size=n_points)
ends = starts + np.random.randint(200, 2000, size=n_points)

metadata = pl.DataFrame(
    {
        "rDHS": [f"rDHS_{i:06d}" for i in range(n_points)],
        "cCRE": ccre_ids,
        "chr": np.random.choice(chromosomes, size=n_points),
        "start": starts,
        "end": ends,
        "class": np.random.choice(
            ["promoter", "enhancer", "insulator", "silencer", "CTCF-only"],
            size=n_points,
        ),
    }
)

print(f"Created datasets with {n_points} cCREs")
print(f"Dataset X shape: {x.shape}")
print(f"Dataset Y shape: {y.shape}")
print(f"Metadata shape: {metadata.shape}")
print(f"Unique classes: {sorted(metadata['class'].unique().to_list())}")
print(
    f"Expression level range: [{x['expression_level'].min():.2f}, {x['expression_level'].max():.2f}]"
)
print(
    f"Accessibility score range: [{y['accessibility_score'].min():.2f}, {y['accessibility_score'].max():.2f}]"
)

Created datasets with 10000 cCREs
Dataset X shape: (10000, 3)
Dataset Y shape: (10000, 3)
Metadata shape: (10000, 6)
Unique classes: ['CTCF-only', 'enhancer', 'insulator', 'promoter', 'silencer']
Expression level range: [-54.71, 59.86]
Accessibility score range: [-62.25, 84.33]


## Create Interactive Scatter Plot

Now let's create the interactive scatter plot. The function will automatically create an interpolated colormap based on the number of unique classes and include a diagonal reference line.

In [5]:
# Create the scatter plot with class-based coloring
result = scatterplot(
    x=x,
    y=y,
    metadata=metadata,
    join_column="cCRE",
    category_column="class",  # Specify the category column
    x_label="Accessibility Score",
    y_label="Expression Level",
    title="cCRE Expression vs Accessibility",
    default_category="All",  # Show all classes initially
)

Data ranges: x=[-54.71, 59.86], y=[-62.25, 84.33]
Shared axis range: [-69.58, 91.65]


VBox(children=(HTML(value='<h3>cCRE Expression vs Accessibility</h3>'), HBox(children=(Dropdown(description='C…

## Accessing the Results

The function returns a ScatterplotResult named tuple with several useful objects:

In [6]:
# Access the individual components
scatter_plot = result.scatter
merged_data = result.merged_data
container = result.container
selection_func = result.selection
class_dropdown = result.class_dropdown

print(f"Merged data shape: {merged_data.shape}")
print(f"Available columns: {list(merged_data.columns)}")
print(f"Current selection: {len(selection_func())} points")

# You can change the class filter programmatically
class_dropdown.value = "enhancer"

Merged data shape: (10000, 10)
Available columns: ['cCRE', 'expression_level', 'tissue_type', 'accessibility_score', 'cell_line', 'rDHS', 'chr', 'start', 'end', 'class']
Current selection: 0 points
Filtering by class: enhancer
Data ranges: x=[-44.70, 52.81], y=[-59.46, 76.43]
Shared axis range: [-66.26, 83.22]
Updated plot in-place for class: enhancer


## Using Density-Based Coloring

You can also use density-based coloring instead of categorical coloring:

> NOTE: Density computation is slow, use `radius()` for large datasets.

In [7]:
# Create plot with KDE-based density coloring
result_kde = scatterplot(
    x=x,
    y=y,
    metadata=metadata,
    join_column="cCRE",
    category_column="class",
    x_label="Accessibility Score",
    y_label="Expression Level",
    title="cCRE Expression vs Accessibility (Density Colored)",
    colormap=kde(bandwidth=10.0),  # Use KDE with custom bandwidth
)

Data ranges: x=[-54.71, 59.86], y=[-62.25, 84.33]
Shared axis range: [-69.58, 91.65]


VBox(children=(HTML(value='<h3>cCRE Expression vs Accessibility (Density Colored)</h3>'), HBox(children=(Dropd…

## Using Different Category Columns

You can use any categorical column in your metadata for coloring:

In [8]:
# Add a custom category column to metadata
metadata = metadata.with_columns(
    [
        pl.when(pl.col("class").is_in(["promoter", "enhancer"]))
        .then(pl.lit("active"))
        .when(pl.col("class").is_in(["silencer", "insulator"]))
        .then(pl.lit("regulatory"))
        .otherwise(pl.lit("other"))
        .alias("regulatory_type")
    ]
)

# Create plot using the new category column
result_custom = scatterplot(
    x=x,
    y=y,
    metadata=metadata,
    join_column="cCRE",
    category_column="regulatory_type",  # Use custom category column
    x_label="Accessibility Score",
    y_label="Expression Level",
    title="cCRE Expression vs Accessibility (by Regulatory Type)",
    default_category="All",
)

Data ranges: x=[-54.71, 59.86], y=[-62.25, 84.33]
Shared axis range: [-69.58, 91.65]


VBox(children=(HTML(value='<h3>cCRE Expression vs Accessibility (by Regulatory Type)</h3>'), HBox(children=(Dr…