# Baysor Cell Segmentation

This notebook provides a workflow for running Baysor cell segmentation on spatial transcriptomics data.

## 1. Load Packages

In [2]:
import Pkg; Pkg.add("DataFrames")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.10/Project.toml`
[32mâŒƒ[39m [90m[a93c6f00] [39m[92m+ DataFrames v1.7.1[39m
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`


In [1]:
using Baysor
using DataFrames
using CSV

LoadError: ArgumentError: Package DataFrames not found in current path.
- Run `import Pkg; Pkg.add("DataFrames")` to install the DataFrames package.

## 2. Configuration

Set your input/output paths and parameters here.

In [None]:
# Input file path - CSV with columns: gene, x, y (and optionally z for 3D)
input_file = "path/to/your/transcripts.csv"

# Output directory
output_dir = "./output"

# Create output directory if it doesn't exist
mkpath(output_dir)

## 3. Preview Data (Optional)

Before running segmentation, you can preview your data to check the scale and distribution.

In [None]:
# Load and inspect your data
# df = CSV.read(input_file, DataFrame)
# first(df, 5)

## 4. Run Baysor Segmentation

### Option A: Using command-line interface from notebook

This is the simplest way to run Baysor with full control over parameters.

In [None]:
# Example: Run Baysor segmentation via CLI
# Uncomment and modify the parameters as needed

#=
run(`julia -e '
    using Baysor
    Baysor.command_main()
' -- run 
    -x x 
    -y y 
    -g gene 
    -s 30 
    -o $(output_dir)/segmentation 
    $(input_file)
`)
=#

### Option B: Run directly using Baysor.command_main()

You can also call Baysor's main function directly with arguments.

In [None]:
# Run Baysor with command_main
# Modify these arguments according to your data

#=
args = [
    "run",
    input_file,
    "-x", "x",           # x coordinate column name
    "-y", "y",           # y coordinate column name  
    "-g", "gene",        # gene column name
    "-s", "30",          # scale (expected cell radius in coordinate units)
    "-o", joinpath(output_dir, "segmentation")  # output prefix
]

Baysor.command_main(args)
=#

## 5. Key Parameters

Important Baysor parameters to consider:

| Parameter | Description |
|-----------|-------------|
| `-s, --scale` | Expected cell radius in coordinate units (REQUIRED) |
| `-x, -y, -z` | Column names for coordinates |
| `-g, --gene` | Column name for gene names |
| `--prior-segmentation` | Path to prior segmentation (e.g., from DAPI staining) |
| `--n-clusters` | Number of clusters for neighborhood composition |
| `--min-molecules-per-cell` | Minimum molecules for a valid cell |

## 6. Load and Inspect Results

In [None]:
# After running segmentation, load the results
# results = CSV.read(joinpath(output_dir, "segmentation_segmentation.csv"), DataFrame)
# first(results, 10)

In [None]:
# Check cell statistics
# cell_stats = CSV.read(joinpath(output_dir, "segmentation_cell_stats.csv"), DataFrame)
# println("Number of cells: ", nrow(cell_stats))
# first(cell_stats, 10)