## Notebook demonstrating the addition of data segmented with [proseg](https://github.com/dcjones/proseg)

In [8]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

In [1]:
from pathlib import Path
from insitupy import InSituData, CACHE

## Load data

In [2]:
insitupy_project = Path(CACHE / "out/demo_insitupy_project")
xd = InSituData.read(insitupy_project)
xd.load_all()

In [3]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\Anna Chernysheva\.cache\InSituPy\out\demo_insitupy_project
[1mMetadata file:[0m	.ispy
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mMultiCellData with main layer[0m 'main'
           [1mmatrix[0m
               AnnData object with n_obs × n_vars = 157600 × 297
               obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'n_genes_by_counts', 'n_genes', 'leiden', 'cell_type_dc'
               var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells'
               uns: 'cell_type_dc_colors', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'um

## Select small region for demonstration

In [4]:
xdcrop = xd.crop(xlim=(2700,3000), ylim=(2700,3000))

## Export transcripts for proseg

In [5]:
transcripts_out_path = Path(CACHE / "out/transcripts_for_proseg.csv")
transcripts_out_path.parent.mkdir(exist_ok=True)

In [6]:
# export transcripts as csv
xdcrop.transcripts.to_csv(transcripts_out_path, single_file=True)

['C:\\Users\\Anna Chernysheva\\.cache\\InSituPy\\out\\transcripts_for_proseg.csv']

## Install proseg

For installation checkout the installation instructions in the [proseg Github repository](https://github.com/dcjones/proseg?tab=readme-ov-file#installing). In brief, proseg is a [Rust](https://www.rust-lang.org/) package and can be installed using:

```Bash
cargo install proseg
```

## Run proseg

In [7]:
output_path = transcripts_out_path.parent / "proseg_results"
output_path.mkdir(exist_ok=True)

In [9]:
import subprocess

# Start the process
process = subprocess.Popen([
    'proseg',
    '--xenium', str(transcripts_out_path),
    '--output-path', str(output_path)
    ], stdout=subprocess.PIPE)

# Continuously read the output
while True:
    output = process.stdout.readline()
    if output == b'' and process.poll() is not None:
        break
    if output:
        print(output.decode('utf-8', errors='replace').strip())

Using 16 threads
Read 144377 transcripts
590 cells
471 genes
Estimated full area: 90017.61
Full volume: 549952.6
Using grid size 123.504875. Chunks: 9


## Alternative approach: running Proseg in the terminal 

If the previous cell did not execute successfully (e.g., due to spaces in your file path), you can run Proseg directly from the terminal.

Before proceeding, ensure that you have the correct paths for `transcripts_out_path` and `output_path`, then replace the placeholders in the command below: 

```Bash
proseg --xenium /path/to/transcripts_out_path --output-path /path/to/output_path
```

After successfully running the command in the command line, please continue with this tutorial.

## Add proseg results to `InSituData`

In [8]:
xdcrop.cells.add_proseg(path=output_path)
xdcrop.cells.add_proseg(path=output_path, key="test") # add the data a second time with another key

In [9]:
cropped_out = CACHE / "out/cropped"
xdcrop.saveas(cropped_out)

Saving data to C:\Users\ge37voy\.cache\InSituPy\out\cropped
Saved.


## Store and visualize data

In [10]:
xdr = InSituData.read(cropped_out)
xdr.load_all()

In [None]:
# visualize data
xdr.show()

Layer 'proseg-boundaries-cells' already in layer list.
