# A Step-by-Step tutorial for usage of ISMGCC


This is a tutorial about using the Python implementation of ISMGCC method to find molecular gas structures. 
The input of ISMGCC is a table containing rows of Gaussian components (GCs), which could be created from observational PPV data cubes with Gaussian decomposition methods, e.g., [GaussPy+](https://github.com/mriener/gausspyplus).
For now, we will concentrate on the ISMGCC. 
Therefore, a ready-made [table](https://github.com/Haoran-Feng/ismgcc/blob/main/example/demo-data/GL14_fit_fin_sf-p2_finalize.csv) is provided.

## Table of content

1. Prepare the environment
2. <del>Decompose the input data cube with gausspy+</del> (we might provide a simple example in the future)
3. Use ISMGCC to find structures
4. Run a parameter grid of ISMGCC

## Prepare the environment 

It is recommended to use an isolated conda environment for ISMGCC, with the following commands

```bash
conda create -n ismgcc-demo python=3.9
conda activate ismgcc-demo
python -m pip install ismgcc
python -m pip install ipykernel  # if you're running the code in Jupyter 
```
where `ismgcc-demo` is the name of your new environment, which can be anything you like. 


## Use ISMGCC to find structures


In [2]:
# import everything we need
from ismgcc import DecomposedPPVStructureFinder, PostProcess
import pandas as pd
from spectral_cube import SpectralCube
from astropy import units as u

In [4]:
# set the parameters of ISMGCC
bwc = 0.5
r = 3
snrth0 = 0
snrth1 = 5
db = 0.5
resolution = 0.01
n_process = 4

param_dict = {
    "bandwidth_coef": bwc, 
    "r": r, 
    "snr_th0": snrth0, 
    "snr_th1": snrth1, 
    "decision_boundary": db, 
    "community_resolution": resolution,
    "spatial_distance_threshold": 3.0,
}

# load the input table
input_file = "./demo-data/GL14_fit_fin_sf-p2_finalize.csv"
df = pd.read_csv(input_file)

# initialize the finder and run
finder = DecomposedPPVStructureFinder(df, params=param_dict, n_jobs=n_process)
output_suffix = finder.get_output_suffix()
outdf = finder.find_structures()

# save the output as a csv file
output = f"./demo-results/GL14-{output_suffix}.csv"
outdf.to_csv(output, index=False)

INFO: Start _prepare_kdtree_and_adj_table at Tue Sep  3 14:30:11 2024 [ismgcc.finder]
INFO: No lock file: .cache/ad73f_spatial_distance_threshold3.00_kdtree_and_adj_table.pickle.lock [ismgcc.finder]
INFO: Read cached kdtree_and_adj_table file: .cache/ad73f_spatial_distance_threshold3.00_kdtree_and_adj_table.pickle [ismgcc.finder]
INFO: Done at Tue Sep  3 14:30:12 2024, elapse time: 0.95 seconds [ismgcc.finder]
INFO: Start _prepare_v_cluster_table at Tue Sep  3 14:30:12 2024 [ismgcc.finder]
INFO: No lock file: .cache/ad73f_bandwidth_coef0.50_r3.00_vcluster_table.arrow.lock [ismgcc.finder]
INFO: Read cached vcluster_table file: .cache/ad73f_bandwidth_coef0.50_r3.00_vcluster_table.arrow [ismgcc.finder]
INFO: Done at Tue Sep  3 14:30:12 2024, elapse time: 0.10 seconds [ismgcc.finder]
INFO: Start _prepare_v_graph at Tue Sep  3 14:30:12 2024 [ismgcc.finder]
INFO: No lock file: .cache/ad73f_bandwidth_coef0.50_r3.00_v_graph.gpickle.lock [ismgcc.finder]
INFO: Read cached v_graph file: .cache/ad

Finding communities...: 100%|██████████| 18/18 [03:45<00:00, 12.53s/it]


INFO: The 18 complex connected components are splitted into 162 communities. [ismgcc.finder]
INFO: Total number of structures: 4566, Number of structures with at least 16 pixels: 300. [ismgcc.finder]


In [7]:
# The input table looks like this, each row is a Gaussian component with its amp, VLSR, vel_disp, int_tot ....
df

Unnamed: 0,x_pos,y_pos,GLON,GLAT,amp,e_amp,VLSR,e_VLSR,vel_disp,e_vel_disp,...,pvalue,aicc,rchi2,ncomps,ncomp_wmedian,ncomp_jumps,flag_blended,flag_neg_res_peak,flag_broad,flag_centroid
0,0,0,14.5000,-0.5,2.5701,0.0652,20.6539,0.0922,3.1447,0.0923,...,0.1656,-353.8994,1.3034,3,5,1,0,0,0,0
1,0,0,14.5000,-0.5,1.4462,0.0720,36.0404,0.1483,2.5791,0.1485,...,0.1656,-353.8994,1.3034,3,5,1,0,0,0,0
2,0,0,14.5000,-0.5,0.8765,0.1576,58.9923,0.1118,0.5377,0.1118,...,0.1656,-353.8994,1.3034,3,5,1,0,0,0,0
3,1,0,14.4917,-0.5,1.3144,0.1294,35.0283,0.0922,0.8585,0.1062,...,0.0795,-334.2233,1.4726,4,5,0,0,0,0,0
4,1,0,14.4917,-0.5,0.6031,0.0637,38.1806,0.5983,4.8465,0.4989,...,0.0795,-334.2233,1.4726,4,5,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65495,116,120,13.5333,0.5,1.0009,0.1027,26.3488,0.0977,0.8241,0.0977,...,0.0016,-312.3063,1.1757,1,1,0,0,0,0,0
65496,117,120,13.5250,0.5,1.6836,0.1107,26.2623,0.0512,0.6738,0.0512,...,0.0000,-378.8249,1.0100,1,1,0,0,0,0,0
65497,118,120,13.5167,0.5,1.0048,0.1042,25.9628,0.0926,0.7731,0.0926,...,0.0143,-298.7433,1.4393,1,1,0,0,0,0,0
65498,119,120,13.5083,0.5,0.6756,0.0760,25.4692,0.1966,1.5118,0.1966,...,0.0569,-362.5776,0.8458,1,1,0,0,0,0,0


In [8]:
# The output table is
outdf

Unnamed: 0,GLON,GLAT,VLSR,index,serial_id,shuffle_id,x_pos,y_pos,amp,e_amp,...,flag_broad,flag_centroid,cluster_id0,multi_vclusters_of_cluster_id0,cluster_id1,uid,multi_vclusters_of_serial_id,vc_count,pix_count,has16pix
0,14.5000,-0.5,20.6539,0,16,37,0,0,2.5701,0.0652,...,0,0,1,True,8,53.0,False,875,750,True
1,14.5000,-0.5,36.0404,1,74,2967,0,0,1.4462,0.0720,...,0,0,1,True,37,778.0,False,132,117,True
2,14.5000,-0.5,58.9923,2,1,2341,0,0,0.8765,0.1576,...,0,0,2,False,0,3.0,False,5,5,False
3,14.4917,-0.5,35.0283,3,2,190,1,0,1.3144,0.1294,...,0,0,1,True,1,4.0,False,14493,7925,True
4,14.4917,-0.5,38.1806,4,74,2967,1,0,0.6031,0.0637,...,0,0,1,True,37,778.0,False,132,117,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65495,13.5333,0.5,26.3488,65495,3319,2312,116,120,1.0009,0.1027,...,0,0,3183,False,0,5067336.0,False,293,290,True
65496,13.5250,0.5,26.2623,65496,3319,2312,117,120,1.6836,0.1107,...,0,0,3183,False,0,5067336.0,False,293,290,True
65497,13.5167,0.5,25.9628,65497,3319,2312,118,120,1.0048,0.1042,...,0,0,3183,False,0,5067336.0,False,293,290,True
65498,13.5083,0.5,25.4692,65498,3319,2312,119,120,0.6756,0.0760,...,0,0,3183,False,0,5067336.0,False,293,290,True


The output is a table with the same row number as the input table, but extra columns are added. 
The extra columns include:
- `serial_id`:  ID of the structure that  
- `shuffle_id`
- `index`
- `cluster_id0`
- `cluster_id1`
- `uid`
- `multi_vclusters_of_serial_id`
- `multi_vclusters_of_cluster_id0`
- `vc_count`
- `pix_count`
- `has16pix`

In [9]:
# do the post process