# SimiC Preprocessing Pipeline Tutorial

>*Author: Irene Marín-Goñi, PhD student - ML4BM group (CIMA University of Navarra)*

This notebook demonstrates how to preprocess single-cell RNA-seq data for SimiC analysis.

## Overview
This preprocessing tutorial covers:
1. Package installation and setup
2. MAGIC imputation pipeline
3. Gene selection and experiment setup
4. Preparing input files for SimiC

For running SimiC analysis see `Tutorial_SimiCPipeline_simple.ipynb`

## Introduction
Before running SimiC, you need to:
- Impute your scRNA-seq data. We recommend to use [MAGIC](https://pypi.org/project/magic-impute/3.0.0/) and include a wrapper class `MagicPipeline` to ease the process.
- Select top variable genes based on Median Absolute Deviation (MAD) or the genes of interest you want to infer the gene regulatory network from.
- Prepare input files in the correct format for SimiCPipeline.

This tutorial shows you how to do all of this using the SimiC preprocessing modules.

## Setup

The easiest way to configure your environment is to follow the `README` instructions using `poetry` (or `Docker`).


Required packages for this tutorial:
- simicpipeline
- anndata
- pandas
- numpy
- os
- pickle

Internally simicpipeline also uses:
- scipy
- scprep
- magic-impute



## Import Modules
First, import the necessary preprocessing modules.

In [1]:
import os
print(os.getcwd())

/home/workdir


In [2]:
import simicpipeline 
simicpipeline.__version__

'0.1.0'

<a id='part1'></a>
# Part 1: MAGIC Imputation Pipeline

MAGIC (Markov Affinity-based Graph Imputation of Cells) is used to denoise and impute scRNA-seq data. `MagicPipeline` facilitates the steps described in [Magic Tutorial]("https://magic.readthedocs.io/en/stable/tutorial.html")


## Step 1.1: Load Your Data

Load your raw expression data. Your input should be in AnnData format. 
You can see an example below starting from pandas DataFrame or 10x format.

<div class="alert alert-block alert-info">
<em> <b>Note:</b> If you have already processsed your data, make sure the adata object has the raw counts in the `adata.raw.X` slot</b>
</div>


In [None]:

print("Load your AnnData object here")
# # Example: Load from CSV files
# import pandas as pd
# import anndata as ad
# expression_data = pd.read_csv('path/to/expression_data.csv', index_col=0)
# metadata = pd.read_csv('path/to/metadata.csv', index_col=0)
# adata = ad.AnnData(X=expression_data.values, obs=metadata)

# Example: Load from 10X format
# adata = ad.read_10x_mtx('path/to/10x/directory')

# # Example: Load from Matrix Market format
# import pandas as pd
# import anndata as ad
# from pathlib import Path
# df = simicpipeline.load_from_matrix_market( 
#     matrix_path=Path("./data/simic_matrix.txt"),
#     genes_path=Path("./data/simic_genes.txt"),
#     cells_path=Path("./data/simic_cells.txt"),
#     transpose=True,
#     cells_index_name="Cell",
# )
# adata = ad.AnnData(X=df.values, obs=pd.DataFrame(index=df.index), var=pd.DataFrame(index=df.columns))
# # If your data is raw, you should set it properly with
# adata.raw = adata.copy()

# Example: Load from h5ad file using simicpipeline function
adata = simicpipeline.load_from_anndata('path/to/your/data.h5ad')
print(type(adata.X))
print(type(adata))
adata

Load your AnnData object here
<class 'scipy.sparse._csr.csr_matrix'>
<class 'anndata._core.anndata.AnnData'>


AnnData object with n_obs × n_vars = 72650 × 36774
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'sample', 'species', 'gene_count', 'tscp_count', 'mread_count', 'bc1_wind', 'bc2_wind', 'bc3_wind', 'bc1_well', 'bc2_well', 'bc3_well', 'treatment', 'cellLine', 'percent.mt', 'percent.rb', 'percent.hb', 'integrated_snn_res.0.095', 'seurat_clusters', 'sctype_custom', 'sctype_functional', 'sctype_SCsubtype', 'final_annotation', 'final_annotation_functional'
    var: 'vf_vst_counts.1_mean', 'vf_vst_counts.1_variance', 'vf_vst_counts.1_variance.expected', 'vf_vst_counts.1_variance.standardized', 'vf_vst_counts.1_variable', 'vf_vst_counts.1_rank', 'vf_vst_counts.2_mean', 'vf_vst_counts.2_variance', 'vf_vst_counts.2_variance.expected', 'vf_vst_counts.2_variance.standardized', 'vf_vst_counts.2_variable', 'vf_vst_counts.2_rank', 'vf_vst_counts.3_mean', 'vf_vst_counts.3_variance', 'vf_vst_counts.3_variance.expected', 'vf_vst_counts.3_variance.standardized', 'vf_vst_counts.3_variable', 'vf_vst

## Step 1.2: Initialize MAGIC Pipeline

Create a MAGIC pipeline instance:
- `input_data`: Your AnnData object. If you run the full pipline starting from raw counts they should be in `adata.raw.X`
- `project_dir`: Project directory where `magic_output`dir will be created and files will be saved
- `magic_output_file`: Filename for the imputed data (default: 'magic_data_allcells_sqrt.pickle')
- `filtered`: Set to True if data is already filtered (low quality cells and genes) (default: False)

In [6]:
# This command will initialize the MAGIC pipeline and generate the output directory if it does not exist
from simicpipeline import MagicPipeline
magic_pipeline = MagicPipeline(
    input_data= adata,
    project_dir='./SimiCExampleRun',
    magic_output_file='magic_imputed.pickle',
    filtered=False
)

print(magic_pipeline)

MagicPipeline(
  data = AnnData object with (n_obs × n_vars) = 72650 × 36774,
  filtered = False,
  imputed = False,
  magic_data = None,
  project_dir = 'SimiCExampleRun'
)


## Step 1.3: Filter Cells and Genes

Remove low-quality cells and lowly-expressed genes:
- `min_cells_per_gene`: Minimum number of cells expressing a gene (default: 10)
- `min_umis_per_cell`: Minimum total UMI counts per cell (default: 500)
<div class="alert alert-block alert-info">
<em><b>Note:</b> If your data was already filtered you can skip this step and set the flitered argument flag to `True` in the previous step.</em>
</div>

In [7]:
magic_pipeline.filter_cells_and_genes(min_cells_per_gene = 10, min_umis_per_cell = 500)


Filtering cells and genes...
Before filtering: 72650 cells x 36774 genes
Keeping 27837/36774 genes (75.70%)
Keeping 72650/72650 cells (100.00%)
All cells pass the filter!
After filtering: 72650 cells x 27837 genes


MagicPipeline(
  data = AnnData object with (n_obs × n_vars) = 72650 × 27837,
  filtered = True,
  imputed = False,
  magic_data = None,
  project_dir = 'SimiCExampleRun'
)

## Step 1.4: Normalize Data

Perform library size normalization with `scprep` followed by square root transformation.

In [8]:
magic_pipeline.normalize_data()
# Note this will overide adata.X with normalized data and remove adata.raw slot


Normalizing data...
After normalization: 72650 cells x 27837 genes


MagicPipeline(
  data = AnnData object with (n_obs × n_vars) = 72650 × 27837,
  filtered = True,
  imputed = False,
  magic_data = None,
  project_dir = 'SimiCExampleRun'
)

## Step 1.5: Run MAGIC Imputation

Run MAGIC imputation with custom parameters:
- `t`: Number of diffusion steps (default: 'auto')
- `knn`: Number of nearest neighbors (default: 5)
- `decay`: Decay rate for kernel (default: 1)
- `n_jobs`: Number of parallel jobs (default: -2)
- `genes`: Genes to be returned. If None or "all genes" it returns teh entire matrix.
- `save_data`: Whether to automatically save imputed data (default: True). If magic_output_file extension is .pickle will save it in .pickle, if h5ad, will save in adata format.

See [MAGIC documentation](https://magic.readthedocs.io/) for more parameter options.

In [7]:
magic_pipeline.run_magic(
    random_state=123,
    n_jobs=-2,  # Use all but 1 CPU cores
    save_data=True
)


Running MAGIC imputation...
Calculating MAGIC...
  Running MAGIC on 72650 cells and 27837 genes.
  Calculating graph and diffusion operator...
    Calculating PCA...
    Calculated PCA in 193.59 seconds.
    Calculating KNN search...
    Calculated KNN search in 2.77 seconds.
    Calculating affinities...
    Calculated affinities in 6.15 seconds.
  Calculated graph and diffusion operator in 202.65 seconds.
  Running MAGIC with `solver='exact'` on 27837-dimensional data may take a long time. Consider denoising specific genes with `genes=<list-like>` or using `solver='approximate'`.
  Calculating imputation...
  Calculated imputation in 155.87 seconds.
Calculated MAGIC in 371.92 seconds.
MAGIC imputation complete:  72650 cells x 27837 genes

Saving MAGIC-imputed data to SimiCExampleRun/magic_output/magic_imputed.pickle
Saved successfully to SimiCExampleRun/magic_output/magic_imputed.pickle


MagicPipeline(
  data = AnnData object with (n_obs × n_vars) = 72650 × 27837,
  filtered = True,
  imputed = True,
  magic_data = AnnData object with n_obs × n_vars = 72650 × 27837,
  project_dir = 'SimiCExampleRun'
)

## Step 1.6: Check Pipeline Status

In [8]:
experiment.print_project_info(max_depth=2)

├── KPB25L/
│   └── Tumor/
├── inputFiles/
│   ├── TF_list.pickle
│   ├── expression_matrix.pickle
│   └── treatment_annotation.txt
├── magic_output/
│   ├── magic_imputed.h5ad
│   ├── magic_imputed.pickle
│   └── magic_imputed_old.pickle
├── outputSimic/
│   ├── exports/
│   ├── figures/
│   └── matrices/
└── TF_list.pickle


<div class="alert alert-block alert-success">
<b>Success!</b> MAGIC imputation is complete. The imputed data is saved in the magic_output directory.
</div>


# Part 2: Experiment Setup and Gene Selection

Now we will select top variable genes and prepare input files for SimiCPipeline. 
<div class="alert alert-block alert-info">

<em><b>Note:</b> if you have your data filtered, normalized and inputed with other methods you can start from here.</em>
</div>


## Step 2.1: Load Imputed Data

In this example we will start from the imputed AnnData object from the MAGIC pipeline. If you saved and stopped your work, you can re-load the object with the following code:

In [10]:
# If you saved it in h5ad format, you can load it back using:
# import simicpipeline
# imputed_data = simicpipeline.load_from_anndata('path/to/magic_imputed.h5ad')
# # If you saved it in pickle format, you can load it back using:
# import pickle
# with open('path/to/magic_imputed.pickle', 'rb') as f:
#     imputed_data = pickle.load(f)
# If you contintue from the previous section you can access the MAGIC-imputed AnnData object
# imputed_data = magic_pipeline.magic_adata

# If you have inputed data from other methods, you can directly use it here. Make sure it is in AnnData format or pandas DataFrame (cells × genes)

In [4]:
# Load imputed data from pickle file
import pandas as pd
import anndata as ad
import pickle
with open('./SimiCExampleRun/magic_output/magic_imputed.pickle', 'rb') as f:
    imputed_data = pickle.load(f)
print(f"Imputed data shape: {imputed_data.shape}")
print(imputed_data.obs.head())

Imputed data shape: (72650, 27837)
             orig.ident  nCount_RNA  nFeature_RNA                 sample  \
43_01_73__s1  snParseBS     10080.0          3737  KPB25L-UV_Combination   
43_01_92__s1  snParseBS      2913.0          1743  KPB25L-UV_Combination   
43_01_94__s1  snParseBS     15986.0          5283  KPB25L-UV_Combination   
43_02_41__s1  snParseBS      5901.0          2216  KPB25L-UV_Combination   
43_02_56__s1  snParseBS     20456.0          5811  KPB25L-UV_Combination   

             species  gene_count  tscp_count  mread_count  bc1_wind  bc2_wind  \
43_01_73__s1  GRCm39        3737       10080        15158        43         1   
43_01_92__s1  GRCm39        1743        2913         4327        43         1   
43_01_94__s1  GRCm39        5283       15986        23555        43         1   
43_02_41__s1  GRCm39        2216        5901         8759        43         2   
43_02_56__s1  GRCm39        5811       20456        30394        43         2   

              ...  pe

## Step 2.2: Initialize Experiment Setup

Create an experiment setup instance and directories:
- `input_data`: Your imputed AnnData object or pandas DataFrame (cells × genes)
- `tf_path`: Path to transcription factor list file (.csv or .txt)
- `project_dir`: Directory where experiment files will be saved
<div class="alert alert-block alert-info">

<em><b>Note:</b> In case you do not a TF list, we provide a mouse TF list in the data folder that can be saved in your own data directory. 
TF mouse list was downloaded in December 2024 from "https://guolab.wchscu.cn/AnimalTFDB4_static/download/TF_list_final/Mus_musculus_TF"
</em>
</div>

In [10]:
from importlib.resources import files
p2tf = files("simicpipeline.data").joinpath("Mus_musculus_TF.txt")
mouse_TF_df = pd.read_csv(p2tf, sep='\t')
mouse_TF = mouse_TF_df['Symbol']
mouse_TF.to_csv('./data/TF_list.csv', index=False, header=False)

In [5]:
# Initialize ExperimentSetup
from simicpipeline import ExperimentSetup
experiment = ExperimentSetup(
    input_data = imputed_data, 
    tf_path = "./data/TF_list.csv", # Should have no header
    project_dir='./SimiCExampleRun'
)

print(f"Matrix shape: {experiment.matrix.shape}")
print(f"Number of cells: {len(experiment.cell_names)}")
print(f"Number of genes: {len(experiment.gene_names)}")
print(f"Number of TFs: {len(experiment.tf_list)}")
print(f"... Example TF names: {experiment.tf_list[0:5]}")

Matrix shape: (72650, 27837)
Number of cells: 72650
Number of genes: 27837
Number of TFs: 1611
... Example TF names: ['Lin28b', 'Tbx2', 'Dmtf1', 'Irx4', 'Irf3']


In [3]:
experiment.print_project_info(max_depth=2)

├── KPB25L/
│   └── Tumor/
├── inputFiles/
│   ├── TF_list.pickle
│   ├── expression_matrix.pickle
│   └── treatment_annotation.txt
├── magic_output/
│   ├── magic_imputed.h5ad
│   ├── magic_imputed.pickle
│   └── magic_imputed_old.pickle
├── outputSimic/
│   ├── exports/
│   ├── figures/
│   └── matrices/
└── TF_list.pickle


The previous code automatically creates the SimiC directory structure:
```
project_dir/
├── inputFiles/       # Input files for SimiC
└── outputSimic/
    ├── figures/      # For future visualizations
    └── matrices/     # For future results
```

## Step 2.3: Calculate MAD and Select Genes

Select top variable genes based on Median Absolute Deviation (MAD):
- `n_tfs`: Number of top TF genes to select (default: 100)
- `n_targets`: Number of top target genes to select (default: 1000)

Returns a tuple of (TF_list, TARGET_list)

In [4]:
tf_list, target_list = experiment.calculate_mad_genes(
    n_tfs=100,
    n_targets=1000
)

print(f"Selected {len(tf_list)} TFs")
print(f"Selected {len(target_list)} targets")
print(f"\nTop 10 TFs: {tf_list[:10]}")
print(f"\nTop 10 targets: {target_list[:10]}")

Removing 0 targets with MAD = 0
Selecting top 1000 targets based on MAD.
Selected 100 TFs
Selected 1000 targets

Top 10 TFs: ['Hmga2', 'Bnc2', 'Zfpm2', 'Satb2', 'Mecom', 'Zeb1', 'Glis3', 'Zeb2', 'Ebf1', 'Grhl2']

Top 10 targets: ['Xist', 'Brinp3', 'Tenm4', '4930467D21Rik', 'Igf2bp2', 'Rad51b', 'Nop58', 'Gm49890', 'Pip5k1b', 'Ccbe1']


## Step 2.4: Subset Data to Selected Genes

Create a subset of your data containing only the selected TFs and targets.

In [5]:
# Combine TF and target lists
selected_genes = tf_list + target_list

# Subset the data
if isinstance(imputed_data, ad.AnnData):
    subset_data = imputed_data[:, selected_genes].copy()
elif isinstance(imputed_data, pd.DataFrame):
    subset_data = imputed_data[selected_genes].copy()

print(f"Subset data shape: {subset_data.shape}")

Subset data shape: (72650, 1100)


## Step 2.5: Save Experiment Files

Save the expression matrix and TF names in `.pickle` format and annotation file (optional) as `.txt`
- `run_data`: `ad.AnnData` or `pd.Dataframe` with data to run in SimiC (Inputed and sliced according to experiment run)
- `matrix_filename`: Filename to save `run_data` (saved with row/column headers)
- `tf_filename`: Filename for TF names list for the experiment run.

- `annotation`:`str` (Optional) if `run_data` is `ad.AnnData` and `annotation` is in `run_data.obs.columns`, it will create a `.txt` file with the phenotype annotations needed for SimiC with `index = False`, `header = False`

All files are saved in the `inputFiles/` directory.

In [6]:
experiment.save_experiment_files(
    run_data = subset_data,
    matrix_filename = 'expression_matrix.pickle',
    tf_filename = 'TF_list.pickle',
    annotation = 'groups'
)


Saved expression matrix to SimiCExampleRun/inputFiles/expression_matrix.pickle

Saved 100 TFs to SimiCExampleRun/inputFiles/TF_list.pickle


Available columns:
 ['orig.ident', 'nCount_RNA', 'nFeature_RNA', 'sample', 'species', 'gene_count', 'tscp_count', 'mread_count', 'bc1_wind', 'bc2_wind', 'bc3_wind', 'bc1_well', 'bc2_well', 'bc3_well', 'treatment', 'cellLine', 'percent.mt', 'percent.rb', 'percent.hb', 'integrated_snn_res.0.095', 'seurat_clusters', 'sctype_custom', 'sctype_functional', 'sctype_SCsubtype', 'final_annotation', 'final_annotation_functional']
Please manually provide an appropriate annotation file to SimiCPipeline in SimiCExampleRun/inputFiles

-------

Experiment files saved successfully.

-------



In [7]:
experiment.save_experiment_files(
    run_data = subset_data,
    matrix_filename = 'expression_matrix.pickle',
    tf_filename = 'TF_list.pickle',
    annotation = 'treatment'
)


Saved expression matrix to SimiCExampleRun/inputFiles/expression_matrix.pickle

Saved 100 TFs to SimiCExampleRun/inputFiles/TF_list.pickle

-------

Annotation 'treatment' found in obs columns!

Annotation distribution: {0: 17259, 1: 20491, 2: 15426, 3: 19474}

Saved annotation to SimiCExampleRun/inputFiles/treatment_annotation.txt

-------

Experiment files saved successfully.

-------



<div class="alert alert-block alert-success">
<b>Success!</b> All preprocessing steps completed. Your files are ready for SimiC analysis.
</div>


## Step 2.6: Verify Saved Files

Check that all files were created correctly.

In [12]:
experiment.print_project_info(max_depth=2)

├── KPB25L/
│   └── Tumor/
├── inputFiles/
│   ├── TF_list.pickle
│   ├── expression_matrix.pickle
│   └── treatment_annotation.txt
├── magic_output/
│   ├── magic_imputed.h5ad
│   ├── magic_imputed.pickle
│   └── magic_imputed_old.pickle
├── outputSimic/
│   ├── exports/
│   ├── figures/
│   └── matrices/
└── TF_list.pickle


# Summary

This tutorial covered:

✓ Loading and filtering scRNA-seq data

✓ Running MAGIC imputation

✓ Selecting top variable genes using MAD

✓ Preparing input files for SimiC analysis with proper directory structure

### Output Directory Structure

Your output directory now contains:
```
SimicExampleRun/
├── magic_output/
│   └── magic_imputed.pickle
├── inputFiles/
│   ├── expression_matrix.pickle
│   ├── TF_list.pickle
│   └── groups_phenotype.txt
└── outputSimic/
    ├── figures/
    └── matrices/
```


# Alternative Approach

In this tutorial we used the whole Magic-inputed matrix (obtained in [Part1](#part1)) and selected top MAD genes but you may want to run SimiC in a subset of cells from your data. Make sure you slice the data before you inilitalize the `ExperimentSetup` class so MAD genes are calculated over your cells of interest.

All these steps are optional but recommended before running SimiCPipeline just:
1. Make sure to **prepare cell assignments** if not done before: Create a file with cell phenotype labels matching  the same order as your expression matrix that you will use in SimiC.
2. Prepare **expression matrix** pickle file
3. Prepare **TF_list** pickle with TF genes in your expression matrix.


In [15]:
imputed_data.obs

Unnamed: 0,orig.ident,nCount_RNA,nFeature_RNA,sample,species,gene_count,tscp_count,mread_count,bc1_wind,bc2_wind,...,percent.mt,percent.rb,percent.hb,integrated_snn_res.0.095,seurat_clusters,sctype_custom,sctype_functional,sctype_SCsubtype,final_annotation,final_annotation_functional
43_01_73__s1,snParseBS,10080.0,3737,KPB25L-UV_Combination,GRCm39,3737,10080,15158,43,1,...,0.079365,0.327381,0.009921,0,0,Unknown,Basal-like,Basal-like,Cancer cells,Basal-like
43_01_92__s1,snParseBS,2913.0,1743,KPB25L-UV_Combination,GRCm39,1743,2913,4327,43,1,...,0.034329,0.137315,0.000000,1,0,Cancer cells,Proliferating cells,Proliferating cells,Cancer cells,Proliferating cells
43_01_94__s1,snParseBS,15986.0,5283,KPB25L-UV_Combination,GRCm39,5283,15986,23555,43,1,...,0.150131,0.337796,0.006255,1,0,Cancer cells,Proliferating cells,Proliferating cells,Cancer cells,Proliferating cells
43_02_41__s1,snParseBS,5901.0,2216,KPB25L-UV_Combination,GRCm39,2216,5901,8759,43,2,...,0.016946,0.169463,0.067785,4,3,Endothelial cells,Endothelial cells,Endothelial cells,Endothelial cells,Endothelial cells
43_02_56__s1,snParseBS,20456.0,5811,KPB25L-UV_Combination,GRCm39,5811,20456,30394,43,2,...,0.048885,0.386195,0.019554,0,0,Unknown,Basal-like,Basal-like,Cancer cells,Basal-like
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
06_92_89__s8,snParseBS,6205.0,2592,KPB25L_control,GRCm39,2592,6205,9053,6,92,...,0.000000,0.112812,0.032232,0,0,Unknown,Basal-like,Basal-like,Cancer cells,Basal-like
06_94_61__s8,snParseBS,2085.0,1383,KPB25L_control,GRCm39,1383,2085,3132,6,94,...,0.000000,0.143885,0.047962,2,1,Macrophages,Macrophages,Macrophages,Macrophages,Macrophages
06_96_48__s8,snParseBS,6017.0,2632,KPB25L_control,GRCm39,2632,6017,8845,6,96,...,0.066478,0.149576,0.000000,6,0,Unknown,Unknown,LumA_SC,Cancer cells,Unknown
06_96_58__s8,snParseBS,4153.0,2102,KPB25L_control,GRCm39,2102,4153,6128,6,96,...,0.240790,0.216711,0.072237,2,1,Macrophages,Macrophages,Macrophages,Macrophages,Macrophages


In [None]:
from simicpipeline import ExperimentSetup
cell_mask = imputed_data.obs['cellLine'].isin(["KPB25L"]) & imputed_data.obs['final_annotation_functional'].isin(['Proliferating cells','Basal-like'])
sum(cell_mask)
subset_imputed_data = imputed_data[cell_mask,:].copy()
subset_imputed_data.shape

experiment = ExperimentSetup(
    input_data = subset_imputed_data, 
    tf_path = "./data/TF_list.csv", # Should have no header
    project_dir='./SimiCExampleRun/KPB25L/Tumor'
)
# Then follow the same steps as above to select genes and save experiment files

The output directory will then look like 
```
SimicExampleRun/
├── magic_output/
│   └── magic_imputed.pickle
├── inputFiles/
│   ├── expression_matrix.pickle
│   ├── TF_list.pickle
│   └── groups_phenotype.txt
├── outputSimic/
│   ├── figures/
│   └── matrices/
├── KPB25L/
    └──Tumor/
        ├── inputFiles/
        │   ├── expression_matrix.pickle
        │   ├── TF_list.pickle
        │   └── groups_phenotype.txt
        └── outputSimic/
            ├── figures/
            └── matrices/
```


# Next steps:
1. **Run SimiC**: Use `SimiCPipeline` class to run SimiC.
2. **Explore results**: Use `SimicVisualization`class to analyze GRNs and TF activities.

Check `Tutorial_SimiCPipeline_simple.ipynb` or `Tutorial_SimiCPipeline_full` for more info.


### Final Notes

<div class="alert alert-block alert-info">
<b>Data Format:</b> All matrices are stored as cells × genes (rows = cells, columns = genes)
</div>

<div class="alert alert-block alert-warning">
<b>Memory Usage:</b> MAGIC imputation can be memory-intensive for large datasets. Consider using a machine with sufficient RAM and adjusting MAGIC parameters (n_jobs, knn, t)
</div>

<div class="alert alert-block alert-info">
<b>Please note:</b> Although you will be able to pass custom file/direcotry paths, we highly recommend to follow the directory structure described above and follow this tutorial before running SimiC to avoid errors.
</div>