# **Package Usage Guide**

## **1. Introduction**
Welcome to the **SINanalyzer** usage guide! 

This package enables personalized biological network construction, 
allowing users to model gene-gene interactions within a biological network. 

Specifically, it supports selecting specific genes to build targeted, individualized networks. 
The package provides multiple methods to customize and generate networks tailored to your research needs.

This notebook provides a comprehensive overview of installation, setup, and examples for getting started.


## **2. Installation** 
1. Use the following command to clone this repository and install locally


In [None]:
git clone https://github.com/TashaTu/SINanalyzer.git
cd SINanalyzer

2. Create a virtual environment (optional but recommended)

In [None]:
python3 -m venv env
source env/bin/activate  # For MacOS/Linux
# OR
env\Scripts\activate # For Windows (CMD)
# OR
.\env\Scripts\activate   # For Windows (Powershell)

3. Install the package

In [None]:
pip install .

## **3. Quick Start**
Let's start with a simple example to explore the basic functionalities of this package. 

Before diving into the package functionality, ensure you have an appropriate **gene expression matrix (GEM)** prepared. 
This matrix should contain genes as rows and samples as columns, with each cell representing the expression level of a particular gene in a specific sample.

### **Example Gene Expression Matrix (GEM)**
To illustrate, the GEM might look something like this:

| Gene/Sample | Sample1 | Sample2 | Sample3 | ... |
|-------------|---------|---------|---------|-----|
| Gene1       | 5.3     | 6.7     | 4.2     | ... |
| Gene2       | 8.1     | 7.4     | 6.0     | ... |
| Gene3       | 3.4     | 5.1     | 4.8     | ... |
| ...         | ...     | ...     | ...     | ... |

**Note**: Ensure your GEM is correctly formatted, as it will be the basis for constructing the personalized biological networks.

### **Loading the GEM into the Package**
To start, load your GEM data into the package. This can typically be done using `pandas` or similar data-handling libraries if your data is in TXT or TSV format:


In [1]:
import pandas as pd

# Load GEM (gene expression matrix) from a TSV file
gem = pd.read_csv('tests/KICH_all.txt', sep='\t', header=0, index_col=0)
print(gem.head())  # Check the first few rows to confirm data structure

  from pandas.core import (


           TCGA-KN-8435-01A-11R-2315-07  TCGA-KL-8327-01A-11R-2315-07  \
gene                                                                    
100130426                      0.440846                      0.958583   
100133144                      4.972647                      1.937420   
100134869                      4.817618                      0.000000   
10357                          6.026074                      5.584963   
10431                          9.369173                     10.098405   

           TCGA-KN-8434-01A-11R-2315-07  TCGA-KL-8337-01A-11R-2315-07  \
gene                                                                    
100130426                      0.510050                      0.836247   
100133144                      2.840161                      1.534609   
100134869                      3.311140                      2.473267   
10357                          5.461057                      5.816039   
10431                          9.393508           

### **Running SiNE to Construct a Personalized Network**
Now that we have the gene expression matrix (GEM), we can proceed with using the **SiNE** method. 
The following example will guide you through configuring and running SiNE with the appropriate parameters.

**Note**: If `output_format` is not specified, the output will default to the `npz` format. Additionally, a `Gene_index_mapping_table.txt` file will be provided as a reference.

In [1]:
from SINanalyzer import integrated_network_construction
import pandas as pd

# Define the file path to your gene expression matrix
gem_path = 'tests/KICH_all.txt'  # Update this path with the location of your GEM file
gem = pd.read_csv(gem_path, sep='\t', header=0, index_col=0)

# Example: specify a subset of genes (or set to None to include all)
gene_df = pd.read_csv('tests/gene_set_100.txt', sep='\t', header=0, index_col=0)
gene_set =  list(gene_df.index)
sample_df = pd.read_csv('tests/sample_list_10.txt', sep='\t', header=0, index_col=0)
sample_list = list(sample_df.index)
outdir = 'tests/SiNE_SIN'  # Specify an output directory for results

# Run SiNE with the specified parameters
integrated_network_construction.network_construction(case_data=gem,method="SiNE",gene_set=gene_set,sample_list=sample_list,outdir=outdir,output_format="npz")

  from pandas.core import (


Step 1 : weight calculation
-> Sample weight calculation ... Done
Step 2 : mean calculation
-> Data preprocessing ... Done
-> Aggregate network construction ... Done


  c /= stddev[:, None]
  c /= stddev[None, :]


-> Mean of edge scores calculation ... Done
-> Mean = 0.03033256360451698
Step 3 : standard calculation
-> Data preprocessing ... Done
-> Aggregate network construction ... Done
-> Standard of edge scores calculation ... Done
-> Standard = 0.26139335457714413
Step 4 : network construction
-> Data preprocessing ... Done
-> Gene index mapping table ... Done
-> Aggregate network construction ... Done
-> Single-sample network construction ... Done


### **Running SSN to Construct a Personalized Network**
Now that we have the gene expression matrices (GEM) for both **case** and **control** datasets, we can proceed with using the **SSN** method.
The following example will guide you through configuring and running **SSN** with the appropriate parameters, which requires preparing data for both **case** and **control** categories.

This time, let's output the `edge_list_zscore` format to see the results.

In [15]:
from SINanalyzer import integrated_network_construction
import pandas as pd

# Define the file path to your gene expression matrix
gem_case_path = 'tests/KICH_tumor.txt'  # Update this path with the location of your GEM file
gem_case = pd.read_csv(gem_case_path, sep='\t', header=0, index_col=0)
gem_control_path = 'tests/KICH_normal.txt'  # Update this path with the location of your GEM file
gem_control = pd.read_csv(gem_control_path, sep='\t', header=0, index_col=0)

# Example: specify a subset of genes (or set to None to include all)
gene_df = pd.read_csv('tests/gene_set_100.txt', sep='\t', header=0, index_col=0)
gene_set =  list(gene_df.index)
sample_df = pd.read_csv('tests/sample_list_10.txt', sep='\t', header=0, index_col=0)
sample_list = list(sample_df.index)
outdir = 'tests/SSN_SIN'  # Specify an output directory for results

# Run SiNE with the specified parameters
integrated_network_construction.network_construction(case_data=gem_case,control_data=gem_control,method="SSN",gene_set=gene_set,sample_list=sample_list,outdir=outdir,output_format="edge_list_zscore")

-> Data preprocessing ... Done
-> Aggregate network construction ... Done
-> Single-sample network construction ... Done


  c /= stddev[:, None]
  c /= stddev[None, :]


### **Visualizing a Personalized Network** ###
After successfully constructing a personalized network in the previous step, we now proceed to visualize it. This example demonstrates how to use the **Pyvis** package, a powerful tool for creating interactive network visualizations. 


In [14]:
from SINanalyzer import network_visualization

file_path = f'tests/SiNE_SIN/TCGA-KN-8436-01A-11R-2315-07.npz'
gene_label = f'tests/SiNE_SIN/Gene_index_mapping_table.txt'
save_file_path = f'tests/SiNE_SIN/TCGA-KN-8436-01A-11R-2315-07.html'
network_visualization.graph_plot(file_path, gene_label=gene_label, save_file_path=save_file_path, input_format="npz")

The network graph has been saved to tests/SiNE_SIN/TCGA-KN-8436-01A-11R-2315-07.html
