# Step by step tutorial to use ISN-tractor

## Installation

You can install the ISN-tractor library using either of the following methods:

__PyPi Package Installation:__

In [1]:
pip install isn-tractor

You should consider upgrading via the '/massstorage/HOME/f067198/isn_project/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


__GitHub Repository Installation:__

pip install git+https://github.com/GiadaLalli/ISN-tractor

## Import libraries

First, import the ISN-tractor module to run the tool:

In [2]:
import isn_tractor.ibisn as it

Additionally, import standard libraries to handle data:

In [3]:
import pandas as pd
import numpy as np

Now that you have installed the library and imported the necessary modules, you can proceed with using ISN-tractor in your Python environment.

## Data Formats: generating mock data

This section is dedicated to _understanding and generating mock data_. For each data type, there will be a brief explanation followed by practical examples demonstrating how to create and manipulate mock data in Python. 

### Introduction to the ```mock_mapped``` Function
The ```mock_mapped``` function is designed to generate synthetic gene expression-like data, which are continuous values typically used in genomic studies. This function is particularly useful for creating mock datasets for testing and experimentation. The term "mapped" in the function name refers to how this data is labeled in a way that is similar to real-world gene expression data, which are mapped to specific features (genes).

#### Why "Mapped" Data?
In genomics, different types of data are used to study genetic variations and gene expression. Two common types are:

1. __Gene Expression Data__: Continuous data representing the expression levels of genes. This data is often already mapped to specific genes or genomic features.
2. __Genotype Array Data__ (SNPs): Discrete data representing single nucleotide polymorphisms (SNPs). This data often needs to be mapped to their respective genes or genomic regions to be meaningful for downstream analysis.
The mock_mapped function generates data that is labeled and structured similarly to gene expression data, making it ready for use in various genomic analyses without additional mapping steps.

#### Function Definition
Here is the definition of the ```mock_mapped``` function:

In [4]:
def mock_mapped(size: tuple[int, int]) -> pd.DataFrame:

    """
    Generates a DataFrame with continuous gene expression-like data.

    Parameters
    ----------
    size : tuple[int, int]
        A tuple specifying the number of rows (samples) and columns (features).

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the synthetic gene expression-like data.
    """
    
    rows, cols = size
    data = np.random.uniform(0, 100, size=size)
    col_names = [f"mapped_feature_{i}" for i in range(cols)]
    index_names = [f"sample_{i}" for i in range(rows)]
    return pd.DataFrame(data, index=index_names, columns=col_names)

#### How It Works
1. Parameters:

- ```size (tuple[int, int])```: This parameter specifies the dimensions of the DataFrame. It takes a tuple where the first element is the number of rows (samples) and the second element is the number of columns (features).

2. Data Generation:

- The function uses ```np.random.uniform(0, 100, size=size)``` to generate random continuous values uniformly distributed between 0 and 100. These values simulate gene expression levels.

3. Column and Row Names:

- ```col_names```: The column names are generated in the format "mapped_feature_i", where i is the column index. This mimics the naming convention for features in gene expression data.
- ```index_names```: The row names are generated in the format "sample_i", where i is the row index. This represents different samples in the dataset.

4. Return Value:

- The function returns a ```pd.DataFrame``` containing the generated data, with the specified row and column names.

#### Example Usage
Here is an example of how to use the mock_mapped function to generate and display a DataFrame of gene expression-like data:

In [5]:
# Example usage:
if __name__ == "__main__":
    # Define the size of the DataFrame (rows, columns)
    size = (100, 50)  # 100 samples and 50 features

    # Generate the DataFrame
    df_mapped = mock_mapped(size)

    # Save the DataFrame to a CSV file
    df_mapped.to_csv('1_df_mapped.csv', index=True)

In [6]:
# Display the first 5 rows of the DataFrame
df_mapped.head()

Unnamed: 0,mapped_feature_0,mapped_feature_1,mapped_feature_2,mapped_feature_3,mapped_feature_4,mapped_feature_5,mapped_feature_6,mapped_feature_7,mapped_feature_8,mapped_feature_9,...,mapped_feature_40,mapped_feature_41,mapped_feature_42,mapped_feature_43,mapped_feature_44,mapped_feature_45,mapped_feature_46,mapped_feature_47,mapped_feature_48,mapped_feature_49
sample_0,34.928172,68.168139,58.683361,94.128436,29.690661,26.677572,75.0349,54.57313,89.55741,40.737855,...,98.695799,84.871647,73.280647,26.337824,78.053343,0.481553,8.403765,29.222197,33.574207,24.504671
sample_1,78.016747,99.49205,70.145861,17.151101,99.030816,90.708562,21.253042,12.867142,89.866548,99.754787,...,65.728041,42.782461,86.708234,56.280929,72.621607,63.896755,2.506398,9.899629,28.125014,31.566921
sample_2,81.544309,60.898832,57.578161,16.95151,10.553842,38.332432,27.706533,25.893565,40.347078,48.134245,...,89.223164,13.09081,93.875936,10.820379,7.544881,93.904309,11.400488,26.753149,49.610781,27.189153
sample_3,87.299392,90.036457,57.616788,51.66303,63.643471,80.017579,85.543056,15.645901,79.044992,12.080211,...,56.115858,49.187837,94.61927,7.408395,10.563895,9.484922,60.102911,19.981704,2.932652,65.964166
sample_4,46.586535,92.593747,97.98156,59.742234,45.28717,39.107802,31.22696,41.657816,75.979516,99.350307,...,71.812797,20.400118,97.29435,81.651273,40.115705,52.751265,65.468523,83.560803,39.732533,4.241352


This example generates a DataFrame with 100 samples and 50 features of synthetic mapped, continuous data and then saves it to a CSV file named ```df_mapped.csv```. This data can be found in the Tutorial folder in the GitHub Repository.

### Introduction to the ```mock_unmapped``` Function
The ```mock_unmapped``` function generates synthetic data resembling unmapped, discrete data types, such as Single Nucleotide Polymorphism (SNP) data. This function is useful for creating mock datasets resembling real-world genotype data.

#### Why "Unmapped" Data?
In genomics, data types like SNP data are crucial for studying genetic variations. Unlike mapped data, which is already associated with specific genomic features, unmapped data requires further processing to be linked to such features. The ```mock_unmapped``` function simulates this data-type, making it suitable for exploring algorithms and methods that handle discrete genomic data effectively.

#### Function Definition
Here is the definition of the ```mock_unmapped``` function:

In [7]:
def mock_unmapped(size: tuple[int, int]) -> pd.DataFrame:
    
    """
    Generates a DataFrame with synthetic unmapped, discrete data (e.g., SNP data).

    Parameters
    ----------
    size : tuple[int, int]
        A tuple specifying the number of rows (samples) and columns (features).

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the synthetic unmapped, discrete data.
    """
    rows, cols = size
    data = np.random.randint(0, 3, size=size)
    col_names = [f"unmapped_feature_{i}" for i in range(cols)]
    index_names = [f"sample_{i}" for i in range(rows)]

    return pd.DataFrame(data, index=index_names, columns=col_names)

#### How It Works
1. Parameters:

- ```size (tuple[int, int])```: Specifies the dimensions of the DataFrame to be generated. The first element indicates the number of rows (samples), and the second element indicates the number of columns (features).

2. Data Generation:

- Generates random integer data (```data```) using NumPy's ```randint``` function, with values ranging from 0 to 2. This simulates discrete data typical of SNP datasets.

3. Column and Row Names:

- ```col_names```: Generates column names in the format ```unmapped_feature_i```, where i is the column index. This convention mirrors how features in SNP data are labeled.
- ```index_names```: Creates row (index) names in the format ```sample_i```, where i is the row index. Each name represents a different sample in the generated dataset.

4. Return Value:

- Returns a Pandas DataFrame containing the generated data (```data```), with specified row and column names (```index_names``` and ```col_names```, respectively).

#### Example Usage
Here is an example of how to use the ```mock_unmapped``` function to generate and save a DataFrame of synthetic unmapped, discrete data:

In [8]:
if __name__ == "__main__":
    # Define the size of the DataFrame (rows, columns)
    size = (100, 50)  # 100 samples and 50 features

    # Generate the DataFrame
    df_unmapped = mock_unmapped(size)

    # Save the DataFrame to a CSV file
    df_unmapped.to_csv('2_df_unmapped.csv', index=True)

In [9]:
# Display the first 5 rows of the DataFrame
df_unmapped.head()

Unnamed: 0,unmapped_feature_0,unmapped_feature_1,unmapped_feature_2,unmapped_feature_3,unmapped_feature_4,unmapped_feature_5,unmapped_feature_6,unmapped_feature_7,unmapped_feature_8,unmapped_feature_9,...,unmapped_feature_40,unmapped_feature_41,unmapped_feature_42,unmapped_feature_43,unmapped_feature_44,unmapped_feature_45,unmapped_feature_46,unmapped_feature_47,unmapped_feature_48,unmapped_feature_49
sample_0,1,0,1,0,2,0,1,0,0,2,...,1,1,2,1,2,2,1,2,1,0
sample_1,0,2,0,0,1,0,2,0,1,0,...,1,1,1,0,2,1,0,0,1,2
sample_2,2,2,2,1,2,1,1,0,2,1,...,0,0,2,1,1,0,0,2,0,0
sample_3,0,0,2,1,1,1,1,1,2,0,...,1,1,1,2,1,0,1,1,1,2
sample_4,1,2,0,2,2,1,1,2,2,2,...,1,0,0,1,1,2,2,0,0,0


This example generates a DataFrame with 100 samples and 50 features of synthetic unmapped, discrete data and then saves it to a CSV file named ```df_unmapped.csv```. This data can be found in the _Tutorial_ folder in the GitHub Repository.

### Introduction to the ```interactions``` Function

#### What is the ```interactions``` Function?
The ```interactions``` function generates a DataFrame representing interactions between features, akin to pairwise interactions in genomic studies or other analytical contexts. This function is useful for simulating relationships between features for testing algorithms or exploring data patterns.

#### Function Definition
Here is the definition of the interactions function:

In [10]:
def interactions(n_rows):
    """
    Generates a DataFrame representing interactions between features.

    Parameters
    ----------
    n_rows : int
        Number of features (rows) to create interactions for.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing feature interactions.
    """
    features = [f"mapped_feature_{i}" for i in range(n_rows)]
    interact = []
    for i in range(len(features)):
        other_features = features[:i] + features[i + 1 :]
        n_interact = np.random.randint(1, n_rows)
        interact_features = np.random.choice(
            other_features, size=n_interact, replace=False
        )
        for j in range(n_interact):
            interact.append((features[i], interact_features[j]))
    interact_df = pd.DataFrame(interact, columns=["feature_1", "feature_2"])

    # Remove 30% of random rows
    interact_df = interact_df.sample(frac=0.7, random_state=42)

    # Sort by index
    interact_df = interact_df.sort_index()

    return interact_df

#### How It Works
1. Parameters:

- ```n_rows``` (int): Specifies the number of features (rows) for which interactions are to be simulated.

2. Feature Interaction Generation:
- __features__: Creates a list of feature names in the format ```mapped_feature_i```, where i is the feature index.
- __interact__: Initializes an empty list to store tuples representing interactions between features.
- __for__ loop: Iterates through each feature (```features[i]```) and selects a random subset of other features (```other_features```) to interact with.
- __np.random.choice__: Randomly selects a subset of features (```interact_features```) from ```other_features``` without replacement, simulating interactions.
- __append__: Adds each interaction tuple (```features[i]```, ```interact_features[j]```) to the ```interact``` list.

3. Column and Row Names:

- ```col_names```: Generates column names in the format ```unmapped_feature_i```, where i is the column index. This convention mirrors how features in SNP data are labeled.
- ```index_names```: Creates row (index) names in the format ```sample_i```, where i is the row index. Each name represents a different sample in the generated dataset.

4. Return Value:

- Returns a Pandas DataFrame (```interact_df```) containing simulated feature interactions between the generated features.

#### Example Usage
Here is an example of how to use the ```interactions``` function to generate and display a DataFrame of feature interactions:

In [11]:
if __name__ == "__main__":
    # Define the number of features (rows)
    n_features = 20

    # Generate the DataFrame with feature interactions
    interactions_df = interactions(n_features)

    # Save the DataFrame to a CSV file
    interactions_df.to_csv('3_interactions_df.csv', index=True)

In [12]:
# Display the first 5 rows of the DataFrame
interactions_df.head()

Unnamed: 0,feature_1,feature_2
0,mapped_feature_0,mapped_feature_4
2,mapped_feature_0,mapped_feature_10
3,mapped_feature_0,mapped_feature_18
4,mapped_feature_0,mapped_feature_11
5,mapped_feature_0,mapped_feature_19


This example generates a DataFrame (```interactions_df```) with simulated interactions between 20 features and then saves it to a CSV file named ```interactions_df.csv```. This data can be found in the _Tutorial_ folder in the GitHub Repository.

### Introduction to the ```mapped_info``` Function
#### What is the ```mapped_info``` Function?
The ```mapped_info``` function generates a DataFrame containing mapped genomic information based on the columns of a given DataFrame (```df```). It assigns synthetic values representing chromosome numbers, start positions, and stop positions, mimicking structured genomic data often used in genomic studies.

#### Function Definition
Here is the definition of the ```mapped_info``` function:

In [13]:
import pandas as pd
import numpy as np

def mapped_info(df):
    """
    Generates a DataFrame with mapped genomic information based on input DataFrame columns.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame where columns represent genomic features.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing synthetic genomic mapping information (chromosome, start, stop).
    """
    # Define column names
    column_names = ["chr", "start", "stop"]

    # Define number of repetitions for each chromosome
    chr_repetitions = [7, 9, 14, 5, 16, 4, 10, 12, 8, 6, 3, 21, 1, 3, 11, 8, 7, 9, 11, 10, 10]  
    chrs = []
    for i, reps in enumerate(chr_repetitions, start=1):
        chrs.extend([i] * reps)
    chrs = np.array(chrs)

    # Ensure that we have enough repetitions to cover the number of rows in df
    n_rows = len(df.columns)
    if len(chrs) < n_rows:
        raise ValueError("The defined chromosome repetitions do not cover the number of rows in the input DataFrame.")

    chrs = chrs[:n_rows]  # Trim to the number of rows

    # Generate incremental start and stop positions for each chromosome group
    df_rows = []
    for chr_num in set(chrs):
        chr_indices = np.where(chrs == chr_num)[0]
        starts = np.sort(np.random.randint(1, 99, size=len(chr_indices)))
        lengths = np.random.randint(1, 10, size=len(chr_indices))  # Ensure variability
        stops = starts + lengths
        stops[stops > 99] = 99  # Ensure stop positions do not exceed 99

        for i, idx in enumerate(chr_indices):
            df_rows.append([chr_num, starts[i], stops[i]])

    # Create dataframe
    data_frame = pd.DataFrame(df_rows, columns=column_names, index=df.columns[:len(df_rows)])

    return data_frame

#### How It Works
1. Parameters:

- __df__ (```pd.DataFrame```): Input DataFrame where each column represents a genomic feature for which mapped information will be generated.

2. Data Generation::
- __column_names__: Defines the column names ```chr```, ```start```, and ```stop``` for the output DataFrame, representing chromosome number, start position, and stop position, respectively.
- __n_chromosomes__: Specifies the total number of chromosomes (assumed to be 23 in this example).
- __n_rows__: Computes the number of columns (genomic features) in the input DataFrame ```df```.
  
3. Generation of Mapped Values:

- __chrs__: Generates synthetic chromosome numbers (```chr```) by repeating numbers from 1 to 23 enough times to cover all columns in df.
- __starts__: Computes synthetic start positions (```start```) for genomic regions based on a sequence starting from 1 and increasing by 10 for each subsequent genomic feature.
- __stops__: Calculates synthetic stop positions (```stop```) for genomic regions by adding 9 to each start position.

4. Data Assignment:

- __df_rows__: Iterates through the columns of ```df```, assigning corresponding chromosome, start, and stop values to each genomic feature.

5. Return Value:
   
- Returns a Pandas DataFrame (```data_frame```) containing synthetic genomic mapping information (```chr```, ```start```, ```stop```) indexed by the column names of the input ```df```.

#### Example Usage
Here is an example of how to use the ```mapped_info``` function with a sample DataFrame:

In [14]:
# Example usage
if __name__ == "__main__":
    # Example DataFrame
    example_df = pd.DataFrame({
        "gene1": [0, 0, 1, 1, 0],
        "gene2": [1, 1, 0, 0, 1],
        "gene3": [0, 1, 1, 0, 0],
        "gene4": [1, 0, 0, 1, 1],
        "gene5": [0, 1, 0, 1, 1],
        "gene6": [1, 0, 1, 0, 0],
        "gene7": [0, 0, 1, 1, 0],
        "gene8": [1, 1, 0, 0, 1],
        "gene9": [0, 1, 1, 0, 0],
        "gene10": [1, 0, 0, 1, 1],
        "gene11": [0, 1, 0, 1, 1],
        "gene12": [1, 0, 1, 0, 0],
        "gene13": [0, 0, 1, 1, 0],
        "gene14": [1, 1, 0, 0, 1],
        "gene15": [0, 1, 1, 0, 0],
        "gene16": [1, 0, 0, 1, 1],
        "gene17": [0, 1, 0, 1, 1],
        "gene18": [1, 0, 1, 0, 0],
        "gene19": [0, 0, 1, 1, 0],
        "gene20": [1, 1, 0, 0, 1],
        "gene21": [0, 1, 1, 0, 0],
        "gene22": [1, 0, 0, 1, 1],
        "gene23": [0, 1, 0, 1, 1],
        "gene24": [1, 0, 1, 0, 0],
        "gene25": [0, 0, 1, 1, 0],
        "gene26": [1, 1, 0, 0, 1],
        "gene27": [0, 1, 1, 0, 0],
        "gene28": [1, 0, 0, 1, 1],
        "gene29": [0, 1, 0, 1, 1],
        "gene30": [1, 0, 1, 0, 0]
    })

    # Generate mapped genomic information
    mapped_data = mapped_info(example_df)

    # Save the DataFrame to a CSV file
    mapped_data.to_csv('4_mapped_data.csv', index=True)

# Second Example of Usage:
# mapped_data = mapped_info(df_mapped)
# This example generates mapped metadata directly from the mock data produced by the mock_mapped function.

# Third Example of Usage:
# df_mapped = pd.read_csv('1_df_mapped.csv')
# mapped_data = mapped_info(df_mapped)
# This example demonstrates loading data from a CSV file and then applying the mapped_info function to generate the metadata.

In [15]:
# Display the generated DataFrame
mapped_data.head()

Unnamed: 0,chr,start,stop
gene1,1,19,27
gene2,1,29,38
gene3,1,44,51
gene4,1,57,64
gene5,1,60,67


### Introduction to the ```unmapped_info``` Function
#### What is the ```unmapped_info``` Function?
The ```unmapped_info``` function is designed to generate synthetic metadata for a DataFrame containing unmapped, discrete data (e.g., SNP data). This metadata includes chromosome numbers and genomic locations for each feature (column) in the DataFrame. The function is useful for creating mock datasets that resemble real genomic data with associated metadata.

#### Why Metadata for Unmapped Data?
In genomic studies, metadata such as chromosome numbers and genomic locations are critical for understanding the context of genetic variations. This function provides a way to simulate such metadata for synthetic datasets, enabling researchers to test and develop methods that require this additional layer of information.

#### Function Definition
Here is the definition of the ```unmapped_info``` function:

In [16]:
import pandas as pd
import numpy as np

def unmapped_info(df):
    """
    Generates synthetic metadata for a DataFrame of unmapped, discrete data.

    Parameters
    ----------
    df : pd.DataFrame
        A DataFrame containing the unmapped data, where columns represent features.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing synthetic metadata with chromosome numbers and locations.
    """
    rows = df.shape[1]
    
    # Generate random incremental values for location
    location = []
    chromosome = []
    current_position = 1
    chr_num = 1

    for _ in range(rows):
        increment = np.random.randint(1, 10)
        next_position = current_position + increment
        
        if next_position > 99:
            chr_num += 1
            current_position = 1
            next_position = current_position + increment
        
        location.append(next_position)
        chromosome.append(chr_num)
        current_position = next_position

    return pd.DataFrame({"chr": chromosome, "location": location}, index=df.columns)

#### How It Works
1. Parameters:
__df__ (```pd.DataFrame```): A Pandas DataFrame containing the unmapped data, with each column representing a feature.
2. Metadata Generation:
- __Location__: Generates a list of genomic locations, spaced by increments of 2, for each feature.
- __Chromosome__: Generates a sorted list of chromosome numbers, cycling through 1 to 23, for each feature.
3. Return Value:
- Returns a Pandas DataFrame containing the synthetic metadata with columns ```chr``` for chromosome numbers and ```location``` for genomic locations. The index of this DataFrame matches the column names of the input DataFrame ```df```.
#### Example Usage
Here is an example of how to use the ```unmapped_info``` function to generate and display synthetic metadata for a DataFrame of unmapped, discrete data:

In [17]:
# Example usage
if __name__ == "__main__":
    # Example DataFrame
    example_df = pd.DataFrame({
        "unmapped_feature_0": [0, 1, 0, 1, 0],
        "unmapped_feature_1": [1, 0, 1, 0, 1],
        "unmapped_feature_2": [0, 1, 0, 1, 0],
        "unmapped_feature_3": [1, 0, 1, 0, 1],
        "unmapped_feature_4": [0, 1, 0, 1, 0],
        "unmapped_feature_5": [1, 0, 1, 0, 1],
        "unmapped_feature_6": [0, 1, 0, 1, 0],
        "unmapped_feature_7": [1, 0, 1, 0, 1],
        "unmapped_feature_8": [0, 1, 0, 1, 0],
        "unmapped_feature_9": [1, 0, 1, 0, 1],
        "unmapped_feature_10": [0, 1, 0, 1, 0],
        "unmapped_feature_11": [1, 0, 1, 0, 1],
        "unmapped_feature_12": [0, 1, 0, 1, 0],
        "unmapped_feature_13": [1, 0, 1, 0, 1],
        "unmapped_feature_14": [0, 1, 0, 1, 0],
        "unmapped_feature_15": [1, 0, 1, 0, 1],
        "unmapped_feature_16": [0, 1, 0, 1, 0],
        "unmapped_feature_17": [1, 0, 1, 0, 1],
        "unmapped_feature_18": [0, 1, 0, 1, 0],
        "unmapped_feature_19": [1, 0, 1, 0, 1],
        "unmapped_feature_20": [0, 1, 0, 1, 0],
        "unmapped_feature_21": [1, 0, 1, 0, 1],
        "unmapped_feature_22": [0, 1, 0, 1, 0],
        "unmapped_feature_23": [1, 0, 1, 0, 1],
        "unmapped_feature_24": [0, 1, 0, 1, 0],
        "unmapped_feature_25": [1, 0, 1, 0, 1],
        "unmapped_feature_26": [0, 1, 0, 1, 0],
        "unmapped_feature_27": [1, 0, 1, 0, 1],
        "unmapped_feature_28": [0, 1, 0, 1, 0],
        "unmapped_feature_29": [1, 0, 1, 0, 1]
    })

    # Generate synthetic metadata
    unmapped_data = unmapped_info(example_df)

    # This example generates mapped metadata directly from the mock data produced by the mock_mapped function.

    # Save the DataFrame to a CSV file
    unmapped_data.to_csv('5_unmapped_data.csv', index=True)

# Second Example of Usage:
# df_unmapped = pd.read_csv('df_unmapped.csv')
# unmapped_metadata = unmapped_info(df_unmapped)
# This example demonstrates loading data from a CSV file and then applying the unmapped_info function to generate the metadata.

In [18]:
# Display the generated DataFrame
unmapped_data.head()

Unnamed: 0,chr,location
unmapped_feature_0,1,5
unmapped_feature_1,1,11
unmapped_feature_2,1,19
unmapped_feature_3,1,27
unmapped_feature_4,1,31


### Generating and Saving Synthetic SNP Data and Interactome

The following script generates synthetic SNP data and interactome information for testing and experimentation purposes. The data is then saved for downstream analysis.

In [19]:
import numpy as np
import pickle

def generate(Ps, Cs):
    np.random.seed(1)
    p1, p2 = Ps
    c1, c2 = Cs

    clust1 = np.sort(np.random.choice(c1, size=p1, replace=True))
    clust2 = np.sort(np.random.choice(c2, size=p2, replace=True))

    R = np.zeros((p1, p2))
    MAF = np.zeros((c1, c2)) + 0.05
    maf1 = np.linspace(0.0, 0.25, c1)
    maf2 = np.linspace(0.0, 0.2, c2)
    np.random.shuffle(maf1)
    np.random.shuffle(maf2)
    MAF = MAF + np.tile(maf1, (c2, 1)).T + np.tile(maf2, (c1, 1))
    MAF = np.random.uniform(0.01, 0.5, size=(c1, c2))

    for i in range(c1):
        idx_i = np.where(clust1 == i)[0]
        for j in range(c2):
            idx_j = np.where(clust2 == j)[0]
            maf = MAF[i, j]
            block = np.random.choice(
                [0, 1, 2],
                size=[len(idx_i), len(idx_j)],
                p=[(1 - maf) * (1 - maf), 2 * maf * (1 - maf), maf * maf],
            )
            R[idx_i[0] : (idx_i[-1] + 1), idx_j[0] : (idx_j[-1] + 1)] = block

    return (R.astype(int), clust1.astype(int), clust2.astype(int))

#### Explanation
__Function Definition__: ```generate```
- Purpose: Generates synthetic SNP data and cluster information.
- Parameters:
1. ```Ps```: A list containing the number of SNPs (```p1```) and phenotypes (```p2```).
2. ```Cs```: A list containing the number of clusters for SNPs (```c1```) and phenotypes (```c2```).
- Returns: a tuple containing:
1. ```R```: The generated SNP data matrix.
2. ```clust1```: Cluster assignments for SNPs.
3. ```clust2```: Cluster assignments for phenotypes.

__How It Works__
1. __Seed Initialization__: The random seed is set to 1 for reproducibility.
2. __Cluster Generation__: Clusters for SNPs and phenotypes are generated and sorted.
3. __Data Initialization__: The SNP data matrix ```R``` and Minor Allele Frequencies (MAF) matrix are initialized.
4. __MAF Calculation__: MAF values are shuffled and combined to form the final MAF matrix.
5. __Data Generation__: For each combination of clusters, a block of SNP data is generated based on the MAF values and assigned to the appropriate indices in ```R```.

__Saving the Data__
1. __SNP Data__:
- __File__: ```toydata_SNP.csv```
- __Content__: The generated SNP data matrix ```R``` with an additional column for phenotype clusters.
- __Format__: CSV with headers indicating SNP and phenotype IDs.
2. __Interactome__:
- __File__: ```example_interact_genes.csv```
- __Content__: Pairs of gene interactions.
- __Format__: CSV with pairs of interacting genes.
3. __Mapping__:
- __File__: ```example_interact_snps.pkl```
- __Content__: List of SNP mappings for each pair of interacting genes.
- __Format__: Pickle file.

#### Example Usage:

In [20]:
if __name__ == "__main__":
    R, clust1, clust2 = generate([200, 1000], [5, 20])

    # save the SNP data
    header = ["SNP" + str(i + 1) for i in range(R.shape[1])] + ["Pheno"]
    np.savetxt(
        "6_toydata_SNP.csv",
        np.hstack((R, clust1[:, None])),
        delimiter=",",
        header=",".join(header),
        comments="",
        fmt="%i",
    )

    # save the interactome
    genes = ["gene" + str(i + 1) for i in range(20)]
    interact = np.array(np.meshgrid(genes, genes)).T.reshape(-1, 2)
    np.savetxt("7_example_interact_genes.csv", interact, delimiter=",", fmt="%s")

    # save the mapping
    mapping = []
    for pair in interact:
        gene1 = int(pair[0][4:]) - 1
        gene2 = int(pair[1][4:]) - 1
        snps1 = np.where(clust2 == gene1)[0]
        snps2 = np.where(clust2 == gene2)[0]
        mapping.append(([f"SNP{i+1}" for i in snps1], [f"SNP{i+1}" for i in snps2]))

    with open("8_example_interact_snps.pkl", "wb") as mappingfile:
        pickle.dump(mapping, mappingfile)

This example demonstrates how to use the provided code to generate synthetic datasets and save them for subsequent analysis:

- __Generating the Data__: The ```generate``` function is called with specified parameters to produce synthetic SNP data and cluster information.
- __Saving SNP Data__: The generated SNP data is saved as a CSV file.
- __Saving Interactome__: Interacting genes are generated and saved as a CSV file.
- __Saving Mapping__: SNP mappings for interacting genes are saved as a pickle file.

#### Mock data functions summary:

In [21]:
import pandas as pd
import numpy as np

import isn_tractor.ibisn as it


def mock_mapped(size: tuple[int, int]) -> pd.DataFrame:
    "These data are generally gene expression-like data which are continuous."
    rows, cols = size
    data = np.random.uniform(0, 100, size=size)
    col_names = [f"mapped_feature_{i}" for i in range(cols)]
    index_names = [f"sample_{i}" for i in range(rows)]
    return pd.DataFrame(data, index=index_names, columns=col_names)


def mock_unmapped(size: tuple[int, int]) -> pd.DataFrame:
    "Unmapped data is always discrete (e.g. SNP data)."
    rows, cols = size
    data = np.random.randint(0, 3, size=size)
    col_names = [f"unmapped_feature_{i}" for i in range(cols)]
    index_names = [f"sample_{i}" for i in range(rows)]
    return pd.DataFrame(data, index=index_names, columns=col_names)


def interactions(n_rows):
    features = [f"mapped_feature_{i}" for i in range(n_rows)]
    interact = []
    for i in range(len(features)):
        other_features = features[:i] + features[i + 1 :]
        n_interact = np.random.randint(1, n_rows)
        interact_features = np.random.choice(
            other_features, size=n_interact, replace=False
        )
        for j in range(n_interact):
            interact.append((features[i], interact_features[j]))
    interact_df = pd.DataFrame(interact, columns=["feature_1", "feature_2"])

    # Remove 30% of random rows
    interact_df = interact_df.sample(frac=0.7, random_state=42)

    # Sort by index
    interact_df = interact_df.sort_index()

    return interact_df


def mapped_info(df):
    # Define column names
    column_names = ["chr", "start", "stop"]

    # Define number of chromosomes
    n_chromosomes = 23

    # Compute number of rows
    n_rows = len(df.columns)

    # Generate random values for each column
    chrs = np.repeat(np.arange(1, n_chromosomes + 1), n_rows // n_chromosomes + 1)[
        :n_rows
    ]
    starts = np.arange(1, n_rows * 10 + 1, 10)
    stops = starts + 9

    # Assign values to rows based on input df
    df_rows = []
    for i, row_name in enumerate(df.columns):
        df_rows.append([chrs[i], starts[i], stops[i]])

    # Create dataframe
    data_frame = pd.DataFrame(df_rows, columns=column_names, index=df.columns)

    return data_frame


def unmapped_info(df):
    rows = df.shape[1]
    location = [2 * i for i in range(rows)]
    chromosome = sorted([(i % 23) + 1 for i in range(rows)])
    return pd.DataFrame(
        {"chr": chromosome[:rows], "location": location}, index=df.columns
    )


if __name__ == "__main__":
    u_df = mock_unmapped((200, 1_000_000))
    m_df = mock_mapped((200, 100))
    interact = interactions(100)
    mapped_info = mapped_info(m_df)
    unmapped_info = unmapped_info(u_df)

    # interaction mapping
    interact_unmapped, interact_mapped = it.map_interaction(
        interact, mapped_info=mapped_info, unmapped_info=unmapped_info, neighborhood=20
    )

In [22]:
interact_unmapped

[(array(['unmapped_feature_0', 'unmapped_feature_1', 'unmapped_feature_2',
         'unmapped_feature_3', 'unmapped_feature_4', 'unmapped_feature_5',
         'unmapped_feature_6', 'unmapped_feature_7', 'unmapped_feature_8',
         'unmapped_feature_9', 'unmapped_feature_10', 'unmapped_feature_11',
         'unmapped_feature_12', 'unmapped_feature_13',
         'unmapped_feature_14', 'unmapped_feature_15',
         'unmapped_feature_16', 'unmapped_feature_17',
         'unmapped_feature_18', 'unmapped_feature_19',
         'unmapped_feature_20'], dtype=object),
  array(['unmapped_feature_11', 'unmapped_feature_12',
         'unmapped_feature_13', 'unmapped_feature_14',
         'unmapped_feature_15', 'unmapped_feature_16',
         'unmapped_feature_17', 'unmapped_feature_18',
         'unmapped_feature_19', 'unmapped_feature_20',
         'unmapped_feature_21', 'unmapped_feature_22',
         'unmapped_feature_23', 'unmapped_feature_24',
         'unmapped_feature_25', 'unmapped_fea

In [23]:
interact_mapped

Unnamed: 0,gene_id_1,gene_id_2
0,mapped_feature_1,mapped_feature_4
1,mapped_feature_2,mapped_feature_1
2,mapped_feature_2,mapped_feature_4
3,mapped_feature_3,mapped_feature_4
4,mapped_feature_4,mapped_feature_2
5,mapped_feature_4,mapped_feature_0
6,mapped_feature_4,mapped_feature_1


#### ISNs computation using the mock data just generated:

- dense ISNs with continuous values:

(Note: Both ```m_df``` and ```df_mapped``` are DataFrames containing mock data of the same type / structure.  We chose to compute the ISNs on the smaller DataFrame to be faster while converting the results into a ```pd.DataFrame```.
)

In [24]:
#d_isn = it.dense_isn(m_df)
d_isn = pd.DataFrame(isn.numpy() for isn in it.dense_isn(df_mapped))

- sparse ISNs with continuous values:

In [25]:
sparse_cont_isn = pd.DataFrame(isn.numpy() for isn in it.sparse_isn(
        u_df, interact_unmapped, interact_mapped, metric="pearson", pool="avg"))

- sparse ISNs with discrete values:

In [26]:
sparse_disc_isn = pd.DataFrame(isn.numpy() for isn in it.sparse_isn(
    m_df, interact_unmapped=None, interact_mapped=interact_mapped, metric="pearson"))

## ISN-tractor Main Functions & Example of Usage 

### Introduction to the ```preprocess_gtf``` Function
#### What is the ```preprocess_gtf``` Function?
The ```preprocess_gtf``` function is designed to clean and preprocess a DataFrame derived from a GTF (Gene Transfer Format) file. This function filters out unnecessary information and retains only essential columns: gene name, chromosome, start, and stop positions for each gene feature. This preprocessed DataFrame is crucial for genomic analyses, as it simplifies the data to focus on relevant gene annotations.

#### Function Definition
Here is the definition of the ```preprocess_gtf``` function:

In [27]:
def preprocess_gtf(gtf: pd.DataFrame) -> pd.DataFrame:
    """
    Ingest standardized human genome and remove unnecessary information
    such that the pre-processed DataFrame only contains gene name,
    chromosome, start, and stop for each feature.

    Parameters
    ----------
    gtf : pd.DataFrame
        A DataFrame derived from a GTF file, containing gene annotations.

    Returns
    -------
    pd.DataFrame
        A preprocessed DataFrame with columns: gene name, chromosome, start, and stop.
    """
    gtf[["ENSEMBL_ID", "B"]] = gtf[8].str.split(";", n=1, expand=True)
    gtf[["VERSION", "D"]] = gtf["B"].str.split(";", n=1, expand=True)
    gtf[["NAME", "E"]] = gtf["D"].str.split(";", n=1, expand=True)

    cols = [8, 10, 12, 14]
    gtf.drop(gtf.columns[cols], axis=1, inplace=True)
    gene_info = gtf[gtf[2].str.contains("gene")]

    gene_info[["gene", "ENSEMBL_ID"]] = gtf["ENSEMBL_ID"].str.split(" ", n=1, expand=True)
    gene_info[["ENSEMBL_ID", "X"]] = gtf["ENSEMBL_ID"].str.split('"', n=1, expand=True)
    gene_info[["ENSEMBL_ID", "X"]] = gene_info["X"].str.split('"', n=1, expand=True)

    cols = [1, 2, 5, 6, 7, 9, 10, 11, 12]
    gene_info.drop(gene_info.columns[cols], axis=1, inplace=True)
    gene_info = gene_info.rename(columns={0: "chr", 3: "start", 4: "stop"})
    gene_info = gene_info.set_index("ENSEMBL_ID")

    gene_info = gene_info[gene_info.chr != "MT"]
    gene_info = gene_info[gene_info.chr != "X"]
    gene_info = gene_info[gene_info.chr != "Y"]
    gene_info["chr"] = gene_info["chr"].astype(int)

    return gene_info

#### How It Works
1. Parameters:
- __gtf__ (```pd.DataFrame```): A DataFrame derived from a GTF file, containing various columns of gene annotations.

2. Data Preprocessing Steps:

2.1 Column Splitting:

- The function splits the annotation information in column 8 to extract ```ENSEMBL_ID```, ```VERSION```, and ```NAME```.
  
2.2 Column Dropping:

- Unnecessary columns are removed to simplify the DataFrame.
  
2.3 Gene Filtering:

- The function filters out rows that contain gene annotations.
  
2.4 Gene ID Extraction:

- The ```ENSEMBL_ID``` is further split to clean up the data.
  
2.5 Final Column Dropping and Renaming:

- Additional unnecessary columns are dropped, and remaining columns are renamed for clarity (```chr```, ```start```, ```stop```).
  
2.6 Chromosome Filtering:

- Rows corresponding to mitochondrial (MT), X, and Y chromosomes are removed, and the chromosome column is converted to integers.
  
3. Return Value:
- The function returns a preprocessed DataFrame with columns: ```gene name```, ```chr```, ```start```, and ```stop```, and sets the ```ENSEMBL_ID``` as the index.
#### Example Usage
Below is an example of how to use the ```preprocess_gtf``` function to preprocess a GTF DataFrame:

In [28]:
# importing libraries to use public data
from pathlib import Path
from random import choices
import urllib.request as req

# data upload

gtf_path = Path("Homo_sapiens.GRCh38.105.chr.gtf.gz")
if not gtf_path.exists():
    with req.urlopen(
        "https://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/Homo_sapiens.GRCh38.105.chr.gtf.gz"
    ) as response:
        gtf_path.write_bytes(response.read())

gtf = pd.read_csv(
    "Homo_sapiens.GRCh38.105.chr.gtf.gz",
    delimiter="\t",
    engine="python",
    header=None,
    compression="gzip",
    skiprows=5,
)

In [29]:
# displaying original data
gtf

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,1,ensembl_havana,gene,1211340,1214153,.,-,.,"gene_id ""ENSG00000186827""; gene_version ""11""; ..."
1,1,ensembl_havana,transcript,1211340,1214153,.,-,.,"gene_id ""ENSG00000186827""; gene_version ""11""; ..."
2,1,ensembl_havana,exon,1213983,1214153,.,-,.,"gene_id ""ENSG00000186827""; gene_version ""11""; ..."
3,1,ensembl_havana,CDS,1213983,1214127,.,-,0,"gene_id ""ENSG00000186827""; gene_version ""11""; ..."
4,1,ensembl_havana,start_codon,1214125,1214127,.,-,0,"gene_id ""ENSG00000186827""; gene_version ""11""; ..."
...,...,...,...,...,...,...,...,...,...
3235990,MT,insdc,transcript,15888,15953,.,+,.,"gene_id ""ENSG00000210195""; gene_version ""2""; t..."
3235991,MT,insdc,exon,15888,15953,.,+,.,"gene_id ""ENSG00000210195""; gene_version ""2""; t..."
3235992,MT,insdc,gene,15956,16023,.,-,.,"gene_id ""ENSG00000210196""; gene_version ""2""; g..."
3235993,MT,insdc,transcript,15956,16023,.,-,.,"gene_id ""ENSG00000210196""; gene_version ""2""; t..."


In [None]:
#calling the function
gtf_preprocessed = preprocess_gtf(gtf)

In [None]:
# displaying data processed by the 'preprocess_gtf' function
gtf_preprocessed

In [None]:
# Save the DataFrame to a CSV file
gtf_preprocessed.to_csv('gtf_preprocessed.csv', index=True)

### Introduction to the ```map_interaction``` Function
#### What is the ```map_interaction``` Function?
The ```map_interaction``` function is designed to map gene interactions to SNPs based on their positional information. It uses a specified neighborhood range to determine which SNPs are near each gene and outputs the mapped interactions.

#### Function Definition
Here is the definition of the ```map_interaction``` function:

In [None]:
import pandas as pd
from typing import List, Tuple

def map_interaction(
    interact: pd.DataFrame,
    mapped_info: pd.DataFrame,
    unmapped_info: pd.DataFrame,
    neighborhood: int = 2000,
) -> Tuple[List[Tuple[str, str]], pd.DataFrame]:
    """
    Map gene interactions to SNP interactions based on positional information.

    Parameters
    ----------
    interact : pd.DataFrame
        A DataFrame of size [n_interactions, 2] where the 2 columns are the 2 gene IDs of the interaction.
    mapped_info : pd.DataFrame
        A DataFrame of size [n_genes, 3] where the 3 columns are the chromosome, start location, and end location of every gene.
    unmapped_info : pd.DataFrame
        A DataFrame of size [n_snps, 2] where the 2 columns are the chromosome and position of every SNP.
    neighborhood : int, optional
        The neighborhood range to consider around each gene for SNP mapping, by default 2000.

    Returns
    -------
    Tuple[List[Tuple[str, str]], pd.DataFrame]
        A tuple containing:
        - A list of mapped SNP interactions.
        - A DataFrame of the original gene interactions that were successfully mapped.
    """

    mapping = positional_mapping(unmapped_info, mapped_info, neighborhood)

    interact_mapped = []
    interact_unmapped = []
    for feature_1, feature_2 in interact.to_records(index=False):
        if feature_1 in mapping and feature_2 in mapping:
            interact_mapped.append((feature_1, feature_2))
            interact_unmapped.append((mapping[feature_1], mapping[feature_2]))

    return (
        interact_unmapped,
        pd.DataFrame(interact_mapped, columns=["gene_id_1", "gene_id_2"]),
    )

#### How It Works
- Parameters:
1. ```interact```: A DataFrame containing pairs of gene IDs that interact.
2. ```mapped_info```: A DataFrame containing chromosome, start, and end locations for each gene.
3. ```unmapped_info```: A DataFrame containing chromosome and position information for each SNP.
4. ```neighborhood```: An optional integer specifying the range around each gene within which SNPs are considered neighbors (default is 2000).

- Returns a tuple containing:
- A list of tuples representing mapped SNP interactions.
- A DataFrame of the original gene interactions that were successfully mapped.

__Steps__
1. __Positional Mapping__: Uses the ```positional_mapping``` function to map SNPs to genes based on their positions and the specified neighborhood range.
2. __Iterating Through Interactions__: For each pair of interacting genes, checks if both genes are in the mapping.
3. __Recording Mappings__: If both genes are mapped, records the interaction in both mapped and unmapped formats.
4. __Returning Results__: Returns the mapped SNP interactions and the original gene interactions that were successfully mapped.
#### Example Usage
Below is an example of how to use the ```map_interaction``` function to preprocess a GTF DataFrame:

In [None]:
if __name__ == "__main__":
    # Example input DataFrames (replace with actual data)
    interact = pd.DataFrame([['gene1', 'gene2'], ['gene3', 'gene4']], columns=['gene_id_1', 'gene_id_2'])
    mapped_info = pd.DataFrame({'chr': [1, 1, 2, 2], 'start': [100, 200, 300, 400], 'stop': [150, 250, 350, 450]}, index=['gene1', 'gene2', 'gene3', 'gene4'])
    unmapped_info = pd.DataFrame({'chr': [1, 1, 2, 2], 'position': [120, 220, 320, 420]}, index=['snp1', 'snp2', 'snp3', 'snp4'])

    interact_unmapped, interact_mapped = it.map_interaction(interact, mapped_info, unmapped_info)

    # Output the results
    print("Mapped SNP Interactions:", interact_unmapped)
    print("Mapped Gene Interactions:")
    print(interact_mapped)

This example demonstrates how to use the provided code to map gene interactions to SNP interactions based on positional information and a specified neighbourhood range:

- __Generating Data__: The script generates example data for gene interactions, gene positional information, and SNP positional information.
- __Mapping Interactions__: The ```map_interaction``` function is called to map the gene interactions to SNP interactions.
- __Output__: The results are printed, showing both the mapped SNP interactions and the original gene interactions that were successfully mapped.

### Introduction to the ```dense_isn``` Function
The ```dense_isn``` function computes a dense Individual-Specific Network (ISN) based on the _Lioness_ algorithm. This method is used to infer sample-specific networks from a given dataset, which is particularly useful in genomics for understanding individual-specific interactions between genes or proteins.

The ```dense_isn``` function takes a DataFrame of gene expression data and computes a dense ISN for each sample. It leverages _PyTorch_ for efficient computation, making use of matrix operations and _JIT_ (Just-In-Time) compilation for optimization.

#### Function Definition
Here is the definition of the ```dense_isn``` function:

https://github.com/GiadaLalli/ISN-tractor/blob/1d6a19f1ecf2123fac4d912226d9d0fcbf438300/isn_tractor/ibisn.py#L512

#### How it works
1. __Parameters__:

- ```data``` (pd.DataFrame): The input DataFrame containing gene expression data. Each row represents a sample, and each column represents a gene.
- ```device``` (Optional[t.device]): The device on which to perform computations (e.g., CPU or GPU).
2. __Initial Computations__:

- ```num_samples```: A tensor representing the number of samples.
- ```orig```: Converts the input DataFrame to a PyTorch tensor.
- ```orig_transpose``` Transposes the tensor to facilitate matrix operations.
- ```dot_prod```: Computes the dot product of the transposed tensor with itself.
- ```mean_vect```: Computes the sum of the original tensor along the sample dimension.
- ```std_vect```: Computes the sum of the squares of the original tensor along the sample dimension.
- ```glob_net```: Computes the global network using the Pearson correlation coefficient.
3. __Edge Function__:

- Defined using ```t.jit.script``` for optimization.
- Computes the edge weights for the ISN using the Lioness algorithm.

4. __Yielding ISNs__:

Iterates over each sample and yields the computed ISN.
#### Example Usage
Here is an example of how to use the ```dense_isn``` function to compute dense ISNs for a given dataset:

In [None]:
# Example usage:
if __name__ == "__main__":
    # Create a sample DataFrame
    data = pd.DataFrame({
        'gene1': [1.2, 3.4, 5.6, 7.8],
        'gene2': [2.1, 4.3, 6.5, 8.7],
        'gene3': [3.1, 2.3, 4.5, 6.7],
        'gene4': [7.2, 2.4, 9.6, 1.8],
        'gene5': [9.1, 3.3, 4.5, 1.7],
        'gene6': [8.1, 1.3, 5.5, 8.7]
    })

    # Define the device (CPU or GPU)
    device = t.device('cuda' if t.cuda.is_available() else 'cpu')

    # Compute dense ISNs
    isn_generator = dense_isn(data, device)

    # Print the ISNs
    for isn in isn_generator:
        print(isn)

### Introduction to the ```sparse_isn``` Function
The ```sparse_is```n function computes a sparse Individual-Specific Network (ISN) guided by weighted edges that indicate interaction relevance. This method is particularly useful for inferring sample-specific networks from genomic data, where only a subset of interactions (based on relevance) is considered. The function uses various correlation metrics and pooling methods to compute these networks.

The ```sparse_is```n function takes a DataFrame of gene expression data along with interaction mappings and computes a sparse ISN for each sample. The function supports different metrics for correlation and various pooling methods, and it leverages PyTorch for efficient computation.

#### Function Definition
Here is the definition of the ```sparse_isn``` function:

https://github.com/GiadaLalli/ISN-tractor/blob/1d6a19f1ecf2123fac4d912226d9d0fcbf438300/isn_tractor/ibisn.py#L430

#### How it works
1. __Parameters__:

- ```data``` (pd.DataFrame): The input DataFrame containing gene expression data. Each row represents a sample, and each column represents a gene.
- ```interact_unmapped``` (Optional[np.ndarray]): An optional array of interactions that are not yet mapped.
- ```interact_mapped``` (pd.DataFrame): A DataFrame of mapped interactions.
- ```metric``` (Metric): The correlation metric to be used (e.g., "pearson", "spearman", "dot") or a custom metric function.
- ```pool``` (Optional[Pooling]): The pooling method to be used (e.g., "max", "avg", "average") or a custom pooling function.
- ```device``` (Optional[t.device]): The device on which to perform computations (e.g., CPU or GPU).
2. __Interaction Selection__:

- If ```interact_unmapped``` is provided, it is used as the interaction set. Otherwise, ```interact_mapped``` is used.
- Ensures that all specified interactions are present in the data columns.

3. __Metric and Pooling Functions__:

- If a string is provided for the metric, it selects the corresponding function  (```__pearson_metric```, ```__spearman_metric```, or ```__dot_metric```).
- If a string is provided for pooling, it selects the corresponding function (```t.max```, ```t.mean```).
- Custom functions can also be provided for both metric and pooling.

4. __Edge Function__:
- Uses the (```__make_edge_fn``` to create a function for computing edges based on the specified metric and pooling method.

5. __Compute ISNs__:
- Iterates over each interaction and computes the ISNs.
- Returns a DataFrame with the computed edges for each interaction.

#### Example Usage
Here is an example of how to use the ```sparse_isn``` function to compute dense ISNs for a given dataset:

In [None]:
import torch as t
# Example usage
if __name__ == "__main__":
    # Create a sample DataFrame
    data = pd.DataFrame({
        'gene1': [1.2, 3.4, 5.6, 7.8],
        'gene2': [2.1, 4.3, 6.5, 8.7],
        'gene3': [3.1, 2.3, 4.5, 6.7]
    })

    # Define interaction mappings
    interact_mapped = pd.DataFrame({
        'geneA': ['gene1', 'gene1', 'gene2'],
        'geneB': ['gene2', 'gene3', 'gene3']
    })

    # Define the device (CPU or GPU)
    device = t.device('cuda' if t.cuda.is_available() else 'cpu')

    # Compute sparse ISNs
    isn_generator = it.sparse_isn(data, None, interact_mapped, "pearson", "average", device)
    
    # Print the ISNs
    for isn in isn_generator:
        print(isn)