# Configure Cluster Module Params

This notebook should be used as a test for ensuring correct cluster parameters before cluster processing.
Cells marked with `SET PARAMETERS` contain crucial variables that need to be set according to your specific experimental setup and data organization.
Please review and modify these variables as needed before proceeding with the analysis.

## SET PARAMETERS

### Fixed parameters for cluster processing

- `CONFIG_FILE_PATH`: Path to a Brieflow config file used during processing. Absolute or relative to where workflows are run from.

In [1]:
CONFIG_FILE_PATH = "config/config.yml"

In [2]:
from pathlib import Path

import yaml
import pandas as pd

from lib.shared.configuration_utils import CONFIG_FILE_HEADER

In [3]:
# load config file and determine root path
with open(CONFIG_FILE_PATH, "r") as config_file:
    config = yaml.safe_load(config_file)
    ROOT_FP = Path(config["all"]["root_fp"])

In [4]:
MIN_CELL_CUTOFFS = {"mitotic": 0, "interphase": 3, "all": 3}

# Analysis parameters
CORRELATION_THRESHOLD = 0.99
VARIANCE_THRESHOLD = 0.001
MIN_UNIQUE_VALUES = 5
LEIDEN_RESOLUTION = 5.0

In [6]:
UNIPROT_DATA_FP = "config/uniprot_data.tsv"

# TODO: create uniprot data with API
uniprot_data = pd.read_csv("/lab/barcheese01/screens/denali-etna-fuji/cluster_5/databases/uniprot_complete_data.csv")
uniprot_data.to_csv(UNIPROT_DATA_FP, sep="\t", index=False)

uniprot_data

Unnamed: 0,Gene Names,Function [CC],KEGG,ComplexPortal,STRING
0,MT-RNR1,FUNCTION: Regulates insulin sensitivity and me...,,,
1,CIROP LMLN2,FUNCTION: Putative metalloproteinase that play...,,,
2,BLTP3B KIAA0701 SHIP164 UHRF1BP1L,FUNCTION: Tube-forming lipid transport protein...,hsa:23074;,,9606.ENSP00000279907;
3,POTEB3,,hsa:102724631;,,9606.ENSP00000483103;
4,CLRN2,FUNCTION: Plays a key role to hearing function...,hsa:645104;,,9606.ENSP00000424711;
...,...,...,...,...,...
20456,LINC00597 C15orf5,,,,
20457,PRO3102,,,,
20458,PRO2829,,,,
20459,,,,,


## Add cluster parameters to config file

In [8]:
# Add cluster_process section
config["cluster_process"] = {
    "min_cell_cutoffs": MIN_CELL_CUTOFFS,
    "correlation_threshold": CORRELATION_THRESHOLD,
    "variance_threshold": VARIANCE_THRESHOLD,
    "min_unique_values": MIN_UNIQUE_VALUES,
    "leiden_resolution": LEIDEN_RESOLUTION,
    "uniprot_data_fp": UNIPROT_DATA_FP
}

# Write the updated configuration
with open(CONFIG_FILE_PATH, "w") as config_file:
    # Write the introductory comments
    config_file.write(CONFIG_FILE_HEADER)

    # Dump the updated YAML structure, keeping markdown comments for sections
    yaml.dump(config, config_file, default_flow_style=False)