# Instructions: how to use scDiffusion

# Data

- Download the muris dataset (muris.zip) from: https://figshare.com/s/49b29cb24b27ec8b6d72. 
- Download the SCimilarity weights (annotation_model_v1.tar.gz) from: https://zenodo.org/records/8286452

## Training the models: train the VAE, the diffusion model and the classifier

Given an AnnData file you wish to analyze, named `anndata.h5ad`, please follow these steps in your command line to create the necessary input data:

1. **Train VAE**:
   - Prior to training, create the folder `path/to/saved_VAE_model`.
   - In the terminal, navigate to the `VAE` directory:
     ```bash
     cd VAE
     ```
   - Run the following command to train the Autoencoder:
     ```bash
     echo "Training Autoencoder, this might take a long time" 
     CUDA_VISIBLE_DEVICES=0 python VAE_train.py --data_dir 'path/to/anndata.h5ad' --num_genes 18996 --save_dir 'path/to/saved_VAE_model' --max_steps 200000
     echo "Training Autoencoder done"
     ```
  - **Example**
      ```python 
        cd VAE
        echo "Training Autoencoder, this might take a long time"
        CUDA_VISIBLE_DEVICES=0 python VAE_train.py --data_dir '/workspace/projects/001_scDiffusion/data/data_in/tabula_muris/all.h5ad' --num_genes 18996 --state_dict "/workspace/projects/001_scDiffusion/scripts/scDiffusion/annotation_model_v1" --save_dir '../checkpoint/AE/my_VAE' --max_steps 200000 --max_minutes 15
        echo "Training Autoencoder done"
      ```
      Where the muris data was unzipped at:
      ```python
      '/workspace/projects/001_scDiffusion/data/data_in/tabula_muris/all.h5ad'
      ```
      And the SCimilarity weights were downloaded and unzipped at:
      ```python
      '/workspace/projects/001_scDiffusion/scripts/scDiffusion/annotation_model_v1'
      ```
    




2. **Train the Diffusion Model**:
   - Prior to training, create the folder `path/to/saved_diffusion_model`.
   - In the terminal, navigate back to the root directory:
     ```bash
     cd ..
     ```
   - Run the following command to train the diffusion backbone:
     ```bash
     echo "Training diffusion backbone"
     CUDA_VISIBLE_DEVICES=0 python cell_train.py --data_dir 'path/to/anndata.h5ad'  --vae_path 'path/to/saved_VAE_model/VAE_checkpoint.pt' \
         --save_dir 'path/to/saved_diffusion_model' --model_name 'my_diffusion' --save_interval 20000
     echo "Training diffusion backbone done"
     ```

3. **Train the Classifier**:
   - Prior to training, create the folder `path/to/saved_classifier_model`.
   - Run the following command to train the classifier:
     ```bash
     echo "Training classifier"
     CUDA_VISIBLE_DEVICES=0 python classifier_train.py --data_dir 'path/to/anndata.h5ad' --model_path "path/to/saved_classifier_model" \
         --iterations 40000 --vae_path 'path/to/saved_VAE_model/VAE_checkpoint.pt'
     echo "Training classifier, done"
     ```

## Generate new samples 

Once the models are trained, you need to create the `.npz` files which will serve as input for the following steps. To do this:

1. **Unconditional Sampling (Generate Data from Diffusion Model)**:
   - Prior to sampling, create the folder `path/to/saved_unconditional_sampling`.
   - Run the following command to perform unconditional sampling:
     ```bash
     # Unconditional sampling
     python cell_sample.py --model_path "path/to/saved_diffusion_model/checkpoint.pt" --sample_dir "path/to/saved_unconditional_sampling"
     ```

2. **Conditional Sampling (Generate Data from Classifier Model)**:
   - Prior to sampling, create the folder `path/to/saved_conditional_sampling`.
   - You also need to modify the `main()` function in `classifier_sample.py` to create the samples based on your specified condition. Here, one needs to modify the ```__main__()``` function to fit the desired purpose. A description is given hereafter. 
   - Run the following command to perform conditional sampling:
     ```bash
     # Conditional sampling 
     python classifier_sample.py --model_path "path/to/saved_diffusion_model/checkpoint.pt" --classifier_path "path/to/saved_classifier_model/checkpoint.pt" --sample_dir "path/to/saved_conditional_sampling"
     ```

Ensure that you replace `'path/to/anndata.h5ad'`, `'path/to/saved_VAE_model'`, `'path/to/saved_diffusion_model'`, and `'path/to/saved_classifier_model'` with the actual paths in your system. Additionally, make sure to adjust any other parameters according to your specific setup and requirements.

3. **Modify the main function of classifier_sample.py**
We need to modify the main function of classifier_sample.py to generate the desired samples. 
 - Generate adatas corresponding to interpolation between two cell states. 
      ```python
      # in classifier_sample.py
      if __name__ == "__main__":
         # For Gradient Interpolation, run:
         # Range stating the number of samples to generate
         for i in range(0,11): 
            # Create 11 samples between cell type 6 and cell type 7 of the provided anndata
               main(cell_type=[6,7], inter=True, weight=[10-i,i])
      ```