<img src="https://dinlerantunes.com/assets/images/lab-logo.png" height="200">


# Welcome to CrossDome on Jupyter!

---

The complete CrossDome source and documentation are available at https://github.com/AntunesLab/crossdome

---

***If you use crossdome in published research, please cite:***
Fonseca AF, Antunes DA. CrossDome: an interactive R package to predict cross-reactivity risk using immunopeptidomics databases. Front Immunol. 2023;14:1142573. Published 2023 Jun 12. doi:10.3389/fimmu.2023.1142573

---

**Acknowledgments**
The Jupyter Notebook was created by the Antunes lab members: Martiela Freitas and Pamella Borges.

## Instructions

This CrossDome on Jupyter was developed to facilitate the use of the tool for those with little or no knowledge of the R language. Each step is described as follows:

1. **Downloading the necessary libraries**: This Jupyter notebook was developed in R, as CrossDome was also developed in R. In this cell, you will download and install all the necessary libraries.

2. If you don't have R installed on your computer, please follow the instructions to install it: https://www.r-project.org/.

3. After you have installed R, please open R in your terminal and install the IRkernel:

<pre> install.packages('IRkernel')
    IRkernel::installspec(user = FALSE) </pre>

4. Verify you are running your Jupyter notebook with R. In the upper right corner, confirm whether the Select Kernel is R.


5. **Loading Data**: Now it's time to inform your data. Select your option and run the cell. Bellow the cell a message will be prompted with a query are to be filled.

- *Single entry*: chose this if you have a single peptide and HLA targets.
```
Please complete the allele are with the format: HLA-A*02:01
```

   Inform your allele and press Return/Enter.

   Next, CrossDome will request your peptide target:
```
Please complete the peptite in the query area:
```
Inform your peptide and press Return/Enter.


- *Multiple entry*: chose this if you have a multiple peptides and/or HLA targets.
```
Please upload the table to your drive and give the path for a CSV table formatted as indicated in the instructions:
```
The table should have the following format:

| Allele   | Peptide |
| -------- | ------- |
| HLA-B*07:01 | RPILTISTL |
| HLA-B*07:02 | RPILTISTL |

You can have as many peptide and HLA combinations as you want.
Beware that CrossDome current is based on 9mer peptide dataset only.
To properly run CrossDome on your targets a full file needs to be provided.
This table should be saved on your drive or uploaded to the content area.

If you have uploaded the table.csv to your Google Drive, your path should look as: ```/content/drive/MyDrive/path/to/table.csv```

If you have uploaded the table.csv to the Colab content area, the path should look as: ```/content/table.csv```

6. **Run CrossDome**: Run this cell to generate your results. The outputs will be:

- Allele_Peptide.csv with the peptide rank list.
- Allele_Peptide_expression_heatmap.pdf with the gene donor expression profile.
- Allele_Peptide_tissue_specificity.pdf with the bar plot summarizing the tissue-specificity groups.
- Allele_Peptide_cross_substitution.pdf with the heatmap combined with seqlogo displaying amino acid substitutions.

### 2. **Downloading the necessary libraries**

To use CrossDome in a Jupyter Notebook, you need to install the package from GitHub.  
The package is available at: https://github.com/AntunesLab/crossdome  
Run the R code cell below to install and load CrossDome.

In [28]:
# Load devtools if it's not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools") }

# Install CrossDome if it is not already installed or needs an update
if (!("crossdome" %in% rownames(installed.packages()))) {
  devtools::install_github("antuneslab/crossdome", build_vignettes = TRUE) } 

library(crossdome)

Beware that CrossDome only works for 9 mers.

CrossDome on Google Colab can work with single or multiple entries.

If you choose to run CrossDome using a table as input (multiple entries option), please use the following format

| Allele   |    Peptide   |
| -------- | ------- |
| HLA-B*07:01  | RPILTISTL   |
| HLA-B*07:02  | RPILTISTL    |

## 5. **Loading Data**
Please select the type of entry data you would like to work on:

For single entry, please complete the allele area with the format HLA-A*02:01 and the peptide in the query area

For multiple entries, please provide the path to a **CSV table** formatted as indicated previously.

In [76]:
option <- readline(prompt = "Choose an option ('Single' or 'Multiple') entry: ")

if (option == "Single") {
  # Prompt user for allele and peptide input
  allele <- readline(prompt = "Please complete the allele with the format: HLA-A*02:01\n")
  query <- readline(prompt = "Please complete the peptide in the query area:\n")
  table <- ""
} else if (option == "Multiple"){
  # Prompt user for CSV file path
  table <- readline(prompt = "Please provide the name to a CSV file formatted as indicated:\n")
  allele <- ""
  query <- ""
}


Choose an option ('Single' or 'Multiple') entry:  Multiple
Please provide the name to a CSV file formatted as indicated:
 Test.csv


## 6. **Run CrossDome**

In [78]:
if (option == "Single") {
    # Create a unique identifier for files
    file_identifier <- paste(allele, query, sep="_")
    file_identifier <- gsub("[:*]", "", file_identifier) # Remove special characters that may not be valid in file names

    # Set up the database with specific allele, withOUT off_target
    database <- cross_background(allele = allele)
    database

    # Compose result using the query value
    result <- cross_compose(query = query, background = database)

    # View result - use print if running in a non-interactive environment
    #print(result@result)

    # Save each result to a separate CSV file
    result_file_name <- paste0(file_identifier, ".csv")
    write.csv(result@result, file = result_file_name, row.names = FALSE)
    print(paste0("Your results were saved as: ", file_identifier, ".csv"))

    # Optionally, collect all results in a list for further processing
    #results_list[[file_identifier]] <- result@result

    # Extracting the gene donor mRNA expression based on CR candidates
    expression_matrix <- cross_expression_matrix(result, pvalue_threshold = 0.005)
    #expression_matrix

    # The heatmap presenting the gene donor expression profile
    pdf(paste0(file_identifier, "_expression_heatmap.pdf"), width = 20, height = 12)
    par(mar = c(3, 3, 2, 2), oma = c(0, 0, 0, 0), omi = c(0, 0, 0, 0))
    print(cross_expression_plot(object = expression_matrix))
    dev.off()
    print(paste0("Your heatmap presenting the gene donor expression profile was saved as: ",
                file_identifier, "_expression_heatmap.pdf"))
    
    # Summarizing tissue specificity across candidates
    
    # Bar plot summarizing the tissue-specificity groups
    pdf(paste0(file_identifier, "_tissue_specificity.pdf"), width = 20, height = 12)
    print(cross_tissues_plot(object = expression_matrix))
    dev.off()
    print(paste0("Your Bar plot summarizing the tissue-specificity groups was saved as: ",
                file_identifier, "_tissue_specificity.pdf"))

    # Displaying peptides composition across best-score candidates
    
    # Calculate position-specific substitution across cross-reactive candidates
    cross_result <- cross_substitution_matrix(result)
    #cross_result

    # Heatmap combined with seqlogo displaying amino acid substitutions
    pdf(paste0(file_identifier, "_cross_substitution.pdf"), width = 20, height = 12)
    print(cross_substitution_plot(object = cross_result))
    dev.off()
    print(paste0("Your Heatmap combined with seqlogo displaying amino acid substitutions was saved as: ",
                file_identifier, "cross_substitution.pdf"))

} else if (option == "Multiple") {

    # Load the data from CSV file
    df <- read.csv(table)
    #print(colnames(df))

    # Using data from CSV file to set variables
    for (i in 1:nrow(df)) {
      allele_value <- df[i,1]
      query_value <- df[i, 2]

      # Create a unique identifier for files
      file_identifier <- paste(allele_value, query_value, sep="_")
      file_identifier <- gsub("[:*]", "", file_identifier) # Remove special characters that may not be valid in file names

      # Set up the database with specific allele, with off_target
      # database <- cross_background(off_targets = off_target_value, allele = allele_value)

      # Set up the database with specific allele, withOUT off_target
      database <- cross_background(allele = allele_value)
      database

      # Compose result using the query value
      result <- cross_compose(query = query_value, background = database)
      result

      # View result - use print if running in a non-interactive environment
      #print(result@result)

      # Save each result to a separate CSV file
      result_file_name <- paste0(file_identifier, ".csv")
      write.csv(result@result, file = result_file_name, row.names = FALSE)
      print(paste0("Your results were saved as: ", file_identifier, ".csv"))

      # Optionally, collect all results in a list for further processing
      #results_list[[file_identifier]] <- result@result

      # Extracting the gene donor mRNA expression based on CR candidates
      expression_matrix <- cross_expression_matrix(result, pvalue_threshold = 0.005)
      expression_matrix

      # The heatmap presenting the gene donor expression profile
      pdf(paste0(file_identifier, "_expression_heatmap.pdf"))
      print(cross_expression_plot(object = expression_matrix))
      dev.off()
      print(paste0("Your heatmap presenting the gene donor expression profile was saved as: ",
                file_identifier, "_expression_heatmap.pdf"))

      ## Summarizing tissue specificity across candidates

      # Bar plot summarizing the tissue-specificity groups
      pdf(paste0(file_identifier, "_tissue_specificity.pdf"))
      print(cross_tissues_plot(object = expression_matrix))
      dev.off()
      print(paste0("Your Bar plot summarizing the tissue-specificity groups was saved as: ",
                file_identifier, "_tissue_specificity.pdf"))

      ## Displaying peptides composition across best-score candidates

      # Calculate position-specific substitution across cross-reactive candidates
      cross_result <- cross_substitution_matrix(result)
      cross_result

      # Heatmap combined with seqlogo displaying amino acid substitutions
      pdf(paste0(file_identifier, "_cross_substitution.pdf"))
      print(cross_substitution_plot(object = cross_result))
      dev.off()
      print(paste0("Your Heatmap combined with seqlogo displaying amino acid substitutions was saved as: ",
                file_identifier, "cross_substitution.pdf"))
    }
} else {
    print("Invalid option selected. Please choose 'Single entry' or 'Multiple entries'.")
}

“incomplete final line found by readTableHeader on 'Test.csv'”


##------ Mon Jun 30 10:47:33 2025 ------##
[1] "Your results were saved as: HLA-A2301_EYLDDRNIF.csv"


“Found 21 peptides out of 25. Unmapped peptides: QYLDAYNMM,AYIDNYNKF,WFITQRNFF,AYFDNYNKF”


[1] "Your heatmap presenting the gene donor expression profile was saved as: HLA-A2301_EYLDDRNIF_expression_heatmap.pdf"
[1] "Your Bar plot summarizing the tissue-specificity groups was saved as: HLA-A2301_EYLDDRNIF_tissue_specificity.pdf"


“[1m[22mNo shared levels found between `names(values)` of the manual scale and the
data's [32mcolour[39m values.”


[1] "Your Heatmap combined with seqlogo displaying amino acid substitutions was saved as: HLA-A2301_EYLDDRNIFcross_substitution.pdf"
##------ Mon Jun 30 10:47:36 2025 ------##
[1] "Your results were saved as: HLA-B0702_RPILTISTL.csv"


“Found 191 peptides out of 213. Unmapped peptides: RPSAFASTL,RPAISFTTV,RPIVSTQLL,RPEGSVSTL,RPASGYSTL,RPVYVALTL,RPKGSFSTL,RPLLIEGTA,LPSLQYSTL,KPLFTLQSL,RPNITSTAL,HPVLVTATL,RPNLHSASL,RPFNNILNL,RPVMFVSRV,RPTITNNLF,KPKLKVATL,MPNASFSTL,RPADSITYL,SPTLNVSAL,RPMLARLTV,RPLLLLRLL”


[1] "Your heatmap presenting the gene donor expression profile was saved as: HLA-B0702_RPILTISTL_expression_heatmap.pdf"
[1] "Your Bar plot summarizing the tissue-specificity groups was saved as: HLA-B0702_RPILTISTL_tissue_specificity.pdf"


“[1m[22mNo shared levels found between `names(values)` of the manual scale and the
data's [32mcolour[39m values.”


[1] "Your Heatmap combined with seqlogo displaying amino acid substitutions was saved as: HLA-B0702_RPILTISTLcross_substitution.pdf"
##------ Mon Jun 30 10:47:40 2025 ------##
[1] "Your results were saved as: HLA-A0201_ALNNMFCQL.csv"


“Found 955 peptides out of 1071. Unmapped peptides: ALNTLVKQL,ALNHLVLSL,ALQNVMISI,ALHQCFTEL,SLFNMVATL,ALYDVVSKL,QLNKDIIQL,SLNSMYTRL,ALKNSQAEL,ALSGVFCGV,ALSALLTKL,ALNDALWAV,SLFNTVCTL,KLNKMTVEL,FLNQANCKI,ILNAMIAKI,KLSSFFQSV,ALNQFTKVL,SLNNTVATL,LLSNTLAEL,AIIDILQQL,ALQAIELQL,ALNIALVAV,VLNSLASLL,QLANAIFKL,DLNQAVNNL,VLNDQYAKV,VLNSVASLL,LLAKMLFYL,ALNIALIAV,ALYNTVATL,TTNNLLEQL,ALDAQAVEL,ALYDVVSTL,KINEMVDEL,AASPMLYQL,ALSLIIVSV,ILNSLFERL,SLNQTVHSL,AIIRMLQQL,SLHNAVAVL,SLYNAVVTL,ATAQMALQL,ILQKEISQL,LLNNYDVLV,ALFNTVATL,NLNQVIQSV,LLNNSLGSV,LLLQQQQQL,ALYNTAAAL,QLAFTYCQV,ALDHYDCLI,LVLNMVYSI,ALLFFIVAL,ALNDSVKTV,SLYNVVATL,ALTNAQFFV,SLAALFYSL,SLFNAVVTL,VLLTYFCFV,SLFNTFATL,IILNKIVQL,VISNSVAQA,ALDAYHASL,SLYNAIATL,DLNKVIQFL,LIDFYLCFL,SLSNTVATL,SLDSLVHLL,VLANFCSAL,ILSELLSNL,LMNNAFEWI,DLENNLVKL,SLFNLVATL,IVNSVLLFL,AIHNVVHAI,SLYNAVATL,KMNNFIEKV,SLYNTISVL,ILSNYVKTL,ALWNLHGQA,VLVVMACLV,LAKSVFNSL,SLCSCICTV,LLQEMVEYV,VLNIVLFIL,LMSSIVHQV,TINNLKMML,SLFNVVATL,AMNNGMEDL,ALFHEVAKL,ALIRILQQL,ALIKQVAYL,ALSSSLGNV,KLYEELCD

[1] "Your heatmap presenting the gene donor expression profile was saved as: HLA-A0201_ALNNMFCQL_expression_heatmap.pdf"
[1] "Your Bar plot summarizing the tissue-specificity groups was saved as: HLA-A0201_ALNNMFCQL_tissue_specificity.pdf"


“[1m[22mNo shared levels found between `names(values)` of the manual scale and the
data's [32mcolour[39m values.”


[1] "Your Heatmap combined with seqlogo displaying amino acid substitutions was saved as: HLA-A0201_ALNNMFCQLcross_substitution.pdf"
