Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability of reference Seurat objects #1

Open
SirKuikka opened this issue May 31, 2023 · 34 comments
Open

Availability of reference Seurat objects #1

SirKuikka opened this issue May 31, 2023 · 34 comments

Comments

@SirKuikka
Copy link

Hi,

Is the reference Seurat object available somewhere? I would like to run TCellMap.R.

refSeuratObj <- readRDS(opt$referenceData)

@dustincys
Copy link

Hi @SirKuikka

I apologize for any inconvenience this may cause you.

I am writing to inform you of our current efforts to upload the processed data object to the Gene Expression Omnibus (GEO) database. Our objective is to provide easy access to the data and enable wider dissemination of our findings.

Further to this, we are developing a new R package "MultipleMap" which offers the ability to map to a reference while accounting for batch effects. This package will be made available to the scientific community and will be an important resource for researchers working with gene expression data.

Thank you for your understanding and support.

@Merguerrero
Copy link

Glad to hear (read) that!

Thank you for your efforts. Looking forward to test the R package and use your reference.

@ccruizm
Copy link

ccruizm commented Jun 11, 2023

Hello @dustincys

Do you have an estimated date when the Seurat object (or count matrices+cell annotation) will be available in GEO? Also, patiently waiting to use your nice dataset for reference mapping

Thanks in advance!

@nuneznicolas
Copy link

Hi @dustincys,

First, I would like to express my sincere congratulations on your outstanding paper. Bravo!!

I am particularly interested in the count matrix data mentioned in your paper, as they hold great potential for further analysis and exploration. I wanted to kindly inquire about the availability of the data, specifically the .RDS files (CD8_, CD4_, Innate_ etc and seurat objet) associated with your research. I understand that you are currently in the process of uploading the data to the Gene Expression Omnibus (GEO) database. I was wondering if you have an estimated timeline for when the data will be accessible to the scientific community.

Thank you for your understanding and for your efforts in sharing your research.

Warm regards,

Nicolás

@dustincys
Copy link

Hi @nuneznicolas @ccruizm ,

I hope this email finds you well. I am writing to express my
appreciation for your support and to provide an update on my current
progress.

I am currently attempting to upload the processed data to the GEO
repository, specifically GSE222859. However, this task has proved
challenging due to the inclusion of samples from other publicly
available GEO repos. I am doing my best to resolve this issue, but if I
cannot successfully upload it to the public data repository, I will
explore alternative options such as uploading to GitHub or another
suitable platform.

Best regards,

@ccruizm
Copy link

ccruizm commented Jun 14, 2023

Thanks @dustincys for the update! Some good alternatives are Zenodo or CellXGene (for the last one you'll need to do more formatting so I would suggest the first one better). 😉

@nuneznicolas
Copy link

nuneznicolas commented Jun 14, 2023

Thanks @dustincys! I agree with @ccruizm.

Best and thanks a lot!

Nicolás

@ccruizm
Copy link

ccruizm commented Jun 22, 2023

Hey @dustincys,

I hope you're doing well. I am following up on the data availability. I understand that you're working hard to make it accessible to the scientific community, and I appreciate your efforts in that regard.

I was wondering if you have any updates on when the data will be available for us to access.

Thank you once again. I'm eagerly awaiting the availability of the data.

@dustincys
Copy link

Hello All,

I wanted to provide an update regarding the data sharing process. After careful consideration, we have decided to provide a download link through our website. This decision was made to ensure that we comply with the MD Anderson Cancer Center's data security policy. We encountered some difficulties for uploading to GEO, in collecting all sample details, as the data also contains other GEO repositories.

At this time, the data is still undergoing a data security check. As soon as this process is complete, we will provide the download link through our website. Please be aware that due to regulations set out in the MDACC's data security code, I am unable to send the data personally in private.

Thank you for your understanding and patience as we work to ensure the protection of the data.

Best,

@dustincys
Copy link

Hello everyone,

Currently, the data is undergoing a thorough security check to ensure its safety. Once the data has been fully vetted and deemed secure, the download link will be made available at the bottom of the overview page (as shown in the image). We understand that this delay may be frustrating, but please know that we are taking every precaution necessary. Thank you for your understanding and patience. If you have any further questions or concerns, please do not hesitate to reach out.

Best regards,

image

@nuneznicolas
Copy link

Thanks a lot @dustincys !!

Best

Nicolás

@ccruizm
Copy link

ccruizm commented Jul 12, 2023

Hello @dustincys,

Do you have any update on when the dataset will be released? I check daily and still do not see it on the website you mentioned. 😅

Thanks in advance!

@nuneznicolas
Copy link

Dear @dustincys,
Do you have any news about the data?
Thanks in advance

Nicolás

@dustincys
Copy link

Hello @ccruizm @nuneznicolas

The data will be online very soon.

Over the past few weeks, we have held a number of meetings to address any concerns and ensure that the data we are working with does not contain any patient information. I am pleased to inform you that these meetings have been successful in clarifying this aspect. Currently, we are in the final stages of the data security check, and it is nearing completion. Once all the necessary measures are in place, we will be ready to make the data available online.

Kind regards,

@ZhihaoAlex
Copy link

Dear @dustincys

Do you have any news about the data?
We can't wait to experience your newly developed tools.

Thanks in advance

@dustincys
Copy link

dustincys commented Aug 7, 2023

@ZhihaoAlex

Hi Alex,

I apologize for any confusion, but I have received an update from Rsch Info Sys of MDACC regarding the availability of the data. It seems that there is a freeze on any changes for the rest of the month due to end of year IS schedules. While the change request will be reviewed in August, the formal implementation of the requested changes will not be possible until 9/7/23 at the earliest.

Sincerely,

image

@dustincys
Copy link

Hi All,

I wanted to inform you that the SCRP update has been approved after we submitted an emergency ticket. I am pleased to let you know that the data is now available online.

Best

@Chris-Cherry
Copy link

Hi @dustincys,

Thank you so much for uploading the processed CD4 and CD8 data. Is there any chance it's possible to also get the GD data? It would be wonderful!

Cheers and thanks again,

Chris

@dustincys
Copy link

Hi Chris,

Thank you for reaching out to us. We appreciate your interest in our
study and are happy to provide the data you have requested.
However, I wanted to inform you that releasing certain data from our
study is subject to certain restrictions.

Specifically, we have several unpublished datasets from our
collaborators, and we have been asked to release this information
(including barcode and patient ID) only after their manuscripts have
been accepted. This is to ensure proper attribution and adherence to
academic norms.

Regarding the public data set of GD cells, we are happy to share the expression matrix.
In the meanwhile, I have to inform you that I am unable
to personally send the data to you in a private manner. Our institution,
MD Anderson Cancer Center, strictly regulates data security and privacy.
Any violation of these regulations could result in serious consequences,
including job suspension for both myself and my supervisor.

In addition, please note that the process of conducting a data security
check with regard to the patient ID or barcode may take considerable
time. For more detailed information on this, you can visit the following
link: #1 (comment)

Thank you for your understanding and patience.

Best

@Chris-Cherry
Copy link

Hey @dustincys - thank you so much for the response and my apologies for the delayed response. We totally understand the requirements and would appreciate any GD data that could be shared through the appropriate channels. If there's anything I can do to assist or expedite, please let me know!

@LQLe2
Copy link

LQLe2 commented Jan 14, 2024

hi,
How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article?
thanks!

@SirKuikka
Copy link
Author

hi, How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article? thanks!

This is actually something I was wondering as well.

@dustincys
Copy link

hi, How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article? thanks!

Yes

@SirKuikka
Copy link
Author

SirKuikka commented Feb 19, 2024

Hi @dustincys

And what about these two scripts?

TCellMap.R
TCellMap2.R

Which one should we use?

In TCellMap.R there are these two files that I can't find:

cellCycleGeneT1 <- read_tsv("/rsrch3/scratch/genomic_med/ychu2/projects/p1review/R3Q7/knowledge/public/database/general/cell-cy
cle-gene-list.txt")
cellCycleGeneT2 <- read_tsv("/rsrch3/scratch/genomic_med/ychu2/projects/p1review/R3Q7/knowledge/public/database/general/regev_l
ab_cell_cycle_genes.txt")

@dustincys
Copy link

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can
be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In
particular, you may find this package useful for addressing some of the
challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the
following link:
https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these
genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar.
TCellMap2.R, however, utilizes each batch to map the query, but some of the
mapping results may not be optimal. In such cases, the MultiMap package could
potentially provide better batch mapping results.

Best regards,

@SirKuikka
Copy link
Author

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In particular, you may find this package useful for addressing some of the challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar. TCellMap2.R, however, utilizes each batch to map the query, but some of the mapping results may not be optimal. In such cases, the MultiMap package could potentially provide better batch mapping results.

Best regards,

Yes, sorry. I don't know why I forgot Multimap. Thanks!

@SirKuikka
Copy link
Author

SirKuikka commented Feb 19, 2024

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In particular, you may find this package useful for addressing some of the challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar. TCellMap2.R, however, utilizes each batch to map the query, but some of the mapping results may not be optimal. In such cases, the MultiMap package could potentially provide better batch mapping results.

Best regards,

Does it matter how the query data are normalized? Is LogNormalize ok?

In my case Multimap predicted allmost all of the query CD4+ T cells as "CD4_c5_Tctl" cells. I don't think this worked

@dustincys
Copy link

Hi Siuiri,

I hope this message finds you well. As you may know, Seurat suggests utilizing SCTransform
because it helps address the characteristics of the data, particularly in cases
where 10x counts are more in line with a zero-inflated non-negative binomial
distribution.

allmost all of the query CD4+ T cells as "CD4_c5_Tctl" cells.

I have to admit that MultiMap is no perfect.
I also suggest that using the DEGs to double confirms the mapping results.

Best regards,

@Conghui2023
Copy link

Hi @dustincys,

Is the reference Seurat object available now? where can I find it? And how can I install the MultipleMap package?

All the best

@dustincys
Copy link

Hi @Conghui2023 ,

At the bottom of this page https://singlecell.mdanderson.org/TCM/
you may find the seurat object with md5 code.

For MultiMap package, it is a R package, you could install it like this

library(devtools)
install_github("WangLab-ComputationalBiology/MultiMap")

Best

@Conghui2023
Copy link

Conghui2023 commented Jul 18, 2024 via email

@Conghui2023
Copy link

Conghui2023 commented Jul 18, 2024 via email

@dustincys
Copy link

Hi Conghui2023,

You can access the code template at the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R

Best regards,

Dear Dustins, Sorry to bother you again, when I run the 'TCellMap.R'
using the 'cd8.rds' file that you provided, I encounter many bugs,
although I have fixed some of them, it still doesn't work, could you
please update this scripts? All the best, Conghui 2024年7月17日
23:48,Dustin ()./*> 写道: Du får ikke ofte mails fra
*
()*.*/ Få mere at vide om, hvorfor dette er
vigtigthttps://aka.ms/LearnAboutSenderIdentification Hi
(a)> , At the bottom of
this page
https://singlecell.mdanderson.org/TCM/https://singlecell.mdanderson.org/TCM/
you may find the seurat object with md5 code. For MultiMap package, it
is a R package, you could install it like this library(devtools)
installgithub("WangLab-ComputationalBiology/MultiMap") Best — Reply
to this email directly, view it on
GitHub<#1
(comment)
>, or
unsubscribehttps://github.com/notifications/unsubscribe-auth/BCBSD7NKBXBMTEF77K4IPMLZM2G4ZAVCNFSM6AAAAAAYV4FC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTGYZTSOJYGE.
You are receiving this because you were mentioned.Message ID:
/*()*./**>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants