Seurat and genefu #22

ksaunders73 · 2021-12-03T20:04:21Z

Hello!

Thank you for the excellent package! I would like to use genefu's molecular.subtyping() function (using the pam.50.robust model) on my Seurat object, and was wondering whether the Seurat object should be

only normalized beforehand with NormalizeData()
additionally scaled after normalization using ScaleData()

Thank you for reading!

ChristopherEeles · 2021-12-03T23:28:31Z

Hi @ksaunders73,

This is not a straight forward question to answer.

All of the cluster centroids in the genefu package were derived from RNA microarray data of their respective publications. Because the units of a microarray (fluorescence intensity or intensity ratio) are different from those of RNA sequencing (counts or FPKM or TPM), it is not clear-cut deciding how your counts/FPKM/TPM values should be processed to be comparable with the array based cluster centroids.

I recommend reading the PAM50 subtype paper, specifically the Methods section:

van ’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., & Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), 530–536. https://doi.org/10.1038/415530a

My understanding is that they used log2 transformed expression ratios to conduct their clustering analysis. Therefore the centroids of their clusters will also be indicated in these units. From the supplementary Methods section for the aforementioned paper, the expression ratios were calculated as:

the logarithmic transcriptional expression level measured relative to a baseline condition

I was unable to find the definition of the baseline condition in the paper, maybe you can find it? Without knowing what the baseline for the expression ratios were it is hard to say how to make an analogous metric from counts/TPM.

My instinct would be to divide the TPM by the average or median for each gene across your patient cohort, but whether this is scientifically valid or not is a call you will need to make. It is possible they used a normal sample for their baseline.

Once you decide on how to get a log expression ratio from your Seurat data, you should apply the genefu::rescale function to the expression matrix since this is what has been done for the pam50.robust cluster centroids. It is also worth noting that the molecular.subtyping function always uses the robust variant of the cluster centroid data.

Information about different centroids can be found in the genefu package help, e.g. using ?pam50. This will include a reference to the publication from which the centroid data was retrieved.

Given that this package was designed for classifying data from Affymetrix microarrays, I am not sure it is optimal to adapt it for use on RNA sequencing data. You may want to consider an RNA seq based clustering algorithm due to the above technical considerations.

Hopefully that helps.

Best,
Christopher Eeles
Software Developer
BHK Lab | PM-Research | UHN

ksaunders73 · 2021-12-08T12:41:50Z

Thank you very much @ChristopherEeles!

ChristopherEeles · 2021-12-14T17:35:57Z

Hi @ksaunders73,

I am going to close this issue. If you have further questions feel free to re-open this thread or file a new issue.

Best,
Christopher Eeles
Software Developer
BHK Lab | PM-Research | UHN

zhangjl-work · 2022-05-26T03:22:29Z

Excuse me, how to use single-cell data for PAM50 analysis, what does the input expression matrix look like, and which normalization method should be used?

zhangjl-work · 2022-05-26T06:12:45Z

你好！

感谢您提供的优质包裹！我想在我的 Seurat 对象上使用genefu 的molecular.subtyping() 函数（使用pam.50.robust 模型），并且想知道Seurat 对象是否应该是

仅使用NormalizeData()预先标准化

使用ScaleData()标准化后额外缩放

感谢您的阅读！

Excuse me, how to use single-cell data for PAM50 analysis, what does the input expression matrix look like, and which normalization method should be used?

ChristopherEeles · 2022-06-23T05:07:23Z

It has come to my attention that the paper I cited above is not the original PAM50 publication. However, the discussion still applies.

ChristopherEeles mentioned this issue Dec 3, 2021

ggi() and gene70() commands input files #18

Closed

ChristopherEeles closed this as completed Dec 14, 2021

ChristopherEeles self-assigned this Dec 14, 2021

ChristopherEeles added the question label Dec 14, 2021

ChristopherEeles mentioned this issue Jan 26, 2022

How to use an in-house-dataset to evaluate centroids for PAM50 method #24

Closed

ChristopherEeles mentioned this issue Apr 6, 2022

Can this package be applied to RNA-seq data? #26

Closed

ChristopherEeles mentioned this issue May 30, 2022

Centroids of PAM50 and Different result for TCGA-BRCA RNA-seq data #27

Closed

ChristopherEeles pinned this issue Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seurat and genefu #22

Seurat and genefu #22

ksaunders73 commented Dec 3, 2021

ChristopherEeles commented Dec 3, 2021

ksaunders73 commented Dec 8, 2021

ChristopherEeles commented Dec 14, 2021

zhangjl-work commented May 26, 2022

zhangjl-work commented May 26, 2022

ChristopherEeles commented Jun 23, 2022

Seurat and genefu #22

Seurat and genefu #22

Comments

ksaunders73 commented Dec 3, 2021

ChristopherEeles commented Dec 3, 2021

ksaunders73 commented Dec 8, 2021

ChristopherEeles commented Dec 14, 2021

zhangjl-work commented May 26, 2022

zhangjl-work commented May 26, 2022

ChristopherEeles commented Jun 23, 2022