Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem changing the ploidy of female individuals on the sex chromosome #2122

Open
TanguyMuller opened this issue Mar 8, 2024 · 3 comments

Comments

@TanguyMuller
Copy link

TanguyMuller commented Mar 8, 2024

Hello everyone,

I am conducting research on the Pine Processionary Moth and I want to call variant on the sexual chromosome.
In my data I have males that are homochromatic ZZ and females that are heterochromatic ZW.
So I would like to make a variant calling on the chrZ specifying that the male individuals are diploid and female haploid. To do this I run the following command:

sbatch script/variant_call.sh list/all.list chrZ

with variant_call.sh :

BAMLIST=$1
REGION=$2
PATH_TO_ASSEMBLY=/PATH/TO/ASSEMBLY/reference.fa

bcftools mpileup -d 500 -C 50 -Oz -f $PATH_TO_ASSEMBLY -b $BAMLIST -r $REGION -q 20 -Q 20 -a DP,AD | \
   bcftools call -mv -S sample.txt --ploidy-file ploidy.txt -Oz > all.list.${REGION}.bcftools.vcf.gz

echo "done"

and this is the error I get

[mpileup] 67 samples in 67 input files
[mpileup] maximum number of reads per input file set to -d 500
Note: could not parse as PED: sample.txt

I am having difficulty understanding this because my file is in a specific format.

My sample.txt file :

01_SP_10    M
PP_Portugal_SP_CKER00371_seq2023_pooled    M
PP_Portugal_SP_CKER00380_seq2022    M
PP_Portugal_SP_CKER00399_seq2022    M
PP_Portugal_SP_CKER00463_seq2022    M
PP_Portugal_SP_CKER_F3_SP_seq2015    F
PP_Portugal_SP_CKER_F4_SP_seq2015    F
PP_Portugal_SP_CKER_M1_SP_seq2015    M
PP_Portugal_SP_CKER_M2_SP_seq2015    M
01_SP_9    M
PP_Portugal_SP_pool_seq2015    M
PP_Portugal_WP_CKER00482_seq2023_pooled    M
PP_Portugal_WP_CKER00486_seq2022    M
PP_Portugal_WP_CKER00491_seq2022    M
PP_Portugal_WP_CKER00503_seq2022    M
PP_Portugal_WP_CKER00505_seq2022    M
PP_Portugal_WP_CKER00508_seq2022    M
PP_Portugal_WP_CKER00538_seq2022    M
PP_Portugal_WP_CKER00540_seq2022    M
PP_Portugal_WP_pool_seq2015    M
03_FU_10    M
PP_Portugal_Fundao_CKER01105_seq2022    M
PP_Portugal_Fundao_CKER01111_seq2023    M
PP_Portugal_Fundao_CKER01113_seq2022    M
PP_Portugal_Fundao_CKER01174_seq2023    M
03_FU_5    M
03_FU_6    M
03_FU_7    M
03_FU_8    M
03_FU_9    M
PP_Portugal_Fundao_pool_seq2022    M
PP_Portugal_Viseu_1230_viseu_seq2022    F
PP_Portugal_Viseu_1235_viseu_seq2022    M
PP_Portugal_Viseu_CKERnana1_seq2023    F
PP_Portugal_Viseu_CKERnana2_seq2023    F
PP_Portugal_Viseu_pool_seq2022    M
PP_Portugal_Varges_CKER01301_seq2023    F
PP_Portugal_Varges_CKER01302_seq2023    F
PP_Portugal_Varges_CKER01308_seq2022    F
PP_Portugal_Varges_CKER01309_seq2022    F
PP_Portugal_Varges_pool_seq2022    M
PP_Portugal_Caparica_CKER01261_seq2022    M
PP_Portugal_Caparica_CKER01262_seq2023    M
PP_Portugal_Caparica_CKER01263_seq2023    M
PP_Portugal_Caparica_CKER01269_seq2022    M
PP_Portugal_Caparica_pool_seq2022    M
PP_Portugal_Tavira_CKER01120_seq2022    M
PP_Portugal_Tavira_CKER01199_seq2023    M
PP_Portugal_Tavira_CKER01200_seq2022    M
PP_Portugal_Tavira_CKER01201_seq2023    M
PP_Portugal_Tavira_pool_seq2022    M
PP_Portugal_Grandola_CKER01231_seq2022    M
PP_Portugal_Grandola_CKER01232_seq2022    M
PP_Portugal_Grandola_CKER01233_seq2023    M
PP_Portugal_Grandola_CKER01235_seq2023    M
PP_Portugal_Grandola_pool_seq2022    M
09_ES_10    M
PP_Espagne-bas_Cortijuela-1418m_CKER00939_seq2022    M
PP_Espagne-bas_Cortijuela-1418m_CKER00940_seq2022    M
PP_Espagne-bas_Cortijuela-1418m_CKER00952_seq2022    M
PP_Espagne-bas_Cortijuela-1418m_CKER00958_seq2023    M
PP_Espagne-bas_Cortijuela-1418m_CKER00959_seq2023    M
PP_Espagne-bas_Cortijuela-1418m_CKER00960_seq2023    M
PP_Espagne-bas_Cortijuela-1418m_CKER00961_seq2023    M
PP_Espagne-bas_Cortijuela-1418m_CKER00962_seq2022    M
09_ES_9    M
PP_Espagne-bas_Cortijuela-1418m_pool_seq2015    M

and my ploidy.txt file is :

chrZ	1	28970955	F	1
*	*	*	M	2
*	*	*	F	2 

Could someone please help me ??

@pd3
Copy link
Member

pd3 commented Mar 14, 2024

Note: could not parse as PED: sample.txt

It is not an error, only a warning (albeit admittedly a bit useless and confusing). Check if the GT fields have the correct ploidy for the right samples and chromosomes, then it worked.

@TanguyMuller
Copy link
Author

Thank you for this response.
But precisely because bcftools does not read the ped.file, the GT fields are diploids for the female individuals and don't give the correct ploidy. Then i don't realy understand why bcftools can't read this file.

@pd3
Copy link
Member

pd3 commented Mar 26, 2024

The file sample.txt you showed above is not a PED file. For the format definition see for example here https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format

Regardless, if the ploidy is not determined and output correctly based on your current sample.txt, that would indicate a bug. Any chance you could provide a small test case to reproduce the problem? A few lines from the mpileup command, including the full headers, would suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants