Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reharmonising a file with error #1226

Closed
ljwh2 opened this issue Jan 10, 2024 · 20 comments
Closed

Reharmonising a file with error #1226

ljwh2 opened this issue Jan 10, 2024 · 20 comments
Assignees

Comments

@ljwh2
Copy link
Contributor

ljwh2 commented Jan 10, 2024

GCST002047 was not harmonised successfully because our harmonisation pipeline cannot recognise the column “Effect_Allele”. The harmonisation pipeline reads the “effect_allele” column in the input file to harmonise the variant. However, all data in this column is NA. This is the reason why all variants give hm 14. If we change the header of this file, it should be able to be harmonised. (same as other_allele)

Please fix the file and re-qeue for harmonisation

@karatugo
Copy link
Member

When unzipped two files appeared. I fixed the header for both but the metadata yaml files are missing. Since the study is old, the data is not available from the ingest api.

@karatugo
Copy link
Member

karatugo commented Mar 19, 2024

  • Create metadata yaml files that contain GCST ID and genome assembly for the harmonisation.
  • Submit the harmonisation

@karatugo
Copy link
Member

Submitted to codon with the submission script /hps/software/users/parkinso/spot/gwas/prod/scripts/cron/start_harmonisation_pre_standard_goci1226.sh

Job <92779129> is submitted to default queue <standard>.

@karatugo
Copy link
Member

karatugo commented Mar 20, 2024

  • Compare two studies in FTP ("EduYears" and "College")
  • If they are the same, replace the zipped file with the correct one - Moved SSGAC_College_Rietveld2013_publicrelease.txt to GCST002001-GCST003000/GCST002047 and removed the zipped file
  • Harmonise with pre_gwas_ssf
  • Also, upload only one harmonised file with the correct title

@karatugo
Copy link
Member

karatugo commented Mar 20, 2024

  • Use variant_id in the header (as per pre_gwas_ssf standard)
  • Rename the file with their GCST_ID.txt for harmonisation

@karatugo
Copy link
Member

Using the script at /hps/software/users/parkinso/spot/gwas/prod/scripts/cron/start_harmonisation_pre_standard_goci1226.sh

Job <93028342> is submitted to default queue <standard>.

@karatugo
Copy link
Member

Added chromosome and bas_pair_location columns filled with NA and submitted again.

Job <93126943> is submitted to default queue <standard>.

@karatugo
Copy link
Member

karatugo commented Mar 27, 2024

@sprintell
Copy link
Member

This is confirmed done, @earlEBI will double check

@earlEBI
Copy link

earlEBI commented Apr 4, 2024

Reopening as the yaml files do not look quite right. (is_harmonised = false).
Also, the .tbi files should be renamed .tbi.gz.

@earlEBI earlEBI reopened this Apr 4, 2024
@karatugo
Copy link
Member

Fixed the following fields:

genome_assembly: GRCh38
is_harmonised: true
is_sorted: true

@earlEBI Could you check again please? Thanks.

@ljwh2
Copy link
Contributor Author

ljwh2 commented May 22, 2024

@earlEBI please confirm

@earlEBI
Copy link

earlEBI commented May 22, 2024

The yamls are only five lines long. Should they not contain more detail?

Screenshot 2024-05-22 at 10 21 14

@karatugo
Copy link
Member

I thought that's because it's a very old submission. And also they are not available in the ingest api. @sajo-ebi

https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST008396
https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST002047

@sprintell
Copy link
Member

Old studies are meant to be retrieved fromthe public rest API: https://www.ebi.ac.uk/gwas/rest/api/studies/GCST008396

@karatugo
Copy link
Member

TODO: Update sumstats tools so that we fetch the REST API if Ingest API does not return any data.

@sprintell
Copy link
Member

Harmonization done, but yaml file has some missing data.

@karatugo
Copy link
Member

karatugo commented Jun 3, 2024

Regenerated YAML files for GCST002047 and GCST008396. Expect them in the public ftp in 2 days.

@karatugo
Copy link
Member

karatugo commented Jun 10, 2024

YAML files are in staging FTP but not in public FTP. The reason why it didn't sync is in our ftp-sync code, we only filter the files that start with 'GCST*'. See https://github.com/EBISPOT/gwas-utils/blob/6fbf2c7a6d6fdfc79e0b8c2d1e74539bb1073303/ftpSummaryStatsScript/ftp_sync.py#L186-L188

Will renamed files, expect them in the public ftp in 2 days.

@ljwh2
Copy link
Contributor Author

ljwh2 commented Jun 12, 2024

Agreed to keep original files as per old guidelines

@ljwh2 ljwh2 closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants