Skip to content

baigal628/Tutorial_UploadDataToEGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

How to upload data to EGA?

What is EGA?

The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems (From EGA Website).

Prepare documents before getting a submission account

Data encryption

Before uploading data to EGA, you should encype your data locally.

Download data encryptor from EGA:

$ wget https://ega-archive.org/files/EgaCryptor.zip

$ unzip EgaCryptor.zip

Encrype data:

# Here I am using scRNAseq count matrix as an exmple.

$ tar -czvf ./data/Pool01_1_outs.tar.gz ./data/EGA/GEXdata/Pool01_1/outs/

$ java -jar EGA-Cryptor-2.0.0/ega-cryptor-2.0.0.jar -t {threads} \
    -i ./data/Pool01_1_outs.tar.gz \
    -o ./data/EGA/encryped/

# After the encryption, you will see three outputs.

$ ls ./data/EGA/encryped/

Pool01_1_outs.tar.gz.gpg
Pool01_1_outs.tar.gz.gpg.md5
Pool01_1_outs.tar.gz.md5

Upload data to EGA

I use Aspera ascp command line program to upload data. You can also use FTP for uploading files. More details are under: https://ega-archive.org/submission/tools/ftp-aspera

Download Aspera CLI

$ wget https://d3gcli72yxqn2z.cloudfront.net/downloads/connect/latest/bin/ibm-aspera-connect_4.2.6.393_linux_x86_64.tar.gz

$ tar xvzf ibm-aspera-connect_4.2.6.393_linux_x86_64.tar.gz

# For my computer, it is installed under home directory:
~/.aspera/connect/bin/ascp

Upload encryped data to EGA

# For a single file:
$ ASPERA_SCP_PASS={password} ~/.aspera/connect/bin/ascp  -P33001  -O33001 -QT -m3000M -L- ./data/EGA/encryped/Pool01_1_outs.tar.gz.gpg ega-box-{id}@fasp.ega.ebi.ac.uk:/.

# For a directory
$ ASPERA_SCP_PASS={password} ~/.aspera/connect/bin/ascp  -P33001  -O33001 -QT -m3000M -L- ./data/EGA/encryped/* ega-box-{id}@fasp.ega.ebi.ac.uk:/.

Register sample metadata

You can either type the sample information one by one in the submission protal or upload a csv file to the portal for large data sets.

Prepare sample csv file

Here is an example format of how this csv file should look like

$ head sample_meta.csv

title,alias,description,subjectId,bioSampleId,caseOrControl,gender,organismPart,cellLine,region, phenotype
scRNAseq,Pool01_1,tumor,001,,case,female,brain,,cortex,relapsed
scRNAseq,Pool01_2,normal,002,,control,male,brain,,cortex,wt
.....

Link data with registered sample metadata

After sample registration, you will want to link the data uploaded to EGA with registered samples. You can do this step manually by clicking on the sample and data item on EGA submission portal. However, you will never want to repeat this many times for large data sets. Here is an example of csv file, where you can link the .bam data items with samples easily. Uploading this file to the submission portal will automatically link the bam files and their md5 checksums to the registered samples.

Here is an example format of how this csv file should look like:

$ head sample_bam.csv

Sample alias,BAM File,Checksum,Unencrypted checksum
Pool01_1,Pool01_1.bam,,
Pool01_2,Pool01_2.bam,,
......

Final notes:

  • So far, I have not found out a way of linking analysis objects with sample meta via uploading csv file. The only way seems to be manually clicking in the protal. Because of that, I recommend puutting all the analysis data in to one folder, compress to the tar format and upload to the portal.

About

This is a step by step illustration of how I uploads data to EGA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published