Skip to content
Cristina Yenyxe Gonzalez Garcia edited this page Jan 10, 2017 · 2 revisions

European Variation Archive

What is the European Variation Archive (EVA)?

An open-access database of all types of genetic variation data, from all species. It provides access to highly detailed, granular, raw variant data from human, animal and plant species. All users can download data from any study, or submit their own data to the archive. They can also query all variants in the EVA by study, gene, chromosomal location or reference SNP ID number (rs ID) using its Variant Browser.

You can visit the service at www.ebi.ac.uk/eva.

What is the EVA pipeline?

It is a bioinformatics pipeline that processes Variant Call Format (VCF) files, normalizes the variants listed in them, calculates statistics and annotates them using the Variant Effect Predictor developed by Ensembl. All this information is stored in a MongoDB database and can be easily queried by genomic region, gene name or rs ID, among others.

What problems does the EVA pipeline solve?

The EVA infrastructure is portable, so anyone can set up their own server to store and query variation data. You only need to have Java installed, a relational database to track job progress, a MongoDB database to store the variant information, and you are ready to go!

The pipeline is very focused on reducing wasted computation time. The pipeline tracks the status of a job and, when it fails, it resumes execution from that exact point. No need to process again million of variants that were successfully stored!

What other components conform the EVA ecosystem?

In other to make the querying experience more friendly, we have implemented a web services API that supports the queries listed in the previous section. Please check the API wiki for more information.

The web services API returns information in a format convenient for websites to consume. But for analysis, the preferred format is the Variant Call Format (VCF), and the EVA also provides a tool to generate that kind of files, extracting information from the database. This allows to combine variants and samples from multiple studies that otherwise could not be analyzed altogether. Please check the tools repository to get access to the code.