# Data practical using the Sequence Read Archive

The goal of this practical is to download the data associated with this paper:


Greiff V, Menzel U, Haessler U, Cook SC, Friedensohn S, Khan TA, Pogson M, Hellmann I, Reddy ST. (2014) Quantitative assessment of the robustness of next-generation sequencing of antibody variable gene repertoires from immunized mice. *BMC Immunol.* 2014 **15**:40.

You will do this in two ways:
  - Using a web browser
  - Using the EDirect utilities

Use this notebook to conduct your searches using EDirect, and feel free to add any other notes that help you.

## Finding accessions using a web browser

- Go tp [PubMed](https://www.ncbi.nlm.nih.gov/pubmed)
- Search for the publication
- In the 'Related Information' field on the right hand side, click SRA.

## Finding accessions using EDirect

You will need to piece together a command separated by pipes (`|`) in order to obtain a list of accessions that can be passed to `fastq-dump`.

First search PubMed for the paper using `esearch` with the database set to `pubmed` and an appropriate query. The usage information for `esearch` can be obtained by running the below cell.

In [None]:
!esearch --help

In [None]:
%%bash

Now pipe the output of the above to `elink -target sra` to get the linked records.

In [None]:
!elink --help

In [None]:
%%bash

Now fetch the run info using `efetch -format runinfo`.

In [None]:
!efetch --help

In [None]:
%%bash

Now save this as a CSV called 'greiff_runinfo.csv' by redirecting the output by adding a `>` followed by the name of the file.

In [None]:
%%bash

This bit of Python code will display the file you just downloaded.

In [None]:
import pandas as pd
runinfo = pd.read_csv("greiff_runinfo.csv")
runinfo

This will save the run accessions to a file.

In [None]:
runinfo["Run"].to_csv("greiff_accessions.txt",header=False,index=False)

In [None]:
%%bash
cat greiff_accessions.txt

You can now pipe the output of these to `fastq-dump`.