Skip to content
cameron-jack edited this page Jul 10, 2017 · 1 revision

This is a basic tutorial following the standard workflow for viewing functional relationships between counts-based data and expression data. If you are finding the command-line difficult to use, you can launch the GUI version of ChipPy by running the ChipPy.py python script in the main folder. You will need to install PyQt4 for this to work however.

Download the chippy_db appropriate for the species you are working on. If the species you are working on does not appear here please contact the maintainer (Cameron Jack).

Human (Ensembl release 73) Human (Ensembl release 76) Mouse (Ensembl release 73) Mouse (Ensembl release 76) Mouse (Ensembl release 67) Zebrafish (Ensembl release 73) Zebrafish (Ensembl release 76) D.melanogaster (Ensembl release 73) D.melanogatser (Ensembl release 76) S.cerevisiae (Ensembl release 73) S.cerevisiae (Ensembl release 76) Note: All release 76 DBs contain "dummy" expression sets so you can examine your counts data without expression information present. Please make sure these have the extension '.db' as some download environments drop the file extension. The first part of the file name ('empty') is optional - please change this to the name of your project for easy identification. The rest of the name encodes useful information, i.e. species and Ensembl release number, so please don't change these.

ChipPy uploads expression data from a very simple, tab-delimited format with the header and type: "genes probesets exp". If your expression data is in the GFF3 format then the expression_from_gff3.py script will convert it for you:

python expression_from_gff3.py <data.gff> [data_out.exp]

We'll assume for this example that you're working with the mouse DB. Now you need to upload this expression data to the chippy_db. For this you use add_expression_db.py e.g.

python add_expression_db.py chippy_73_mouse.db -e expression/H2A/Sample1_H2A.exp --sample_type 'exp_absolute' --name 'Sample1' --description 'H2A' From now on you'll refer to this sample as 'Sample1' wherever you would put the sample name into ChipPy. You can see that the ChipPy mouse DB is listed directly after add_expression_db.py. It is the only positional argument used for any of the ChipPy scripts.

Now it's time to export the counts data from your ChIP-Seq for the genes you've just loaded into the DB.

python export_counts.py chippy_73_mouse.db --collection 'sample1_bam.data' --sample 'Sample1' --BAMorBED chipseq/sample1_sorted_indexed.bam --expression_area 'TSS' --chr_prefix 'chr_' If you used BWA to align the reads then the chromosomes will have chr_ as the prefix. If you used Bowtie (or Bowite2) then it'll be 'chr'. We are also selecting the Transcription Start Site (TSS) as the region to view. This will take about 20 minutes for a large ChIP-Seq data set in mouse. The '-c' (--collection) flag sets where you output the data collection created by export_centred_counts.py.

Now you can plot the results:

python plot_counts.py chippy_69_mouse.db --collections 'sample1_bam.data' -g 100 --counts_metric 'mean' --normalise_by_RPM --plot_filename 'sample1_H2A' Please note that this is --collections and is distinct from --collection. --collections may take any number of data files and handles '*' glob operators.

The above code line produces a normalised RPM tag counts plot around the transcription start site of all genes and grouped by 100 genes per line. Note this will output to screen only. When you wish to save the plots to file, simply add to the command line: --plot_filename 'my_name.pdf' With the possible extensions and formats being 'pdf', 'png', 'jpg'.

From the command line all scripts will divulge their option details with:

python script_name -h

Clone this wiki locally