The MaizeGDB Python 3 version of James Schnable's RNA-seq processing pipeline and modular web interface.
For an examples of MaizeGDB's qTeller web interfaces visit https://qteller.maizegdb.org/
| qTeller data types | Description |
|---|---|
| Genes in an Interval | Select a chromosome coordinate interval for a given genome to retrieve RNA/protein abundances. |
| Genes by Name | Paste a list of gene models of interest to retrieve their RNA/protein abundances. |
| Visualize Expression | Visualize RNA/protein abundances for a single gene, or compare abundances for two genes. |
Data types are optimized for single-genome RNA-seq data, single-genome RNA-seq and protein abundance data, or multi-genome RNA-seq data.
| Directory Name | Description |
|---|---|
| build_db | Scripts for constructing the SQLite DB. |
| web_interface | Public facing files that are served by the Apache Server. |
| qteller_python2.7 | MaizeGDB Python 2.7 instance. |
- No additional libraries required.
- See python3_requirements.txt and python modules for a list of dependencies.
- No untypical customization is needed.
See an example of installing Apache on CentOS 8.
Centos 8 comes with Python 3, which includes PIP.
To install additional libraries:
$ 'pip install -r python3_requirements.txt'
Upon successful installation of Python, PHP, and Apache, you can git clone this project into your Apache directory. The public-facing directories are located in the web_interface directory. Assuming a default Apache installation, the DocumentRoot in the httpd.conf would look like this:
DocumentRoot "/var/www/html/qTeller/web_interface"
See Adding new data on final steps for generating the DB.
(1) Drop-down menu changes for chromosome IDs must be manually edited in index_singlegenome.php, index_multigenome.php, and Protein_index.php files. These files must be edited in each to reflect the chromosome IDs of the target genome(s). For instance, maize has ten chromosomes with the nomenclature chr1, chr2, etc; Sorghum also has ten chromosomes, but the chromosome nomenclature is Chr01, Chr02, etc. The index*.php files must be edited to reflect your target genome's chromosome information:
(2) For index_multigenome.php only, the Genome Version drop-down menu must be edited to reflect the genomes from the multi-genome bed file. Note that <option value= for the Genome Version dropdown menu in the php file corresponds to the genome ID listed in Column 5 of the bed file. To see more in-depth examples of file formatting, click here.
The qTeller database generation script requires the following 3 files:
- RNA-seq and/or protein abundance files
- If it doesn't exist already, create the build_db/abundance directory (the abundance directory can be whatever name you want):
$ mkdir build_db/abundance - Drop your fpkm_tracking files in the build_db/abundance directory. They must end with either the .fpkm_tracking file extension from a Cufflinks output, or if you are submitting RNA-seq or protein abundances with only the gene model ID (column 1) and abundance data (column 2), the file extension should be .txt .
- If it doesn't exist already, create the build_db/abundance directory (the abundance directory can be whatever name you want):
- GFF or bed file
- CSV file
- Create a metadata file in CSV format so the script knows how to interpret the abundance files. Here is an example.
- NOTE: The File_handle column specifies the name of the abundance file to load (minus the .fpkm_tracking or .txt file extension)
Assuming you have the required files, you can create the SQLite DB for RNA and/or protein abundance data using the following command:
$ cd build_db
$ python multigenome_build_qt_db.py <METADATA.CSV> --bed_file <BED.bed> --info_dir ./<ABUNDANCE> --dbname userdb # creates userdb
where <METADATA.CSV> is the CSV file (3), <GFF.gff3> is the GFF file (2), and is the directory where the abundance files are kept (1) as described above. This will create a userdb SQLite file.
To build the SQLite DB for single-genome data (with no protein abundances) from the included test data, download and uncompress this gff3 file, move to build_db, then run this command:
$ cd build_db
$ python build_qt_db_gene_protein.py test_singlegenome_metadata.csv --gff_file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 --info_dir ./test_singlegenome_fpkm --dbname singledb # creates singledb
To build the SQLite DB for single-genome data with both RNA and protein abundances from the included test data, download and uncompress this gff3 file, move to build_db, then run this command:
$ cd build_db
$ python build_qt_db_gene_protein.py test_singlegenome_metadata.csv --gff_file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 --info_dir ./test_protein_abundance --dbname proteindb # creates proteindb
To create the SQLite DB for multi-genome data from the included test data, run this command:
$ cd build_db
$ python multigenome_build_qt_db.py test_multigenome_metadata.csv --bed_file test_multigenome_NAM_merged_IDs.bed --info_dir ./test_multigenome_fpkm --dbname multidb # creates multidb
To see more in-depth examples of file formatting, click here.
- Fun fact: you can use the SQLite Viewer to easily look inside the DB and experiment with queries.
Finally, the last step is to move the generated singledb, proteindb, or multidb file into the web_interface directory:
$ mv singledb ../web_interface/
You should now be able to access qTeller through your browser.
