Quick start

HotMAPS

HotMAPS detects spatially cluster missense mutations in protein structures. Often HotMAPS is applied to the entire Protein Data Bank to do an exome scale analysis of mutational clustering. However, this quick start will demo an application of HotMAPS to a single protein structure. This is intended to get HotMAPS running with minimal overhead of preparing the more bulky requirements of an exome scale analysis. In this quickstart we will be examining a hotspot in RAC1 found in Head and Neck Squamous Cell Carcinoma.

Download PDB

The first step is to download the PDB file for structure 1e96. You can download the structure from RCSB, found here. Click the "download files" button to show a drop down menu. Download the "PDB Format (gz)" and the "Biological Assembly". The first file contains the asymmetrical unit and the second file is a biolgoical assembly (see more explanation here ). Place both files into the same directory.

Tell HotMAPS where to find the PDB

The next step is to alter the configuration file to point to the directory where you download the PDB files. The configuration file is named "config.txt" and is located in the top-level directory of the HotMAPS source code. The original configuration file should look like the following:

[PDB]
modbase_dir=/path/to/ModBase
pdb_dir=/path/to/my_pdb_data_dir
refseq_homology: %(modbase_dir)s/ModBase_H_sapiens_2013_refseq/models/model/
ensembl_homology: %(modbase_dir)s/H_sapiens_2013/models/
biological_assembly: %(pdb_dir)s/biounit/coordinates/all/
non_biological_assembly: %(pdb_dir)s/structures/all/pdb

You will need to edit the pdb_dir, biological_assembly, and non_biological_assembly fields. The pdb_dir is the path to the base directory where you are storing your PDB files. If both your downloaded files are in the same directory, then set both the biological_assembly and non_biological_assembly fields to be %(pdb_dir)s/. If say you download your PDB files into the directory /home/your_user/pdb_downloads, then your configuration file should look like the following (where "/home/your_user/pdb_downloads" should be replaced by the actual path you downloaded the files to):

[PDB]
modbase_dir=/path/to/ModBase
pdb_dir=/home/your_user/pdb_downloads
refseq_homology: %(modbase_dir)s/ModBase_H_sapiens_2013_refseq/models/model/
ensembl_homology: %(modbase_dir)s/H_sapiens_2013/models/
#biological_assembly: %(pdb_dir)s/biounit/coordinates/all/
#non_biological_assembly: %(pdb_dir)s/structures/all/pdb
biological_assembly: %(pdb_dir)s/
non_biological_assembly: %(pdb_dir)s/

For the purpose of this quick start, you do not need to know about the other fields: modbase_dir, refseq_homology, and ensembl_homology.

Fetching and preparing the quick start data

The next step is to fetch the data for the quick start example. The mutations for this example have been mapped from the genome to the protein structure 1e96 for you already. You will need to use a command prompt and be located in the top-level HotMAPS directory.

$ wget http://karchinlab.org/data/HotMAPS/1e96_example.tar.gz
$ tar xvzf 1e96_example.tar.gz
$ mkdir -p data  # make a directory 
$ cd data
$ wget http://karchinlab.org/data/HotMAPS/pdb_info.txt.gz 
$ gunzip pdb_info.txt.gz 
$ cp ../1e96_example/mutations_1e96.txt mutations.txt  # by default mapped mutations are found in the file data/mutations.txt
$ cd ..

By default HotMAPS will assume a file where your mutations that have been mapped to protein structures is located in data/mutations.txt. In a later tutorial, you will go through the process of mapping the mutations to protein structures yourself. The next step is to annotate your PDB structure. You should be in the top-level directory of HotMAPS.

$ make annotateStructures

HotMAPS will tell you that many PDB structures are missing. Please ignore this, as this is expected since we have only provided a single structure. The data should now be ready to run HotMAPS statistical algorithm to find hotspot regions in protein structures.

HotMAPS hotspots

For this example, you can use the following command to run the steps of HotMAPS.

$ make runNormalHotspot OUTPUT_DIR=output/1e96_output
$ make multipleTestCorrect OUTPUT_DIR=output/1e96_output MUPIT_ANNOTATION_DIR=1e96_example/mupit_annotations
$ make findHotregionStruct OUTPUT_DIR=output/1e96_output MUPIT_ANNOTATION_DIR=1e96_example/mupit_annotations

HotMAPS results are saved to the directory specified by the OUTPUT_DIR parameter. The MUPIT_ANNOTATION_DIR contains information about how the mutations were mapped to the protein structure, for the purpose of this quick start you do not have to concern yourself with this. You should see the significant hotspots (at a q-value of 0.01, by default) in the output/1e96_output/hotspot_regions_structure_.01.txt file. It should contain one hotspot:

1e96   	HNSC   	0:A:116;0:A:159;0:A:18;0:B:102;0:A:29;0:A:15

The first column of the output is the protein structure (1e96 in this case), the second column is the particular category/cancer type (HNSC for Head and Neck Squamous Cell Carcinoma), and columns 3 and beyond are significant hotspot regions. In this case there is only a single hotspot region, so only column 3 has a value. However, if the protein structure had multiple spatially separated hotspot regions then columns 4 and further may contain values. Letters indicate the protein chain within the PDB file, and numbers indicate the amino acid residue position. Many amino acid residues can be found in the same hotspot region (separated by semi-colons).

Below you can see the hotspot region (highlighted in red) for RAC1 (teal). Mutated residues are shown with purple spheres. The hotspot region occurs at the GTP binding residues (GTP shown in sticks, magnesium is blue sphere). This visualization is generated from MuPIT.

RAC1 hotspot

Provide feedback

Saved searches