Cancer Mutation Finder CLI

This Python 3 CLI app retrieves data from the cancer bio portal and does a rudimentary analysis of mutation frequency of the supplied gene symbols. Gene symbols are limited to the 1000 common genes listed in app/data/gene_id_symbol_map.csv. Currently the study ID is firm-coded to "gbm_tcga_gistic".

Getting Started

Dependencies:

Internet access
Python 3
pip

Installation

cd DFCI
virtualenv venv  # Recommended
source venv/bin/activate # Recommended
pip install -r requirements.txt

Note the shebang line in app/main.py may not work for all systems. In that case run main.py as a script. I.E. $ python3.6 main.py TP53.

Using Command Line Tool

Each parameter is a gene name i.e. TP53 or BRCA1.

$ ./main.py TP53
TP53 is mutated in 22.34% of all cases.
TP53 is copy number altered in 1.83% of all cases.

The user can also be able to execute the program with as many genes as desired genes. For example:

$ ./main.py TP53 MDM2 MDM4 MXA XPA
TP53 is mutated in 22.34% of all cases.
TP53 is copy number altered in 1.83% of all cases.
MDM2 is mutated in 23.81% of all cases.
MDM2 is copy number altered in 9.16% of all cases.
MDM4 is mutated in 30.40% of all cases.
MDM4 is copy number altered in 9.52% of all cases.
XPA is mutated in 26.74% of all cases.
XPA is copy number altered in 0.37% of all cases.
The gene set is altered in 61.54% of all cases.

And finally, because life is short, the --toJSON flag results in JSON being sent to standard out. Note I used jq to format the output for legibility in this example.

$ ./main.py TP53 MDM2 MDM4 XPA --toJSON | jq
{
  "geneset": {
    "mutated": 168,
    "totalCount": 273
  },
  "TP53": {
    "NA": 0,
    "singleCopy": 56,
    "noCopy": 5,
    "multiCopy": 0,
    "noChange": 212
  },
  "MDM2": {
    "NA": 0,
    "singleCopy": 40,
    "noCopy": 0,
    "multiCopy": 25,
    "noChange": 208
  },
  "MDM4": {
    "NA": 0,
    "singleCopy": 57,
    "noCopy": 0,
    "multiCopy": 26,
    "noChange": 190
  },
  "XPA": {
    "NA": 0,
    "singleCopy": 72,
    "noCopy": 0,
    "multiCopy": 1,
    "noChange": 200
  }
}

Known Issues

There are (almost) no test. Unit test and at least one integration test would be awesome and wouldn't take very long.
Executing the script from outside of /app resulting in incorrect path to /data. This could be fixed pretty easily with:
```
import os
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'relative/path/to/file/you/want')
```
Single genes should also return the total % of mutations. I.E. "Total % of cases where TP53 is altered by either mutation or copy number alteration: 30% of all cases." for ./main.py TP53
The error messages are uninformative and should tell a user why the app is failing. Usually this is because the gene symbol isn't recognized.
None of the functions have doc string. Might not be that big of a deal since they're not designed for export but it would help with maintenance.

Future Features

Specifying a study ID could add a lot of value and is trivial.
Dockerizing the whole app while definitely overkill would solve shebang line issues and negate the need to install requirements.txt.
If a gene symbol isn't found in app/data/gene_id_symbol_map.csv the app should look for it online.
Gene results should be cached. (I.E. app/data/<study_id>/<gene_id>)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
docs		docs
.gitignore		.gitignore
ReadMe.md		ReadMe.md
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cancer Mutation Finder CLI

Getting Started

Installation

Using Command Line Tool

Known Issues

Future Features

About

Releases

Packages

Languages

HarryMWinters/cancer-mutation-CLI

Folders and files

Latest commit

History

Repository files navigation

Cancer Mutation Finder CLI

Getting Started

Installation

Using Command Line Tool

Known Issues

Future Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages