Skip to content

Commit

Permalink
Merge pull request #156 from AlexandrovLab/i131-p6
Browse files Browse the repository at this point in the history
I131 p6
  • Loading branch information
mdbarnesUCSD authored Sep 21, 2023
2 parents 11de286 + af3a303 commit 5dc31d7
Show file tree
Hide file tree
Showing 16 changed files with 527 additions and 264 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ cache:
- $TRAVIS_BUILD_DIR/src/

before_script:
- python3 install_genome.py $TRAVIS_BUILD_DIR/src/ GRCh37
- python install_genome.py -l $TRAVIS_BUILD_DIR/src/ GRCh37

script:
# run unit tests
Expand Down
64 changes: 41 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
[![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://osf.io/s93d5/wiki/home/) [![License](https://img.shields.io/badge/License-BSD\%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause) [![Build Status](https://travis-ci.com/AlexandrovLab/SigProfilerMatrixGenerator.svg?branch=master)](https://app.travis-ci.com/AlexandrovLab/SigProfilerMatrixGenerator)
[![Uptime Robot status](https://img.shields.io/uptimerobot/status/m795312784-02766a79f207f67626cef289)](https://stats.uptimerobot.com/jjqW4Ulymx)

# SigProfilerMatrixGenerator
SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.
SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts of the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.

**INTRODUCTION**

Expand All @@ -23,36 +24,48 @@ assemblies (GRCh37, GRCH38, mm10, mm9, rn6). As a result, ~3 Gb of storage must

**QUICK START GUIDE**

#### Using Python Interface:
This section will guide you through the minimum steps required to create mutational matrices:
1. Install the python package using pip:
```bash
pip install SigProfilerMatrixGenerator
```
2.
a. Install your desired reference genome from the command line/terminal as follows (a complete list of supported genomes can be found below):
```bash
pip install SigProfilerMatrixGenerator
```
2. Install your desired reference genome from the command line/terminal as follows (a complete list of supported genomes can be found below):
```python
$ python
>> from SigProfilerMatrixGenerator import install as genInstall
>> genInstall.install('GRCh37', rsync=False, bash=True)
```
This will install the human 37 assembly as a reference genome. You may install as many genomes as you wish. If you have a firewall on your server, you may need to install rsync and use the rsync=True parameter. Similarly, if you do not have bash,
use bash=False.
b. To install a reference genome that has been downloaded locally from the [Alexandrov Lab's ftp server](ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/), you can do the following:
```python
$ python
>> from SigProfilerMatrixGenerator import install as genInstall
>> genInstall.install('GRCh37', offline_files_path='path/to/directory/containing/GRCh37.tar.gz')
```
3. Place your vcf files in your desired output folder. It is recommended that you name this folder based on your project's name
4. From within a python session, you can now generate the matrices as follows:
```python
$ python3
>>from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
>>matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "/Users/ebergstr/Desktop/test",plot=True, exome=False, bed_file=None, chrom_based=False, tsb_stat=False, seqInfo=False, cushion=100)
```
```python
$ python3
>>from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
>>matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "/Users/ebergstr/Desktop/test",plot=True, exome=False, bed_file=None, chrom_based=False, tsb_stat=False, seqInfo=False, cushion=100)
```
The required parameters are as follows:
```python
SigProfilerMatrixGeneratorFunc(project, reference_genome, path_to_input_files)
```
#### Using Command Line Interface (CLI):
1. After installing the package, you can directly call the tool using:
```bash
SigProfilerMatrixGenerator <function-name> <arguments>
```
Currently supported functions are 'install' and 'matrix_generator'.
SigProfilerMatrixGeneratorFunc(project, reference_genome, path_to_input_files)
2. For example, to install a reference genome:
```bash
SigProfilerMatrixGenerator install GRCh37
```
3. To generate SBS, DBS, or INDEL matrices:
```bash
SigProfilerMatrixGenerator matrix_generator <project> <reference_genome> <path_to_input_files>
```
View the table below for the full list of parameters.
Expand All @@ -71,6 +84,7 @@ View the table below for the full list of parameters.
| | tsb_stat | Boolean | Outputs the results of a transcriptional strand bias test for the respective matrices. Default value False. |
| | seqInfo | Boolean | Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation. Default value True. |
| | cushion | Integer | Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations. Default value 100. |
| | volume | String | Path to SigProfilerMatrixGenerator's volume where reference genomes will be saved and loaded from. Useful for docker installations. Default is None. |


**INPUT FILE FORMAT**
Expand All @@ -92,7 +106,6 @@ a DBS, SBS, ID, and TSB folder (there will also be a plots folder if this parame

### Example with SV class present (tsv or csv file):


| chrom1 | start1 | end1 | chrom2 | start2 | end2 | svclass |
| :-----: | :-: | :-: | :-: | :-: | :-: | :-: |
| 19 | 21268384 | 21268385 | 19 | 21327858 | 21327859 | deletion
Expand Down Expand Up @@ -123,7 +136,6 @@ python3 ./SigProfilerMatrixGenerator/scripts/SVMatrixGenerator.py ./SigProfilerM
2. Aggregate SV plot - a summary plot showing the average number of events in each channel for the whole cohort of samples
3. SV Matrix - a 32 X n matrix (where n is the number of samples) that can be used to perform signature decomposition, clustering, etc.


## COPY NUMBER MATRIX GENERATION

In order to generate a copy number matrix, provide the an absolute path to a multi-sample segmentation file obtained from one of the following copy number calling tools (if you have individual sample files, please combine them into one file with the first column corresponding to the sample name):
Expand Down Expand Up @@ -156,8 +168,8 @@ $ python3

```
python ./SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py BATTENBERG ./SigProfilerMatrixGenerator/references/CNV/example_input/Battenberg_test.tsv BATTENBERG-TEST ./SigProfilerMatrixGenerator/references/CNV/example_output/

```

**SUPPORTED GENOMES**

This tool currently supports the following genomes:
Expand Down Expand Up @@ -203,22 +215,28 @@ pip install .[tests]
pytest tests
```

**END-TO-END tests**
**END-TO-END TESTS**

An integration test can be run with the following commands:

```bash
wget ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/GRCh37.tar.gz -P ./src/
pip install .
python3 install_genome.py src/ GRCh37
SigProfilerMatrixGenerator install GRCh37
python3 test.py -t GRCh37
```

**CITATION**

For SBSs, DBSs, and INDELs, please cite the following paper:

Bergstrom EN, Huang MN, Mahto U, Barnes M, Stratton MR, Rozen SG, and Alexandrov LB (2019) SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. **BMC Genomics** 20, Article number: 685.
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6041-2

For SVs and CNVs, please cite the following paper:

Khandekar A, Vangara R, Barnes M, Díaz-Gay M, Abbasi A, Bergstrom EN, Steele CD, Pillay N, and Alexandrov LB (2023) Visualizing and exploring patterns of large mutational events with SigProfilerMatrixGenerator. **BMC Genomics** 24, Article number: 469.
https://doi.org/10.1186/s12864-023-09584-y

**COPYRIGHT**

Expand Down
125 changes: 119 additions & 6 deletions SigProfilerMatrixGenerator/controllers/cli_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
from typing import List

from SigProfilerMatrixGenerator import install, test_helpers
from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as mg


def parse_arguments_test(args: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Run tests for SigProfilerMatrixGenerator"
description="Run tests for SigProfilerMatrixGenerator."
)
parser.add_argument(
"-t",
Expand All @@ -22,24 +23,136 @@ def parse_arguments_test(args: List[str]) -> argparse.Namespace:
nargs="+",
default=None,
)
parser.add_argument(
"-v",
"--volume",
help="Specify a destination for the downloaded genomes (default: None, used for Docker)",
default=None,
)

result = parser.parse_args(args)
return result


def parse_arguments_install(args: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Install reference files")
parser.add_argument("source_dir", help="Directory with local files")
parser.add_argument("genome", help="Genome to install")
parser = argparse.ArgumentParser(description="Install reference genome files.")
parser.add_argument(
"genome",
help="The reference genome to install. Supported genomes include {c_elegans, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, yeast}.",
)
parser.add_argument(
"-l",
"--local_install_genome",
help="""
Install an offline reference genome downloaded from the Alexandrov Lab's FTP server.
Provide the absolute path to the locally-stored genome file.
For downloads, visit AlexandrovLab's server:
ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/
""",
default=None,
)
parser.add_argument(
"-v",
"--volume",
help="Specify a destination for the downloaded genomes (default: None, used for Docker)",
default=None,
)
result = parser.parse_args(args)
return result


def parse_arguments_matrix_generator(args: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Create mutational matrices for all types of somatic mutations"
)

# Mandatory arguments
parser.add_argument("project", help="The name of the project.")
parser.add_argument(
"reference_genome",
help="The name of the reference genome. Supported values {c_elegans, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, yeast}.",
)
parser.add_argument("path_to_input_files", help="The path to the input files.")

# Optional arguments
parser.add_argument(
"--exome",
action="store_true",
help="Downsamples mutational matrices to the exome regions of the genome. Default is False.",
)
parser.add_argument(
"--bed_file",
default=None,
help="Downsamples mutational matrices to custom regions of the genome. Provide full path to BED file. Note: BED file header is required. Default is None.",
)
parser.add_argument(
"--chrom_based",
action="store_true",
help="Outputs chromosome-based matrices. Default is False.",
)
parser.add_argument(
"--plot",
type=bool,
default=False,
help="Integrates with SigProfilerPlotting to output visualizations for each matrix. Default is False.",
)
parser.add_argument(
"--tsb_stat",
type=bool,
default=False,
help="Outputs the results of a transcriptional strand bias test for the respective matrices. Default is False.",
)
parser.add_argument(
"--seqInfo",
type=bool,
default=True,
help="Outputs original mutations into a text file with the SigProfilerMatrixGenerator classification for each mutation. Default is True.",
)
parser.add_argument(
"--cushion",
type=int,
default=100,
help="Adds an Xbp cushion to the exome/bed_file ranges for downsampling mutations. Default is 100.",
)

parser.add_argument(
"-v",
"--volume",
help="Specify a destination for the downloaded genomes (default: None, used for Docker)",
default=None,
)

result = parser.parse_args(args)
return result


class CliController:
def dispatch_install(self, user_args: List[str]) -> None:
parsed_args = parse_arguments_install(user_args)
install.install(parsed_args.genome, offline_files_path=parsed_args.source_dir)
install.install(
parsed_args.genome,
offline_files_path=parsed_args.local_install_genome,
volume=parsed_args.volume,
)

def dispatch_test(self, user_args: List[str]) -> None:
parsed_args = parse_arguments_test(user_args)
test_helpers.install_genomes(parsed_args.download_genomes)
test_helpers.test_genomes(parsed_args.test_genome)
test_helpers.test_genomes(parsed_args.test_genome, parsed_args.volume)

def dispatch_matrix_generator(self, user_args: List[str]) -> None:
parsed_args = parse_arguments_matrix_generator(user_args)

mg.SigProfilerMatrixGeneratorFunc(
project=parsed_args.project,
reference_genome=parsed_args.reference_genome,
path_to_input_files=parsed_args.path_to_input_files,
exome=parsed_args.exome,
bed_file=parsed_args.bed_file,
chrom_based=parsed_args.chrom_based,
plot=parsed_args.plot,
tsb_stat=parsed_args.tsb_stat,
seqInfo=parsed_args.seqInfo,
cushion=parsed_args.cushion,
volume=parsed_args.volume,
)
Loading

0 comments on commit 5dc31d7

Please sign in to comment.