# Alignment CLI Usage

## Activate virtual environment

In [None]:
# Using virtualenvwrapper here but can also be done with Conda 
workon pyBioTools

## Reads_index

### Get help

In [3]:
pyBioTools Alignment Reads_index -h

usage: pyBioTools Alignment Reads_index [-h] -i INPUT_FN [-u] [-s] [-p]
                                        [-v | -q | --progress]

Index reads found in a coordinated sorted bam file by read_id. The created
index file can be used to randon access the alignment file per read_id

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to the bam file to index (required) [str]
  -u, --skip_unmapped   Filter out unmapped reads (default: False) [None]
  -s, --skip_secondary  Filter out secondary alignment (default: False) [None]
  -p, --skip_supplementary
                        Filter out supplementary alignment (default: False)
                        [None]
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)
  --progress            Display a progress bar
(pyBioTools) 

: 1

### Basic usage

In [4]:
pyBioTools Alignment Reads_index -i ./data/sample_1.bam

Checking Bam file
Parsing reads
(pyBioTools) 

: 1

### Excluding reads from index

In [5]:
pyBioTools Alignment Reads_index -i ./data/sample_1.bam --verbose --skip_secondary --skip_unmapped

Checking Bam file
Parsing reads

Read counts summary
	Reads retained
		total: 10,772
		primary: 10,584
		supplementary: 188
	Reads discarded
		total: 2,912
		secondary: 1,496
		unmapped: 1,416

(pyBioTools) 

: 1

## Reads_sample

### Get help

In [6]:
pyBioTools Alignment Reads_sample -h

usage: pyBioTools Alignment Reads_sample [-h] -i INPUT_FN [-o OUTPUT_FOLDER]
                                         [-p OUTPUT_PREFIX] [-r N_READS]
                                         [-s N_SAMPLES]
                                         [--rand_seed RAND_SEED]
                                         [-v | -q | --progress]

Randomly sample `n_reads` reads from a bam file and write downsampled files in
`n_samples` bam files. If the input bam file is not indexed by read_id
`index_reads` is automatically called.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to the indexed bam file (required) [str]
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        Path to a folder where to write sample files (default:
                        ./) [str]
  -p OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        Path to a folder where to write sample files (default:
 

: 1

### Basic usage

In [4]:
pyBioTools Alignment Reads_sample -i ./data/sample_1.bam -o ./output/sample_reads -p 1K -r 1000 -s 3 --progress --verbose

Checking Bam and index file
Load index
	Index: 10772it [00:00, 397461.54it/s]
Write sample reads
	Sample 1: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1232.54 Reads/s]
	Indexing output bam file
	Sample 2: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1211.51 Reads/s]
	Indexing output bam file
	Sample 3: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1164.56 Reads/s]
	Indexing output bam file
(pyBioTools) 

: 1

## Filter

### Get help

In [9]:
pyBioTools Alignment Filter -h

usage: pyBioTools Alignment Filter [-h] -i INPUT_FN -o OUTPUT_FN [-u] [-s]
                                   [-p] [-t ORIENTATION] [-r MIN_READ_LEN]
                                   [-a MIN_ALIGN_LEN] [-m MIN_MAPQ]
                                   [-f MIN_FREQ_IDENTITY]
                                   [--select_ref [SELECT_REF [SELECT_REF ...]]]
                                   [--exclude_ref [EXCLUDE_REF [EXCLUDE_REF ...]]]
                                   [-v | -q | --progress]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to the bam file to filter (required) [str]
  -o OUTPUT_FN, --output_fn OUTPUT_FN
                        Path to the write filtered bam file (required) [str]
  -u, --skip_unmapped   Filter out unmapped reads (default: False) [None]
  -s, --skip_secondary  Filter out secondary alignment (default: False) [None]
  -p, --skip_supplementary
                        

: 1

### Basic usage

In [3]:
pyBioTools Alignment Filter \
    -i "./data/sample_1.bam" \
    -o "./output/sample_1_filter.bam" \
    --skip_unmapped \
    --skip_supplementary \
    --skip_secondary \
    --min_read_len 300 \
    --min_align_len 300 \
    --orientation "+" \
    --min_mapq 10 \
    --min_freq_identity 0.8 \
    --verbose

Checking input bam file
Parsing reads
Indexing output bam file

Read counts summary
	Reads discarded
		total: 9,262
		wrong_orientation: 5,291
		secondary: 1,496
		unmapped: 1,416
		low_identity: 510
		low_mapping_quality: 283
		supplementary: 188
		short_alignment: 67
		short_read: 11
	Reads retained
		primary: 4,422
		total: 4,422

(pyBioTools) 

: 1

## To_fastq

In [2]:
pyBioTools Alignment To_fastq -h

usage: pyBioTools Alignment To_fastq [-h] -i [INPUT_FN [INPUT_FN ...]] -1
                                     OUTPUT_R1_FN [-2 OUTPUT_R2_FN] [-s] [-v]
                                     [-q] [--progress]

Dump reads from an alignment file or set of alignment file(s) to a fastq or
pair of fastq file(s). Only the primary alignment are kept and paired_end
reads are assumed to be interleaved. Compatible with unmapped or unaligned
alignment files as well as files without header.

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FN [INPUT_FN ...]], --input_fn [INPUT_FN [INPUT_FN ...]]
                        Path (or list of paths) to input BAM/CRAM/SAM file(s)
                        (required) [str]
  -1 OUTPUT_R1_FN, --output_r1_fn OUTPUT_R1_FN
                        Path to an output fastq file (for Read1 in paired_end
                        mode of output_r2_fn is provided). Automatically
                        gzipped if the .gz extension is

: 1

### Single end read usage from bam files

In [4]:
pyBioTools Alignment To_fastq \
    -i ./data/sample_1.bam ./data/sample_2.bam\
    -1 ./output/sample_1-2_SE_from_bam.fastq.gz \
    --verbose \
    --progress

Opening file ./output/sample_1-2_SE_from_bam.fastq.gz in writing mode
Parsing reads
Reading input file ./data/sample_1.bam
	Reading: 12000 Reads [00:16, 720.68 Reads/s] 
	Reached end of input file ./data/sample_1.bam
Reading input file ./data/sample_2.bam
	Reading: 12000 Reads [00:18, 644.27 Reads/s] 
	Reached end of input file ./data/sample_2.bam
Closing file:./output/sample_1-2_SE_from_bam.fastq.gz
	Sequences writen: 24000
(pyBioTools) 

: 1

###  Paired-end reads usage from unaligned CRAM files

In [6]:
pyBioTools Alignment To_fastq \
    -i ./data/sample_1_20k.cram ./data/sample_2_20k.cram \
    -1 ./output/sample_1-2_PE_from_CRAM_1.fastq.gz \
    -2 ./output/sample_1-2_PE_from_CRAM_2.fastq.gz \
    --verbose \
    --progress

Opening file ./output/sample_1-2_PE_from_CRAM_1.fastq.gz in writing mode
Opening file ./output/sample_1-2_PE_from_CRAM_2.fastq.gz in writing mode
Parsing reads
Reading input file ./data/sample_1_20k.cram
	Reading: 12000 Reads [00:03, 3450.71 Reads/s]
	Reached end of input file ./data/sample_1_20k.cram
Reading input file ./data/sample_2_20k.cram
	Reading: 12000 Reads [00:03, 3449.30 Reads/s]
	Reached end of input file ./data/sample_2_20k.cram
Closing file:./output/sample_1-2_PE_from_CRAM_1.fastq.gz
	Sequences writen: 24000
Closing file:./output/sample_1-2_PE_from_CRAM_2.fastq.gz
	Sequences writen: 24000
(pyBioTools) 

: 1