#### preprocess_features.py
Merge a set of feature BED files for training into a single BED and activity table.
| Arguments | Type | Description |
|---|---|---|
| target_beds_file | table listing labels and BED | One line per sample- label then BED path |
| Options | Variable | Description |
|---|---|---|
| -a | db_act_file | Existing database activity table |
| -b | db_bed | Existing database BED |
| -c | chrom_lengths_file | Table of chromosome lengths |
| -m | merge_overlap | Overlap length (after extension to feature_size) above which to merge features [Default: 200] |
| -n | no_db_activity | Do not pass along the activities of the database sequences [Default: False] |
| -o | out_prefix | Output file prefix [Default: features] |
| -s | feature_size | Extend features to this size [Default: 600] |
| -y | ignore_y | Ignore Y chromsosome features [Default: False] |
#### seq_hdf5.py
Construct an HDF5 file, dividng the data into training, validation, and test subsets.
| Arguments | Type | Description |
|---|---|---|
| fasta_file | FASTA | FASTA file of sequences. |
| targets_file | Table | Targets activity table. |
| out_file | HDF5 | Output HDF5 file. |
| Options | Variable | Description |
|---|---|---|
| -b | batch_size | Align sizes with batch size |
| -c | counts | Validation and training percentages are given as raw counts [Default: False] |
| -r | permute | Permute sequences [Default: False] |
| -s | random_seed | numpy.random seed [Default: 1] |
| -t | test_pct | Test % [Default: 0] |
| -v | valid_pct | Validation % [Default: 0] |
#### basset_sample.py
Sample sequences from an existing database.
| Arguments | Type | Description |
|---|---|---|
| db_bed | BED | Existing database BED. |
| db_act_file | Table | Existing database activity table. |
| sample_seqs | int | Number of sequences to sample. |
| output_prefix | str | Filename prefix for output BED and activity table files. |
| Options | Variable | Description |
|---|---|---|
| -s | seed | Random number generator seed [Default: 1] |