Skip to content

Commit

Permalink
Add notebook on predicting statistics
Browse files Browse the repository at this point in the history
  • Loading branch information
cangermueller committed Apr 11, 2017
1 parent 80e4a7e commit 255ec5b
Show file tree
Hide file tree
Showing 5 changed files with 572 additions and 11 deletions.
2 changes: 1 addition & 1 deletion deepcpg/models/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,7 +573,7 @@ def __call__(self, data_files, class_weights=None, *args, **kwargs):
outputs[name] = data_raw['outputs/%s' % name]
cweights = class_weights[name] if class_weights else None
weights[name] = get_sample_weights(outputs[name], cweights)
if name == 'stats/cat_var':
if name.endswith('cat_var'):
output = outputs[name]
outputs[name] = to_categorical(output, 3)
outputs[name][output == dat.CPG_NAN] = 0
Expand Down
4 changes: 2 additions & 2 deletions docs/source/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ For debugging, testing, or reducing compute costs, ``--chromos`` can be used the
Predicting statistics
---------------------

For predicting statistics across methylation profiles, ``--stats`` and ``--win_stats`` can be used. These arguments specify a list of statistics that are computed across profiles for either a single CpG site or in a window of size ``--win_stats_wlen`` that is centered on a target CpG site. Following statistics are supported:
For predicting statistics across methylation profiles, ``--cpg_stats`` and ``--win_stats`` can be used. These arguments specify a list of statistics that are computed across profiles for either a single CpG site or in windows of length ``--win_stats_wlen`` that are centered on a CpG site. Following statistics are supported:

* ``mean``: the mean methylation rate.
* ``mode``: the mode of methylation rates.
Expand All @@ -81,7 +81,7 @@ For predicting statistics across methylation profiles, ``--stats`` and ``--win_s
* ``diff``: if a CpG site is differentially methylated, i.e. methylated in one profile but zero in others.
* ``cov``: the CpG coverage, i.e. the number of profiles for which the methylation state of the target CpG site is observed.

Statistics are only computed or CpG sites that are covered by at least ``--stats_cov`` (default 1) cells. Increasing ``--stats_cov`` will lead to more robust estimates.
Per-CpG statistics specified by ``--cpg_stats`` are computed only for CpG sites that are covered by at least ``--cpg_stats_cov`` (default 3) cells. Increasing ``--cpg_stats_cov`` will lead to more robust estimates.


Common issues
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ bash setup.sh
* [Fine-tuning](./notebooks/fine_tune/index.ipynb): Fine-tuning a pre-trained model to speed-up training.
* [Motif analysis](./notebooks/motifs/index.ipynb): Visualizing and analyzing learned motifs.
* [Mutations effects](./notebooks/snp/index.ipynb): Computing and visualizing mutations effects.
* [Predicting statistics](./notebooks/stats/index.ipynb): Predicting statistics such as cell-to-cell variance.

## Shell scripts
`./scripts` contains shell scrips with recommended default parameters. They may help you to easily build a DeepCpG pipeline for creating data, training models, and evaluating models. Set `test_mode` variable in scripts to `1` for testing, and `0` otherwise.
Expand Down

0 comments on commit 255ec5b

Please sign in to comment.