Skip to content

Commit

Permalink
update version
Browse files Browse the repository at this point in the history
  • Loading branch information
ploy-np committed Jul 15, 2021
1 parent a89e2bd commit 3e41ef0
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 49 deletions.
46 changes: 25 additions & 21 deletions docs/source/scripts.rst → docs/source/cmd.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _scripts:
.. _cmd:

Scripts
==========
Command line arguments
=======================

We provide 2 main scripts to run the analysis of differential RNA modifications as the following.

Expand All @@ -10,27 +10,20 @@ We provide 2 main scripts to run the analysis of differential RNA modifications

* Input

Output files from `nanopolish eventalgin`. Please refer to :ref:`Data preparation <preparation>` for the full Nanopolish command.

* Usage example
Output files from ``nanopolish eventalgin``. Please refer to :ref:`Data preparation <preparation>` for the full Nanopolish command.

================================= ========== =================== ============================================================================================================
Argument name(s) Required Default value Description
Argument name Required Default value Description
================================= ========== =================== ============================================================================================================
--eventalign=FILE Yes NA Eventalign filepath, the output from nanopolish.
--summary=FILE Yes NA Eventalign summary filepath, the output from nanopolish.
--out_dir=DIR Yes NA Output directory.
--ensembl=NUM No 91 Ensembl version for gene-transcript mapping.
--species=STR No homo_sapiens Species for ensembl gene-transcript mapping.
--customised_genome No False If customised genome provided.
--reference_name No NA fasta reference name.
--annotation_name No NA gtf annotation name.
--gtf_path_or_url No NA gtf file path or url.
--transcript_fasta_paths_or_urls No NA Transcript fasta paths or urls.
--gtf_path_or_url No NA GTF file path or url used for mapping transcriptomic to genomic coordinates.
--transcript_fasta_paths_or_urls No NA Transcript FASTA paths or urls used for mapping transcriptomic to genomic coordinates.
--skip_eventalign_indexing No False To skip indexing the eventalign nanopolish output.
--genome No False To run on Genomic coordinates. Without this argument, the program will run on transcriptomic coordinates.
--n_processes=NUM No 1 Number of processes to run.
--readcount_max=NUM No 1000 Maximum read counts per gene.
--readcount_min=NUM No 1 Minimum read counts per gene.
--resume No False With this argument, the program will resume from the previous run.
================================= ========== =================== ============================================================================================================

Expand All @@ -51,14 +44,12 @@ data.readcount csv Summary of readcounts per gene.

* Input

Output files from `xpore-dataprep`.

* Usage example
Output files from ``xpore-dataprep``.

=================== ========== =============== ==============================================================================
Argument name(s) Required Default value Description
Argument name Required Default value Description
=================== ========== =============== ==============================================================================
--config=FILE Yes NA Yaml configuraion filepath.
--config=FILE Yes NA YAML configurtaion filepath.
--n_processes=NUM No 1 Number of processes to run.
--save_models No False With this argument, the program will save the model parameters for each id.
--resume No False With this argument, the program will resume from the previous run.
Expand All @@ -73,4 +64,17 @@ File name File type Description
diffmod.table csv Output table information of differential modification rates. Please refer to :ref:`Output table description <outputtable>` for the full description.
diffmod.log txt Gene/Transcript ids being processed.
====================== =============== =================================================================================================================================================


``xpore-postprocessing``
**************************

* Input

The ``diffmod.table`` file from ``xpore-diffmod``.

====================== =============== =======================================================================
Argument name Required Description
====================== =============== =======================================================================
--diffmod_dir Yes Path of the directory containing ``diffmod.table``.
====================== =============== =======================================================================

4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@
author = 'Ploy N. Pratanwanich'

# The short X.Y version
version = '1.1'
version = '2.0'
# The full version, including alpha/beta/rc tags
release = '1.1'
release = '2.0'


# -- General configuration ---------------------------------------------------
Expand Down
11 changes: 7 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,18 @@
Welcome to xPore's documentation!
=================================
xPore is a Python package for detection of differentail RNA modifications from Nanopore sequencing data.
xPore is a Python package for identification of differentail RNA modifications from Nanopore sequencing data.

To install the latest release, run::

pip install xpore
pyensembl install --release 91 --species homo_sapiens # Please specify the compatible Ensembl release with your data when you install it.

See our :ref:`Installation page <installation>` for details.

To check the version of xPore, run::

xpore -v

To detect differential modifications, you can follow the instructions in our :ref:`Quickstart page <quickstart>`.

Contents
Expand All @@ -27,11 +30,11 @@ Contents
configuration
preparation
data
scripts
cmd

Contacts
--------
xPore is maintained by `Ploy N. Pratanwanich <https://github.com/ploy-rukawa>`_ and `Jonathan Goeke <https://github.com/jonathangoeke>`_ from the Genome Institute of Singapore, A*STAR.
xPore is maintained by `Ploy N. Pratanwanich <https://github.com/ploy-np>`_ and `Jonathan Goeke <https://github.com/jonathangoeke>`_ from the Genome Institute of Singapore, A*STAR.

If you want to contribute, please leave an issue in `our repo <https://github.com/GoekeLab/xpore>`_

Expand Down
3 changes: 1 addition & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ PyPI installation (recommended)
::

pip install xpore
pyensembl install --release 91 --species homo_sapiens # please specify the compatible Ensembl release with your data when you install it.

Installation from our GitHub repository
---------------------------------------
Expand All @@ -19,4 +18,4 @@ Installation from our GitHub repository
git clone https://github.com/GoekeLab/xpore.git
cd xpore
python setup.py install
pyensembl install --release 91 --species homo_sapiens # please specify the compatible Ensembl release with your data when you install it.

36 changes: 18 additions & 18 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Quickstart - Detection of differential RNA modifications
=========================================================

Download and extract the demo dataset from our `S3 bucket <http://s3.ap-southeast-1.amazonaws.com/all-public-data.store.genome.sg/xpore/demo.tar.gz>`_::
Download and extract the demo dataset from our `zenodo <https://zenodo.org/record/5103099/files/demo.tar.gz>`_::

wget http://s3.ap-southeast-1.amazonaws.com/all-public-data.store.genome.sg/xpore/demo.tar.gz
tar -xvf demo.tar.gz
Expand All @@ -15,17 +15,19 @@ After extraction, you will find::
|-- data
|-- HEK293T-METTL3-KO-rep1 # dataset dir
|-- HEK293T-WT-rep1 # dataset dir
|-- demo.gtf
|-- demo.fa
|-- demo.gtf # general transfer format file for gene-transcript mapping
|-- demo.fa # transcriptome reference file for gene-transcript mapping

Each dataset under the ``data`` directory contains the following directories:

* ``fast5`` : Raw signal FAST5 files
* ``fastq`` : Basecalled reads
* ``bamtx`` : Transcriptome-aligned sequence
* ``fast5`` : Raw signal, FAST5 files
* ``fastq`` : Basecalled reads, FASTQ file
* ``bamtx`` : Transcriptome-aligned sequence, BAM file
* ``nanopolish``: Eventalign files obtained from `nanopolish eventalign <https://nanopolish.readthedocs.io/en/latest/quickstart_eventalign.html>`_

1. Preprocess the data for each data set using ``xpore-dataprep``. (This step will take approximately 5h for 1 million reads)::
Note that the FAST5, FASTQ and BAM files are required to obtain the eventalign file with Nanopolish, xPore only requires the eventalign file. See our :ref:`Data preparation page <preparation>` for details to obtain the eventalign file from raw reads.

1. Preprocess the data for each data set using ``xpore dataprep``. Note that the ``--gtf_path_or_url`` and ``--transcript_fasta_paths_or_urls`` arguments are required to map transcriptomic to genomic coordinates when the ``--genome`` option is chosen, so that xPore can run based on genome coordinates. (This step will take approximately 5h for 1 million reads)::

# Within each dataset directory i.e. demo/data/HEK293T-METTL3-KO-rep1 and demo/data/HEK293T-WT-rep1, run
xpore-dataprep \
Expand All @@ -43,7 +45,7 @@ The output files are stored under ``dataprep`` in each dataset directory:
* ``data.readcount`` : Summary of readcounts per gene
* ``data.log`` : Log file

Run ``xpore-dataprep -h`` to explore the full usage.
Run ``xpore dataprep -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description.

2. Prepare a ``.yml`` configuration file. With this YAML file, you can specify the information of your design experiment, the data directories, the output directory, and the method options.
In the demo directory, there is an example configuration file ``Hek293T_config.yaml`` available that you can use as a starting template.
Expand All @@ -60,21 +62,21 @@ Below is how it looks like::
out: ./out # output dir


See the :ref:`Configuration file page <configuration>` for details.
See the :ref:`Configuration file page <configuration>` for more details.

3. Now that we have the data and the configuration file ready for modelling differential modifications using ``xpore-diffmod``.

::

# At the demo directory where the configuration file is, run.
xpore-diffmod --config Hek293T_config.yml
xpore diffmod --config Hek293T_config.yml

The output files are generated within the ``out`` directory:

* ``diffmod.table`` : Result table of differential RNA modification across all tested positions
* ``diffmod.log`` : Log file

Run ``xpore-diffmod -h`` to explore the full usage.
Run ``xpore diffmod -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description.

We can rank the significantly differentially modified sites based on ``pval_HEK293T-KO_vs_HEK293T-WT``. The results are shown below.::

Expand All @@ -87,15 +89,13 @@ We can rank the significantly differentially modified sites based on ``pval_HEK2

4. (Optional) We can consider only one modification type per k-mer by finding the majority ``mod_assignment`` of each k-mer.
For example, the majority of the modification means of ``GGACT`` (``mu_mod``) is lower than the non-modification counterpart (``mu_unmod``).
This can be achieved by simply running ``groupby`` on the ``kmer`` and ``mod_assignment`` columns in Python.
We can then remove those positions with the ``mod_assigment`` not in line with the majority in order to restrict ourselves with one modification type per kmer in the analysis.
You can find more details in our paper.
This can be done by running ``xpore-postprocessing``.
We can filter out those positions whose ``mod_assigment`` values are not in line with those of the majority in order to restrict ourselves with one modification type per kmer in the analysis.
This can be done by running ``xpore postprocessing``.

::

xpore-postprocessing --diffmod_dir out

With this command, we will get the final file in which only kmers with their ``mod_assignment`` different from the majority assigment of the corresponding kmer are removed. The output file ``majority_direction_kmer_diffmod.table`` is generated in the ``out`` directtory.
xpore postprocessing --diffmod_dir out

With this command, we will get the final file in which only kmers with their ``mod_assignment`` different from the majority assigment of the corresponding kmer are removed. The output file ``majority_direction_kmer_diffmod.table`` is generated in the ``out`` directtory. You can find more details in our paper.

Run ``xpore postprocessing -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
name=__pkg_name__,
license="MIT",
description='xpore is a python package for Nanopore data analysis of differential RNA modifications.',
version='v1.1',
version='v2.0',
long_description=README,
long_description_content_type='text/markdown',
url='https://github.com/GoekeLab/xpore',
Expand Down
2 changes: 1 addition & 1 deletion xpore/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.1"
__version__ = "2.0"

0 comments on commit 3e41ef0

Please sign in to comment.