update version

GoekeLab · Jul 15, 2021 · 3e41ef0 · 3e41ef0
1 parent a89e2bd
commit 3e41ef0
Show file tree

Hide file tree

Showing 7 changed files with 55 additions and 49 deletions.
diff --git a/docs/source/scripts.rst → docs/source/cmd.rst b/docs/source/scripts.rst → docs/source/cmd.rst
@@ -1,7 +1,7 @@
-.. _scripts:
+.. _cmd:
 
-Scripts
-==========
+Command line arguments
+=======================
 
 We provide 2 main scripts to run the analysis of differential RNA modifications as the following.
 
@@ -10,27 +10,20 @@ We provide 2 main scripts to run the analysis of differential RNA modifications
 
 * Input
 
-Output files from `nanopolish eventalgin`. Please refer to :ref:`Data preparation <preparation>` for the full Nanopolish command.
-
-* Usage example
+Output files from ``nanopolish eventalgin``. Please refer to :ref:`Data preparation <preparation>` for the full Nanopolish command.
 
 =================================   ==========  ===================  ============================================================================================================
-Argument name(s)                    Required    Default value         Description
+Argument name                       Required    Default value         Description
 =================================   ==========  ===================  ============================================================================================================
 --eventalign=FILE                   Yes         NA                    Eventalign filepath, the output from nanopolish.         
---summary=FILE                      Yes         NA                    Eventalign summary filepath, the output from nanopolish.
 --out_dir=DIR                       Yes         NA                    Output directory.
---ensembl=NUM                       No          91                    Ensembl version for gene-transcript mapping.
---species=STR                       No          homo_sapiens          Species for ensembl gene-transcript mapping.
---customised_genome                 No          False                 If customised genome provided.
---reference_name                    No          NA                    fasta reference name.
---annotation_name                   No          NA                    gtf annotation name.
---gtf_path_or_url                   No          NA                    gtf file path or url.
---transcript_fasta_paths_or_urls    No          NA                    Transcript fasta paths or urls.
+--gtf_path_or_url                   No          NA                    GTF file path or url used for mapping transcriptomic to genomic coordinates.
+--transcript_fasta_paths_or_urls    No          NA                    Transcript FASTA paths or urls used for mapping transcriptomic to genomic coordinates.
 --skip_eventalign_indexing          No          False                 To skip indexing the eventalign nanopolish output.
 --genome                            No          False                 To run on Genomic coordinates. Without this argument, the program will run on transcriptomic coordinates.
 --n_processes=NUM                   No          1                     Number of processes to run.
 --readcount_max=NUM                 No          1000                  Maximum read counts per gene.
+--readcount_min=NUM                 No          1                     Minimum read counts per gene.
 --resume                            No          False                 With this argument, the program will resume from the previous run.
 =================================   ==========  ===================  ============================================================================================================
 
@@ -51,14 +44,12 @@ data.readcount          csv             Summary of readcounts per gene.
 
 * Input
 
-Output files from `xpore-dataprep`.
-
-* Usage example
+Output files from ``xpore-dataprep``.
 
 ===================  ==========  ===============      ==============================================================================
-Argument name(s)      Required    Default value       Description
+Argument name         Required    Default value       Description
 ===================  ==========  ===============      ==============================================================================
---config=FILE           Yes         NA                Yaml configuraion filepath.
+--config=FILE           Yes         NA                YAML configurtaion filepath.
 --n_processes=NUM       No          1                 Number of processes to run.
 --save_models           No          False             With this argument, the program will save the model parameters for each id.
 --resume                No          False             With this argument, the program will resume from the previous run.
@@ -73,4 +64,17 @@ File name                File type           Description
 diffmod.table            csv                 Output table information of differential modification rates. Please refer to :ref:`Output table description <outputtable>` for the full description.   
 diffmod.log              txt                 Gene/Transcript ids being processed.
 ======================  ===============     =================================================================================================================================================
-
+
+``xpore-postprocessing``
+**************************
+
+* Input
+
+The ``diffmod.table`` file  from ``xpore-diffmod``.
+
+======================  ===============     =======================================================================
+Argument name            Required           Description
+======================  ===============     =======================================================================
+--diffmod_dir            Yes                Path of the directory containing ``diffmod.table``.
+======================  ===============     =======================================================================
+
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -26,9 +26,9 @@
 author = 'Ploy N. Pratanwanich'
 
 # The short X.Y version
-version = '1.1'
+version = '2.0'
 # The full version, including alpha/beta/rc tags
-release = '1.1'
+release = '2.0'
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -5,15 +5,18 @@
 
 Welcome to xPore's documentation!
 =================================
-xPore is a Python package for detection of differentail RNA modifications from Nanopore sequencing data.
+xPore is a Python package for identification of differentail RNA modifications from Nanopore sequencing data.
 
 To install the latest release, run::
 
     pip install xpore
-    pyensembl install --release 91 --species homo_sapiens  # Please specify the compatible Ensembl release with your data when you install it.
 
 See our :ref:`Installation page <installation>` for details.
 
+To check the version of xPore, run::
+
+    xpore -v
+
 To detect differential modifications, you can follow the instructions in our :ref:`Quickstart page <quickstart>`. 
 
 Contents
@@ -27,11 +30,11 @@ Contents
    configuration
    preparation
    data
-   scripts
+   cmd
 
 Contacts
 --------
-xPore is maintained by `Ploy N. Pratanwanich <https://github.com/ploy-rukawa>`_ and `Jonathan Goeke <https://github.com/jonathangoeke>`_ from the Genome Institute of Singapore, A*STAR. 
+xPore is maintained by `Ploy N. Pratanwanich <https://github.com/ploy-np>`_ and `Jonathan Goeke <https://github.com/jonathangoeke>`_ from the Genome Institute of Singapore, A*STAR. 
 
 If you want to contribute, please leave an issue in `our repo <https://github.com/GoekeLab/xpore>`_
 

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -10,7 +10,6 @@ PyPI installation (recommended)
 ::
 
     pip install xpore
-    pyensembl install --release 91 --species homo_sapiens  # please specify the compatible Ensembl release with your data when you install it.
 
 Installation from our GitHub repository
 ---------------------------------------
@@ -19,4 +18,4 @@ Installation from our GitHub repository
     git clone https://github.com/GoekeLab/xpore.git
     cd xpore
     python setup.py install
-    pyensembl install --release 91 --species homo_sapiens  # please specify the compatible Ensembl release with your data when you install it.
+
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -3,7 +3,7 @@
 Quickstart - Detection of differential RNA modifications
 =========================================================
 
-Download and extract the demo dataset from our `S3 bucket <http://s3.ap-southeast-1.amazonaws.com/all-public-data.store.genome.sg/xpore/demo.tar.gz>`_::
+Download and extract the demo dataset from our `zenodo <https://zenodo.org/record/5103099/files/demo.tar.gz>`_::
 
     wget http://s3.ap-southeast-1.amazonaws.com/all-public-data.store.genome.sg/xpore/demo.tar.gz
     tar -xvf demo.tar.gz
@@ -15,17 +15,19 @@ After extraction, you will find::
     |-- data
         |-- HEK293T-METTL3-KO-rep1  # dataset dir
         |-- HEK293T-WT-rep1 # dataset dir
-    |-- demo.gtf
-    |-- demo.fa
+    |-- demo.gtf # general transfer format file for gene-transcript mapping  
+    |-- demo.fa # transcriptome reference file for gene-transcript mapping
 
 Each dataset under the ``data`` directory contains the following directories:
 
-* ``fast5`` : Raw signal FAST5 files
-* ``fastq`` : Basecalled reads
-* ``bamtx`` : Transcriptome-aligned sequence
+* ``fast5`` : Raw signal, FAST5 files
+* ``fastq`` : Basecalled reads, FASTQ file
+* ``bamtx`` : Transcriptome-aligned sequence, BAM file
 * ``nanopolish``: Eventalign files obtained from `nanopolish eventalign <https://nanopolish.readthedocs.io/en/latest/quickstart_eventalign.html>`_
 
-1. Preprocess the data for each data set using ``xpore-dataprep``. (This step will take approximately 5h for 1 million reads)::
+Note that the FAST5, FASTQ and BAM files are required to obtain the eventalign file with Nanopolish, xPore only requires the eventalign file. See our :ref:`Data preparation page <preparation>` for details to obtain the eventalign file from raw reads.
+
+1. Preprocess the data for each data set using ``xpore dataprep``. Note that the ``--gtf_path_or_url`` and ``--transcript_fasta_paths_or_urls`` arguments are required to map transcriptomic to genomic coordinates when the ``--genome`` option is chosen, so that xPore can run based on genome coordinates. (This step will take approximately 5h for 1 million reads)::
 
     # Within each dataset directory i.e. demo/data/HEK293T-METTL3-KO-rep1 and demo/data/HEK293T-WT-rep1, run
     xpore-dataprep \
@@ -43,7 +45,7 @@ The output files are stored under ``dataprep`` in each  dataset directory:
 * ``data.readcount`` : Summary of readcounts per gene
 * ``data.log`` : Log file
 
-Run ``xpore-dataprep -h`` to explore the full usage.
+Run ``xpore dataprep -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description. 
 
 2. Prepare a ``.yml`` configuration file. With this YAML file, you can specify the information of your design experiment, the data directories, the output directory, and the method options.
 In the demo directory, there is an example configuration file ``Hek293T_config.yaml`` available that you can use as a starting template.
@@ -60,21 +62,21 @@ Below is how it looks like::
     out: ./out # output dir
 
 
-See the :ref:`Configuration file page <configuration>` for details.
+See the :ref:`Configuration file page <configuration>` for more details.
 
 3. Now that we have the data and the configuration file ready for modelling differential modifications using ``xpore-diffmod``. 
 
 ::
 
     # At the demo directory where the configuration file is, run.
-    xpore-diffmod --config Hek293T_config.yml
+    xpore diffmod --config Hek293T_config.yml
 
 The output files are generated within the ``out`` directory:
 
 * ``diffmod.table`` : Result table of differential RNA modification across all tested positions
 * ``diffmod.log`` : Log file
 
-Run ``xpore-diffmod -h`` to explore the full usage.
+Run ``xpore diffmod -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description.
 
 We can rank the significantly differentially modified sites based on ``pval_HEK293T-KO_vs_HEK293T-WT``. The results are shown below.::
 
@@ -87,15 +89,13 @@ We can rank the significantly differentially modified sites based on ``pval_HEK2
 
 4. (Optional) We can consider only one modification type per k-mer by finding the majority ``mod_assignment`` of each k-mer. 
 For example, the majority of the modification means of ``GGACT`` (``mu_mod``) is lower than the non-modification counterpart (``mu_unmod``). 
-This can be achieved by simply running ``groupby`` on the ``kmer`` and ``mod_assignment`` columns in Python.
-We can then remove those positions with the ``mod_assigment`` not in line with the majority in order to restrict ourselves with one modification type per kmer in the analysis.
-You can find more details in our paper.
-This can be done by running ``xpore-postprocessing``.
+We can filter out those positions whose ``mod_assigment`` values are not in line with those of the majority in order to restrict ourselves with one modification type per kmer in the analysis.
+This can be done by running ``xpore postprocessing``.
 
 ::
 
-    xpore-postprocessing --diffmod_dir out
-
-With this command, we will get the final file in which only kmers with their ``mod_assignment`` different from the majority assigment of the corresponding kmer are removed. The output file ``majority_direction_kmer_diffmod.table`` is generated in the ``out`` directtory. 
+    xpore postprocessing --diffmod_dir out
 
+With this command, we will get the final file in which only kmers with their ``mod_assignment`` different from the majority assigment of the corresponding kmer are removed. The output file ``majority_direction_kmer_diffmod.table`` is generated in the ``out`` directtory. You can find more details in our paper.
 
+Run ``xpore postprocessing -h`` or visit our :ref:`Command line arguments <cmd>` to explore the full usage description.
diff --git a/setup.py b/setup.py
@@ -14,7 +14,7 @@
     name=__pkg_name__,
     license="MIT",
     description='xpore is a python package for Nanopore data analysis of differential RNA modifications.',
-    version='v1.1',
+    version='v2.0',
     long_description=README,
     long_description_content_type='text/markdown',
     url='https://github.com/GoekeLab/xpore',

diff --git a/xpore/__init__.py b/xpore/__init__.py
@@ -1 +1 @@
-__version__ = "1.1"
+__version__ = "2.0"