Skip to content

Commit

Permalink
Made new etl and added logic for incorporting more caps in training
Browse files Browse the repository at this point in the history
  • Loading branch information
adnaniazi committed Aug 9, 2024
1 parent 2f9616f commit fa44e3f
Show file tree
Hide file tree
Showing 23 changed files with 2,378 additions and 220 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/cookiecutter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
python-version: 3.12

- name: Install dependencies
run: python -m pip install cruft poetry jello tabulate
Expand Down
19 changes: 18 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,22 @@
"workbench.tree.indent": 15,
"workbench.tree.renderIndentGuides": "always",
"workbench.colorCustomizations": {
"tree.indentGuidesStroke": "#05ef3c"}
"tree.indentGuidesStroke": "#05ef3c"
},
"[markdown]": {
"editor.tabSize": 4,
"editor.insertSpaces": true,
"editor.detectIndentation": false,
"editor.rulers": [
120
],
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 120
},
"files.associations": {
"*.md": "markdown"
},
"editor.formatOnSave": true,
"editor.formatOnPaste": true,
"editor.formatOnType": true
}
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Added
- Added ability to add more cap types to training
- Added a new train ETL pipeline that can handle larger than memory datasets

## [0.3.3] - 2024-08-08
### Fixed
Expand Down Expand Up @@ -139,4 +142,3 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
[0.1.2]: https://github.com/adnaniazi/capfinder/compare/0.1.1...0.1.2
[0.1.1]: https://pypi.org/manage/project/capfinder/release/0.1.1/
[0.1.0]: https://pypi.org/manage/project/capfinder/release/0.1.0/

134 changes: 28 additions & 106 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# capfinder
# Capfinder - A Tool for mRNA Cap type Prediction

[![PyPI](https://img.shields.io/pypi/v/capfinder?style=flat-square)](https://pypi.python.org/pypi/capfinder/)
[![PyPi Downloads](https://img.shields.io/pypi/dm/capfinder)](https://pypistats.org/packages/capfinder)
Expand All @@ -17,130 +17,52 @@

---

A package for decoding RNA cap types
Capfinder is a tool for predicting RNA cap types in mRNAs sequenced using Oxford Nanopore Technologies (ONT) RNA004 chemistry. It analyzes native RNA sequencing data to determine the cap structure of individual transcripts.

# Installing Capfinder
### Supported Cap Types
Currently, Capfinder can predict the following cap types:

## 1. Installing and activate new Python Environment
Please make a fresh conda/micromamba env with required supported Python versions like so:
```sh
micromamba create -n capfinder_env python=3.12
```
Next, activate the newly created conda env:
```sh
micromamba activate capfinder_env
```

## 2. Installing Capfinder package
- Cap0
- Cap1
- Cap2
- Cap2,-1

### CPU installation
### Requirements for mRNA data
For Capfinder to work correctly, the m7G moiety in mRNA samples must be first be removed from the 5' end of the mRNA (decapping).
The following 52-nucleotide oligonucleotide extension (OTE) must be ligated to the 5' end of each mRNA molecule:
```sh
pip install capfinder[cpu]
```

### GPU installation (CUDA 12)
```sh
pip install capfinder[gpu] "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
Capfinder depends on JAX internally for using GPUS. Jax requires CUDA to work. So the CUDA requierments for capfinder are the same as the CUDA requirements for JAX.
For more information [here](https://jax.readthedocs.io/en/latest/installation.html) on the required CUDA version for JAX.

### TPU installation
```sh
pip install capfinder[tpu] "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
```

# 3. Updating Capfinder

If you are using an older version of Capfinder, and would like to upgrade to the latest version, then please do the following in your activate Python enviorment:
### Updating Capfinder on CPU-based system
```sh
pip install capfinder[cpu]
```

### Updating Capfinder on GPU-based system
```sh
pip install capfinder[gpu]
```

### Updating Capfinder on TPU-based system
```sh
pip install capfinder[tpu]
```


## Development

* Clone this repository
* Requirements:
* [Poetry](https://python-poetry.org/)
* Python 3.7+
* Create a virtual environment and install the dependencies

### CPU installation
```sh
poetry install --extras cpu
```

### GPU installation (CUDA 12)
```sh
poetry install --extras gpu
poetry run pip install "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

### TPU installation
```sh
poetry install --extras tpu
poetry run pip install "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
5'-GCTTTCGTTCGTCTCCGGACTTATCGCACCACCTATCCATCATCAGTACTGT-3'
```
---
# Installing Capfinder

* Activate the virtual environment
### 1. Make and activate a new Python Environment
- [Creating new environment](docs/environment.md)

```sh
poetry shell
```

### Testing

```sh
pytest
```
### 2. Install Capfinder package

### Documentation
- [Installation](docs/installation.md)

The documentation is automatically generated from the content of the [docs directory](./docs) and from the docstrings
of the public signatures of the source code. The documentation is updated and published as a [Github project page
](https://pages.github.com/) automatically as part each release.
---
# Usage

### Releasing
### 1. Preprocessing: Basecalling and alignment

Trigger the [Draft release workflow](https://github.com/adnaniazi/capfinder/actions/workflows/draft_release.yml)
(press _Run workflow_). This will update the changelog & version and create a GitHub release which is in _Draft_ state.
- [Data Preprocessing](docs/preprocessing.md)

Find the draft release from the
[GitHub releases](https://github.com/adnaniazi/capfinder/releases) and publish it. When
a release is published, it'll trigger [release](https://github.com/adnaniazi/capfinder/blob/master/.github/workflows/release.yml) workflow which creates PyPI
release and deploys updated documentation.

### Pre-commit
### 2. Predicting Cap Types with Capfinder

Pre-commit hooks run all the auto-formatters (e.g. `black`, `isort`), linters (e.g. `mypy`, `flake8`), and other quality
checks to make sure the changeset is in good shape before a commit/push happens.
- [Usage Guide](docs/prediction.md)

You can install the hooks with (runs for each commit):

```sh
pre-commit install
```
# Updating Capfinder

Or if you want them to run only for each push:
- [Updating Capfinder](docs/updating.md)

```sh
pre-commit install -t pre-push
```

Or if you want e.g. want to run all checks manually for all files:
## Development

```sh
pre-commit run --all-files
```
- [Development](docs/development.md)
78 changes: 78 additions & 0 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
* Clone this repository
* Requirements:
* [Poetry](https://python-poetry.org/)
* Python 3.10 - 3.12
* Create a virtual environment and install the dependencies

### Creating dev enviornment
```sh
micromamba create -n capfinder_env python=3.12
micromamba activate capfinder_env
```

### Installation

=== "CPU"

```sh
poetry install --extras cpu
```

=== "GPU (CUDA 12)"

```sh
poetry install --extras gpu
poetry run pip install "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

=== "TPU"

```sh
poetry install --extras tpu
poetry run pip install "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
```

### Testing

```sh
pytest
```

### Documentation

The documentation is automatically generated from the content of the [docs directory](./docs) and from the docstrings
of the public signatures of the source code. The documentation is updated and published as a [Github project page
](https://pages.github.com/) automatically as part each release.

### Releasing

Trigger the [Draft release workflow](https://github.com/adnaniazi/capfinder/actions/workflows/draft_release.yml)
(press _Run workflow_). This will update the changelog & version and create a GitHub release which is in _Draft_ state.

Find the draft release from the
[GitHub releases](https://github.com/adnaniazi/capfinder/releases) and publish it. When
a release is published, it'll trigger [release](https://github.com/adnaniazi/capfinder/blob/master/.github/workflows/release.yml) workflow which creates PyPI
release and deploys updated documentation.

### Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. `black`, `isort`), linters (e.g. `mypy`, `flake8`), and other quality
checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

```sh
pre-commit install
```

Or if you want them to run only for each push:

```sh
pre-commit install -t pre-push
```

Or if you want e.g. want to run all checks manually for all files:

```sh
pre-commit run --all-files
```
31 changes: 30 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,30 @@
--8<-- "README.md"
# Capfinder - A tool for mRNA Cap Type Prediction

[![PyPI](https://img.shields.io/pypi/v/capfinder?style=flat-square)](https://pypi.python.org/pypi/capfinder/)
[![PyPi Downloads](https://img.shields.io/pypi/dm/capfinder)](https://pypistats.org/packages/capfinder)
[![CI/CD](https://github.com/adnaniazi/capfinder/actions/workflows/release.yml/badge.svg)](https://github.com/adnaniazi/capfinder/actions/workflows/release.yml)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/capfinder?style=flat-square)](https://pypi.python.org/pypi/capfinder/)
[![PyPI - License](https://img.shields.io/pypi/l/capfinder?style=flat-square)](https://pypi.python.org/pypi/capfinder/)

---

Capfinder is a specialized tool designed for predicting RNA cap types in mRNAs sequenced using Oxford Nanopore Technologies (ONT) SQK-RNA004 chemistry. By analyzing native RNA sequencing data, Capfinder can determine the cap structure of individual transcripts with high accuracy.

### Supported Cap Types

Capfinder currently supports the prediction of the following cap types:

1. Cap0
2. Cap1
3. Cap2
4. Cap2,-1

### mRNA Sample Preparation Requirements
To ensure optimal performance of Capfinder, mRNA samples must be prepared according to the following specifications:

- **Decapping:** The m7G moiety at the 5' end of the mRNA must be removed (decapping process).
- **Oligonucleotide Extension (OTE):** A specific 52-nucleotide sequence must be ligated to the 5' end of each mRNA molecule. The OTE sequence is as follows:
```sh
5'-GCTTTCGTTCGTCTCCGGACTTATCGCACCACCTATCCATCATCAGTACTGT-3'
```
- **Sequencing:** Samples should be sequenced using Oxford Nanopore Technologies (ONT) SQK-RNA004 chemistry.
Loading

0 comments on commit fa44e3f

Please sign in to comment.