<img src="https://raw.githubusercontent.com/MasaakiU/MultiplexNanopore/master/resources/logo/SAVEMONEY_logo_with_letter_batch.png"  height="autox" width="600p">

*Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You!*

# Overview of SAVEMONEY

SAVEMONEY guides researchers to mix multiple plasmids for submission as a single sample to a commercial long-read sequencing service (e.g., Oxford Nanopore Technology), reducing overall sequencing costs while maintaining fidelity of sequencing results. Following is the outline of the procedure:

- <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">**Step 1. Pre-survey**</a> takes plasmid maps as inputs and provides users with optimal groupings of plasmids.
- <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step2">**Step 2. Submit samples**</a> according to the output of the pre-survey.
- <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step3">**Step 3. Post-analysis**</a> executes a computational deconvolution of the obtained results, and generates a consensus sequence for each plasmid.
- An optional step, <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step4">**Step 4. Visualization of results (optional)**</a> provides a platform for the detailed examination of the alignments and consensus generated in the post-analysis.

For more information, please see [SAVEMONEY main notebook](https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb) and [SAVEMONEY GitHub](https://github.com/MasaakiU/MultiplexNanopore).

# What is SAVEMONEY BATCH?

SAVEMONEY BATCH is designed to execute <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step3">Step 3</a> multiple times at once. You have to provide:

- `recommended_grouping.txt` file from <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">Step 1</a>
- plasmid maps (`*.fasta`, `*.fa`, or `*.fastq` files) used in <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">Step 1</a>
- raw sequencing results file(s) (`*.fastq` file) obtained after <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step2">Step 2</a>

SAVEMONEY BATCH will classify plasmid maps and raw sequencing results according to the `recommended_grouping.txt` file, then execute <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step3">Step 3</a> sample (i.e., the mixture of plasmids) by sample.
**It requires that samples be submitted based on the results of <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">Step 1</a>.**

---

We would appreciate it if you could cite our manuscript if you use it toward any publications.<br>
[Uematsu, M. and Baskin, J.M. Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing. *eLife*. **2023**; 12: RP88794](https://doi.org/10.7554/eLife.88794.1)

<a name="Execute"></a>
# Execute SAVEMONEY BATCH

Follow steps below to execute the post-analysis. After the completion of all processes, `fastq_file_name.zip` files will appear in the `sample_data` directory for each `*.fastq` file. Right-click on the zip file to download it to your local storage. The zip file will also be automatically uploaded to your Google Drive, unless you uncheck `save_to_goole_drive` option below.

In [None]:
#@markdown <a name="Step3no1"></a>
#@markdown ## 1. Upload files
#@markdown The following three types of files must be uploaded to the `sample_data` directory, which is accessed by clicking on the file folder icon at left:
#@markdown - `recommended_grouping.txt` file from <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">Step 1</a>
#@markdown - the plasmid maps (`*.fasta`, `*.fa`, or `*.dna`) used in <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step1">Step 1</a>
#@markdown - the raw nanopore sequencing results (`*.fastq`) obtained after <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step2">Step 2</a>

#@markdown Only one `*.fastq` file per sample is accepted for SAVEMONEY BATCH.

#@markdown ## 2. Set combinations
#@markdown Click this cell and hit `Runtime` -> `Run the focused cell`.
#@markdown Dropdown menus will appear, select appropriate combinations.

import re
from IPython.display import display
from ipywidgets import Dropdown, VBox, HBox, Layout, interactive_output, Label, HTML
from pathlib import Path
pwd = Path('/content/sample_data/')
uploaded_recommended_grouping_path = pwd / "recommended_grouping.txt"
uploaded_fastq_files = sorted([path for path in pwd.glob("*.fastq")])
uploaded_refseq_files = sorted([path for path in pwd.glob("*.*") if path.suffix in (".dna", ".fa", ".fasta")])

if not uploaded_recommended_grouping_path.exists():
    raise Exception("Please upload `recommended_grouping.txt` file under the 'sample_data' directory!")
if not len(uploaded_fastq_files) > 0:
    raise Exception("Please upload `*.fastq` files under the 'sample_data' directory!")
if not len(uploaded_refseq_files) > 0:
    raise Exception("Please upload plasmid map files under the 'sample_data' directory!")


""" The script detect text with the following style in the `recommended_grouping.txt` file
# recommended_grouping_txt(str)
=== Group 1 ===
abb1	plasmid_name_1.dna
abb2	plasmid_name_2.dna
abb3	plasmid_name_3.dna

=== Group 2 ===
abb4	plasmid_name_4.dna
abb5	plasmid_name_5.dna
abb6	plasmid_name_6.dna
"""

# read recommended_grouping_txt
class RecommendedGrouping():
    def __init__(self):
        self.group_dict = {}
        keep_record = False
        new_group = ""
        group_number = 0
        with open(uploaded_recommended_grouping_path, "r") as f:
            lines = f.readlines()
            for l in lines + ["=== Group 0 ==="]:
                if l.strip() == "# recommended_grouping_txt(str)":
                    keep_record = True
                elif keep_record:
                    if l.startswith("="):
                        self.group_dict[group_number] = new_group
                        group_number = int(re.match(r"=== Group ([0-9]+) ===", l.strip()).group(1))
                        new_group = Group(group_number)
                    elif l.strip() == "":
                        pass
                    elif l.startswith("# "):
                        self.group_dict[group_number] = new_group
                        keep_record = False
                    else:
                        new_group.add(*l.strip().split("\t"))
    def assert_names_abbreviations(self):
        names = []
        abbreviations = []
        for group in self.group_dict.values():
            if group == "":
                continue
            names.extend(list(group.group_contents.values()))
            abbreviations.extend(list(group.group_contents.keys()))
        assert len(names) == len(set(names))
        assert len(abbreviations) == len(set(abbreviations))
    def __str__(self):
        return "\n".join([str(group) for group in self.group_dict.values()])
    def __len__(self):
        return len(self.group_dict) - 1 # 最初のグループは pseudo group である

class Group():
    def __init__(self, group_number):
        self.group_number = group_number
        self.group_contents = {}
    def add(self, abbreviation, plasmid_name):
        self.group_contents[abbreviation] = plasmid_name
    def __str__(self):
        return "\n".join(f"{k}\t{v}" for k, v in self.group_contents.items())

rg = RecommendedGrouping()
rg.assert_names_abbreviations()

if len(uploaded_fastq_files) != len(rg):
    raise Exception("The number of `*.fastq` files and the number of groups provided in `recommended_grouping.txt` file dies not match!")

# drop_down_input = 'Dropdown_input_1'
child_widget_list = []
class DDDict(dict):
    def __init__(self):
        self.ignore_event = False
    def value_changed(self, change):
        if self.ignore_event:
            return
        else:
            old = change["old"]
            new = change["new"]
            owner  = change["owner"]
            detected_dd_list = self.search_dd_by_value(new)
            assert len(detected_dd_list) == 2
            if detected_dd_list[0] == owner:    another_dd = detected_dd_list[1]
            else:                               another_dd = detected_dd_list[0]

            # 値を変更
            self.ignore_event = True
            another_dd.value = old
            self.ignore_event = False
    def search_dd_by_value(self, value):
        return [dd for group_number, dd in self.items() if dd.value == value]

dd_dict = DDDict()
nl = "\n"
for group_number, group in rg.group_dict.items():
    if group_number == 0:   # 最初のグループはシュードグループ
        continue
    # 枠作り
    dd = Dropdown(options=[path.name for path in uploaded_fastq_files], value=uploaded_fastq_files[group_number - 1].name)
    hbox = HBox([Label(f"=== Group {group_number} ===", layout=Layout(width="100px")), dd])
    vbox = VBox([hbox, HTML(f'<div style="line-height: 1; padding-left: 10px; margin-bottom: 10px;">{str(group).replace(nl, "<br>")}</div>')])
    child_widget_list.append(vbox)
    dd_dict[group_number] = dd
    dd.observe(dd_dict.value_changed, names=["value"])

# display
display(VBox(children=child_widget_list))

In [None]:
#@markdown ## 3. Execute
#@markdown Click this cell and hit `Runtime` -> `Run after`. Set parameters below if necessary.
save_to_google_drive = True #@param {type:"boolean"}
if save_to_google_drive:
    from pydrive.drive import GoogleDrive
    from pydrive.auth import GoogleAuth
    from google.colab import auth
    from oauth2client.client import GoogleCredentials
    auth.authenticate_user()
    gauth = GoogleAuth()
    gauth.credentials = GoogleCredentials.get_application_default()
    drive = GoogleDrive(gauth)
    print("You are logged into Google Drive and are good to go!")

# params
gap_open_penalty = 3   #param {type:"integer"}
gap_extend_penalty = 1 #param {type:"integer"}
match_score = 1        #param {type:"integer"}
mismatch_score = -2    #param {type:"integer"}
param_dict = {i:globals()[i] for i in (
    'gap_open_penalty',
    'gap_extend_penalty',
    'match_score',
    'mismatch_score',
)}
score_threshold = 0.5  #@param {type:"number"}
param_dict["score_threshold"] = score_threshold

#@markdown <ul>
#@markdown <details><summary>Description</summary>
#@markdown The <code>score_threshold</code> (previously konwn as <code>threshold_post</code>) is a user-defined value between (recommended range of 0.2–0.8) that represents a cutoff for short reads.
#@markdown The default value of 0.5 provides a balance between including a higher number of reads and reads of sufficient length/quality.
#@markdown For example, if the total number of reads is small but plasmids in the mixture were quite different from one another, the threshold can be lowered below 0.5 to increase the number of reads that are passed on to the next step of analysis.
#@markdown Conversely, if the total number of reads is large and the plasmids in the mixture are highly similar to one another, the quality of subsequent analysis can be improved by raising the threshold above 0.5.
#@markdown See also <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#Step3no5">Step 3-5. Set threshold for assignment</a>, and "Summary distribution" and "Summary scatter" section of <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResults">Interpretation of results (post-analysis)</a>.
#@markdown
#@markdown </details>
#@markdown </ul>
error_rate = 0.00001     #@param {type:"number"}
del_mut_rate = error_rate / 4     # e.g. "A -> T, C, G, del"
ins_rate   = 0.00001     #@param {type:"number"}
window   = 160          #@param {type:"number"}
param_dict["error_rate"] = error_rate
param_dict["del_mut_rate"] = del_mut_rate
param_dict["ins_rate"] = ins_rate
param_dict["window"] = window

#@markdown <ul>
#@markdown <details><summary>Description (<code>error_rate</code> and <code>ins_rate</code>)</summary>
#@markdown These variables represent the prior probability of errors that occur during the plasmid construction, such as PCR, ligation, and assembly.
#@markdown <ul><li>The <code>error_rate</code> represents the prior probability that a base is replaced by another base including deletion.
#@markdown For example, mutation or deletion from "A" to "T", "C", "G", or "–". These conversions are treated equaly, i.e., prior probability of conversion from "A" to "T" is a quarter of <code>error_rate</code>.
#@markdown </li>
#@markdown <li>The <code>ins_rate</code>represents the prior probability of insertion.
#@markdown For example, the conversion of sequence "AT" to "ANT", where "N" represents one of the "A", "T", "C", and "G.
#@markdown </li>
#@markdown </ul>
#@markdown Regardless of these prior probability values, the results without considering them are always generated, which is the equivalent of using <code>error_rate</code> and <code>ins_rate</code> values of 0.8 and 0.8, respectively.
#@markdown See also "Consensus FASTQ files" section in <a href="https://colab.research.google.com/github/MasaakiU/MultiplexNanopore/blob/master/colab/MultiplexNanopore.ipynb#InterpretationOfResultsPost">Interpretation of results (post-analysis)</a>.
#@markdown </details>
#@markdown <details><summary>Description (<code>window</code>)</summary>
#@markdown The value represents the maximum detectable length of repetitive sequences when wrong plasmid maps are provided. If you suspect that the region of 80 nt might be repeated adjascently two times, put the value of 160.
#@markdown If the value is too large, it will take a long time to execute the process.
#@markdown </details>
#@markdown </ul>

#@markdown <font color='red'>**All processes are executed automatically.**</font>

# ファイルの組み合わせを取得
grouping_list = [
    [dd.value] + list(rg.group_dict[group_number].group_contents.values())
    for group_number, dd in dd_dict.items()
]

# もう一度確認
for grouping in grouping_list:
    assert len(grouping) == len(set(grouping))


print("installing dependencies...")
# import sys, os
# save_stdout =  sys.stdout
# sys.stdout = open(os.devnull, 'w')

!pip install savemoney==0.2.16
!wget -nc https://raw.githubusercontent.com/MasaakiU/MultiplexNanopore/master/resources/font/Arial.ttf
!wget -nc https://raw.githubusercontent.com/MasaakiU/MultiplexNanopore/master/resources/font/Arial\ Bold.ttf

# sys.stdout.close()
# sys.stdout = save_stdout
print("installation: DONE")

import shutil
import savemoney
import matplotlib.font_manager as fm
font_path = '/content/Arial.ttf'
font_path_bold = '/content/Arial Bold.ttf'
fm.fontManager.addfont(font_path)
fm.fontManager.addfont(font_path_bold)

# 実行ループ
param_dict = {}
tmp_folder_path = pwd / "tmp_folder"
for grouping in grouping_list:
    # 実行フォルダ内部のファイルをすべて削除
    if tmp_folder_path.exists():
        shutil.rmtree(tmp_folder_path)
    # フォルダ作成・ファイル移動
    tmp_folder_path.mkdir(exist_ok=True)
    for file_name in grouping:
        shutil.copy(pwd / file_name, tmp_folder_path / file_name)
    # 中間結果があればそれも移動
    ir_file_path = (pwd / grouping[0]).with_suffix(".intermediate_results.ir")
    if ir_file_path.exists():
        shutil.copy(ir_file_path, tmp_folder_path / ir_file_path.name)
    # 実行
    save_dir = savemoney.post_analysis(tmp_folder_path, pwd, **param_dict)
    # compress as zip
    import os
    from pathlib import Path
    import zipfile
    zip_path = save_dir.with_suffix(".zip")
    shutil.make_archive(pwd / zip_path.stem, 'zip', root_dir=save_dir)

    if save_to_google_drive == True and drive:
        uploaded = drive.CreateFile({'title': zip_path.name})
        uploaded.SetContentFile(zip_path)
        uploaded.Upload()
        print(f"Uploaded {zip_path} to Google Drive with ID {uploaded.get('id')}")


# 後処理
if tmp_folder_path.exists():
    shutil.rmtree(tmp_folder_path)



