# 今回の取り組み

- 計算速度を上げるために、`batch`の際にマルチプロセスで動作するようにします

## いつものセットアップ

In [1]:
# ルートディレクトリをPathに含めるおまじない
import sys, os
from pathlib import Path
if Path(os.getcwd()).stem != "DAJIN2":
    parent_path = str(Path(os.path.dirname(os.path.abspath("__file__"))).parent.parent)
    sys.path.append(parent_path)
    os.chdir(parent_path)

print(os.getcwd())
sys.path.append(os.getcwd() + "/" + "src")

/mnt/d/Research/DAJIN2


In [2]:
%%bash
pip uninstall -qy DAJIN2
# pipの更新
# pip install -q -U pip
# pip install -q -U -r requirements.txt



# 実験

In [28]:
%%bash
rm -rf DAJINResults/batch_tyr_50_10_01
rm -rf DAJINResults/.tempdir/batch_tyr_50_10_01

pip install -qe .
time DAJIN2 batch -f misc/data/design_batch_tyr_50_10_01.csv -t 3

misc/data/tyr_control.fq.gz is now processing...
misc/data/tyr_control.fq.gz is finished...
misc/data/tyr_albino_50%.fq.gz is now processing...
misc/data/tyr_albino_10%.fq.gz is now processing...
misc/data/tyr_albino_01%.fq.gz is now processing...
2023-04-28 09:00:35: Preprocess tyr_albino_01%...
2023-04-28 09:00:35: mapping tyr_albino_01%...
2023-04-28 09:00:35: Preprocess tyr_albino_10%...
2023-04-28 09:00:35: mapping tyr_albino_10%...
2023-04-28 09:00:36: Preprocess tyr_albino_50%...
2023-04-28 09:00:36: mapping tyr_albino_50%...
2023-04-28 09:00:45: midsv tyr_albino_50%...
2023-04-28 09:00:51: replace_NtoD tyr_albino_50%...
2023-04-28 09:00:54: midsv tyr_albino_01%...
2023-04-28 09:00:54: midsv tyr_albino_10%...
2023-04-28 09:00:55: extract_mutation_loci tyr_albino_50%...
2023-04-28 09:01:06: replace_NtoD tyr_albino_01%...
2023-04-28 09:01:06: replace_NtoD tyr_albino_10%...
2023-04-28 09:01:14: extract_mutation_loci tyr_albino_01%...
2023-04-28 09:01:14: extract_mutation_loci tyr_a

Finished! Open DAJINResults/batch_tyr_50_10_01 to see the report.

real	4m6.365s
user	6m38.423s
sys	0m25.141s


In [3]:
! pip install -qe .


- `midsv.read_jsonl`をmultiprocessで動かすとエラーが出ます
    - `Unterminated string starting at: line 1 column 82 (char 81)`
- エラーが出ないときもあります
- 再現が取れるか調べます

In [13]:
import multiprocessing
from itertools import islice

def _batched(iterable, chunk_size):
    iterator = iter(iterable)
    chunk = tuple(islice(iterator, chunk_size))
    while chunk:
        yield chunk
        chunk = tuple(islice(iterator, chunk_size))


def _run_multiprocess(function, arguments: list, num_workers: int = 1):
    arguments_batched = _batched(arguments, num_workers)
    for args in arguments_batched:
        jobs = []
        for arg in args:
            p = multiprocessing.Process(target=function, args=(arg,))
            jobs.append(p)
            p.start()
        for job in jobs:
            job.join()



In [15]:
num_workers = 3
targets = [1, 2, 3]
_run_multiprocess(print, targets, num_workers)

1
2
3


In [30]:
from pathlib import Path
import midsv
num_workers = 30
# targets = list(Path("DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/").glob("*.jsonl"))
targets = ["DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl"] * 30
def test_reads(target):
    print(f"{target} : {len(midsv.read_jsonl(target))}")
# _run_multiprocess(midsv.read_jsonl, targets, num_workers)
_run_multiprocess(test_reads, targets, num_workers)

DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000

DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000
DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl : 10000

DAJINResults/.tempdir/batch_tyr_50_10_01/midsv/tyr_albino_01%_control.jsonl 

- どうやら`midsv.read_jsonl`には問題がない様子です
- ただ、jsonlのIOに問題があるのはおそらく間違いないので、IOを最小限にしてなるべくPythonのメモリ上でハンドリングするようにします
    - `midsv.read_jsonl`と`midsv.write_jsonl`を必要最小限にします
    - 出力はlistではなくgeneratorとします

In [32]:
%%bash
find src/DAJIN2 -type f | grep -v "past" | xargs grep "midsv.read_jsonl"

src/DAJIN2/core/preprocess/correct_knockin.py:        midsv_sample = midsv.read_jsonl(Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl"))
src/DAJIN2/core/preprocess/correct_knockin.py:        midsv_control = midsv.read_jsonl(Path(TEMPDIR, "midsv", f"{CONTROL_NAME}_{allele}.jsonl"))
src/DAJIN2/core/preprocess/correct_revititive_deletions.py:        midsv_control = midsv.read_jsonl((Path(TEMPDIR, "midsv", f"{CONTROL_NAME}_{allele}.jsonl")))
src/DAJIN2/core/preprocess/correct_revititive_deletions.py:        midsv_sample = midsv.read_jsonl((Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl")))
src/DAJIN2/core/preprocess/replace_NtoD.py:        midsv_sample = midsv.read_jsonl(Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl"))
src/DAJIN2/core/preprocess/correct_sequence_error.py:        midsv_sample = midsv.read_jsonl((Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl")))
src/DAJIN2/core/preprocess/correct_sequence_error.py:        midsv_control = midsv.read_jsonl((Path(TE

In [33]:
from __future__ import annotations

import sys, os
from pathlib import Path

import hashlib
from collections import defaultdict
from pathlib import Path
from importlib import reload

from src.DAJIN2.core import preprocess, classification, clustering, consensus, report

reload(preprocess)
reload(classification)
reload(clustering)
reload(consensus)
reload(report)

#### # * Subset of Point mutation
#### # 50 or 10 or 01%
percent = "50"
SAMPLE, CONTROL, ALLELE, NAME, GENOME, DEBUG, THREADS = (
    f"misc/data/tyr_albino_{percent}%.fq.gz",
    "misc/data/tyr_control.fq.gz",
    "misc/data/tyr_control.fasta",
    "batch_tyr_50_10_01",
    "mm10",
    True,
    30,
)


######################################################################
# Preprocessing
######################################################################

print(f"processing {NAME}...")

SAMPLE = preprocess.format_inputs.convert_to_posix_path(SAMPLE)
CONTROL = preprocess.format_inputs.convert_to_posix_path(CONTROL)
ALLELE = preprocess.format_inputs.convert_to_posix_path(ALLELE)

# ====================================================================
# Varidate inputs
# ====================================================================

preprocess.validate_inputs.check_files(SAMPLE, CONTROL, ALLELE)
TEMPDIR = Path("DAJINResults", ".tempdir", NAME)
IS_CACHE_CONTROL = preprocess.validate_inputs.exists_cached_control(CONTROL, TEMPDIR)
IS_CACHE_GENOME = preprocess.validate_inputs.exists_cached_genome(GENOME, TEMPDIR, IS_CACHE_CONTROL)
UCSC_URL, GOLDENPATH_URL = None, None
if GENOME and not IS_CACHE_GENOME:
    UCSC_URL, GOLDENPATH_URL = preprocess.validate_inputs.check_and_fetch_genome(GENOME)

# ====================================================================
# Format inputs
# ====================================================================
SAMPLE_NAME = preprocess.format_inputs.extract_basename(SAMPLE)
CONTROL_NAME = preprocess.format_inputs.extract_basename(CONTROL)
FASTA_ALLELES = preprocess.format_inputs.dictionize_allele(ALLELE)
THREADS = preprocess.format_inputs.update_threads(THREADS)

preprocess.format_inputs.make_directories(TEMPDIR, SAMPLE_NAME, CONTROL_NAME)

if GENOME:
    GENOME_COODINATES = preprocess.format_inputs.fetch_coodinate(GENOME, UCSC_URL, FASTA_ALLELES["control"])
    CHROME_SIZE = preprocess.format_inputs.fetch_chrom_size(GENOME_COODINATES["chr"], GENOME, GOLDENPATH_URL)
    preprocess.format_inputs.cache_coodinates_and_chromsize(TEMPDIR, GENOME, GENOME_COODINATES, CHROME_SIZE)


processing batch_tyr_50_10_01...


In [34]:
from __future__ import annotations

import midsv
import re
from itertools import groupby

def _split_cigar(CIGAR:str) -> list[str]:
    cigar = re.split(r"([MIDNSH=X])", CIGAR)
    n = len(cigar)
    cigar_split = []
    for i, j in zip(range(0, n, 2), range(1, n, 2)):
        cigar_split.append(cigar[i] + cigar[j])
    return cigar_split



def _call_alignment_length(CIGAR: str) -> int:
    cigar_split = _split_cigar(CIGAR)
    alignment_length = 0
    for c in cigar_split:
        if re.search(r"[MDN=X]", c[-1]):
            alignment_length += int(c[:-1])
    return alignment_length



def _has_inversion_in_splice(CIGAR: str) -> bool:
    is_splice = False
    is_insertion = False
    for cigar in _split_cigar(CIGAR):
        if cigar.endswith("I"):
            is_insertion = True
            continue
        if is_insertion and cigar.endswith("N"):
            is_splice = True
            break
        else:
            is_insertion = False
    return is_splice


def _extract_qname_of_map_ont(sam_ont: list[list[str]], sam_splice: list[list[str]]) -> set():
    """Extract qname of reads from `map-ont` when:
        - no inversion signal in `splice` alignment (insertion + deletion)
        - single read
        - long alignment length
    """
    alignments_ont = [s for s in sam_ont if not s[0].startswith("@")]
    alignments_ont.sort(key=lambda x: x[0])
    dict_alignments_splice = {s[0]: s for s in sam_splice if not s[0].startswith("@")}
    qname_of_map_ont = set()
    for qname_ont, group in groupby(alignments_ont, key=lambda x: x[0]):
        alignment_ont = list(group)
        if not qname_ont in dict_alignments_splice:
            qname_of_map_ont.add(qname_ont)
            continue
        alignment_splice = dict_alignments_splice[qname_ont]
        if _has_inversion_in_splice(alignment_splice[5]):
            qname_of_map_ont.add(qname_ont)
            continue
        if len(alignment_ont) != 1:
            continue
        alignment_ont = alignment_ont[0]
        alignment_length_ont = _call_alignment_length(alignment_ont[5])
        alignment_length_splice = _call_alignment_length(alignment_splice[5])
        if alignment_length_ont >= alignment_length_splice:
            qname_of_map_ont.add(qname_ont)
    return qname_of_map_ont


def _extract_sam(sam: list[list[str]], qname_of_map_ont: set, preset:str="map-ont") -> list[list[str]]:
    sam_extracted = []
    for alignment in sam:
        if alignment[0].startswith("@"):
            sam_extracted.append(alignment)
        if preset == "map-ont":
            if alignment[0] in qname_of_map_ont:
                sam_extracted.append(alignment)
        else:
            if alignment[0] not in qname_of_map_ont:
                sam_extracted.append(alignment)
    return sam_extracted


def _midsv_transform(sam: list[list[str]]) -> list[list[str]]:
    num_header = 0
    for s in sam:
        if s[0].startswith("@"):
            num_header += 1
        else:
            break
    if len(sam) == num_header:
        return []
    return midsv.transform(sam, midsv=False, cssplit=True, qscore=False)


def call_midsv(TEMPDIR, FASTA_ALLELES, SAMPLE_NAME) -> dict[str, list]:
    midsv_sample = dict()
    for allele in FASTA_ALLELES:
        sam_ont = midsv.read_sam(f"{TEMPDIR}/sam/{SAMPLE_NAME}_map-ont_{allele}.sam")
        sam_splice = midsv.read_sam(f"{TEMPDIR}/sam/{SAMPLE_NAME}_splice_{allele}.sam")
        qname_of_map_ont = _extract_qname_of_map_ont(sam_ont, sam_splice)
        sam_of_map_ont = _extract_sam(sam_ont, qname_of_map_ont, preset="map-ont")
        sam_of_splice = _extract_sam(sam_splice, qname_of_map_ont, preset="splice")
        midsv_of_single_read = _midsv_transform(sam_of_map_ont)
        midsv_of_multiple_reads = _midsv_transform(sam_of_splice)
        midsv_sample[allele] = midsv_of_single_read + midsv_of_multiple_reads
    return midsv_sample
    midsv.write_jsonl(midsv_sample, f"{TEMPDIR}/midsv/{SAMPLE_NAME}_{allele}.jsonl")


In [35]:
midsv_sample = call_midsv(TEMPDIR, FASTA_ALLELES, SAMPLE_NAME)

In [38]:
print(midsv_sample["control"][0:3])

[{'QNAME': '00077750-d7ab-4c73-ac65-8707d39936c2', 'RNAME': 'control', 'CSSPLIT': '=T,=G,=C,=A,=T,=T,=G,=A,=A,=G,=C,=A,=G,=T,=T,=C,=A,=C,=C,+G|=A,=A,=A,=A,=T,=A,=A,=C,=A,=A,=A,=G,=T,=A,=A,=C,+A|=A,=A,=A,=G,=T,=A,=A,=G,=A,=T,=A,=T,=C,=T,=T,=T,=G,=G,=A,=A,=T,=A,=A,=T,=C,=A,=A,=T,=T,=C,=A,=A,=G,=A,=T,=A,=A,=T,=C,=A,=A,=G,=G,=A,=A,=A,=A,=A,=T,=G,=A,=G,=A,-G,=G,=C,=A,=A,=C,=T,=A,=T,=T,=T,=T,=A,=G,=A,=C,=T,=G,=A,=T,=T,=A,=C,=T,=T,=T,=T,=A,=T,=A,=A,=A,=A,=T,=A,=A,=A,=T,=A,=A,=G,=C,=T,=C,=A,=G,=C,=T,=T,=A,=G,=C,=C,*AG,*GA,=A,=T,=A,=T,=A,=A,=G,=C,=A,=A,=T,=A,=T,=T,=C,=T,=G,=A,=G,=T,=T,=C,=T,=G,=A,=A,=G,=A,=A,=A,=A,=A,=T,=T,=T,=T,=T,=G,=A,=C,=A,=A,=A,=A,=T,=G,=A,=G,=T,=T,=C,=T,=A,=T,=A,=A,=A,=T,=G,=T,=T,=A,=T,=T,=G,=T,=C,=T,=A,=C,=T,=T,=A,=T,=G,=A,=T,=C,=T,=C,=T,=A,=A,=A,=T,=A,=C,=A,=A,=C,=A,=G,=G,=C,=T,=T,=G,=T,=A,=T,=T,=C,=A,=G,=A,=A,=T,=C,=T,=A,=G,=A,=T,=G,=T,=T,=T,=C,=A,=T,=G,=A,=C,=C,=T,=T,=T,=A,=T,=T,=C,=A,=T,=A,=A,=G,=A,=G,=A,=T,=G,+A|+A|=A,=T,=G,=T,=A,=T,=T,=C,=T,=T,=G,=A,=T,=A,=C,=T,=A,

- pickle形式でIOします

In [39]:
import pickle
with open('tmp_midsv.plk', 'wb') as p:
    pickle.dump(midsv_sample, p)

In [40]:
! ls -lh tmp_midsv.plk

-rwxrwxrwx 1 kuno kuno 42M Apr 28 09:54 tmp_midsv.plk


In [41]:
import pickle
with open('tmp_midsv.plk', 'rb') as p:
    l = pickle.load(p)

print(l == midsv_sample)

True


In [45]:
hoge = defaultdict(dict)
hoge["foga"]["foo"] = {"test": [1,2,3]}
print(hoge)
print(hoge["foga"])

defaultdict(<class 'dict'>, {'foga': {'foo': {'test': [1, 2, 3]}}})
{'foo': {'test': [1, 2, 3]}}


- とりあえず`classif_sample`までやってみました。検証します

In [55]:
%%bash
rm -rf DAJINResults/single-tyr50
rm -rf DAJINResults/.tempdir/single-tyr50

time DAJIN2 \
    --name single-tyr50 \
    --sample "misc/data/tyr_albino_50%.fq.gz" \
    --control "misc/data/tyr_control.fq.gz" \
    --allele "misc/data/tyr_control.fasta" \
    --genome mm10 \
    --threads 10

misc/data/tyr_control.fq.gz is now processing...
misc/data/tyr_control.fq.gz is finished...
misc/data/tyr_albino_50%.fq.gz is now processing...
2023-04-28 10:35:22: Preprocess tyr_albino_50%...
2023-04-28 10:35:22: mapping tyr_albino_50%...
2023-04-28 10:35:31: midsv tyr_albino_50%...
2023-04-28 10:35:34: replace_NtoD tyr_albino_50%...
2023-04-28 10:35:36: extract_mutation_loci tyr_albino_50%...
2023-04-28 10:35:51: correct_sequence_error tyr_albino_50%...
2023-04-28 10:36:13: Classify tyr_albino_50%...
2023-04-28 10:36:15: Clustering tyr_albino_50%...
2023-04-28 10:36:34: Consensus calling tyr_albino_50%......
misc/data/tyr_albino_50%.fq.gz is finished...


Finished! Open DAJINResults/single-tyr50 to see the report.

real	2m4.164s
user	1m46.749s
sys	0m4.672s


- いい感じです！
- batchモードで検証します

In [56]:
%%bash
rm -rf DAJINResults/batch_tyr_50_10_01
rm -rf DAJINResults/.tempdir/batch_tyr_50_10_01

pip install -qe .
time DAJIN2 batch -f misc/data/design_batch_tyr_50_10_01.csv -t 3

misc/data/tyr_control.fq.gz is now processing...
misc/data/tyr_control.fq.gz is finished...
misc/data/tyr_albino_50%.fq.gz is now processing...
misc/data/tyr_albino_10%.fq.gz is now processing...
misc/data/tyr_albino_01%.fq.gz is now processing...
2023-04-28 10:38:28: Preprocess tyr_albino_50%...
2023-04-28 10:38:28: mapping tyr_albino_50%...
2023-04-28 10:38:28: Preprocess tyr_albino_01%...
2023-04-28 10:38:28: mapping tyr_albino_01%...
2023-04-28 10:38:28: Preprocess tyr_albino_10%...
2023-04-28 10:38:28: mapping tyr_albino_10%...
2023-04-28 10:38:37: midsv tyr_albino_50%...
2023-04-28 10:38:40: replace_NtoD tyr_albino_50%...
2023-04-28 10:38:42: extract_mutation_loci tyr_albino_50%...
2023-04-28 10:38:46: midsv tyr_albino_01%...
2023-04-28 10:38:46: midsv tyr_albino_10%...
2023-04-28 10:38:53: replace_NtoD tyr_albino_01%...
2023-04-28 10:38:53: replace_NtoD tyr_albino_10%...
2023-04-28 10:38:56: extract_mutation_loci tyr_albino_01%...
2023-04-28 10:38:57: extract_mutation_loci tyr_a

Finished! Open DAJINResults/batch_tyr_50_10_01 to see the report.

real	3m31.183s
user	6m34.475s
sys	0m16.518s


- batchモードでもとりあえずエラーは出ませんでした
- リアルのサンプルでテストします

In [73]:
! pip install -qe .

In [61]:
print(Path(TEMPDIR, "report", "BAM", CONTROL_NAME, f"{CONTROL_NAME}.bam").exists())

True


In [63]:
print(True and True)

True


In [64]:
import resource

print(resource.getrlimit(resource.RLIMIT_DATA))

(-1, -1)


In [66]:
import os
import resource

mem_bytes = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')
mem_usable = int(mem_bytes * 9/10)

resource.setrlimit(resource.RLIMIT_DATA, (mem_usable, -1))

In [67]:
print(resource.getrlimit(resource.RLIMIT_DATA))

(60648822374, -1)


In [None]:
%%bash
find src/DAJIN2 -type f | grep -v "past" | xargs grep "midsv.read_"

src/DAJIN2/core/report/report_bam.py:    sam = midsv.read_sam(intput_path_sam)
src/DAJIN2/core/report/report_bam.py:    sam = midsv.read_sam(input_path_sam)
src/DAJIN2/core/preprocess/correct_knockin.py:        midsv_sample = midsv.read_jsonl(Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl"))
src/DAJIN2/core/preprocess/correct_knockin.py:        midsv_control = midsv.read_jsonl(Path(TEMPDIR, "midsv", f"{CONTROL_NAME}_{allele}.jsonl"))
src/DAJIN2/core/preprocess/correct_revititive_deletions.py:        midsv_control = midsv.read_jsonl((Path(TEMPDIR, "midsv", f"{CONTROL_NAME}_{allele}.jsonl")))
src/DAJIN2/core/preprocess/correct_revititive_deletions.py:        midsv_sample = midsv.read_jsonl((Path(TEMPDIR, "midsv", f"{SAMPLE_NAME}_{allele}.jsonl")))
src/DAJIN2/core/preprocess/call_midsv.py:        sam_ont = midsv.read_sam(f"{TEMPDIR}/sam/{SAMPLE_NAME}_map-ont_{allele}.sam")
src/DAJIN2/core/preprocess/call_midsv.py:        sam_splice = midsv.read_sam(f"{TEMPDIR}/sam/{SAMPLE_NAME}_spl

- メモリーエラーでプロセスが強制終了となっていたようです
- 以下は対策です
    - `midsv.read_*`においてgeneratorを返すようにしました (midsv version 0.10.0)
    - `resource`モジュールを用いて、使用するメモリを使用可能メモリの9/10に抑えるようにしました

In [74]:
%%bash
rm -rf DAJINResults/single-tyr50
rm -rf DAJINResults/.tempdir/single-tyr50

time DAJIN2 \
    --name single-tyr50 \
    --sample "misc/data/tyr_albino_50%.fq.gz" \
    --control "misc/data/tyr_control.fq.gz" \
    --allele "misc/data/tyr_control.fasta" \
    --genome mm10 \
    --threads 10

(60648822374, -1)
misc/data/tyr_control.fq.gz is now processing...
misc/data/tyr_control.fq.gz is finished...
misc/data/tyr_albino_50%.fq.gz is now processing...
2023-04-28 12:24:10: Preprocess tyr_albino_50%...
2023-04-28 12:24:10: mapping tyr_albino_50%...
2023-04-28 12:24:19: midsv tyr_albino_50%...
2023-04-28 12:24:20: replace_NtoD tyr_albino_50%...
2023-04-28 12:24:20: extract_mutation_loci tyr_albino_50%...
2023-04-28 12:24:20: correct_sequence_error tyr_albino_50%...
2023-04-28 12:24:20: Classify tyr_albino_50%...
2023-04-28 12:24:20: Clustering tyr_albino_50%...


Traceback (most recent call last):
  File "/home/kuno/miniconda/bin/DAJIN2", line 33, in <module>
    sys.exit(load_entry_point('DAJIN2', 'console_scripts', 'DAJIN2')())
  File "/mnt/d/Research/DAJIN2/src/DAJIN2/DAJIN2.py", line 280, in main
    _execute_single_mode(arguments)
  File "/mnt/d/Research/DAJIN2/src/DAJIN2/DAJIN2.py", line 45, in _execute_single_mode
    core_execute.execute_sample(arguments)
  File "/mnt/d/Research/DAJIN2/src/DAJIN2/core/core_execute.py", line 182, in execute_sample
    clust_sample = clustering.update_labels(clust_sample)
  File "/mnt/d/Research/DAJIN2/src/DAJIN2/core/clustering/clustering.py", line 150, in update_labels
    prev_label = clust_result[0]["LABEL"]
IndexError: list index out of range

real	0m42.311s
user	0m32.931s
sys	0m0.803s


CalledProcessError: Command 'b'rm -rf DAJINResults/single-tyr50\nrm -rf DAJINResults/.tempdir/single-tyr50\n\ntime DAJIN2 \\\n    --name single-tyr50 \\\n    --sample "misc/data/tyr_albino_50%.fq.gz" \\\n    --control "misc/data/tyr_control.fq.gz" \\\n    --allele "misc/data/tyr_control.fasta" \\\n    --genome mm10 \\\n    --threads 10\n'' returned non-zero exit status 1.

- なぜかupdate_labelsでエラーが出るようになってしまったので原因を調べます
- `core_execute`の変更点が多いので、いったん別のノートにします

# 👉👉👉 いまここ 👈👈👈

# 👌👌👌 まとめ 👌👌👌


- `multiprocessing`のブランチを切りました
- JSONLのI/Oに問題がありそうだったので、直接オブジェクトを返すようにしました。
- `multiprocessing`ではメモリーエラーが出ていることがわかりました
    - `midsv.read_*`においてgeneratorを返すようにしました (midsv version 0.10.0)
    - `resource`モジュールを用いて、使用するメモリを使用可能メモリの9/10に抑えるようにしました

# 次に取り組むこと

- なぜか`update_labels`でエラーが出るようになってしまったので原因を調べます
- できるかぎりGeneratorで返すようにします


### Lists

+ GUIの見栄え
+ igv.jsの起動
+ VCFによる長鎖挿入・欠失情報の付与
+ Figの作成
+ ⬜ Insertionのなかにある変異を同定する手法を考案する
+ ⬜ Ayabe-taks1のright_loxpがいまいちな理由を考察する
+ ✅ 断端リードの扱いをどうするべきか
+ ✅ `SV`の判定をconsensus callのあとにする
+ ✅ Tyrの動作確認
+ ✅ ayabe-task1のleft/right-loxpの検出
+ ✅ mutation_lociをpreprocessで使用したものに変更する
> + ⬜ `preprocess.correct_sequence_error.replace_atmark`のコードがわかりにくい
    + テストを用意してリファクタリングする