Optimize reference genome loading memory usage by using string instea… by cascadingstyletrees · Pull Request #5 · cascadingstyletrees/Yleaf

cascadingstyletrees · 2026-01-28T15:44:30Z

💡 What: Changed CACHED_REFERENCE_FILE in yleaf/Yleaf.py from a List[str] (where each element was a character) to a single str. Updated the load_reference_file function to construct this string efficiently using join.

🎯 Why: Storing a large genome sequence as a list of individual characters incurs massive memory overhead in Python. Using a single string is far more efficient.

📊 Measured Improvement:
Benchmark with a 10MB generated FASTA file:

Memory Usage: Reduced from ~89.45 MB to ~26.21 MB (~70% reduction).
Execution Time: Reduced from ~1.03s to ~0.67s (~35% improvement).
Correctness: Verified that the result type is str and indexing/content matches expectations.

…d of list of characters.

Optimize reference genome loading memory usage by using string instea…

8643a53

…d of list of characters.

cascadingstyletrees merged commit dfb6f5d into master Jan 28, 2026

cascadingstyletrees deleted the perf-optimize-reference-load-17042674900636396903 branch January 28, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize reference genome loading memory usage by using string instea…#5

Optimize reference genome loading memory usage by using string instea…#5
cascadingstyletrees merged 1 commit intomasterfrom
perf-optimize-reference-load-17042674900636396903

cascadingstyletrees commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cascadingstyletrees commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant