Skip to content

Optimize reference genome loading memory usage by using string instea…#5

Merged
cascadingstyletrees merged 1 commit intomasterfrom
perf-optimize-reference-load-17042674900636396903
Jan 28, 2026
Merged

Optimize reference genome loading memory usage by using string instea…#5
cascadingstyletrees merged 1 commit intomasterfrom
perf-optimize-reference-load-17042674900636396903

Conversation

@cascadingstyletrees
Copy link
Owner

💡 What: Changed CACHED_REFERENCE_FILE in yleaf/Yleaf.py from a List[str] (where each element was a character) to a single str. Updated the load_reference_file function to construct this string efficiently using join.

🎯 Why: Storing a large genome sequence as a list of individual characters incurs massive memory overhead in Python. Using a single string is far more efficient.

📊 Measured Improvement:
Benchmark with a 10MB generated FASTA file:

  • Memory Usage: Reduced from ~89.45 MB to ~26.21 MB (~70% reduction).
  • Execution Time: Reduced from ~1.03s to ~0.67s (~35% improvement).
  • Correctness: Verified that the result type is str and indexing/content matches expectations.

@cascadingstyletrees cascadingstyletrees merged commit dfb6f5d into master Jan 28, 2026
@cascadingstyletrees cascadingstyletrees deleted the perf-optimize-reference-load-17042674900636396903 branch January 28, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant