Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biowasm minimap2/samtools integration #349

Closed
Koeng101 opened this issue Sep 16, 2023 · 3 comments
Closed

Biowasm minimap2/samtools integration #349

Koeng101 opened this issue Sep 16, 2023 · 3 comments
Labels
enhancement New feature or request hard A major or complex undertaking help wanted Extra attention is needed medium priority The default priority for a new issue. stale
Milestone

Comments

@Koeng101
Copy link
Contributor

Minimap2 is probably the best alignment algorithm when aligning nanopore sequencing data, and samtools lets you work with those alignments to produce useful output information. In fact, 3 of the parsers I built with poly are just to work with the input and output of these pieces of software.

(nanopore sequencer) -> slow5 -> (basecaller) -> fastq -> (minimap2) -> sam -> (samtools) -> pileup

To review:

  • slow5 contains raw data from a nanopore sequencer (electrical resistance squiggles)
  • fastq contains raw basecalled sequencing data (base pairs)
  • sam contains alignment data against a target
  • pileup contains per-base pair alignment data, useful for validating a target sequence

As such, minimap2 and samtools are essential to my sequence analysis pipeline. They're both C projects - so we could use CGo, but CGo is also kinda the worst. As an alternative, we could use wasm compiled samtools and minimap2. This has already worked for other projects integrating C code, using wazero, a zero dependency WebAssembly runtime in pure Golang.

A wonderful project, biowasm, by @robertaboukhalil has already compiled and tested both minimap2 and samtools in webassembly. We would need some software similar to biowasm's aioli, and then we could integrate these two pieces of software with Poly for a Golang-native experience.

@Koeng101 Koeng101 added help wanted Extra attention is needed medium priority The default priority for a new issue. hard A major or complex undertaking labels Sep 16, 2023
@carreter carreter added the enhancement New feature or request label Sep 16, 2023
@Koeng101
Copy link
Contributor Author

package main

import (
    "context"
    _ "embed"
    "fmt"
    "log"

    "github.com/tetratelabs/wazero"
    "github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)

//go:embed add.wasm
var addWasm []byte

func main() {
    ctx := context.Background()
    r := wazero.NewRuntime(ctx)
    defer r.Close(ctx)
    wasi_snapshot_preview1.MustInstantiate(ctx, r)

    mod, err := r.InstantiateWithConfig(ctx, addWasm, wazero.NewModuleConfig().WithName("addWasm"))
    if err != nil {
        log.Fatalf("%s", err)
    }
    res, err := mod.ExportedFunction("add").Call(ctx, 1, 2)
    if err != nil {
        log.Fatalf("%s", err)
    }
    fmt.Println(res)
}

So you can do something like this with wazero. The NewModuleConfig would be where we add in the io.Reader for stdin and io.Writer for stdout. (using WithStderr, WithStdout, WithStdin)

Ideally, for minimap2, you would instantiate the function with a fake fs.FS (WithFS) with the reference fasta, with space for the index (reference.fasta -> reference.fai). Then you would use the io.WriterTo function from fastq + io.Pipe to pipe the data in from the writer to the stdin (an io.Reader) of the webassembly function. From there, you would take the stdout and io.Pipe that io.Writer to a sam io.Reader. All in all, you could then have a function that takes in a Parser[fastq] + a fasta.Record and get you out a Parser[sam].

@carreter carreter added this to the v1.0 milestone Sep 23, 2023
Copy link

This issue has had no activity in the past 2 months. Marking as stale.

@github-actions github-actions bot added the stale label Nov 22, 2023
@TimothyStiles
Copy link
Collaborator

Closing as stale. Feel welcome to reopen but this may be better as an external project to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hard A major or complex undertaking help wanted Extra attention is needed medium priority The default priority for a new issue. stale
Projects
None yet
Development

No branches or pull requests

3 participants