Implement Chop #188

susan-garry · 2024-07-08T20:54:14Z

Chop works! After cargo build --release, try something like fgfa -I ../tests/k.gfa chop -c 3 -l. -c 3 specifies that nodes are to be chopping into segments no longer than 3, and -l specifies that the output file should compute new links (at this time, it's still not clear to me what need we have for links, if any, but it would be easy to make computing links the default behavior or to always compute links). (Side note, slow_odgi does not compute links - do we care to change this?)

The basic algorithm for chop is as follows:

seg_map;     // map from old segments to their new, chopped counterparts
for each segment:
    chop into segments of size c or smaller
    if args.l:
         link the new segments together, from head to tail (i.e., in the forward orientation)
    update seg_map

for each path:
    new_path;
    for each step in path:
        for new_seg in seg_map(step.seg):
              append new_seg to our new_path
    add new_path to new_fgfa

if args.l:
    for link (A -> B) in old_fgfa:
        add a new link from
             (A.forward ? (A.end, forward) : (A.begin, backwards))
                 -> (B.forward ? (B.begin, forward) : (B.end ? backwards))

One weird note here: the implementation of chop is split between cmd.rs and main.rs. The brunt of the work is done in cmd.rs, but the logic for which aspects of our original graph to preserve is in main.rs. It's unclear that a nice fix exists; because our new graph is borrowing elements from a GFAStore created by chop in cmd.rs, ownership of the GFAStore must be passed to the main function in order for our new FlatGFA to be valid. The best fix may be to compute the FlatGFA in chop and return both the FlatGFA and GFAStore, but right now we do not.

… can pipeline)

…work

sampsyo

AWESOME!! This looks really great. I have just a few low-level suggestions, if you're interested!

sampsyo · 2024-07-09T20:11:54Z

.github/workflows/build.yml

+          docker pull quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0
+          docker tag quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0 odgi


Maybe we should make these odgi versions match for the two jobs?

sampsyo · 2024-07-09T20:12:03Z

.github/workflows/build.yml

+          docker pull quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1
+          docker tag quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1 odgi


(See next comment.)

sampsyo · 2024-07-09T20:12:56Z

Makefile

 .PHONY: test-slow-odgi
 test-slow-odgi: fetch
 	make -C slow_odgi test

 .PHONY: test-flatgfa
-test-flatgfa:
+test-flatgfa: fetch-og


Might I suggest that we don't actually need to do this .og conversion? Seems like odgi works just fine reading directly from a GFA file, and eliminating this step would reduce the surface area of things that can break by one.

sampsyo · 2024-07-09T20:13:35Z

bench/bench.py

@@ -72,7 +72,7 @@ def hyperfine(cmds):
            "hyperfine",
            "--export-json",
            tmp.name,
-            "--shell=none",
+            # "--shell=none",


Fine to just delete this line.

sampsyo · 2024-07-09T20:15:28Z

bench/config.toml

+[modes.chop]
+cmd.flatgfa = '{fgfa} -i {files[flatgfa]} chop -c 3'
+cmd.odgi = '{odgi} chop -i {files[og]} -c 3 -o - | {odgi} view -g -i - | {slow_odgi} norm --nl'


For benchmarking purposes, I think we should probably omit the view and norm steps. This makes for a fairer comparison, where we just measure the time taken to do the data structure transformation and not the time to print out the results. (This of course means that we can't simultaneously check correctness, but I suppose that is probably best done separately.)

sampsyo · 2024-07-09T20:30:06Z

flatgfa/src/cmds.rs

+    fn empty_span<T>() -> Span<T> {
+        Span::new(Id::new(0), Id::new(0))
+    }


I think you have a utility method for this now (Span::new_empty())?

sampsyo · 2024-07-09T20:32:24Z

flatgfa/src/cmds.rs

+    let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new();
+    let mut max_node_id = 1;


These could perhaps use comments to describe what they do and what invariants they maintain?

sampsyo · 2024-07-09T20:44:05Z

flatgfa/src/cmds.rs

+                    path_end = flat.add_steps(
+                        (start_idx..end_idx).map(|idx| {
+                            Handle::new(
+                                Id::new(idx),
+                                Orientation::Forward
+                            )
+                        })
+                    ).end;


DEFINITELY not for this PR, but: I wonder if there's some kind of utility method that we can invent to help with this stuff. Impressionistically speaking, what we want here is...

let segs = seg_map[step.segment()]; flat.add_steps(segs.map(|s| Handle::new(s, Orientation::Forward)));

Like, we kind of want a way to do a map directly on a chunk of segments, without having to fiddle with the index math here. Maybe we can think of a clever way to make that look nice!

sampsyo · 2024-07-09T20:45:42Z

flatgfa/src/cmds.rs

+                match old_from.orient() {
+                    Orientation::Forward => {
+                        Handle::new(
+                            chopped_segs.1 - 1,
+                            Orientation::Forward
+                        )
+                    },
+                    Orientation::Backward => {
+                        Handle::new(
+                            chopped_segs.0,
+                            Orientation::Backward
+                        )
+                    }
+                }


Could be a little shorter/clearer as:

let seg_id = match old_from.orient() { Orientation::Forward => chopped_segs.1 - 1, Orientation::Backward => chopped_segs.0, }; Handle::new(seg_id, old_from.orient())

sampsyo · 2024-07-09T20:45:57Z

flatgfa/src/cmds.rs

+                match old_to.orient() {
+                    Orientation::Forward => {
+                        Handle::new(
+                            chopped_segs.0,
+                            Orientation::Forward
+                        )
+                    },
+                    Orientation::Backward => {
+                        Handle::new(
+                            chopped_segs.1 - 1,
+                            Orientation::Backward
+                        )
+                    }
+                }


A similar simplification is probably possible here.

susan-garry added 17 commits June 10, 2024 16:54

typo

7881627

initial chop implementation

74726d1

test chop, fix off-by-one error

0a34f05

test fgfa depth

633ab10

typo

2407f55

add benchmarking for chop, allow shell commands in config.toml (so we…

2cfa2fc

… can pipeline)

chop now computes new links, but is buggy (as is odgi), testing frame…

1cff638

…work

re-implement chop treating links as bidirectional, tests pass

9ff0d6f

clippy

0609993

turnt error messages are verbose

551d286

flatgfa now requires odgi and slow_odgi

f58b6bd

make fetch-og generates odgi files, which test-flatgfa depends on

0f31491

Get changes to workflow file

7c9cd1b

flatgfa tests display turnt diffs

ec66a32

tests that rely on odgi use odgi files, avoids unnecessary conversions

62de9fc

turnt prints stuff

3f50bcb

use latest version of odgi

7d2cefc

sampsyo mentioned this pull request Jul 9, 2024

Try making Dockerized odgi read from stdin #189

Merged

actually run tests, don't just print turnt commands

a9b0032

sampsyo reviewed Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Chop #188

Implement Chop #188

susan-garry commented Jul 8, 2024 •

edited

Loading

sampsyo left a comment

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

sampsyo Jul 9, 2024

		docker pull quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0
		docker tag quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0 odgi

		let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new();
		let mut max_node_id = 1;

Implement Chop #188

Are you sure you want to change the base?

Implement Chop #188

Conversation

susan-garry commented Jul 8, 2024 • edited Loading

sampsyo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susan-garry commented Jul 8, 2024 •

edited

Loading