Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Chop #188

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Implement Chop #188

wants to merge 18 commits into from

Conversation

susan-garry
Copy link
Contributor

@susan-garry susan-garry commented Jul 8, 2024

Chop works! After cargo build --release, try something like fgfa -I ../tests/k.gfa chop -c 3 -l. -c 3 specifies that nodes are to be chopping into segments no longer than 3, and -l specifies that the output file should compute new links (at this time, it's still not clear to me what need we have for links, if any, but it would be easy to make computing links the default behavior or to always compute links). (Side note, slow_odgi does not compute links - do we care to change this?)

The basic algorithm for chop is as follows:

seg_map;     // map from old segments to their new, chopped counterparts
for each segment:
    chop into segments of size c or smaller
    if args.l:
         link the new segments together, from head to tail (i.e., in the forward orientation)
    update seg_map

for each path:
    new_path;
    for each step in path:
        for new_seg in seg_map(step.seg):
              append new_seg to our new_path
    add new_path to new_fgfa

if args.l:
    for link (A -> B) in old_fgfa:
        add a new link from
             (A.forward ? (A.end, forward) : (A.begin, backwards))
                 -> (B.forward ? (B.begin, forward) : (B.end ? backwards))

One weird note here: the implementation of chop is split between cmd.rs and main.rs. The brunt of the work is done in cmd.rs, but the logic for which aspects of our original graph to preserve is in main.rs. It's unclear that a nice fix exists; because our new graph is borrowing elements from a GFAStore created by chop in cmd.rs, ownership of the GFAStore must be passed to the main function in order for our new FlatGFA to be valid. The best fix may be to compute the FlatGFA in chop and return both the FlatGFA and GFAStore, but right now we do not.

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWESOME!! This looks really great. I have just a few low-level suggestions, if you're interested!

Comment on lines +97 to +98
docker pull quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0
docker tag quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0 odgi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make these odgi versions match for the two jobs?

Comment on lines +46 to +47
docker pull quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1
docker tag quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1 odgi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See next comment.)

.PHONY: test-slow-odgi
test-slow-odgi: fetch
make -C slow_odgi test

.PHONY: test-flatgfa
test-flatgfa:
test-flatgfa: fetch-og
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might I suggest that we don't actually need to do this .og conversion? Seems like odgi works just fine reading directly from a GFA file, and eliminating this step would reduce the surface area of things that can break by one.

@@ -72,7 +72,7 @@ def hyperfine(cmds):
"hyperfine",
"--export-json",
tmp.name,
"--shell=none",
# "--shell=none",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to just delete this line.

[modes.chop]
cmd.flatgfa = '{fgfa} -i {files[flatgfa]} chop -c 3'
cmd.odgi = '{odgi} chop -i {files[og]} -c 3 -o - | {odgi} view -g -i - | {slow_odgi} norm --nl'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For benchmarking purposes, I think we should probably omit the view and norm steps. This makes for a fairer comparison, where we just measure the time taken to do the data structure transformation and not the time to print out the results. (This of course means that we can't simultaneously check correctness, but I suppose that is probably best done separately.)

Comment on lines +353 to +355
fn empty_span<T>() -> Span<T> {
Span::new(Id::new(0), Id::new(0))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a utility method for this now (Span::new_empty())?

Comment on lines +350 to +351
let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new();
let mut max_node_id = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could perhaps use comments to describe what they do and what invariants they maintain?

Comment on lines +429 to +436
path_end = flat.add_steps(
(start_idx..end_idx).map(|idx| {
Handle::new(
Id::new(idx),
Orientation::Forward
)
})
).end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFINITELY not for this PR, but: I wonder if there's some kind of utility method that we can invent to help with this stuff. Impressionistically speaking, what we want here is...

let segs = seg_map[step.segment()];
flat.add_steps(segs.map(|s| Handle::new(s, Orientation::Forward)));

Like, we kind of want a way to do a map directly on a chunk of segments, without having to fiddle with the index math here. Maybe we can think of a clever way to make that look nice!

Comment on lines +470 to +483
match old_from.orient() {
Orientation::Forward => {
Handle::new(
chopped_segs.1 - 1,
Orientation::Forward
)
},
Orientation::Backward => {
Handle::new(
chopped_segs.0,
Orientation::Backward
)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a little shorter/clearer as:

let seg_id = match old_from.orient() {
  Orientation::Forward => chopped_segs.1 - 1,
  Orientation::Backward => chopped_segs.0,
};
Handle::new(seg_id, old_from.orient())

Comment on lines +488 to +501
match old_to.orient() {
Orientation::Forward => {
Handle::new(
chopped_segs.0,
Orientation::Forward
)
},
Orientation::Backward => {
Handle::new(
chopped_segs.1 - 1,
Orientation::Backward
)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar simplification is probably possible here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants