-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Chop #188
base: main
Are you sure you want to change the base?
Implement Chop #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWESOME!! This looks really great. I have just a few low-level suggestions, if you're interested!
docker pull quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0 | ||
docker tag quay.io/biocontainers/odgi:0.8.3--py310h6cc9453_0 odgi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make these odgi versions match for the two jobs?
docker pull quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1 | ||
docker tag quay.io/biocontainers/odgi:0.8.6--py310hdf79db3_1 odgi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(See next comment.)
.PHONY: test-slow-odgi | ||
test-slow-odgi: fetch | ||
make -C slow_odgi test | ||
|
||
.PHONY: test-flatgfa | ||
test-flatgfa: | ||
test-flatgfa: fetch-og |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might I suggest that we don't actually need to do this .og
conversion? Seems like odgi works just fine reading directly from a GFA file, and eliminating this step would reduce the surface area of things that can break by one.
@@ -72,7 +72,7 @@ def hyperfine(cmds): | |||
"hyperfine", | |||
"--export-json", | |||
tmp.name, | |||
"--shell=none", | |||
# "--shell=none", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to just delete this line.
[modes.chop] | ||
cmd.flatgfa = '{fgfa} -i {files[flatgfa]} chop -c 3' | ||
cmd.odgi = '{odgi} chop -i {files[og]} -c 3 -o - | {odgi} view -g -i - | {slow_odgi} norm --nl' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For benchmarking purposes, I think we should probably omit the view
and norm
steps. This makes for a fairer comparison, where we just measure the time taken to do the data structure transformation and not the time to print out the results. (This of course means that we can't simultaneously check correctness, but I suppose that is probably best done separately.)
fn empty_span<T>() -> Span<T> { | ||
Span::new(Id::new(0), Id::new(0)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have a utility method for this now (Span::new_empty()
)?
let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new(); | ||
let mut max_node_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could perhaps use comments to describe what they do and what invariants they maintain?
path_end = flat.add_steps( | ||
(start_idx..end_idx).map(|idx| { | ||
Handle::new( | ||
Id::new(idx), | ||
Orientation::Forward | ||
) | ||
}) | ||
).end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFINITELY not for this PR, but: I wonder if there's some kind of utility method that we can invent to help with this stuff. Impressionistically speaking, what we want here is...
let segs = seg_map[step.segment()];
flat.add_steps(segs.map(|s| Handle::new(s, Orientation::Forward)));
Like, we kind of want a way to do a map
directly on a chunk of segments, without having to fiddle with the index math here. Maybe we can think of a clever way to make that look nice!
match old_from.orient() { | ||
Orientation::Forward => { | ||
Handle::new( | ||
chopped_segs.1 - 1, | ||
Orientation::Forward | ||
) | ||
}, | ||
Orientation::Backward => { | ||
Handle::new( | ||
chopped_segs.0, | ||
Orientation::Backward | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a little shorter/clearer as:
let seg_id = match old_from.orient() {
Orientation::Forward => chopped_segs.1 - 1,
Orientation::Backward => chopped_segs.0,
};
Handle::new(seg_id, old_from.orient())
match old_to.orient() { | ||
Orientation::Forward => { | ||
Handle::new( | ||
chopped_segs.0, | ||
Orientation::Forward | ||
) | ||
}, | ||
Orientation::Backward => { | ||
Handle::new( | ||
chopped_segs.1 - 1, | ||
Orientation::Backward | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar simplification is probably possible here.
Chop works! After
cargo build --release
, try something likefgfa -I ../tests/k.gfa chop -c 3 -l
.-c 3
specifies that nodes are to be chopping into segments no longer than 3, and-l
specifies that the output file should compute newlinks
(at this time, it's still not clear to me what need we have for links, if any, but it would be easy to make computing links the default behavior or to always compute links). (Side note,slow_odgi
does not compute links - do we care to change this?)The basic algorithm for
chop
is as follows:One weird note here: the implementation of
chop
is split betweencmd.rs
andmain.rs
. The brunt of the work is done incmd.rs
, but the logic for which aspects of our original graph to preserve is inmain.rs
. It's unclear that a nice fix exists; because our new graph is borrowing elements from aGFAStore
created bychop
incmd.rs
, ownership of theGFAStore
must be passed to themain
function in order for our newFlatGFA
to be valid. The best fix may be to compute theFlatGFA
inchop
and return both theFlatGFA
andGFAStore
, but right now we do not.