-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Calyx-friendly JSON #25
Conversation
…g, except it saves cells when there are fewer than MAX_NODES nodes.
…ll discuss with Susan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neato!! Very cool that this works.
We briefly discussed Turnt's multiple environments to make differentials testing a little easier.
Also, it would be great to avoid committing most *.gfa
files to the repo, since they can get so big. Currently, the only such files we have in the repo are the tiny ones in test/basic/
:
https://github.com/cucapra/pollen/tree/main/test/basic
The remaining ones are downloaded from the web:
Line 37 in 855be19
curl -Lo ./$@ $(GFA_URL)/$*.gfa |
Instead of making a new test/depth_json
directory for testing this functionality, it may make more sense to just put your new stuff in test/turnt.toml
so you can use the files right there.
slow_odgi/to_json.py
Outdated
print(json.dumps(graph.headers, indent=4)) | ||
print(json.dumps(graph.segments, indent=4, cls=SegmentEncoder)) | ||
print(json.dumps(graph.links, indent=4, cls=LinkEncoder)) | ||
print(json.dumps(graph.paths, indent=4, cls=PathEncoder)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have a better time with these JSONEncoder
subclasses if they were type-sensitive, so you would only need to use a single encoder class. Something like this:
class AlignmentEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, mygfa.Path):
items = str(o).split("\t")
return { ... }
elif isinstance(o, mygfa.Link):
...
I see this is now a draft—care to comment about whether this is ready for a re-review or if there are other outstanding tasks? |
Sorry about the silence! This fell off my radar what with all the other changes elsewhere. In the commits since your last review, I have
|
I now have the ability to pass in the command-line flags n, e, and p that are used to determine the max nodes, max steps per node, and max paths respectively. There is work to be done, though:
|
If interested in testing the "simple" JSON dump,
|
Done with the parameter-adjusting stuff! We now mimic For example, here's how you can go hard on the number of paths for no reason.
Anyway, outlandish examples aside, we now diff out correctly against all the fetch-ed GFA files! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is REALLY great—it seems really really valuable that we can start testing proper Pollen hardware without using odgi at all. 🎉 🎉 🎉 I think this sets us up nicely for a future phase of work where we work more on the node-depth Calyx implementation and even generalize it to work on other stuff.
I agree with your notion (recorded in #69) about splitting the data translator into a separate thing from slow_odgi
. I know we already agree on this, but just to put it into words, the point is that slow_odgi
will be useful for anyone wanting our "PanoBench" code release, whereas the JSON translator is really only relevant to our Calyx stuff. So it's nice to keep them as separate things, both depending on mygfa
.
These JSON files are generated from gfas and not odgi graphs, and are done purely in Python, without the use of odgi commands or odgi's Python bindings.
Presently I match the .data files that Susan creates for
odgi depth
, but I will soon extend this to expose an interface that lets us generate more or less of the gfa. Before going too far, I'm wondering about minimizing these JSON files a little; see #24.Some next steps:
exine depth -d filename.og -o $fn.out
) than the command we are testing (python3 to_json.py < filename.gfa
). I'd like to clean this up after learning more about turnt.turnt ../path_to_gfas/*.gfa
did not seem to work) and I'd like to tidy this up.crush
orflip
to operate.