Generate Calyx-friendly JSON #25

anshumanmohan · 2023-03-02T00:25:43Z

These JSON files are generated from gfas and not odgi graphs, and are done purely in Python, without the use of odgi commands or odgi's Python bindings.

Presently I match the .data files that Susan creates for odgi depth, but I will soon extend this to expose an interface that lets us generate more or less of the gfa. Before going too far, I'm wondering about minimizing these JSON files a little; see #24.

Some next steps:

After talking about the issue linked above, extend the testing suite to include all the GFAs we have.
I presently create the .out files using a shell script, because the oracle needs to run a different command (exine depth -d filename.og -o $fn.out) than the command we are testing (python3 to_json.py < filename.gfa). I'd like to clean this up after learning more about turnt.
I also had to make local copies of the gfa files (turnt ../path_to_gfas/*.gfa did not seem to work) and I'd like to tidy this up.
Extend the JSON generator to emit richer versions of the graph, e.g. enough data for crush or flip to operate.

…inks and paths

…g, except it saves cells when there are fewer than MAX_NODES nodes.

…ll discuss with Susan.

sampsyo

Neato!! Very cool that this works.

We briefly discussed Turnt's multiple environments to make differentials testing a little easier.

Also, it would be great to avoid committing most *.gfa files to the repo, since they can get so big. Currently, the only such files we have in the repo are the tiny ones in test/basic/:
https://github.com/cucapra/pollen/tree/main/test/basic

The remaining ones are downloaded from the web:

pollen/Makefile

Line 37 in 855be19

curl -Lo ./$@ $(GFA_URL)/$*.gfa

Instead of making a new test/depth_json directory for testing this functionality, it may make more sense to just put your new stuff in test/turnt.toml so you can use the files right there.

slow_odgi/to_json.py

sampsyo · 2023-03-02T23:21:04Z

slow_odgi/to_json.py

+    print(json.dumps(graph.headers, indent=4))
+    print(json.dumps(graph.segments, indent=4, cls=SegmentEncoder))
+    print(json.dumps(graph.links, indent=4, cls=LinkEncoder))
+    print(json.dumps(graph.paths, indent=4, cls=PathEncoder))


You might have a better time with these JSONEncoder subclasses if they were type-sensitive, so you would only need to use a single encoder class. Something like this:

class AlignmentEncoder(JSONEncoder): def default(self, o): if isinstance(o, mygfa.Path): items = str(o).split("\t") return { ... } elif isinstance(o, mygfa.Link): ...

sampsyo · 2023-04-05T17:40:45Z

I see this is now a draft—care to comment about whether this is ready for a re-review or if there are other outstanding tasks?

anshumanmohan · 2023-04-26T21:26:03Z

Sorry about the silence! This fell off my radar what with all the other changes elsewhere. In the commits since your last review, I have

Incorporated your comments.
Brought the code up to speed with changes in mygfa, the package-import style, etc.
Dovetailed testing into slow-odgi's testing and the interface into slow-odgi's CLI. These are temporary, and perhaps a mistake. This should stand alone and just borrow mygfa from slow-odgi. Better yet, mygfa should be lifted above these two separate packages. Issue forthcoming.

anshumanmohan · 2023-04-26T21:33:27Z

make test-mkjson will now run the depth-specific json-generator, with exine depth as its oracle.

I now have the ability to pass in the command-line flags n, e, and p that are used to determine the max nodes, max steps per node, and max paths respectively. There is work to be done, though:

The exine json-generator adjusts the widths of fields as needed; I just stick to 4. Must fix.
Fixing this will allow me to pass in larger parameters for these three flags and therefore compute JSONs for the four larger graphs. At present we don't do anything reasonable with them: exine, our oracle, complains that the parameters need to be bigger, and we ignore this, so the expect files are empty.
Opening up testing to all the graphs may well reveal other issues that have not yet come up with the smaller four graphs.
The final step will be to infer these parameters automatically and tightly, so I won't have to run the smaller graphs with parameters that the bigger graphs need.

anshumanmohan · 2023-04-26T21:46:50Z

If interested in testing the "simple" JSON dump,

Toggle the commented lines 158/159 of __main__ so that simple_json becomes the target function
Test with slow_odgi mkjson test/k.gfa and the like, not with turnt. There isn't a reasonable oracle for the simple dump.

anshumanmohan · 2023-04-26T23:09:02Z

Done with the parameter-adjusting stuff! We now mimic exine exactly: if the user provides the -a flag, as I currently do in turnt, all the parameters are inferred automatically and tightly. However, the user is free to also supply some other value(s), and any user-supplied values always take precedence.

For example, here's how you can go hard on the number of paths for no reason.

[envs.mkjson_oracle]
binary = true
command = "exine depth -d {filename} -a {filename} -p 500"
output.json = "-"

[envs.mkjson_test]
binary = true
command = "slow_odgi mkjson {filename} -p 500"
output.json = "-"

Anyway, outlandish examples aside, we now diff out correctly against all the fetch-ed GFA files!

sampsyo

Nice! This is REALLY great—it seems really really valuable that we can start testing proper Pollen hardware without using odgi at all. 🎉 🎉 🎉 I think this sets us up nicely for a future phase of work where we work more on the node-depth Calyx implementation and even generalize it to work on other stuff.

I agree with your notion (recorded in #69) about splitting the data translator into a separate thing from slow_odgi. I know we already agree on this, but just to put it into words, the point is that slow_odgi will be useful for anyone wanting our "PanoBench" code release, whereas the JSON translator is really only relevant to our Calyx stuff. So it's nice to keep them as separate things, both depending on mygfa.

anshumanmohan added 10 commits February 23, 2023 10:14

Small steps towards normalizing emit: keep headers verbatim, swap l…

9fca2e5

…inks and paths

Merge commit '855be197d608f52279b0ee88bf84be9c34b9554e' into to-json

bc79a8a

Silly JSON dump. Not yet Calyx-friendly

079b89c

Dumping path-intersections from a node PoV

569428c

Nest json to match current policy. Add format field

5ff832d

add answer slots

3b81b11

Add paths_to_consider, uniq answer fields. Sort keys. Matches existin…

3f0cd4e

…g, except it saves cells when there are fewer than MAX_NODES nodes.

Minimize the number of paths_to_consider bitvectors generated too. Wi…

c0ca202

…ll discuss with Susan.

Turnt support for JSON-generation

632ffb4

Shell script to generate .out files

c73503d

sampsyo reviewed Mar 2, 2023

View reviewed changes

sampsyo mentioned this pull request Mar 2, 2023

Normalize GFAs? #26

Merged

Merge branch 'main' into to-json

df282f5

anshumanmohan marked this pull request as draft March 21, 2023 23:57

Merge branch 'main' into to-json

0e01309

anshumanmohan changed the base branch from main to cli April 26, 2023 17:13

anshumanmohan added 10 commits April 26, 2023 13:13

Merge branch 'cli' of github.com:cucapra/pollen into to-json

0cbb102

Review testing, bring up to speed with package imports

d91ba48

nit

15a11ae

make json-generation part of the slow-odgi CLI

1ade032

Add CLI support for max args

f714f6f

Add ability to take -e flag

6eb1ebd

Add ability to parse and use max-paths

7719b1f

Move testing into makefile and slow-odgi's turnt

a8e38c1

A little progress in simple-dump

082dfa8

Generic type-sensitive encoder

a73b969

anshumanmohan mentioned this pull request Apr 26, 2023

Reorganize json-gen and slow-odgi into two packages #69

Closed

4 tasks

anshumanmohan mentioned this pull request Apr 26, 2023

json-gen: Adjustable widths #70

Closed

3 tasks

Toggle to testing-mode

ab65c00

anshumanmohan added 2 commits April 26, 2023 18:39

Adjustable widths

fa8bd4a

Infer parameters automatically

88b8224

anshumanmohan marked this pull request as ready for review April 26, 2023 23:09

anshumanmohan requested a review from sampsyo April 26, 2023 23:09

anshumanmohan added 2 commits April 27, 2023 14:08

Kill off silly null

fc83848

Accidentally committed outlandish example. Reverting.

d89f90e

sampsyo approved these changes Apr 28, 2023

View reviewed changes

Base automatically changed from cli to main May 2, 2023 16:24

Merge branch 'main' into to-json

23f8257

anshumanmohan merged commit 9eeb67d into main May 2, 2023

anshumanmohan deleted the to-json branch May 2, 2023 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Calyx-friendly JSON #25

Generate Calyx-friendly JSON #25

anshumanmohan commented Mar 2, 2023 •

edited

Loading

sampsyo left a comment

sampsyo Mar 2, 2023

sampsyo commented Apr 5, 2023

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

sampsyo left a comment

Generate Calyx-friendly JSON #25

Generate Calyx-friendly JSON #25

Conversation

anshumanmohan commented Mar 2, 2023 • edited Loading

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Mar 2, 2023

Choose a reason for hiding this comment

sampsyo commented Apr 5, 2023

anshumanmohan commented Apr 26, 2023 • edited Loading

anshumanmohan commented Apr 26, 2023 • edited Loading

anshumanmohan commented Apr 26, 2023 • edited Loading

anshumanmohan commented Apr 26, 2023 • edited Loading

sampsyo left a comment

Choose a reason for hiding this comment

anshumanmohan commented Mar 2, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading

anshumanmohan commented Apr 26, 2023 •

edited

Loading