-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on refactoring CWL #111
Comments
Hi @slnovak, you have a lot of different points here, so let me address what I can.
The link with RDF is the use of JSON-LD as an annotation which enables converting the rigid YAML document to RDF; however this is an entirely optional feature for an implementation. You can think of the conversion as The benefit of considering RDF in the design is that it provides a conceptual framework for describing how documents, objects, metadata, etc relate to one another (which does not exist in Avro, so a plain Avro schema would still require additional semantics to define these features). It provides a means of integrating metadata from ontologies, such as EDAM for describing file types (#7). We did have an earlier version of the CWL schema written in straight Avro schema, and it was unmaintainable. Since plain Avro doesn't provide any "don't repeat yourself" types of abstractions, refactoring the schema to prototype new features (which happens a lot when you have a standard that's still under development) is painful and error prone and things and get out of sync very easily. I am working on spinning off the schema language in its own project. It's not quite ready yet but I will go ahead and push what I have to get the conversation started. The idea is that you'll be able to use the schema tools to generate a plain Avro schema, then use the Avro tools in your favorite language for validation/code generation. The reference code (cwltool) is already able to export the straight Avro schema using --print-avro, however there are reports that isn't actually compatible with the Java Avro (#69) but that is just a bug that needs to be fixed.
Which is not to say the existing tests couldn't use more work, feature coverage is probably not complete and negative tests (checking an implementation fails when required to fail) don't currently exist.
(You should attend the next CWL Google hangout call. You should have gotten an invite if you are on the Google Groups mailing list, if not send me an email and I will send you the information) |
One last comment, you can always skip formal validation and write your execution code assuming the input document is valid and raise an exception if something unexpected is found. This isn't best practice but the inability to validate directly from the official schema shouldn't be a blocker for producing a conforming CWL implementation. |
Thank you both for bringing more attention to this issue. I’m glad to see there is movement in the area. A couple opinions:
I’m hoping we can find a way to move forward. Perhaps, @tetron could continue his Avro-LD as a separate project, so that it has the chance to be documented, tested and made fully compliant with Avro. And do this work separately so that it doesn’t block the development of CWL. |
Now that draft-2 is stable, perhaps we can commit e.g. avdl files to the repository separately (to avoid potential bugs with --print-avro since it's only tested with avro's python implementation)? |
I think @ntijanic and @kellrott have it right. Since @slnovak has already done the legwork to write the avidl file by hand, there's no reason that can't be merged as an alternate schema for the stable draft-2. The schema language as a separate project (see my note on #69) will be ready for draft-3, and will support conversion to avsc (and maybe even could export avidl) for use with Avro tooling, making it unnecessary to convert the schema by hand, making this a one-off. |
@tetron -- thanks for your feedback. Do you think that there is room within the standard to support the notion of "schema implementations" along with tool implementations? That is, the community is able to develop their own schema implementation as long as it conforms to the standard. Do you know of any standards or data working groups that have done anything along these lines? |
@slnovak just to clarify, by "schema implementations" I assume you mean an encoding of the CWL data model (such as using Avro IDL), rather than an implementation of the schema language? I think as long as we are able to keep them in sync and avoid multiple sources of truth (different schemas that are in conflict), then that is perfectly reasonable. This is in line with the broader idea that CWL is as much about defining the right abstract data and computational models as it is about a specific document encoding. |
@slnovak Thank you for your detailed and thoughtful comment. Now that we are nearly in November what do you think about the state of the CWL? |
Related: #173 Also, I think it's totally doable to define additional encodings that are isomorphic with YAML version of CWL and the RDF version of CWL. Any RDF encoding should automatically qualify, though other encodings don't need to depend on RDF at all. Even one-way mappings should be fine (DSL?). |
@slnovak, @tetron is this ticket outdated? |
Hi all,
First off, thanks for all the hard work to get CWL up to where it is today. I'm pretty excited to see where it goes.
I'm currently working on a project at OHSU where we're wanting to use CWL for a workflow engine. I've been doing a bit of a deep dive into understanding the current implementation and figuring out how we can use it for our application. However, it looks like we're going to be targeting the JVM so I started to look into piecing together an alternative implementation in Ruby (language of choice) so that I can get an idea of what the best approach would be.
Talking the talk
In doing so, I have some first-hand experience of working with the spec and wanted to give some feedback.
I'd love to see CWL grow to adopt best practices to enable developers to build more applications and platforms that take advantage of the CWL.
Walking the walk
I wanted to take some time to see what would it look like if we refactored the CWL schema into Avro. Here's a rundown of what I've been able to do.
I create a bunch of Avro IDL records. Take workflow.avdl for example. IDL allows you to embed both comments and code documentation. The Avro IDL spec says:
So this should satisfy the requirement in being able to support multi-line documentation (one of the motivations for using YAML).
It's definitely nice being able to break the spec down into separate protocols. There's a top-level protocol that's used to compile the schemata.
There are tests! RSpec (a Ruby testing framework) provides a lot of functionality to have extensive tests.
My proposal
Moving forward, I'd like to continue going down the path to see if I can come up with an Avro-sans-RDF schema implementation that is compliant with the CWL standard document. If this work can be reconciled with the CWL roadmap, I'd love to be committed to helping the development with this project.
Please let me know if you have any thoughts or feedback!
Thanks!
The text was updated successfully, but these errors were encountered: