Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is 'print'ing? #2896

Closed
balacij opened this issue Nov 16, 2021 · 11 comments
Closed

What is 'print'ing? #2896

balacij opened this issue Nov 16, 2021 · 11 comments
Assignees

Comments

@balacij
Copy link
Collaborator

balacij commented Nov 16, 2021

Re: printing and classes like CanGenMathExprs - maybe. I've got some emails about that from multiple years ago. It's actually kind of a tricky design point where Haskell classes don't always quite fit. So it needs a proper design, which means that it first needs a proper analysis. Certainly it's not worth creating classes (in general) if there is only a single instance of it. I'll also email you some design discussion on that topic from a while back.

Originally posted by @JacquesCarette in #2883 (comment)

I will follow up here once I've given this a bit more time to think about. This is also relevant to 'composed' printers.

@balacij balacij self-assigned this Nov 16, 2021
@balacij
Copy link
Collaborator Author

balacij commented Nov 29, 2021

Posting discussion which I originally wrote for an email to Dr. Carette about extensible languages;

Relating to #2883, I was trying to understand and formalize "printing" in the context of how encodings and printers work in Drasil. "Printing" occurs at all level of the "data encodings" (e.g., TheoryModels, QDefinitions, Exprs, CodeExprs, pretty docs, etc) where by "higher level" encodings can be printed into "lower level" encodings. Alternatively, at times it seems more approriate to say that "lower level" encodings can be pushed out from "higher level" encodings.

From our discussion on that ticket, we had discussed creating a typeclass for generating each lower-level encoding. We were considering the following design to unify the various "printing" functions (this would also reverse our dependencies because the higher-level will "depend" on the lower-level to show how the lower-level encodings can be "pushed out"/printed from the higher-level encoding). We didn't quite settle a design on this yet because we needed to analyze what "printing" fully is.

One common variant of printing in the code are type signatures of the form :: X -> Y, for different purposes/contexts (where X is a "higher-level" encoding, and Y is a "lower-level" encodings [e.g., X = Expr, Y = Printing.Expr]). Sometimes we use configurations for them too (:: X -> Cfg -> Y -- e.g., X = Printing.Expr, and Y = Doc). One interesting example is how we use "Stage" to denote how a Symbol should be converted into a Printing.Expr.

Ultimately, it appears that our function names are important only in organizing the same defining style of "encoding printing/transformation" functions. In other words, it appears that we're a few type abstractions away from being able to have a single "print" function name that we can use to represent a "print"/"render".

With these above ideas in mind, we should be able to abstract over all of these possible typeclasses we could build for each of the above to increase our code re-usability, adding extra 'configurations' for each possible variant of printing.

As a type, it appears a printer is more or less a specific kind of function, consisting of:

  1. An input domain
  2. An output range
  3. A mapping function
  4. A configuration for the mapping function (e.g., constants, font sizes, section-specific configurations, simplifications, etc)
  5. A context under which the mapping occurs (areas where a configuration's options would be relevant to one context may be irrelevant in another context [e.g., there will likely be different configurations when printing into different programming languages that belong to completely different paradigms], or when printing to the same output language but with completely different intentions [e.g., printing to HTML with creating teaching material as a goal vs printing to HTML with creating testing material as a goal]). This final context heavily relates to abstracting over the family of named interfaces we could create if we chose to use the :: X -> Cfg -> Y type signature for our printing functions.

Context and configuration distinctions might appear to be blurry, but I think it becomes a bit more clear when we think about contexts where we print various things with their own set of unique applicable configurations. Contexts might still be unneeded, but I tend towards having them than not.

Approximately, I believe the typeclass we can create has a fairly direct translation:

class CanGen i o cfg ctx where
    gen :: i -> cfg -> ctx -> o

@JacquesCarette
Copy link
Owner

I think you might taking 'printing' too literally, and so not seeing the larger setting in which this sits. Printing, as we do it, is a version of translation from one representation to another. It is often a lossy translation, but that's ok.

So what I would describe is close still to what you have above, but let me rephrase what a translation process involve:

  1. An input representation language
  2. An output representation language
  3. A method that maps these representations, that may have options
  4. A 'configuration' that specifies the options that the method exposes

And indeed, this does all happen in a context. So translating to "visual layout languages" is a different global context than "programming languages", but both are translation, and both do share commonalities. As you also mention, there is also local context to take care of. So there are multiple layers of options/contexts.

But the threshold for creating a class is not just commonality, but actual re-use. If we can't write some generic functions that will be used by all instances in the same way, what's the point of creating that interface? It needs to be a generalization with a purpose.

In other words, the main entry point to the various translators isn't the most fruitful place to generalize. The most fruitful place is the duplicate functionality, i.e. the functions that are 'essentially' the same, given the contextual specificities.

Having said that, printers definitely have that! I do have some printers (in Ocaml) that were abstracted out. To a certain extent, the ones in drasil-gool have inherited a lot of ideas from that. So there is definitely room for generalization.

@peter-michalski
Copy link
Collaborator

where by "higher level" encodings can be printed into "lower level" encodings

@balacij could you please further comment on this? I am trying to better understand "encodings" in Drasil. What exactly are "higher-level" and "lower-level" encodings in Drasil? Can you provide an example?

@JacquesCarette
Copy link
Owner

The notion of higher/lower level can be quite subjective at times. It is often a proxy for high = more abstract, low = closer to the hardware. It can also be used to mean high = more abstract, low = closer to the target.

@balacij
Copy link
Collaborator Author

balacij commented Jul 12, 2022

I definitely was using the wrong name (printing). Translation is much more accurate! Thank you!

In light of me reading the Czarnecki "Overview of Generative Software Development" paper again, I realize this, implicitly, was intended to capture generative domain models (a triple: input domain, mapping, and output domain). Looking at this as well, I do prefer your quadruple definition over the triple (and my quintuple definition). The quintuple definition was overthinking things, and the triple is problematic because it merges the "configuration data" with the input domain space data (which I imagine is bad in a typed world, but might be fine in a set world).

Regarding the "higher level" and "lower level" language, I think I should try to stay away from it a bit, it can be confusing and is highly subjective without very precise descriptions of "high" and "low" abstraction. "Higher" and "lower" should really only be discussed with respect to a defined line of abstraction or path on a network of domains (a graph of connected generative domain models). For example, it does not really make sense to compare Financial Annuities vs Sun Glasses as "higher" or "lower" because there is no sensible way to map them into each other at all. However, it does make sense to discuss a theoretical model (such as conservation of energy) as "high level" when compared to a specific instance of said theoretical model because they relate to each other in some way (one is a refinement of the other). Another example is that it does make sense to talk about "select scientific knowledge from the SRS documents" as "high level" compared to Java code because we can convert them into Java code (i.e., a generative domain model exists). One large caveat, however, is that "higher vs lower" makes no sense when a bijective generative domain model exists (e.g., it makes no sense to talk about "higher vs lower" when discussing things that can be converted into each other without loss of information). The "network of domains" idea, I believe, really needs to be well-defined for "higher vs lower" discussions to be unambiguous, but even then, some may argue about the way a network of domains was designed, so it will still be debatable/subjective. Since we stick to well-understood domains (i.e., scientific knowledge), I think it's a bit less subjective if you look at the design from a specific point of view/construction method, but I can be wrong.

Regarding the actual typeclass proposed, one of the 'nice' things about it is that it would allow us to gather all of our currently captured generative domain models under one nice umbrella, which we can easily grab information from and generate a network of domains graph.

I'm interested in the "duplicate functionality" you mentioned, where functions are "essentially the same." Do those translations just accidentally encode a nameless generative domain models mapping function in them as well? In other words, are those functions areas that we can use another smaller generative domain model for both? Is this a design issue of the concepts/chunks, a functional issue with typeclasses/Haskell, or something else?

One notable problem with having one name (i.e., gen) for all possible mapping functions is that our mapping functions become type-directed. So, if we have a string of gens composed together, GHC might get confused, and we would need to place explicit type signatures for each gen so that it can resolve a proper typeclass-instance path to use.

@peter-michalski
Copy link
Collaborator

peter-michalski commented Jul 12, 2022

Another example is that it does make sense to talk about "select scientific knowledge from the SRS documents" as "high level" compared to Java code because we can convert them into Java code

Drasil context aside, and considering just an SRS document and source code, could it not also sometimes be the other way around, depending on the intended target and depending on the levels of abstraction of the "input" and "output"? For example, what if the intended target, when considering the context of transformations, is the SRS document or something like it?

While that may be a dubious example, I'm making a point that I agree with

The "network of domains" idea, I believe, really needs to be well-defined for "higher vs lower" discussions to be unambiguous

Things that may seem to be unambiguous can still can be ambiguous. There can often be multiple interpretations if context is not specified - I think even if it is specified, the specification sometimes just decreases the range of interpretations.

@smiths
Copy link
Collaborator

smiths commented Jul 12, 2022

@peter-michalski for your example of code being higher level than the SRS, you are correct if the code generates the SRS. However, for this example, you should probably also think of the SRS as "code." That is what Drasil does. Drasil is a higher-level language that generates lower-level languages. Whether or not something is "higher" level is context-specific. We might say C code is low level, but machine code is lower level than C. The example of the SRS being higher level and Java code is about the representation of the knowledge in the SRS being more abstract than the representation in Java. If we are generating the SRS "code" then we have a representation of the knowledge that is even more abstract (general).

@balacij
Copy link
Collaborator Author

balacij commented Jul 12, 2022

It's exactly as @smiths discussed, thank you!

Regarding reversing this direction, it's really complicated because we are taking general-purpose language and trying to mechanically make it domain-specific (oddly enough, this is partially why ModelKinds was needed). However, if you can create some relation that does it, then there's nothing to really argue (it would obviously be possible, because you have a proof).

If we had a list of all of our generative domain models and graphed them into a network connecting all of them, I think we'd be able to look at what things we can call "high level" and "low level" compared to other things are by looking at the many paths in them.

Things that may seem to be unambiguous can still can be ambiguous. There can often be multiple interpretations if context is not specified - I think even if it is specified, the specification sometimes just decreases the range of interpretations.

I was assuming that the various input & output types would be "captured" and, as such, no longer up for different interpretation because we would only be looking at it through a specific lens (the mapping that defines how they are connected). For us, it will, of course, still be up for different interpretation, but I think Drasil has reproducibility down (partially thanks to Haskells purity).

@balacij
Copy link
Collaborator Author

balacij commented Jul 12, 2022

That brings up some interesting questions: what makes a generative domain model 'valid'? What properties must the input, output, mapping, and configuration knowledge obey, if any? What happens when they do/don't obey the properties?

@JacquesCarette
Copy link
Owner

If we had a list of all of our generative domain models

I'd prefer to remove the word 'generative' from that. Generation is a key tool we're using, but looking at all our domain models, regardless of technology, would still be a very valuable exercise.

I'm going to provide a (non-exhaustive!!) list of 'domains' that we have models for already:

  • mathematical expressions
  • pieces of English
  • units of measure
  • theories
  • kinds of theories
  • relations that associate a "definition" to a "concept"
  • a network of relations between other domains
  • layout on a page
  • code
  • programming languages

What you'll note is that the means by which these are mapped into our implementation vary greatly. And my breakdown is probably a bit buggy (i.e. not everything might be a 'domain').

In any case, all those 'domains' represent knowledge. Our encoding of them in Drasil/Haskell correspond to our internalization of that knowledge into a usable form, i.e. one on which we can apply transformational methods [i.e. back to the starting topic of this thread.] Now some of the above domains are 'second class' in that they cannot be subjected to transformations (ex: GOOL's encoding of PLs via finally tagless does not let one do transformations on PL definitions without resorting to Template Haskell, which is fine, we don't currently need that). Similarly, we can't add new kinds of theories or new kinds of layout without changing code. At this point, that's ok too.

Having an exhaustive list would be very useful indeed.

what makes a generative domain model 'valid'?

We take a very pragmatic approach to that: primarily, if it lets us do our job (generating softifacts) easily enough. We do know what the generated softifacts should be like. We get validity indirectly through that.

But there's indeed a second measure: a domain model should be easily explainable and rarely surprising. After the easy explanation, a listener should react with a "oh, that makes sense." Domains whose explanation doesn't pass that test are ripe for refactoring.

@balacij
Copy link
Collaborator Author

balacij commented Apr 27, 2023

I rambled a bunch, but this is a lot clearer for me. I think it would be nice to capture the 'transformers' under one hat, but that's a separate issue (which I'm sure will re-occur with #3003) 😄

@balacij balacij closed this as completed Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants