Skip to content

Dogen v1.0.19, "Impala Cine"

Compare
Choose a tag to compare
@mcraveiro mcraveiro released this 21 Nov 21:02
· 3852 commits to master since this release
v1.0.19
aed9a60

Imapala Cine

The open air cinema Impala Cine, in the city of Moçâmedes, Namibe, Angola. (C) 2019 Jornal O Record

Introduction

Whilst a long time in coming due to our return to gainful employment, Sprint 19 still managed to pack a punch both in terms of commitment as well as in exciting new features. To be fair, we didn't really plan to add any of these features beforehand - instead, we found ourselves having to do so in order to progress the real work we should have been focusing on. Alas, nothing ever changes in the life and times of a software developer.

But lets not dilly-dally! Without further ado, here's the review of yet another roller-coaster of a Dogen sprint.

User visible changes

This section covers stories that affect end users, with the video providing a quick demonstration of the new features, and the sections below describing them in more detail. Note that breaking changes are annotated with ⚠️.

Sprint 1.0.19 Demo
Video 1: Click on the thumbnail to view the sprint's demo

Add support for variability overrides in Dogen

The sprint's key feature is variability overrides. It was specifically designed to allow for the overriding of model profiles. In order to understand how the feature came about, we need to revisit a fair bit of Dogen history. As you may recall, since early on, Dogen has enabled users to supply meta-data to determine what source code gets generated for each modeling element. By toggling different meta-data switches, we can express quite differently two otherwise identical model elements: say, one can generate hashing support whereas the other can generate serialisation.

Observing its usage, we soon realised that the toggle switches added more value when organised into "configuration sets" that modeling elements could bind against, and this idea eventually morphed into the present concept of profiles. Profiles are named configurations which provide a defaulting mechanism for individual configurations, so that they could be reused across modeling elements and, eventually, across models. That is to say, profiles stem from the very simple observation that the meta-data used for configuration is, in many cases, common to several models and therefore should be shared. In the MDE and SPLE domains, these ideas have been generalised into the field of Variability Modeling, because, taken as a whole, they give you a dimension in which you can "vary" how any given modeling element is expressed; hence why they are also known in Dogen as "variability modeling", as we intend to be as close as possible to domain terminology.

Dogen's profile model
Figure 1: Snippet from dogen.profiles.dia model.

Of course, like all variability information carried in Dogen models, profiles are themselves associated to models via nothing but plain old meta-data - that is, its just configuration too . A typical Dogen model contains an entry like so:

#DOGEN masd.variability.profile=dogen.profiles.base.default_profile

The masd.variability.profile tells Dogen to reuse the configuration defined by the profile called default_profile - an entitty in the referenced model dogen.profiles (c.f. Figure 1).

This approach has served us well thus far, but it carried an implicit assumption: that models are associated with only one profile. As always, reality turned out to be far messier than our simplistic views. After some thinking, we realised that we have not one but two distinct and conflicting requirements for the generation of Dogen's own models:

  • parsimony: from a production perspective, we want to generate the smallest amount of code required so that we avoid bloating our binaries with unnecessary kruft. Thus we want our profiles to be lean and mean and our builds to be fast.
  • coverage: from a development and Q&A perspective, we want to test all possible facets with realistic use cases so that we can validate empirically the quality of the generated code. Dogen's own models are a great sample point for this validation, and should therefore make use of as many facets as possible. In this scenario, we don't mind slow builds and big binaries if it means a higher probability of detecting incorrect code.

This dilemma was not entirely obvious at the start because we could afford to generate all facets for all models and just ignore the bloat. However, as the number of facets increased and as the number of elements in each model grew, we eventually started to ran out of build time to compile all of the generated code. If, at this juncture, you are getting a strange sense of déjà vu, you are not alone. Indeed, we had experienced this very issue in the past, leading us to separate the reference models for C# and C++ from the core Dogen product in Sprint 8. But this time round the trouble is with Dogen itself, and there is nothing left to offload because there are no other obvious product boundaries like before. Interestingly, I do not blame the "short" build times offered by the free CI systems; instead, I see it as a feature, not a bug, because the limited build time has forced us to consider very carefully the impact of growth in our code base.

At any rate, as in the past with the reference models, we limped along yet again for a number of sprints, and resorted to "clever" hacks to allow these two conflicting requirements to coexist for as long as possible, such as enabling only a few facets in certain models. However, we kept increasing the generated code a lot, first with the addition of generated tests (Sprint 13) and this sprint with the relational model. The CI just took too many hits and there were no quick hacks that could fix it. As a result, CI become less and less useful because you started to increasingly ignore build statuses. Not being able to trust your CI is a showstopper, of course, so this sprint we finally sat down to solve this problem in a somewhat general manner. We decided to have two separate builds, one for each use case: nightlies for the coverage, since it runs overnight and no one is waiting for them, and CI for the regular production case. And as you probably guessed by now, we needed a way to have a comprehensive profile for nightlies that generates everything but the kitchen sink whereas for regular CI we wanted to create the aforementioned lean and mean profiles. Variability overrides was the chosen solution. From a technical standpoint, we found this approach very satisfying because it makes variability itself variable - something any geek would appreciate.

The implementation is as follows. A new command line option was added to the Processing section, named --variability-override:

Processing:
<snip>
  --variability-override arg     CSV string with a variability override. Must 
                                 have the form of [MODEL_NAME,][ELEMENT_NAME,][ATT
                                 RIBUTE_NAME,]KEY,VALUE

The first three optional elements are used to bind to the target of the override (e.g., [MODEL_ID,][ELEMENT_ID,][ATTRIBUTE_ID,]). The binding logic is somewhat contrived:

  1. if no model is supplied, the override applies to any model, else it applies to the requested model;
  2. if no element is supplied, the override is applicable only to the model itself;
  3. if an element is supplied, the binding applies to that specific element;
  4. an attribute can only be supplied if an element is supplied. The binding will only activate if it finds a matching element and a matching attribute.

To be honest, given our use case, we only really needed the first type of binding; but since we didn't want to hard-code the functionality, we came up with the simplest possible generalisation we can think of and implemented it. There are no use cases for overrides outside of profiles, so this implementation is as good as any; as soon as we have use cases, the rules can be refined.

Dogen uses this new command line option like so:

    if (WITH_FULL_GENERATION)
        set(profile "dogen.profiles.base.test_all_facets")
        set(DOGEN_PROCESSING_OPTIONS ${DOGEN_PROCESSING_OPTIONS}
            --variability-override masd.variability.profile,${profile})
    endif()

By supplying WITH_FULL_GENERATION to the nightlies' CMake, we then generate all facets and tests for all facets. We then build and run all of the generated code, including generated tests. Surprisingly, we did not have many issues with most generated code - with a few exceptions, which we had to ignore for now. There are also two failures which require investigation and shall be looked into next sprint. Once the change went in, the CI build times decreased dramatically and are now consistently always below the time out threshold.

CDash
Figure 2: Continuous and nightly builds in CDash after the change.

One last mention goes to code coverage. We hummed and harred a lot about the right approach for code coverage. On one side, generated tests gave us a lot of code coverage, which was very satisfying - we went from 30-40% to 80%! On the other hand, these "tests" were just validating basic functionality for Dogen types, not actual domain functionality. So, in some ways, it is misleading to use generated tests to determine overall product coverage, because it is covering different "kinds" of aspects about the code. At the same time, it is very important to know the generated tests coverage because it is indicative of missing sanity checks in Dogen. We finally settled on having two different coverage reports, fed by the two different builds. This vision has not yet been fully materialised as the nightlies are not updating codecov for some reason, but will hopefully happen in the near future.

Tracing of model dependencies

The second feature implemented this sprint is the addition of model references tracing. This work was done in the same vein as the transforms tracing (See Sprint 12 for details) and reused much of the same infrastructure; you'll get the new tracing reports for free when you enable tracing via the existing flags. As an example, Dogen uses the following configuration when we require tracing:

--tracing-enabled --tracing-level detail --tracing-format org-mode --tracing-guids-enabled

Like with transforms, we can generate three different types of tracing reports depending on the choice of --tracing-format: plain, org-mode and graphviz. plain is just a text mode representation of the references graph:

Dogen's profile model
Figure 3: References graph in plain format.

The org-mode version offers the usual interactivity available to org-mode documents in Emacs such as folding, unfolding, querying and so on:

Dogen's profile model
Figure 4: References graph in org-mode format.

Finally, as before, the graphviz output requires further processing with the dot tool before it can be visualised:

dot -Tpdf references_graph.dot -O

The resulting PDF can be opened with any PDF viewer. We find it very useful because it gives a clear indication of the "complexity" of a given model. Of course, at some point in the future, we will want to convert these visual "complexity" indicators into metrics that can be used to determine the "health" of a model, but, as always, there are just not enough hours in the day to implement all these cool features.

Dogen's profile model
Figure 5: References graph in graphviz format, after processing with dot tool.

Split generated tests from manual tests

As we've already mentioned, generated tests were added to Dogen in Sprint 13 as a way to sanity check all generated code. Though we did test generated code prior to this, we did so manually - read haphazardly, as we kept forgetting to add manual tests to new types. When we implemented it originally, we thought it would be a good idea to mix-and-match generated tests with manual tests, as we do with all other facets. However, given the requirements discussed above in the variability overrides story, it was rather inconvenient to have this mixture because it meant we could not rely on the presence of the required build files.

This sprint we took the decision to split generated tests from manual tests, and it must be said, it has improved the project design a fair bit. After all, the purpose of generated tests is just to make sure Dogen generated code is working as expected, and that is largely an internal concern of Dogen developers. More work is required in this area to polish up the support for manual tests though.

Small bug fixes

Several small but important bug fixes went in with this release:

  • Meta-data keys are processed in the inverse order: A very old but rather annoying bug we had in Dogen is that meta-data keys were being processed in reverse order of entry. For example, if a model A referenced models B and C, for some unfathomable reason, Dogen would process it as C and B. This resulted in a great deal of confusion when troubleshooting because we assumed all references in log files etc. would first start with B, not C. This release fixes the bug, but as a result, a lot of the generated code will move places. It should be semantically equivalent, just with a different order. ⚠️
  • Tracer numbering of dumped models is incorrect: for some reason our trace files were skipping numbers (e.g. 000 then 002, and so forth). This was very distracting when trying to analyse a problem. In addition, the previous logic of numbering the traces after a transform was abandoned; instead of having 000 for both the input and output of a transform, we now have 000 and 001. It was a nice thought but required a lot of complexity to implement.
  • Creating reference cycles produces strange errors: In the past, adding a reference cycle in a model resulted in very puzzling errors, entirely unconnected to the problem at hand. With this release we now correctly detect cycles and refuse to generate code. We do not yet have use cases for models with cycles, so for now we just took the brute force approach. Note that we also check for references to the model itself - a typo that in the past resulted in long investigations. It is now correctly detected and reported to the user.
  • Error on duplicate references: Similarly to cycles, adding the same reference more than once is now considered a bug and it is detected and reported to the user. In the past, we used to silently ignore these. The main reason why is because it normally happens as a result of copy and paste bugs, and so its best to inform users immediately. ⚠️

Deprecations

"Master headers" were a feature of Dogen which we haven't really used all that much. It enabled you to have a single include file for all files in a facet - e.g. a serialisation include, or a hashing include. These were used in the past when we had manual tests for the generated code, just to save us the effort of manually including a whole load of files. With the arrival of generated tests in Sprint 13, the feature was no longer used within Dogen. In addition, these days most C++ developers consider these "master includes" as anti-patterns, and a violation of "pay for what you use" because you invariably end up including more files than you need. Due to all of this we removed the feature from Dogen. ⚠️

Development Matters

In this section we cover topics that are mainly of interest if you follow Dogen development, such as details on internal stories that consumed significant resources, important events, etc. As usual, for all the gory details of the work carried out this sprint, see the sprint log.

Milestones

This is the 100th release of Dogen made from GitHub. Overall, its the 120th release, but had a private repo for those first 20 releases and the tags were lost in translation somewhere.

100th release
Figure 6: 100th release of Dogen from GitHub.

Significant Internal Stories

Given that most stories had a user-facing impact, this sprint is short on user facing stories. There are a couple that are worth a mention though.

Updating Boost Version

We've started yet another of those mammoth efforts of trying to update all of our dependencies to use the latest version of Boost. It would be fairer to call this story "updating of toolchains across the estate" since it more or less involves that kind of effort. Now that we are on vcpkg, this should be a straightforward task, but in practice it never is. The main problems are OSX and Windows, two operative systems that somehow seem to always cause weird and wonderful problems. Predictably, we completed the work for Linux, did some of it for Windows and pretty much none of it for OSX. At present, our local setup on OSX is, well and truly borked and we just do not have enough cycles to work on fixing it so the story will remain parked for the foreseeable future.

Implementing the relational model

We had great ambitions this sprint of implementing a relational model for tracing that would enable us to write complex queries to diagnose problems across the Dogen pipeline. We did do quite a lot of work on this, but it was entirely overshadowed by the other problems we had to solve. We won't spend too much time talking about this feature this sprint, waiting instead for its completion.

Recording of coding sessions

Since we've started Dogen all those years ago, we've been searching for "motivational tools" that enable us to continue working on such a long term endeavour without losing the initial hunger. A few successful tools have been incorporated in this way:

  • blog posts narrating particular aspects of Dogen development - e.g. Nerd Food: The Refactoring Quagmire.
  • agile management of sprints using org-mode, giving us a fine grained view of the activity on a sprint - e.g. sprint backlog and a highly curated product backlog. For the importance of curation, see Nerd Food: On Product Backlog.
  • creation of release notes at the end of every sprint as a way to reflect on what was achieved - the document you are reading.
  • creation of demos to visualise the features implemented.

These tools are rare to come by, and, of those that do, most get discarded because they do not fit our way of working. This sprint we found yet another "motivational tool": the recording of coding sessions as YouTube videos. This idea was completely inspired on Andreas Kling's channel, which we highly recommend to anyone who likes programming and C++ in particular. For our "channel", we decided to create a playlist narrating much of the coding that happened this sprint: MASD - Dogen Coding: Relational Model for Tracing. To be sure, with 13 episodes and at over 10 hours of video, this playlist is for the true die-hard fan of Dogen. But the most important aspect from our perspective was that the recording of videos had a positive impact:

  • it forces you to think about what you're doing, just as when you are pair programming;
  • it impeled us to work on days were perhaps we wouldn't have. This may be the novelty factory of seeing oneself on YouTube, of course, but it certainly worked for this sprint. We even managed to get one subscriber and one comment, which was rather surprising.

The one downside is that it is very difficult to focus on complex tasks whilst talking and recording. It is thus no silver bullet, but certainly a useful weapon in the arsenal. We shall continue recording videos next sprint. You can watch the first video of the playlist here, and it is mercifully only 10 minutes long:

MASD - Dogen Coding: Relational Model for Tracing - Part 1
Video 2: Click on the thumbnail to view the first part of the coding playlist

Resourcing

This sprint was marked by the return to "part-time" development on Dogen. After a cadence of eight successful 2-week sprints, it was rather difficult to adjust back to the long, drawn-out process of cobbling together a release from whatever spare time one can find. As you may recall, the target for a "part-time" sprint is to clock around 80 hours worth of work over a rather unpredictable period of time. To be fair, most of Dogen has been developed in this fashion, but it is just not ideal fodder for programming. This is because part-time sprints naturally lend themselves to more fragmented work, given both the typically short-duration time slots available, and the fact that most of these are of rather dubious quality. The 22:00 slot comes particularly to mind - also fondly known known as the graveyard shift. Whilst there are advantages to some resource starvation - described at length in Nerd Food: Dogen: Lessons in Incremental Coding - it is also undoubtedly true that it is much harder to focus on complex tasks that require loading a lot of state into the brain. Nonetheless, “you go to war with the army you have, not the army you might want or wish to have at a later time”, and excuses do not write code, so one must make the most of the prevailing conditions.

To be fair, not all was gloom and doom with Sprint 19, and much was achieved. Let's review how the resourcing (~87 hours) was distributed across stories. At 11.5% of the ask, upgrading to Boost 1.70 was the biggest story this sprint, closely followed by the work on the relational model (11%). Several stories hovered around the 6-7% mark, in particular the splitting of generated tests from manual tests (6.7%), the far-out thought experiments on org-mode as a carrier format for modeling (6.5% - we clearly got carried away here), and the improvements around checking for reference cycles (6.4%). Very much hidden in the list of stories is what we'd consider the "target" story - moving registrar into assets (6.3%) - but it was blocked because we are having some hard-to-debug issues with it, and require the support of the relational model to proceed. At 6% we have the meta-data overrides support, followed by a long tail of smaller stories - all the way from 5.7% creating the modeling reports in tracing to a minuscule 0.1% for upgrading to Clang 9 and GCC 9. The sprint is clearly demonstrating the impact of moving to part-time work, as expected. Finally, an important mention goes to the almost 16% spent in process related activities (backlog grooming, release notes, video editing for demo and coding sessions), down from 19% from the previous sprint. This is rather unexpected given that we've spent a lot of time recording the coding sessions this sprint, and implies they are very low overhead.

Story Pie Chart

Roadmap

We've renamed the "Planning" section to roadmap because it more adequately reflects its role: we are not actually forecasting, merely keeping track of outstanding activities and making some very weak correlations between them and a potential end date. The roadmap was clearly affected by the move to part-time, and looks more or less as was last sprint - just projected forwards in time. We also haven't quite figured out how to take into account "part-time" in Task Juggler, so the "estimates" are extremely optimistic. This is something to fix next sprint, hopefully.

Project Plan

Resource Allocation Graph

Next Sprint

The main focus next sprint is going to be to wrap things up with the relational model and to use it to diagnose problems when moving elements from generation to assets.

Binaries

You can download binaries from Bintray for OSX, Linux and Windows (all 64-bit):

Note: The OSX and Linux binaries are not stripped at present and so are larger than they should be. We have an outstanding story to address this issue, but sadly CMake does not make this trivial.

For all other architectures and/or operative systems, you will need to build Dogen from source. Source downloads are available below.

Happy Modeling!