Skip to content

Dogen v1.0.20, "Oasis do Arco"

Compare
Choose a tag to compare
@mcraveiro mcraveiro released this 23 Jan 00:43
· 3636 commits to master since this release
v1.0.20
439aa7b

Oasis do Arco
Arco Oasis, Namibe, Moçamedes, Angola. (C) 2011 Paulo César Santos

Introduction

New year, new Dogen sprint! At around two months of elapsed time for 83 hours worth of commitment, this was yet another long, drawn-out affair, and the festive period most certainly did not help matters. Having said that, the sprint was reasonably focused on the mission at hand: making the relational model just about usable. In doing so, it provided its fair share of highs and lows, and taught a great deal of lessons - more than we ever wished for. Ah, the joys, the joys. But, onwards we march!

User visible changes

This section covers stories that affect end users, with the video providing a quick demonstration of the new features, and the sections below describing them in more detail. There were only a few small features this sprint, and there are no breaking changes.

Sprint 1.0.20 Demo
Video 1: Sprint 20 Demo.

Add ODB type overrides to primitives

ORM type overrides had not been used in anger until the relational model was introduced (see below), and, as a result, we did not notice any problems with its implementation. Because the relational model makes heavy use of JSONB, we quickly spotted an issue when declaring type overrides inline with the column (i.e., at the attribute level):

#DOGEN masd.orm.type_override=postgresql,JSONB

According to the ODB manual, this incantation is not sufficient to cope with conversion functions and other more complex uses. And so, with this sprint, type mapping was updated to take advantage of ODB's flexibility. You can now define type mappings at the element level:

#DOGEN masd.orm.type_override=postgresql,JSONB
#DOGEN masd.orm.type_mapping=postgresql,JSONB,TEXT,to_jsonb((?)::jsonb),from_jsonb((?))
#DOGEN masd.orm.type_mapping=sqlite,JSON_TEXT,TEXT,json((?))

You can then make use of it at attribute level, as previously. An even better scenario is to define a masd::primitive for the type, which takes care of it for you, and generates code like so:

#pragma db member(json::value_) column("") pgsql:type("JSONB")

For example uses of JSONB, please look at the discussion on the relational model in section Significant Internal Stories below.

Allow outputting the model's SHA1 hash in decoration

The decoration marker has been expanded to allow recording the SHA1 hash of the target model. This is intended as a simple way to keep track of which model was used to generate the source code. In order to switch it on, simply add add_origin_sha1_hash to the generation marker:

Decoration marker
Figure 1: Sample decoration marker, obtained from the C++ Reference Model.

The generated code will then contain the SHA1 hash:

/* -*- mode: c++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
 *
 * This is a code-generated file.
 *
 * Model SHA1 hash: be42bdb7f246ad4040f17dbcc953222492e1a3bf
 * WARNING: do not edit this file manually.
 * Generated by MASD Dogen v1.0.20

Sadly the SHA1 hash does not match the git hash; however, one can easily use sha1sum to compute the hash manually:

$ sha1sum cpp_ref_impl.lam_model.dia
be42bdb7f246ad4040f17dbcc953222492e1a3bf  cpp_ref_impl.lam_model.dia

Before we move on, there are a couple of points worthy of note with regards to this feature. First and foremost, please heed the following warning:

⚠️ : Important: Remember that SHA1 hashes in Dogen are NOT a security measure; they exist only for informational purposes.

Secondly, as we mentioned in the past, features such as these (e.g. date/time, Dogen version, SHA1 hash, etc.) should be used with caution since they may cause unnecessary changes to generated code and thus trigger expensive rebuilds. As such, we recommend that careful consideration is given before enabling it.

Improvements in generation timestamps

For the longest time, Dogen has allowed users to stamp each file it generates with a generation timestamp. This is enabled via the parameter add_date_time, which is part of the generation marker meta-element; for an example of this meta-element see the screenshot above, where it is disabled.

When enabled, a typical output looks like so:

/* -*- mode: c++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
 *
 * This is a code-generated file.
 *
 * Generation timestamp: 2020-01-22T08:29:41
 * WARNING: do not edit this file manually.
 * Generated by MASD Dogen v1.0.21
 *

In this sprint we did some minor improvements around the sourcing of this timestamp. Previously, we obtained it individually for each and every generated file, resulting in a (possibly) moving timestamp across a model generation. With this release, the timestamp for a given activity - e.g. conversion, generation, etc. - is now obtained once upfront and reused by all those who require it. Not only is this approach more performant but it yields a better outcome because users are not particularly interested in the precise second any given file was generated, but care more about knowing when a given model was generated.

In addition, we decided to allow users to control this timestamp externally. The main rationale for this was unit testing, where having a moving timestamp with each test run was just asking for trouble. While we were at it, we also deemed sensible to allow users to override this timestamp, if, for whatever reason, they need to. Now, lest you start to think we are enabling "tampering", we repeat the previous warning:

⚠️ Important: Remember that generation timestamps in Dogen are NOT a security measure; they exist only for informational purposes.

With that disclaimer firmly in hand, lets see how one can override the generation timestamp. A new command line argument was introduced:

Processing:
<SNIP>
  --activity-timestamp arg       Override the NOW value used for the activity
                                 timestamp. Format: %Y-%m-%dT%H:%M:%S

For instance, to change the generation timestamp of the example above, one could set it to --activity-timestamp 2020-02-01T01:01:01, obtaining the following output:

/* -*- mode: c++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
 *
 * This is a code-generated file.
 *
 * Generation timestamp: 2020-02-01T01:01:01
 * WARNING: do not edit this file manually.
 * Generated by MASD Dogen v1.0.21

Clearly, this is more of a troubleshooting feature than anything else, but it may prove to be useful.

Development Matters

In this section we cover topics that are mainly of interest if you follow Dogen development, such as details on internal stories that consumed significant resources, important events, etc. As usual, for all the gory details of the work carried out this sprint, see the sprint log.

Milestones

The 9999th commit was made to Dogen this sprint.

100th release
Figure 2: GitHub repo at the 9999th commit.

Significant Internal Stories

The sprint was mostly dominated by one internal story, which this section describes in detail.

Add relational tracing support

This sprint brought to a close work on the relational model. It was the culmination of a multi-sprint effort that required some significant changes to the core of Dogen - particularly to the tracing subsystem, as well as to ORM. The hard-core Dogen fan may be interested in a series of videos which captured the design and development of this feature:

MASD - Dogen Coding: Relational Model for Tracing - Part 1
Video 2: Playlist "MASD - Dogen Coding: Relational Model for Tracing".

The (rather long) series of videos reached its "climax" on sprint 21, but (spoiler alert) its "TL; DR" is that it is now possible to dump all information produced by a Dogen run into a relational database. This includes both tracing data as well as all logging, at the user-chosen log level. It is important to note that a full run in this manner is slow: dumping all of Dogen's models (18, at the present count) can take the best part of an hour. Interestingly, the majority of the cost comes from dumping the log at debug level. A dump with just tracing information takes less than 10 minutes, making it reasonably useful. Regardless of the wait, once the data is in the database, the full power of SQL and Postgres can be harnessed.

Implementation-wise, we decided to take path of least resistance and create a small number of tables, code-generated by Dogen and ODB:

musseque=> \dt
            List of relations
 Schema |      Name       | Type  | Owner 
--------+-----------------+-------+-------
 DOGEN  | LOG_EVENT       | table | build
 DOGEN  | RUN_EVENT       | table | build
 DOGEN  | TRANSFORM_EVENT | table | build
(3 rows)

Models and other complex data types stored in JSONB fields, e.g.:

musseque=> \dS "RUN_EVENT"
                            Table "DOGEN.RUN_EVENT"
     Column     |            Type             | Collation | Nullable | Default 
----------------+-----------------------------+-----------+----------+---------
 TIMESTAMP      | timestamp without time zone |           |          | 
 RUN_ID         | text                        |           | not null | 
 EVENT_TYPE     | integer                     |           | not null | 
 VERSION        | text                        |           | not null | 
 PAYLOAD        | jsonb                       |           | not null | 
 ACTIVITY       | text                        |           | not null | 
 LOGGING_IMPACT | text                        |           | not null | 
 TRACING_IMPACT | text                        |           | not null | 
Indexes:
    "RUN_EVENT_pkey" PRIMARY KEY, btree ("RUN_ID", "EVENT_TYPE")

Though by no means trivial, this approach required fewer changes to Dogen itself, pushing instead the complexity to the queries over the generated dataset. This seemed like a worthwhile trade-off at the time, because normalising a Dogen model in code was a non-trivial exercise. Nonetheless, as we sooon find out, writing queries with complex JSON documents and multiple rows is not an entirely trivial exercise either. As an example, the following query returns objects in a Dia diagram:

create or replace function classes_in_diagram(in p_transform_instance_id text)
    returns table("ID" text, "NAME" text)
as $$
    select "ID", substring(attrs."ATTRIBUTES"->'values'->0->'data'->>'value', 2,
            length(attrs."ATTRIBUTES"->'values'->0->'data'->>'value') - 2
        ) "NAME"
    from (
        select
            objects."OBJECT"->>'id' "ID",
            objects."OBJECT"->>'type' "TYPE",
            jsonb_array_elements(objects."OBJECT"->'attributes') "ATTRIBUTES"
            from (
                select * from dia_objects_in_diagram(p_transform_instance_id)
            ) as objects
     ) as attrs
     where
         attrs."ATTRIBUTES"->>'name' like 'name' and "TYPE" like 'UML - Class';
$$ language 'sql';

This function can be used as follows:

=> select * from dia_objects_names_and_stereotypes('8ce7069e-6261-4f9f-b701-814bed17cafb');
 ID  |    NAME     |        STEREOTYPES
-----+-------------+----------------------------
 O1  | cpp         | masd::decoration::modeline
 O2  | cs          | masd::decoration::modeline
 O3  | cmake       | masd::decoration::modeline
 O4  | odb         | masd::decoration::modeline
 O5  | xml         | masd::decoration::modeline
 O7  | xml         | masd::decoration::modeline
 O8  | odb         | masd::decoration::modeline
 O9  | cmake       | masd::decoration::modeline
 O10 | cs          | masd::decoration::modeline
 O11 | cpp         | masd::decoration::modeline
 O13 | apache_v2_0 | masd::decoration::licence
 O14 | bsl_v1_0    | masd::decoration::licence
 O15 | gpl_v2      | masd::decoration::licence
 O16 | gpl_v3      | masd::decoration::licence
 O17 | proprietary | masd::decoration::licence
 O18 | sln         | masd::decoration::modeline
 O19 | sln         | masd::decoration::modeline

A library of assorted functions was assembled this way (see functions.sql), and proved useful enough to track the problem at hand which was to figure out why the new meta-element registrar was not being generated. In addition, the expectation is that, over time, more and more powerful queries will be written, allowing us to better exploit the available information. However, it must be said that the complexity of writing JSONB queries is much higher than anticipated, and as such, the feature is not quite as useful as we envisioned. With a bit of luck, next sprint we shall produce a blog post narrating in more detail the saga and its somewhat surprising conclusions.

Resourcing

Now that we have moved to part-time sprints, looking only at the overall commitment makes less sense; after all, by definition, one is guaranteed to have around 80 hours of work on a sprint. Whilst pondering on this matter, another interesting measure popped up on our radars: the utilisation rate - though, perhaps, not yet its final name. The utilisation rate is computed as the number of days on a full time sprint (e.g., 14) divided by the total number of days elapsed since the previous sprint. The utilisation rate measures how "expensive" a day of work is in terms of elapsed days. A high utilisation rate is good, and a low one is bad; on a good sprint we are aiming for close to 50%. In this particular sprint our utilisation rate was around 23%. Since the previous sprint involved a long stretch where we were not doing any work at all, we do not have any comparative figures, but we'll keep tracking this number from now on and hopefully it will became a useful indicator.In terms of our more traditional measurements, the sprint was rather well behaved, as the chart demonstrates:

Story Pie Chart
Figure 3: Cost of stories for sprint 20.

Some 45% of the total committed time was taken by the relational model and related activities; and even diversions such as the SHA1 hashes (6.8%) and improvements on generation timestamps (2.3%) were actually byproducts of this work. In terms of process, this was an expensive sprint: whilst the demo was cheap (3%), the release notes were very expensive (13.7%) and so was backlog grooming (5.7%), resulting on an overall figure of 22.4% for process - one of the most costly sprints in this department. Part of this is related to the amount of "uncoordinated" work that had been carried out previously and which was difficult to describe in a manner suitable for the release notes (remember that demo and release notes describe the work of the previous sprint, e.g. sprint 19 in this case). All and all, for a part time sprint, it was a rather successful one, though we are clearly aiming for a higher utilisation rate for the next one.

Roadmap

We still haven't quite managed to get the roadmap to work for us, but it seems to provide some kind of visual indication of just how long the road ahead is so we're keeping it for now. However, for it to became truly useful in our current process it requires some more tuning. Perhaps some time spent learning task juggler is in order...

Project Plan

Resource Allocation Graph

Next Sprint

Now that the relational model is out of the way, the focus on meta-model entities and the fabric clean-up is resumed once more. We are hoping to get one or two of these entities out of the way by sprint end.

Binaries

You can download binaries from Bintray for OSX, Linux and Windows (all 64-bit):

Note: The OSX and Linux binaries are not stripped at present and so are larger than they should be. We have an outstanding story to address this issue, but sadly CMake does not make this trivial.

For all other architectures and/or operative systems, you will need to build Dogen from source. Source downloads are available below.

Happy Modeling!