Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Domain Design of Collaboration System in Epilogue #305

Open
gregbrowndev opened this issue May 5, 2020 · 7 comments
Open

[Question] Domain Design of Collaboration System in Epilogue #305

gregbrowndev opened this issue May 5, 2020 · 7 comments
Assignees

Comments

@gregbrowndev
Copy link

gregbrowndev commented May 5, 2020

Hi there,

First, I want to thank you for your book. It has been a tremendous help in my journey so far in learning DDD. I am beginning to apply what I've learned to a Django project that has caused me much pain over the last 2 years. Many of the larger concepts have started to click into place, which feels amazing. However, I am struggling with the basic domain modelling of my problem.

The only reason I ask such for specific help is that it is similar to the document collaboration system that you talk about in the epilogue.

The app involves allowing transport authorities to publish different types of datasets via basic file uploads. Each dataset type may have different rules and processes, e.g. validation. However, the part that I'm having the most difficulty with is the generic CMS functionality that you would find in any off the shelf CMS.

In brief, the publisher creates a 'draft' version of their dataset which must be submitted, validated and then reviewed before it can be published. And, to edit a published dataset, a new draft must be submitted, reviewed, etc. There can only be a single draft for review at a time, and the user(s) can edit the draft any number of times (though note: real-time collaboration isn't the purpose of the app). Once a version has been published, it cannot be modified.

The challenge I'm having is in defining what the aggregate should look like, where the responsibility should lie for how the client creates and edits the revisions, and how to make the root responsible for these invariants.

The code snippet below defines a 'Publication' aggregate. Partly the reason I'm not 100% confident is that as it states in blue book (in the chapter about factories):

If the client is expected to assemble the domain objects it needs, it must know something about the internal structure of the object.

but it also says:

A Factory Method on the aggregate root encapsulates expansion of the aggregate.

so I can see the start_revision method is like a factory method to expand the aggregate, but it still feels quite awkward that the client has to pass back and forth the revision objects and thus be aware of the Publication's internal structure.

Do you have any tips on how I can improve my approach to tackling this design problem? Or how you dealt with Document version tracking from the project that you worked on? Thank you for any help you may offer.

from pydantic import BaseModel
from pydantic.generics import GenericModel

T = TypeVar("T", bound="Revision")
PublicationId = NewType("PublicationId", int)

class Publication(GenericModel, Generic[T]):
    id: PublicationId
    organisation: OrganisationId

    live: Optional[T]
    draft: Optional[T]
    revisions: List[T]

    # start_revision() -> T       -- return (partial) copy of `live` or empty T object
    # save_revision(revision: T)  -- assigns revision to `draft`
    # publish_revision()          -- assuming valid, assign `draft` to `live` and append to `revisions`


class Revision(BaseModel):  # can be extended for different types of datasets
    publication_id: PublicationId
    created_at: datetime.datetime

    name: str
    description: str
    comment: str

    published_at: Optional[datetime.datetime] = None
    published_by: Optional[UserId] = None

    def can_edit(self):
        return self.published_at is None
@bobthemighty
Copy link
Contributor

Hi @gregbrowndev . It seems to me that the Publication here is acting more like a repository for Revisions. What's the data required for creating a new Revision? If the data are simple enough then this might be appropriate, but usually I think you'd want to limit the arguments on Aggregate Root methods to primitives and value objects.

An alternative is for the Publication to expose methods for mutating state.

class Publication:

    def submit_comment(...):
        """
        Append a new comment to the current draft
         """

    def submit_draft(self, name: str, description: str):
       """
        Create a new draft of the dataset
        """

    def publish_current_draft(...):
        """
        Publish the current draft
        """

Lastly, it's worth noting that not everything has to be written this way. If the draft approval and revision process is truly generic, then you might as well just use Django CRUD and keep it simple. Save the fancy modelling for the parts of your system that make your domain special and complex.

@gregbrowndev
Copy link
Author

Hi @bobthemighty,

I greatly appreciate your reply.

It seems to me that the Publication here is acting more like a repository for Revisions.

Publication does act as a root for the set of Revisions in that it has a global identifier that other non-publisher users use to find the live (most recent) revision (they are not aware of the draft but can see revision history). This ID doesn't change when new revisions are published. Do you think it is still better suited as a repo.?

What's the data required for creating a new Revision? If the data are simple enough then this might be appropriate

The data that the publishers submit is fairly simple, it is just a handful of fields/metadata describing their upload. E.g.

class TimetableDataset(Revision):
    url: str
    filename: str
    admin_areas_names: List[str]
    publisher_creation_datetime: Optional[datetime.datetime]
    first_expiring_service: Optional[datetime.datetime]

class GPSFeedDataset(Revision):
    url: str
    username: str
    password: str
    requestor_ref: str
    is_capability_request: Optional[bool]

class FaresDataset(Revision):
    ...

class DisruptionDataset(Revision):
    ...

The main problem that we're trying to get away from is that this class hierarchy is currently all stored in the same Django model via single table inheritance. Having grown organically this way alongside the revision functionality, it's now virtually impossible to change, as all the views and test code are highly coupled to this one model. And, as it isn't even the main queryable root model, this precludes the possibility of using something like Django's proxy model to easily separate out data and behaviours.

We basically have the same problem that you describe in the epilogue, where our domain model is inexpressive as it is forced to mirror the data model. In Django dot syntax it looks like:

organisation.dataset.versions.is_published()[0].published_by.account.settings[0]  # or
organisation.dataset.is_timetable().versions.is_draft().timetable_report.warnings  # etc.

usually I think you'd want to limit the arguments on Aggregate Root methods to primitives and value objects.

If I were to treat the different Revision subclasses as Value Objects, or perhaps have the Revision compose the Dataset (since there are fields on the Revision the users shouldn't change, e.g. published_by) and treat the Dataset as a Value Object, then would it be OK to pass those in/out of the aggregate?

Again, I am grateful for any help.

@simkimsia
Copy link

Hey @gregbrowndev 👋 how are you?

I'm interested in this topic as I have something similar. But first a clarifying question.

You wrote

this class hierarchy is currently all stored in the same Django model via single table inheritance.

Do you mean

  • the concrete table inheritance pattern

so this is where there's a table for revision and each revision can be a GPSFeedDataset, TimetableDataset, etc. the common fields are in the revision table, and those fields unique to individual model are in separate child tables

  • Or do you mean you have a giant table that contains all the columns for all the various types of revision, and so there will be null data for certain fields depending on the individual rows?

@gregbrowndev
Copy link
Author

Hey @simkimsia,

I'm well thanks. Hope you are too.

(And to note, I don't mean to follow you around 🙈)

To answer your question, yes it is the second one! The two tables in particular in the DB are Dataset and DatasetRevision. The DatasetRevision table contains all of the columns for all types of datasets (we only started with one type).

Ignoring the fact the table is sparse / has poor data integrity, it wouldn't be that much of an issue if we could get it back into a nice structure. Partly the reason we ended up with this was to avoid adding even more levels of traversal to get to the important data, (i.e. dataset.revisions[0].timetable..., dataset.revisions[0].gps..., etc.), not to mention the fact the type of dataset shouldn't change between versions...

SQLAlchemy has support for Single Table Inheritance that I'm considering for my mapper layer. (It is possibly a bit bonkers using Django ORM and SQLAlchemy as a mapper side-by-side, but anyhow...)

@gregbrowndev
Copy link
Author

BTW @simkimsia, I tried out the classic data mapper approach with SQLAlchemy alongside my Django app. It definitely would work quite elegantly but it seemed to have compatibility issues with mapping onto the Pydantic models. It probably wouldn't be too much of a loss to remove Pydantic, as it says in Appendix E, the validation can be done at the edges. However, SQLAlchemy manages its own DB session, of course, and completely bypasses Django's ORM, so I don't know how many would feel about that at work. So I've decided to put it aside and implement the mappers imperatively with the Django ORM.

@simkimsia
Copy link

simkimsia commented May 7, 2020

(And to note, I don't mean to follow you around 🙈)

hahaha but i did recommend cosmicpython to you. Hope you find it as useful as I do! 🙌

Anyway, I have other clarifying questions I like to ask.

Question 1

probably wouldn't be too much of a loss to remove Pydantic, as it says in Appendix E, the validation can be done at the edges.

So are you saying pydantic was overkill or totally didn't work in Appendix E or something else? In a weird coincidence I was doing my own research into the chapters 8, 9, 10, 11 of cosmicpython and ended up asking pydantic abt appendix E as well. Comparing pydantic against schema which was recommended in Appendix E by the authors as a way for validation. The question to pydantic is posted here pydantic/pydantic#1486

Question 2

Have you tried django typed models? https://github.com/craigds/django-typed-models it allows single table inheritance in Django tho it was last updated last year May

Question 3

implement the mappers imperatively with the Django ORM.

when you manage to get it to work eventually, regardless if u use django orm imperatively with the mappers, or something else, can share with me your eventual architectural decision and snippets of the implementation? I am curious myself Thank you 🙏

@gregbrowndev
Copy link
Author

I did recommend cosmicpython to you. Hope you find it as useful as I do!

@simkimsia yes, it has been super helpful! 🙏

So are you saying pydantic was overkill or totally didn't work in Appendix E or something else?

To be honest, I haven't fully digested all the material in Appendix E yet or thought too much about validation. I am still using Pydantic in all my entities, as I like that it works nicely with DryPython's use case contracts. I didn't want to lose that functionality in favour of SQLAlchemy (since that brings its own substantial complications).

I ended up asking pydantic abt appendix E as well. Comparing pydantic against schema which was recommended in Appendix E by the authors as a way for validation

That discussion is actually really helpful. Up to now, I based all my entities around Pydantic just because of the story contract functionality but didn't know if doing so was a best practice or not. So I will be going forward with Pydantic but I'll let you know if I hit any blockers.

Have you tried django typed models?

Cheers for the suggestion! I haven't used it but I have looked into it. That will be something I will add in time, once things are decoupled from the model and there isn't a greater risk of breaking things.

can share with me your eventual architectural decision and snippets of the implementation?

Yeah of course! I can send you some snippets over email if you drop me a line. All the code I am working on will eventually be open-sourced on GitHub too but our client hasn't organised that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants