Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotlight data modeling POC: Metrics #257

Closed
wants to merge 5 commits into from

Conversation

macfarlandian
Copy link
Collaborator

@macfarlandian macfarlandian commented Nov 18, 2020

Description of the change

This is a data modeling proof-of-concept and is presented here mainly for discussion purposes and as an illustration of the higher-level design discussed in this companion document.

I do not expect to ever merge it as such, so I wouldn't bother reading it for "mergeability" ... my goal here is mainly to try and avoid painting myself into a corner with an insufficiently flexible design as I implement #225 and subsequent tickets concerning data flow from the backend and through the application. To that end, I highlight the following design patterns for your consideration:

You should probably start by reading the tests

This code is not "runnable" in any meaningful sense, so I used the tests included here to exercise the functional requirements described in the design document (but not to provide comprehensive coverage so don't get too hung up on that). The sample data is copied straight out of the fixture files in /spotlight-api but for the sake of this POC I did not bother wiring then up to full fixtures.

Content is required to instantiate a model

In other words, a Metric cannot fetch its own content (name, description, etc); the content must be supplied to its constructor. Whether this should be done inside or outside the factory function is debatable, but I think that's easily changed. This same pattern would apply to all other data models (Tenants, Collections, etc).

Speaking of factory functions,

Models are instantiated with factory functions

To strongly type these Metric instances we have to instantiate them with both the proper constructor arguments and the proper type definition, which are interdependent to some degree. This is going to be pretty tedious (e.g. a 100-line switch statement) and therefore it is encapsulated in a factory function that only requires the metric identifier to be passed. The examples here are 100% realistic, albeit on the easier end of the spectrum ... so whatever you may find questionable about this example will only get worse!

There is only one Metric class

This is the bit I agonized most over and considered a few alternatives (e.g., subclasses, composing plain objects piece by piece). Ultimately I settled on this because it maps well to a coherent type definition, it has nice affordances such as private members, and it is very well memory-optimized out of the box thanks to its reliance on a prototype. It does mean, as you can see in this example, that various members are going to be optional depending on the type; the alternatives mentioned above would give us more options for rigidifying those boundaries but ultimately I wasn't convinced it was really necessary. As this extends I would expect to see, for example, some metadata properties that are only present on certain types, as well as the filters you see here. Because the filters are directly related to the data format, I have harmonized them at the type level. I don't think there's going to be a ton of benefit in doing the same kind of coupling with metadata that is more presentational.

Future considerations

Keep in mind that there will be Tenants and Collections that will also extend the NamedEntity type seen here, and that Metrics and Collections will be nested under Tenants and that certain Collection types will have M2M relationships with certain Metric types. None of that is represented here but if any of it looks to you like we can't get there from here, that would be cause for concern!

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Configuration change (adjusts configuration to achieve some end related to functionality, development, performance, or security)

Related issues

Related to #225

Checklists

Development

These boxes should be checked by the submitter prior to merging:

  • Manual testing against realistic data has been performed locally

Code review

These boxes should be checked by reviewers prior to merging:

  • This pull request has a descriptive title and information useful to a reviewer
  • This pull request has been moved out of a Draft state, has no "Work In Progress" label, and has assigned reviewers
  • Potential security implications or infrastructural changes have been considered, if relevant

@coveralls
Copy link

coveralls commented Nov 18, 2020

Pull Request Test Coverage Report for Build 374879688

  • 46 of 51 (90.2%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.02%) to 55.867%

Changes Missing Coverage Covered Lines Changed/Added Lines %
spotlight-client/src/datastores/Metric.ts 42 43 97.67%
spotlight-client/src/datastores/contentSources/us_nd.ts 0 1 0.0%
spotlight-client/src/metricsApi/metricsClient.ts 4 5 80.0%
spotlight-client/src/datastores/retrieveContent.ts 0 2 0.0%
Totals Coverage Status
Change from base Build 369017266: 1.02%
Covered Lines: 902
Relevant Lines: 1585

💛 - Coveralls

Copy link
Member

@jessex jessex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the approach here, @macfarlandian! I read through the companion design doc and spent most of my review time here looking into the type definitions and content of Metric.ts, and I think it's all quite sane and should be significantly easier to evolve over time than our previous approaches to problems like this.

I do have some minor questions and suggestions but nothing burning.

race: "raceOrEthnicity",
};

// TODO: should these be enums?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the argument against them being enums? It seems well within the spirit of your approach to harden these possible values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I guess this is kind of a vague comment ... these "string union" types do harden the possible values in mostly the same ways that an enum would. This is more specifically about the fact that the Typescript enum type seems to be increasingly Considered Harmful in the developer community (e.g. this stackoverflow post, section 1. of this article) and I need to assess whether they would actually provide some tangible benefit here.

The main thing I have found annoying about string unions in the past is that you can't iterate over their members the way you can with an emum, but apparently there is a way to do that, which I just learned from that stackoverflow post! The More You Know™️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing those articles, really interesting reading. I'm not experienced at all with Typescript, but from that reading I am inclined to prefer not making them enums. It looks like they can potentially cause some issues, when nearly the same effect can be created with these string unions without the risk. For example, the fact that TS enums need their own babel-plugin makes me nervous about the number of hoops needed to be jumped through (and potential for bugs) for not much additional functionality.

spotlight-client/src/datastores/Metric.ts Outdated Show resolved Hide resolved
// NOTE: these fields can only be populated if the RecordFormat contains the requisite fields;
// however, they are not REQUIRED to be populated in those cases (e.g., population snapshots
// contain demographic fields but do not need to support demographic filters).
// I think this means it's possible to inadvertently break a filter by setting it to `undefined`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and it feels like the kind of thing that could be done pretty easily and produce some difficult-to-detect behavior in production.

It seems to me from reading this PR, so far, like the concept of a filter is something that might also benefit from a bit more typing across the app. Trafficking in "filters" as just arbitrary string-string key-value pairs in, say, Pulse Dashboard, has been part of what makes it hard to reason about the complex filter logic in that app. Maybe adding typing around that concept here would help avoid this scenario?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure. I think this limited implementation is a gesture in that direction — for example defaultDemographicView is restricted to the DemographicView strings ("total", "race", "gender", "age").

One option I thought of to address the broken filter issue (I pushed an example of it) is to always require that this value be set when the data format matches its conditions, and then support an explicit "nofilter" value for when we want to show everything in the UI simultaneously rather than one dimension at a time.


isLoading: boolean;

private allRecords?: RecordFormat[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one optional? I would guess that collections would always be either empty or non-empty, but never undefined or null.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's optional because it doesn't exist until the data is fetched. Presumably you could also "successfully" fetch nothing and wind up with an empty array, and you'd be able to tell those two cases apart. I don't really know how useful that will be, as evidenced by the fact that I have already glossed over it in get records() and substituted an empty array!

I could imagine doing something like throwing an error instead of blithely creating an empty array, when you are trying to access data that doesn't yet exist ... it's debatable whether that would actually be a good idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, okay. I could see why this would be useful in a more dynamic runtime environment like this.

this.name = contentSource.name;

// initialize data fetching
this.isLoading = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting, because this seems to mean that as soon as a Metric object is constructed, other parts of the app will assume it is actively being initialized, although instantiation and data fetching appear to be decoupled. They don't have to happen in a chained succession, do they? If they do, feel free to ignore this comment. But if they don't, then I wonder if it's going to lead to any confusion or even odd bug behavior when someone somewhere passes a Metric object through an unexpected logical flow where there's some stretch in which we're not actually ready to start using it / fetching data, but something elsewhere in the app believes the Metric is actively doing so and then responds accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I was also thinking about this. I think it is probably going to be possible to have a meaningful interaction with a Metric without ever actually needing to see its data (e.g., seeing its name and description in a list and deciding not to click on it), and in that circumstance maybe we never even fetch the data. That is one reason for keeping them decoupled, though not necessarily the only one.

What I was not so sure about was whether the distinction between "data has not been requested" and "data has been requested but not yet received" is going to be a meaningful one in practice. But I guess it probably will? For the same reasons you mention here.

Copy link
Member

@jessex jessex Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was not so sure about was whether the distinction between "data has not been requested" and "data has been requested but not yet received" is going to be a meaningful one in practice. But I guess it probably will? For the same reasons you mention here.

I agree, it probably will, but we can also take a wait-and-see approach there when we try to build atop this model.

metricType: MetricTypes.SentenceTypesCurrent;
initOptions: MetricFactoryOptionsBase;
}): SentenceTypesCurrent;
// TODO: like 20 more overloads for all the different metric types -_-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it either, but I don't think it's outrageous, and I think there's something to be said for this level of discreteness when you're dealing with high dimensional complexity. I also don't have any immediate advice on avoiding it! Maybe you could shift some of the more static parts of this out to a config file and then do a lookup of from that config by key, but that's mostly just shifting the declarative stuff out of code and into config, in which case you might argue that extra layer of indirection is a disservice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the further I get into implementing this feature the less I am bothered by this, honestly. It seems to be a pretty common pattern in Typescript for "narrowing" broad types to more specific instances. It does actually seem like the most straightforward way of splitting the difference between high-dimensional types at the core and specific types at the edges. At the very least it is painfully obvious what this code is doing, so while it may not be super fun to write the first time it may not be all that bad in terms of ongoing maintenance

type TotalIdentifier = "ALL";
const TOTAL_KEY: TotalIdentifier = "ALL";

type DemographicView = "total" | "race" | "gender" | "age" | "nofilter";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between the "total" and "nofilter" view?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"total" will only let records pass through that match the condition:

{
  raceOrEthnicity: "ALL",
  gender: "ALL", 
  ageBucket: "ALL,
}

whereas "nofilter" would just return the entire Metric.allRecords array. (I realize now that I did not actually implement the behavior, only the interface definition ... sorry for the confusion!)

Comment on lines +174 to +190
get records(): RecordFormat[] {
let recordsToReturn = this.allRecords || [];

if (this.localityId) {
recordsToReturn = recordsToReturn.filter(
(record) => record.locality === this.localityId
);
}

if (this.demographicView) {
recordsToReturn = recordsToReturn.filter(
recordIsTotalByDimension(this.demographicView)
);
}

return recordsToReturn;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm drooling. Very nice.

record: ValuesType<DataFile>
): DemographicFields {
return {
// we are trusting the API to return valid strings here, not validating them

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been discussing with the Lantern team about how we can strengthen this trust a bit more. We have been discussing adding tests and validations on the dashboard views to ensure that the values do not change under our feet, and that there is a process in place to change values without breaking the FE. We have an epic open for that, hopefully that eases your mind here a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that would be great! Much better to do that once per pipeline run than have every client do it for every fetch

@jovergaag
Copy link

I think this is awesome. I definitely learned more reading this than I can contribute in feedback.

Overall I think the really strong typing, while verbose, is going to significantly reduce bugs and complexity down the line. Even just looking at the filters, wow, it is so clean.

Re: future considerations, I think I'm understanding what the function of the Tenants will be, tenant specific copy/value translations, etc. What are Collections?

@macfarlandian
Copy link
Collaborator Author

@jovergaag collections are thematically linked groupings of metrics that we will want to present together in some way. Sort of analogous to the pages for Sentencing, Prison, etc., on the original site but not necessarily limited to only groupings that we want to make big special pages for

@macfarlandian
Copy link
Collaborator Author

The real implementation has finally caught up with this proof-of-concept, so I think it's time to thank it for its service and bid it adieu

@macfarlandian macfarlandian deleted the ian/225-content-types branch January 15, 2021 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants