-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spotlight data modeling POC: Metrics #257
Conversation
Pull Request Test Coverage Report for Build 374879688
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the approach here, @macfarlandian! I read through the companion design doc and spent most of my review time here looking into the type definitions and content of Metric.ts
, and I think it's all quite sane and should be significantly easier to evolve over time than our previous approaches to problems like this.
I do have some minor questions and suggestions but nothing burning.
race: "raceOrEthnicity", | ||
}; | ||
|
||
// TODO: should these be enums? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the argument against them being enums? It seems well within the spirit of your approach to harden these possible values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I guess this is kind of a vague comment ... these "string union" types do harden the possible values in mostly the same ways that an enum would. This is more specifically about the fact that the Typescript enum
type seems to be increasingly Considered Harmful in the developer community (e.g. this stackoverflow post, section 1. of this article) and I need to assess whether they would actually provide some tangible benefit here.
The main thing I have found annoying about string unions in the past is that you can't iterate over their members the way you can with an emum, but apparently there is a way to do that, which I just learned from that stackoverflow post! The More You Know™️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing those articles, really interesting reading. I'm not experienced at all with Typescript, but from that reading I am inclined to prefer not making them enums. It looks like they can potentially cause some issues, when nearly the same effect can be created with these string unions without the risk. For example, the fact that TS enums need their own babel-plugin makes me nervous about the number of hoops needed to be jumped through (and potential for bugs) for not much additional functionality.
// NOTE: these fields can only be populated if the RecordFormat contains the requisite fields; | ||
// however, they are not REQUIRED to be populated in those cases (e.g., population snapshots | ||
// contain demographic fields but do not need to support demographic filters). | ||
// I think this means it's possible to inadvertently break a filter by setting it to `undefined` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and it feels like the kind of thing that could be done pretty easily and produce some difficult-to-detect behavior in production.
It seems to me from reading this PR, so far, like the concept of a filter
is something that might also benefit from a bit more typing across the app. Trafficking in "filters" as just arbitrary string-string key-value pairs in, say, Pulse Dashboard, has been part of what makes it hard to reason about the complex filter logic in that app. Maybe adding typing around that concept here would help avoid this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for sure. I think this limited implementation is a gesture in that direction — for example defaultDemographicView
is restricted to the DemographicView
strings ("total", "race", "gender", "age").
One option I thought of to address the broken filter issue (I pushed an example of it) is to always require that this value be set when the data format matches its conditions, and then support an explicit "nofilter" value for when we want to show everything in the UI simultaneously rather than one dimension at a time.
|
||
isLoading: boolean; | ||
|
||
private allRecords?: RecordFormat[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this one optional? I would guess that collections would always be either empty or non-empty, but never undefined or null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's optional because it doesn't exist until the data is fetched. Presumably you could also "successfully" fetch nothing and wind up with an empty array, and you'd be able to tell those two cases apart. I don't really know how useful that will be, as evidenced by the fact that I have already glossed over it in get records()
and substituted an empty array!
I could imagine doing something like throwing an error instead of blithely creating an empty array, when you are trying to access data that doesn't yet exist ... it's debatable whether that would actually be a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, okay. I could see why this would be useful in a more dynamic runtime environment like this.
this.name = contentSource.name; | ||
|
||
// initialize data fetching | ||
this.isLoading = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting, because this seems to mean that as soon as a Metric object is constructed, other parts of the app will assume it is actively being initialized, although instantiation and data fetching appear to be decoupled. They don't have to happen in a chained succession, do they? If they do, feel free to ignore this comment. But if they don't, then I wonder if it's going to lead to any confusion or even odd bug behavior when someone somewhere passes a Metric object through an unexpected logical flow where there's some stretch in which we're not actually ready to start using it / fetching data, but something elsewhere in the app believes the Metric is actively doing so and then responds accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah I was also thinking about this. I think it is probably going to be possible to have a meaningful interaction with a Metric without ever actually needing to see its data (e.g., seeing its name and description in a list and deciding not to click on it), and in that circumstance maybe we never even fetch the data. That is one reason for keeping them decoupled, though not necessarily the only one.
What I was not so sure about was whether the distinction between "data has not been requested" and "data has been requested but not yet received" is going to be a meaningful one in practice. But I guess it probably will? For the same reasons you mention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was not so sure about was whether the distinction between "data has not been requested" and "data has been requested but not yet received" is going to be a meaningful one in practice. But I guess it probably will? For the same reasons you mention here.
I agree, it probably will, but we can also take a wait-and-see approach there when we try to build atop this model.
metricType: MetricTypes.SentenceTypesCurrent; | ||
initOptions: MetricFactoryOptionsBase; | ||
}): SentenceTypesCurrent; | ||
// TODO: like 20 more overloads for all the different metric types -_- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love it either, but I don't think it's outrageous, and I think there's something to be said for this level of discreteness when you're dealing with high dimensional complexity. I also don't have any immediate advice on avoiding it! Maybe you could shift some of the more static parts of this out to a config file and then do a lookup of from that config by key, but that's mostly just shifting the declarative stuff out of code and into config, in which case you might argue that extra layer of indirection is a disservice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the further I get into implementing this feature the less I am bothered by this, honestly. It seems to be a pretty common pattern in Typescript for "narrowing" broad types to more specific instances. It does actually seem like the most straightforward way of splitting the difference between high-dimensional types at the core and specific types at the edges. At the very least it is painfully obvious what this code is doing, so while it may not be super fun to write the first time it may not be all that bad in terms of ongoing maintenance
type TotalIdentifier = "ALL"; | ||
const TOTAL_KEY: TotalIdentifier = "ALL"; | ||
|
||
type DemographicView = "total" | "race" | "gender" | "age" | "nofilter"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between the "total" and "nofilter" view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"total" will only let records pass through that match the condition:
{
raceOrEthnicity: "ALL",
gender: "ALL",
ageBucket: "ALL,
}
whereas "nofilter" would just return the entire Metric.allRecords
array. (I realize now that I did not actually implement the behavior, only the interface definition ... sorry for the confusion!)
get records(): RecordFormat[] { | ||
let recordsToReturn = this.allRecords || []; | ||
|
||
if (this.localityId) { | ||
recordsToReturn = recordsToReturn.filter( | ||
(record) => record.locality === this.localityId | ||
); | ||
} | ||
|
||
if (this.demographicView) { | ||
recordsToReturn = recordsToReturn.filter( | ||
recordIsTotalByDimension(this.demographicView) | ||
); | ||
} | ||
|
||
return recordsToReturn; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm drooling. Very nice.
record: ValuesType<DataFile> | ||
): DemographicFields { | ||
return { | ||
// we are trusting the API to return valid strings here, not validating them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been discussing with the Lantern team about how we can strengthen this trust a bit more. We have been discussing adding tests and validations on the dashboard views to ensure that the values do not change under our feet, and that there is a process in place to change values without breaking the FE. We have an epic open for that, hopefully that eases your mind here a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that would be great! Much better to do that once per pipeline run than have every client do it for every fetch
I think this is awesome. I definitely learned more reading this than I can contribute in feedback. Overall I think the really strong typing, while verbose, is going to significantly reduce bugs and complexity down the line. Even just looking at the filters, wow, it is so clean. Re: future considerations, I think I'm understanding what the function of the |
@jovergaag collections are thematically linked groupings of metrics that we will want to present together in some way. Sort of analogous to the pages for Sentencing, Prison, etc., on the original site but not necessarily limited to only groupings that we want to make big special pages for |
The real implementation has finally caught up with this proof-of-concept, so I think it's time to thank it for its service and bid it adieu |
Description of the change
This is a data modeling proof-of-concept and is presented here mainly for discussion purposes and as an illustration of the higher-level design discussed in this companion document.
I do not expect to ever merge it as such, so I wouldn't bother reading it for "mergeability" ... my goal here is mainly to try and avoid painting myself into a corner with an insufficiently flexible design as I implement #225 and subsequent tickets concerning data flow from the backend and through the application. To that end, I highlight the following design patterns for your consideration:
You should probably start by reading the tests
This code is not "runnable" in any meaningful sense, so I used the tests included here to exercise the functional requirements described in the design document (but not to provide comprehensive coverage so don't get too hung up on that). The sample data is copied straight out of the fixture files in
/spotlight-api
but for the sake of this POC I did not bother wiring then up to full fixtures.Content is required to instantiate a model
In other words, a Metric cannot fetch its own content (name, description, etc); the content must be supplied to its constructor. Whether this should be done inside or outside the factory function is debatable, but I think that's easily changed. This same pattern would apply to all other data models (Tenants, Collections, etc).
Speaking of factory functions,
Models are instantiated with factory functions
To strongly type these Metric instances we have to instantiate them with both the proper constructor arguments and the proper type definition, which are interdependent to some degree. This is going to be pretty tedious (e.g. a 100-line
switch
statement) and therefore it is encapsulated in a factory function that only requires the metric identifier to be passed. The examples here are 100% realistic, albeit on the easier end of the spectrum ... so whatever you may find questionable about this example will only get worse!There is only one Metric class
This is the bit I agonized most over and considered a few alternatives (e.g., subclasses, composing plain objects piece by piece). Ultimately I settled on this because it maps well to a coherent type definition, it has nice affordances such as private members, and it is very well memory-optimized out of the box thanks to its reliance on a
prototype
. It does mean, as you can see in this example, that various members are going to be optional depending on the type; the alternatives mentioned above would give us more options for rigidifying those boundaries but ultimately I wasn't convinced it was really necessary. As this extends I would expect to see, for example, some metadata properties that are only present on certain types, as well as the filters you see here. Because the filters are directly related to the data format, I have harmonized them at the type level. I don't think there's going to be a ton of benefit in doing the same kind of coupling with metadata that is more presentational.Future considerations
Keep in mind that there will be
Tenants
andCollections
that will also extend theNamedEntity
type seen here, and that Metrics and Collections will be nested under Tenants and that certain Collection types will have M2M relationships with certain Metric types. None of that is represented here but if any of it looks to you like we can't get there from here, that would be cause for concern!Type of change
Related issues
Checklists
Development
These boxes should be checked by the submitter prior to merging:
Code review
These boxes should be checked by reviewers prior to merging: