-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ara first review #163
Ara first review #163
Conversation
This is mostly a pattern match of the Table resource
assert ara_definition.version is None | ||
|
||
|
||
def test_dump_example(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kroenlein this is the test to run if you want to check the serialization.
columns=[RealMeanColumn(data_source=density.short_name)], | ||
variables=[density] | ||
) | ||
print(json.dumps(ara_definition.dump(), indent=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll remove this before it gets merged into develop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In looking at the dump, I see null
in id
, version
, and target_units
.
- Assuming
id
is the UUID, which will be assigned by the backend, that seems fine. - Assuming
version
will be added as we get closer to roll-out, I don't see a clear reason to not put a0.1
there. - Was opt-out of defining units intentional? I.e., does
null
mean use what's in the template? This could cause table migration if I regenerate a table made 1 year ago whose associated Attribute Template was edited 6 months ago, which feels surprising. I'd prefer explicitly populating this field from the template (data_source.template.units
) though I see that this is put together in a way that that template is not available at definition time.
src/citrine/ara/variables.py
Outdated
short_name = properties.String('short_name') | ||
output_name = properties.List(properties.String, 'output_name') | ||
field = properties.String('field') | ||
type = properties.String('type', default="root_info", deserializable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other classes, we've been using typ
to avoid shadowing the python built-in type
.
src/citrine/ara/variables.py
Outdated
|
||
def __init__(self, | ||
short_name: str, | ||
output_name: List[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to use the singular output_name
if it will generally be a list of length > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the user use an AraDefinition object? Is there a separate route they call to create/get the table? Do they simply include the uid of the AraDefinition object when defining part of the workflow?
density = AttributeByTemplate( | ||
short_name="density", | ||
output_name=["Slice", "Density"], | ||
template=LinkByUID(scope="templates", id="density") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the difference between short_name and output_name (and field
in the case of RootInfo) confusing. They all seem to be referring to sort-of-but-not-quite-the-same-thing, which is how the variable is referred to. What's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to name
(the name of the variable to use when referencing it) and headers
(the things that show up in the header of the table).
src/citrine/ara/columns.py
Outdated
Parameters | ||
---------- | ||
data_source: str | ||
short_name of the variable to use when populating the column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all columns have a data_source
field and do all variables have a short_name
field? This seems to be implying that there is a uniqueness requirement (all variables in a given Ara definition must have unique short names). Is that true, and what happens if that requirement is not met?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do. I made a comment about their interfaces in response to @kroenlein and opened #166. I've added constructor-time validations in AraDefinition that short_name
is unique across the variables and each data_source
is present as the short_name
of one of the variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally makes sense. Specifics follow around some nulls of questions about naming of fields.
data_source: str | ||
short_name of the variable to use when populating the column | ||
target_units: optional[str] | ||
units to convert the real variable into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we are deferring the question of custom display column labels
columns=[RealMeanColumn(data_source=density.short_name)], | ||
variables=[density] | ||
) | ||
print(json.dumps(ara_definition.dump(), indent=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In looking at the dump, I see null
in id
, version
, and target_units
.
- Assuming
id
is the UUID, which will be assigned by the backend, that seems fine. - Assuming
version
will be added as we get closer to roll-out, I don't see a clear reason to not put a0.1
there. - Was opt-out of defining units intentional? I.e., does
null
mean use what's in the template? This could cause table migration if I regenerate a table made 1 year ago whose associated Attribute Template was edited 6 months ago, which feels surprising. I'd prefer explicitly populating this field from the template (data_source.template.units
) though I see that this is put together in a way that that template is not available at definition time.
data_source: str | ||
short_name of the variable to use when populating the column | ||
target_units: optional[str] | ||
units to convert the real variable into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are target_units
optional? What is the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are optional with the default None
, in which case they are returned with original units. You'd only expect to use that if you then had the original units as a separate display column. I can pull that into this minimal example for clarity (note that its already in the backend prototype).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think a default of a non-conversion would be a footgun. Can we have None
be an option, which behaves as you say, but default be empty string (or functional equivalent), in which case the behavior is grab the output units from the template? That would make much more sense as a default to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have the "by default grab it from the template" behavior be part of the SDK behavior and not part of the API/python binding. But I do like that being default behavior in the SDK...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for handling behavior in the SDK, but would feel better if we swapped it to a required argument if the nominally default behavior would be problematic if implemented as such.
src/citrine/ara/columns.py
Outdated
Parameters | ||
---------- | ||
data_source: str | ||
short_name of the variable to use when populating the column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the Slack conversation, alias
might be more intuitive. nickname
operates similarly, but feels less formal. Would there be value to vetting that there are not collisions in short_name
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the argument name in the column definition? We had been discussing changing short_name
in the variable definition. Do you think data_source
is confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_source
is a good concept and good term.
|
||
|
||
class Variable(PolymorphicSerializable['Variable']): | ||
"""A variable that can be assigned values present in material histories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the output_name
be defined for all Variable
classes and thus have an abstract method at the parent class level? Also, this feels more like a taxonomy or hierarchy of labeling, so output_name
seems to miss on the plurality of terms and their nested nature. Not that I have a better word at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot easily move them into the parent class (see #166 ). Agreed that its the ideal way to do it, but we don't have members like that defined in any of our parent classes.
@maxhutch Thanks for your time walking through PR with me this morning. I'm following comments and learning. |
They create and register Ara definitions. Then, they can execute it, which will cause the data platform to read data and produce a Table (which can then be read with the tables route). The same definition can be re-run at a later time to re-build the Table with new data. |
headers = [x.headers for x in variables] | ||
dup_headers = self._get_dups(headers) | ||
if len(dup_headers) > 0: | ||
raise ValueError("Multiple variables defined these output_names," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("Multiple variables defined these output_names," | |
raise ValueError("Multiple variables defined these headers," |
|
||
|
||
def test_dup_names(): | ||
"""Make sure that variable short_name and output_name are unique across an ara definition""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Make sure that variable short_name and output_name are unique across an ara definition""" | |
"""Make sure that variable name and headers are unique across an ara definition""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are two spots where I think the "short_name -> name" and "output_name -> headers" change hasn't been carried through, but that's a small thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few omitted short_names
Co-Authored-By: Ken Kroenlein <51962276+kroenlein@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Great! I'm going to protect this branch and close the PR. |
This review is scoped to a minimal subset of the Ara Definition prototype. This should not be merged. Rather, when the review is approved, we'll close this PR and make all subsequent contributions to
feature/ara
through separate PRs into that branch. Then, we'll re-open this PR when Ara is ready for release.That much being said: this code should be evaluated for quality; this isn't just a design review. This may be the only detailed review of the code present in this PR.