Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ara first review #163

Closed
wants to merge 18 commits into from
Closed

Ara first review #163

wants to merge 18 commits into from

Conversation

maxhutch
Copy link
Contributor

@maxhutch maxhutch commented Jan 7, 2020

This review is scoped to a minimal subset of the Ara Definition prototype. This should not be merged. Rather, when the review is approved, we'll close this PR and make all subsequent contributions to feature/ara through separate PRs into that branch. Then, we'll re-open this PR when Ara is ready for release.

That much being said: this code should be evaluated for quality; this isn't just a design review. This may be the only detailed review of the code present in this PR.

assert ara_definition.version is None


def test_dump_example():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kroenlein this is the test to run if you want to check the serialization.

columns=[RealMeanColumn(data_source=density.short_name)],
variables=[density]
)
print(json.dumps(ara_definition.dump(), indent=2))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll remove this before it gets merged into develop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking at the dump, I see null in id, version, and target_units.

  • Assuming id is the UUID, which will be assigned by the backend, that seems fine.
  • Assuming version will be added as we get closer to roll-out, I don't see a clear reason to not put a 0.1 there.
  • Was opt-out of defining units intentional? I.e., does null mean use what's in the template? This could cause table migration if I regenerate a table made 1 year ago whose associated Attribute Template was edited 6 months ago, which feels surprising. I'd prefer explicitly populating this field from the template (data_source.template.units) though I see that this is put together in a way that that template is not available at definition time.

@maxhutch maxhutch requested review from bfolie and jfriend-citrine and removed request for mcgalcode January 8, 2020 16:28
src/citrine/ara/variables.py Outdated Show resolved Hide resolved
short_name = properties.String('short_name')
output_name = properties.List(properties.String, 'output_name')
field = properties.String('field')
type = properties.String('type', default="root_info", deserializable=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other classes, we've been using typ to avoid shadowing the python built-in type.


def __init__(self,
short_name: str,
output_name: List[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to use the singular output_name if it will generally be a list of length > 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to headers

src/citrine/ara/columns.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bfolie bfolie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the user use an AraDefinition object? Is there a separate route they call to create/get the table? Do they simply include the uid of the AraDefinition object when defining part of the workflow?

Comment on lines 62 to 66
density = AttributeByTemplate(
short_name="density",
output_name=["Slice", "Density"],
template=LinkByUID(scope="templates", id="density")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the difference between short_name and output_name (and field in the case of RootInfo) confusing. They all seem to be referring to sort-of-but-not-quite-the-same-thing, which is how the variable is referred to. What's the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to name (the name of the variable to use when referencing it) and headers (the things that show up in the header of the table).

Parameters
----------
data_source: str
short_name of the variable to use when populating the column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all columns have a data_source field and do all variables have a short_name field? This seems to be implying that there is a uniqueness requirement (all variables in a given Ara definition must have unique short names). Is that true, and what happens if that requirement is not met?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do. I made a comment about their interfaces in response to @kroenlein and opened #166. I've added constructor-time validations in AraDefinition that short_name is unique across the variables and each data_source is present as the short_name of one of the variables.

Copy link
Contributor

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally makes sense. Specifics follow around some nulls of questions about naming of fields.

data_source: str
short_name of the variable to use when populating the column
target_units: optional[str]
units to convert the real variable into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we are deferring the question of custom display column labels

columns=[RealMeanColumn(data_source=density.short_name)],
variables=[density]
)
print(json.dumps(ara_definition.dump(), indent=2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking at the dump, I see null in id, version, and target_units.

  • Assuming id is the UUID, which will be assigned by the backend, that seems fine.
  • Assuming version will be added as we get closer to roll-out, I don't see a clear reason to not put a 0.1 there.
  • Was opt-out of defining units intentional? I.e., does null mean use what's in the template? This could cause table migration if I regenerate a table made 1 year ago whose associated Attribute Template was edited 6 months ago, which feels surprising. I'd prefer explicitly populating this field from the template (data_source.template.units) though I see that this is put together in a way that that template is not available at definition time.

data_source: str
short_name of the variable to use when populating the column
target_units: optional[str]
units to convert the real variable into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are target_units optional? What is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are optional with the default None, in which case they are returned with original units. You'd only expect to use that if you then had the original units as a separate display column. I can pull that into this minimal example for clarity (note that its already in the backend prototype).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think a default of a non-conversion would be a footgun. Can we have None be an option, which behaves as you say, but default be empty string (or functional equivalent), in which case the behavior is grab the output units from the template? That would make much more sense as a default to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the "by default grab it from the template" behavior be part of the SDK behavior and not part of the API/python binding. But I do like that being default behavior in the SDK...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for handling behavior in the SDK, but would feel better if we swapped it to a required argument if the nominally default behavior would be problematic if implemented as such.

Parameters
----------
data_source: str
short_name of the variable to use when populating the column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the Slack conversation, alias might be more intuitive. nickname operates similarly, but feels less formal. Would there be value to vetting that there are not collisions in short_names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the argument name in the column definition? We had been discussing changing short_name in the variable definition. Do you think data_source is confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_source is a good concept and good term.



class Variable(PolymorphicSerializable['Variable']):
"""A variable that can be assigned values present in material histories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the output_name be defined for all Variable classes and thus have an abstract method at the parent class level? Also, this feels more like a taxonomy or hierarchy of labeling, so output_name seems to miss on the plurality of terms and their nested nature. Not that I have a better word at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot easily move them into the parent class (see #166 ). Agreed that its the ideal way to do it, but we don't have members like that defined in any of our parent classes.

@jfriend-citrine
Copy link
Contributor

@maxhutch Thanks for your time walking through PR with me this morning. I'm following comments and learning.

@maxhutch
Copy link
Contributor Author

maxhutch commented Jan 8, 2020

How does the user use an AraDefinition object? Is there a separate route they call to create/get the table? Do they simply include the uid of the AraDefinition object when defining part of the workflow?

They create and register Ara definitions. Then, they can execute it, which will cause the data platform to read data and produce a Table (which can then be read with the tables route). The same definition can be re-run at a later time to re-build the Table with new data.

headers = [x.headers for x in variables]
dup_headers = self._get_dups(headers)
if len(dup_headers) > 0:
raise ValueError("Multiple variables defined these output_names,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Multiple variables defined these output_names,"
raise ValueError("Multiple variables defined these headers,"



def test_dup_names():
"""Make sure that variable short_name and output_name are unique across an ara definition"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Make sure that variable short_name and output_name are unique across an ara definition"""
"""Make sure that variable name and headers are unique across an ara definition"""

bfolie
bfolie previously approved these changes Jan 9, 2020
Copy link
Contributor

@bfolie bfolie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are two spots where I think the "short_name -> name" and "output_name -> headers" change hasn't been carried through, but that's a small thing.

Copy link
Contributor

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few omitted short_names

src/citrine/ara/columns.py Outdated Show resolved Hide resolved
src/citrine/ara/columns.py Outdated Show resolved Hide resolved
Co-Authored-By: Ken Kroenlein <51962276+kroenlein@users.noreply.github.com>
Copy link
Contributor

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@maxhutch
Copy link
Contributor Author

maxhutch commented Jan 9, 2020

Great! I'm going to protect this branch and close the PR.

@maxhutch maxhutch closed this Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants