-
Notifications
You must be signed in to change notification settings - Fork 1
Initial working draft of aggregating serializer and viewset, see #HEA… #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8a210f8 to
10d0f1d
Compare
10d0f1d to
20f3d12
Compare
20f3d12 to
7734573
Compare
rhunwicks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the individual comments, i think it would benefit from some refactoring, but the overall approach looks good.
Mostly, I think it needs unit tests.
| quantity_other_uses = models.PositiveIntegerField(blank=True, null=True, verbose_name=_("Quantity Other Uses")) | ||
| # Can normally be calculated / validated as `quantity_produced + quantity_purchased - quantity_sold - quantity_other_uses` # NOQA: E501 | ||
| # but there are exceptions, such as MilkProduction, where there is also an amount used for ButterProduction | ||
| # but there are exceptions, such as MilkProduction, where there is also an amount used for ButterProduction, is this captured quantity_other_uses? # NOQA: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there is a MilkProduction.quantity_butter_production that captures that separately. Please update this comment accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do thanks
| &min_kcals_consumed_percent=52 | ||
| &max_kcals_consumed_percent=99 | ||
| The strategy type codes are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strategy type codes are defined in metadata.models.LivelihoodStrategyType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They probably should be referenced here rather than coded as a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this very verbose as the docstring is rendered in the API HTML GUI, so this provides end user guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we validate the value of the parameter anywhere? Is that using the metadata.models.LivelihoodStrategyType? At a miminum I think we should put a comment in the code (i.e. outside the docstring) that explains the the allowable values for this parameter must be one of choices from LivelihoodStrategyType.
| serializer_class = LivelihoodZoneBaselineReportSerializer | ||
| filterset_class = LivelihoodZoneBaselineFilterSet | ||
|
|
||
| def get_queryset(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to put this code in get_queryset instead of filter_queryset? You are calling filter_queryset here, does it get called again later by whatever mechanism DRF normally uses to call it?
For example, the default list method has:
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())I think that perhaps it is applying the global filters to the already filtered queryset. That probably doesn't make any difference, but it is hard to be sure without any unit tests.
It also seems likely that this ViewSet will never work as a retrieve/post/push endpoint - logically it can only do list. It isn't clear whether pagination makes any sense either. Therefore, a simpler and more standard definition would probably be:
class LivelihoodZoneBaselineReportViewSet(GenericViewSet):
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
# All the other code from here.
# <snip>
serializer = self.get_serializer(queryset, many=True)
return Response(serializer.data)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting comment thanks.
We could just remove the filter_queryset call from this method and leave it to list.
I think putting it in get_queryset is more flexible, and a nice division of responsibilities with a clear interface, nicely encapsulated in the single-purpose method. Doing this in list would also work. I think of list more as a surface-level rendering function for functionality specific to the HTTP list method, than something for data and cell calculation, but DRF's not strict. I agree it won't ever PUT/PATCH/DELETE, I think 'get the top 10' will be common, and accidentally very large resultsets will be possible (and not easily cached), and flexibility is helpful while the requirements are being hammered out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove filter_queryset from here, then you will be doing the aggregation on the unfiltered queryset. I know that you are applying filter_queryset to the subqueries, but it still changing way that the function works and without a unit test it is hard to see whether it will affect the result.
I think that putting everything in list is the basic DRF approach for the most simple API endpoint. All the stuff that comes on top with ModelViewSet is supposed to make it easier to produce CRUD endpoints for models, which isn't what we want here.
I think that if you want to keep it in get_queryset then you still need LivelihoodZoneBaselineReportViewSet(mixins.ListModelMixin, GenericViewSet) rather than ModelViewSet.
I think putting it all in list is the "normal" DRF way - I think get_queryset should really only be altering the base queryset, e.g. adding select_related, etc. and the docs says that filter_queryset shouldn't need to be altered by the user. Here we are changing the number of rows returned by the queryset - i.e. we are changing the structure of the data to create a response that meets our needs. I think that belongs in list.
Another way to think about it is that if this viewset had another action, for example if it did also allow a POST, then would this code be going in get_queryset or in list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will be doing the aggregation on the unfiltered queryset
The aggregation is not performed until the queryset is evaluated. It is lazy until after the filter call in list.
putting it all in list is the "normal" DRF way
I see the DRF architecture as a number of horizontal slices, with data extraction done by the bottom slices, data shaping in the middle, and rendering at the top. My understanding is that the queryset is constructed by get_queryset so that the data is defined only once for the whole viewset. HTTP methods are mid-way between extracting the data 'below', and rendering it 'above'. The RETRIEVE URL currently works. (By mistake, I will remove it!) POST doesn't use get_queryset, as it just runs an INSERT, PATCH does as it gets the object before updating.
| return str(obj.wealth_group) | ||
|
|
||
|
|
||
| class DictQuerySetField(rest_framework_fields.SerializerMethodField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a class docstring explaining why it is required / what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do thanks
| } | ||
|
|
||
| # Add the ordering field if specified | ||
| ordering = self.context["request"].query_params.get("ordering") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are hard-coding the ordering parameter rather than using https://www.django-rest-framework.org/api-guide/settings/#ordering_param we need to document it. It would be better to use settings.ORDERING_PARAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do thanks
| The API currently only supports a single slice at a time. For combining please run multiple searches, and add | ||
| the desired results to the Compare tab. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful if the docstring included an example response, e.g. a JSON one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
| # Filters are automatically created, eg, min_kcals_consumed_percent and max_kcals_consumed_percent. | ||
| # If no ordering is specified by the FilterSet, the results are ordered by percent descending in the order here. | ||
| aggregates = { | ||
| "kcals_consumed": Sum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is preventing us from adding quantity_produced, quantity_consumed, income, expenditure, etc. - i.e. any field from LivelihoodActivity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing at all, wanted to get the overall design verified before building out loads of functionality.
I do think we want to be selective though, as the bare figures are unlikely to be useful and likely to be misleading, because, eg, kcals_consumed will not work for those LS types such as income that don't store that number - we need to convert cash to a kcals equivalent to incorporate those and produce a useful statistic. I suspect the most useful aggregates will be more use-case driven, for example composite and normalized:
"kcal_income": Sum(
(
F("livelihood_strategies__livelihoodactivity__quantity_purchased")
+ F("livelihood_strategies__livelihoodactivity__quantity_produced")
)
* F("livelihood_strategies__product__kcals_per_unit"),
),This will need a small enhancement to work, but again I wanted to get the fundamentals reviewed and prototyped in the UI before getting clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that based on all the charts, we will need kcals_consumed, income and expenditure for most purposes - and a way to do the "complete" graph that converts income and expenditure to kcals.
|
Based on discussions in Chat I am merging this now to enable testing, and because it won't have any knock-on impacts, but we will continue to evolve this MR and expect a second MR off the same branch in due course. |
…-547