Skip to content

Conversation

@chrispreee
Copy link
Contributor

…-547

@chrispreee chrispreee force-pushed the HEA-547/filter-by-contrib-to-income branch 3 times, most recently from 8a210f8 to 10d0f1d Compare November 4, 2024 15:58
@chrispreee chrispreee force-pushed the HEA-547/filter-by-contrib-to-income branch from 10d0f1d to 20f3d12 Compare November 13, 2024 11:40
@chrispreee chrispreee force-pushed the HEA-547/filter-by-contrib-to-income branch from 20f3d12 to 7734573 Compare November 13, 2024 11:50
Copy link
Contributor

@rhunwicks rhunwicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the individual comments, i think it would benefit from some refactoring, but the overall approach looks good.

Mostly, I think it needs unit tests.

quantity_other_uses = models.PositiveIntegerField(blank=True, null=True, verbose_name=_("Quantity Other Uses"))
# Can normally be calculated / validated as `quantity_produced + quantity_purchased - quantity_sold - quantity_other_uses` # NOQA: E501
# but there are exceptions, such as MilkProduction, where there is also an amount used for ButterProduction
# but there are exceptions, such as MilkProduction, where there is also an amount used for ButterProduction, is this captured quantity_other_uses? # NOQA: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is a MilkProduction.quantity_butter_production that captures that separately. Please update this comment accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks

&min_kcals_consumed_percent=52
&max_kcals_consumed_percent=99
The strategy type codes are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy type codes are defined in metadata.models.LivelihoodStrategyType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They probably should be referenced here rather than coded as a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this very verbose as the docstring is rendered in the API HTML GUI, so this provides end user guidance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we validate the value of the parameter anywhere? Is that using the metadata.models.LivelihoodStrategyType? At a miminum I think we should put a comment in the code (i.e. outside the docstring) that explains the the allowable values for this parameter must be one of choices from LivelihoodStrategyType.

serializer_class = LivelihoodZoneBaselineReportSerializer
filterset_class = LivelihoodZoneBaselineFilterSet

def get_queryset(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to put this code in get_queryset instead of filter_queryset? You are calling filter_queryset here, does it get called again later by whatever mechanism DRF normally uses to call it?

For example, the default list method has:

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())

I think that perhaps it is applying the global filters to the already filtered queryset. That probably doesn't make any difference, but it is hard to be sure without any unit tests.

It also seems likely that this ViewSet will never work as a retrieve/post/push endpoint - logically it can only do list. It isn't clear whether pagination makes any sense either. Therefore, a simpler and more standard definition would probably be:

class LivelihoodZoneBaselineReportViewSet(GenericViewSet):

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())

        # All the other code from here.
        # <snip>

        serializer = self.get_serializer(queryset, many=True)
        return Response(serializer.data)

Copy link
Contributor Author

@chrispreee chrispreee Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting comment thanks.

We could just remove the filter_queryset call from this method and leave it to list.

I think putting it in get_queryset is more flexible, and a nice division of responsibilities with a clear interface, nicely encapsulated in the single-purpose method. Doing this in list would also work. I think of list more as a surface-level rendering function for functionality specific to the HTTP list method, than something for data and cell calculation, but DRF's not strict. I agree it won't ever PUT/PATCH/DELETE, I think 'get the top 10' will be common, and accidentally very large resultsets will be possible (and not easily cached), and flexibility is helpful while the requirements are being hammered out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove filter_queryset from here, then you will be doing the aggregation on the unfiltered queryset. I know that you are applying filter_queryset to the subqueries, but it still changing way that the function works and without a unit test it is hard to see whether it will affect the result.

I think that putting everything in list is the basic DRF approach for the most simple API endpoint. All the stuff that comes on top with ModelViewSet is supposed to make it easier to produce CRUD endpoints for models, which isn't what we want here.

I think that if you want to keep it in get_queryset then you still need LivelihoodZoneBaselineReportViewSet(mixins.ListModelMixin, GenericViewSet) rather than ModelViewSet.

I think putting it all in list is the "normal" DRF way - I think get_queryset should really only be altering the base queryset, e.g. adding select_related, etc. and the docs says that filter_queryset shouldn't need to be altered by the user. Here we are changing the number of rows returned by the queryset - i.e. we are changing the structure of the data to create a response that meets our needs. I think that belongs in list.

Another way to think about it is that if this viewset had another action, for example if it did also allow a POST, then would this code be going in get_queryset or in list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, will do.

Copy link
Contributor Author

@chrispreee chrispreee Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will be doing the aggregation on the unfiltered queryset

The aggregation is not performed until the queryset is evaluated. It is lazy until after the filter call in list.

putting it all in list is the "normal" DRF way

I see the DRF architecture as a number of horizontal slices, with data extraction done by the bottom slices, data shaping in the middle, and rendering at the top. My understanding is that the queryset is constructed by get_queryset so that the data is defined only once for the whole viewset. HTTP methods are mid-way between extracting the data 'below', and rendering it 'above'. The RETRIEVE URL currently works. (By mistake, I will remove it!) POST doesn't use get_queryset, as it just runs an INSERT, PATCH does as it gets the object before updating.

return str(obj.wealth_group)


class DictQuerySetField(rest_framework_fields.SerializerMethodField):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a class docstring explaining why it is required / what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks

}

# Add the ordering field if specified
ordering = self.context["request"].query_params.get("ordering")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are hard-coding the ordering parameter rather than using https://www.django-rest-framework.org/api-guide/settings/#ordering_param we need to document it. It would be better to use settings.ORDERING_PARAM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks

The API currently only supports a single slice at a time. For combining please run multiple searches, and add
the desired results to the Compare tab.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful if the docstring included an example response, e.g. a JSON one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

# Filters are automatically created, eg, min_kcals_consumed_percent and max_kcals_consumed_percent.
# If no ordering is specified by the FilterSet, the results are ordered by percent descending in the order here.
aggregates = {
"kcals_consumed": Sum,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is preventing us from adding quantity_produced, quantity_consumed, income, expenditure, etc. - i.e. any field from LivelihoodActivity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing at all, wanted to get the overall design verified before building out loads of functionality.

I do think we want to be selective though, as the bare figures are unlikely to be useful and likely to be misleading, because, eg, kcals_consumed will not work for those LS types such as income that don't store that number - we need to convert cash to a kcals equivalent to incorporate those and produce a useful statistic. I suspect the most useful aggregates will be more use-case driven, for example composite and normalized:

        "kcal_income": Sum(
            (
                F("livelihood_strategies__livelihoodactivity__quantity_purchased")
                + F("livelihood_strategies__livelihoodactivity__quantity_produced")
            )
            * F("livelihood_strategies__product__kcals_per_unit"),
        ),

This will need a small enhancement to work, but again I wanted to get the fundamentals reviewed and prototyped in the UI before getting clever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that based on all the charts, we will need kcals_consumed, income and expenditure for most purposes - and a way to do the "complete" graph that converts income and expenditure to kcals.

@rhunwicks
Copy link
Contributor

Based on discussions in Chat I am merging this now to enable testing, and because it won't have any knock-on impacts, but we will continue to evolve this MR and expect a second MR off the same branch in due course.

@rhunwicks rhunwicks merged commit 81ade5c into main Jan 9, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants