Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove Nested(many=True) to keep only the List(Nested()) form #779

Open
lafrech opened this issue Apr 19, 2018 · 40 comments
Open

RFC: Remove Nested(many=True) to keep only the List(Nested()) form #779

lafrech opened this issue Apr 19, 2018 · 40 comments

Comments

@lafrech
Copy link
Member

lafrech commented Apr 19, 2018

See discussion in #493 (comment).

We have two ways to do the same thing. They sometimes behave differently, which generally means one of them is buggy.

IMHO, and from what I have seen from users around me, it is more natural for a new user to write List(Nested) than Nested(many). Besides, it is consistent with Dict(values=Nested).

I realize this is quite a breaking change, both from user perspective, because Nested(many=True) is probably the most commonly used form, and from developer perspective, because it involves a lot of code changes (but also a lot of code removal, which is nice).

There could be things that can be done with Nested(many=True) but not with List(Nested), because in the latter case, the nested Schema has access to the whole data, while in the former, it is called on each element. Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

Anyway, since the question was raised in a comment in another issue, I figured I'd give it more visibility.

@seansfkelley
Copy link

Is there any documentation outlining the differences between these two? I find the distinction confusing as well.

@deckar01
Copy link
Member

I don't think there is supposed to be a difference.

@sloria sloria added this to the 3.0 milestone May 29, 2018
@sloria
Copy link
Member

sloria commented May 30, 2018

I think deprecating Nested(many=True) is a good idea. Would someone be up for doing an analysis of the current differences between the two APIs?

@sloria
Copy link
Member

sloria commented May 30, 2018

@lafrech I was thinking of deprecating in 3. As you pointed out, Nested(many=True) is the probably the more common usage, and removing it would break many users' code.

@sloria
Copy link
Member

sloria commented Jun 11, 2018

How would a self-referencing schema work with this? In other words, how would we express

friends = Nested('self', many=True, exclude=('friends',))

@deckar01
Copy link
Member

I would expect friends = List(Nested('self', exclude=('friends',))) to work the same way, but it currently errors. Using a reference to the schema with python 3 works, so I suspect this is just an edge case that hasn't been tested.

@ex5
Copy link

ex5 commented Aug 13, 2018

Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

I've got production code that relies on the fact that a nested field with many=True and @pre_dump(pass_many=True) on its schema will execute this pre_dump only once.

The benefit is making sure expensive code that is needed to be executed before nested fields can be dumped is not repeated for each element in the fields.List.

I don't really see this as an edge case though, pretty sure doing something to the whole set of related objects before dumping/loading them is a common pattern.

@deckar01
Copy link
Member

Currently a list of nested fields does not propagate the schema context correctly, but nested many does. See #424 and #922.

@lafrech
Copy link
Member Author

lafrech commented Aug 22, 2018

Yep, I just updated the list above.

@deckar01
Copy link
Member

I saw the ref on the issue, but I couldn't figure out where it came from. Thanks.

@rooterkyberian
Copy link
Contributor

@anka-sirota yes it seems like a common use case, but it very well could be done on parent schema or by subclassing List and doing custom pre_dump there. All that is possible without handling a separate many case, which I would I agree with @lafrech, just source of inconsistent in the library.

Looking at bigger picture, that this issue already have been here for couple months now, and there still few bugs before many=True can be complete eliminated from the library, I would suggest just calling it deprecated in marshmallow 3 release.
Such move will:

  • prevent further delay of marshmallow 3
  • make it so that there is more interest in List(Nested) and that it works consistently with Nested(many=True)

@rooterkyberian
Copy link
Contributor

@lafrech another difference to be added to the list #946

@lafrech
Copy link
Member Author

lafrech commented Sep 11, 2018

@rooterkyberian, I was just searching for this thread to reference your issue.

@lafrech
Copy link
Member Author

lafrech commented Dec 4, 2018

All differences listed above are either fixed or on their way to be fixed. Once this is done, we should be able to deprecate Nested(many=True).

@bartelemi
Copy link

bartelemi commented Dec 29, 2018

There is also a difference between Nested(Schema, many=True) and List(Nested(Schema)) when loading partial data. Below tests are showing the difference in behaviour when using both methods:

from marshmallow import Schema
from marshmallow.fields import List, Nested, Str


class Employee(Schema):
    name = Str(required=True)
    title = Str(required=True)


def test_nested_partial_load_with_list():
    class Company(Schema):
        employees = List(Nested(Employee))

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with list type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'


def test_nested_partial_load_with_nested_many():
    class Company(Schema):
        employees = Nested(Employee, many=True)

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with Nested(many=True) type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'

Partial loading in nested schemas was introduced in version 3.0.0rc1 in this PR: #849. Problem is that when a field is of type List, it is not getting the same kwargs as Nested so it can't ignore missing data. I think it was even mentioned in referenced discussion.

@lafrech
Copy link
Member Author

lafrech commented Dec 29, 2018

Thans @0xpsilocybe, I think this is the object of @sloria's comment in #1066. I shall update the PR to propagate all needed kwargs to Nested.

@timc13
Copy link

timc13 commented Mar 25, 2019

What about the Pluck(many=True) use case ?

@deckar01
Copy link
Member

deckar01 commented Mar 27, 2019

Removing many would allow a few lines of explicit many handling to be removed from Pluck. The rest of the changes would automatically be inherited from Nested.

@sloria
Copy link
Member

sloria commented Jun 15, 2019

@lafrech Is there anything else to be addressed before we can deprecate Nested(many=True)? I think we'll want to investigate #779 (comment) .

In any case, I don't think we should block on this for releasing 3.0. We can always deprecate within the 3.x line.

@fuhrysteve
Copy link
Contributor

@sloria 👍 on depreciating stuff rather than stacking the deck with more BC breaks. I'd much rather have 8 releases with 8 breaking changes than 8 breaking changes in one release...

I don't know about anyone else, but we've got hundreds - maybe even a thousand schemas to migrate. I'd rather take that in strides!

@lafrech
Copy link
Member Author

lafrech commented Jun 17, 2019

@fuhrysteve, about this, please see the conversation in #928.

@deckar01
Copy link
Member

deckar01 commented Jul 17, 2019

#1304 makes dump() ~18% faster for both paths. #1306 makes dump() ~21% faster for List(Nested), which makes the two paths run in effectively the same amount of time (< 1%).

@lafrech
Copy link
Member Author

lafrech commented Jul 17, 2019

Great.

Is there the same kind of gap between Nested(many) and List(Nested) when loading?

@deckar01
Copy link
Member

deckar01 commented Jul 17, 2019

Ya, load()has the same ~35% slowdown when comparing List(Nested) to Nested(many). The same strategy should work there.

Edit: I added the fix for load() in #1306 and confirmed that it makes them run in the same time without slowing down Nested(many).

@jtrakk
Copy link

jtrakk commented Aug 17, 2019

Does it make sense to remove Integer(many=True) and the many argument on other fields too?

@sloria
Copy link
Member

sloria commented Aug 18, 2019

@jtrakk many isn't an argument to other fields.

@jtrakk
Copy link

jtrakk commented Aug 18, 2019

I see. FWIW, I've seen people (including myself) pass Integer(many=True) and been confused when it doesn't work. I know attr.ib() uses the metadata= namespace instead of taking **metadata, to avoid that problem, at the cost of a little extra verbosity. I'm not sure how I feel about the tradeoff, though I lean toward the explicit namespace.

@sloria
Copy link
Member

sloria commented Aug 18, 2019

Yeah, it's a known issue. The reasons we use **kwargs instead of e.g. metadata= are mostly historical. The original decision was that storing kwargs 1) was more concise and 2) saved us from having to come up with an appropriate name... "metadata" didn't seem right because there are use cases where the things your storing aren't really metadata. At this point, it's not worth breaking the API.

Not the best reasons, but I think it's not terrible. We've discussed adding a whitelist of metadata keys in the past, but we decided it wasn't worth the added API surface.

@jtrakk
Copy link

jtrakk commented Aug 18, 2019

At this point, it's not worth breaking the API.

I suppose so.

That being said, 3.0.0 would be a good time to do it, and it wouldn't be hard to write an automatic converter using libCST or similar that'd move all the current **metadata calls into metadata=dict(...).

@sloria
Copy link
Member

sloria commented Aug 18, 2019

@jtrakk It's too late for 3.0 (planning to release the final today 😬 ), but I've opened #1350 for further discussion.

@sloria
Copy link
Member

sloria commented Sep 8, 2019

With this change and the lambda change proposed here, we can also probably deprecate the only, exclude, and unknown parameters of Nested.

# before
albums = fields.Nested(AlbumSchema, many=True, only=('id', 'title', ))
# after (possible now)
albums = field.List(fields.Nested(AlbumSchema(only=('id', 'title'))))

# before
artist = fields.Nested('ArtistSchema', only=('id', ))
# after (proposed)
artist = fields.Nested(lambda: ArtistSchema(only=('id',)))

@deckar01
Copy link
Member

deckar01 commented Feb 3, 2020

While working on #1498, @Meallia uncovered that Nested(many) does not support enabling allow_none for list items. Schema.load() has no concept of allow_none, so it will always implicitly be true.

@ThiefMaster
Copy link

ThiefMaster commented Apr 30, 2021

#1801 may be related to this topic.

But after looking at the code of List it seems like it probably wouldn't be particularly hard to make it use many on the underlying schema instead of calling it in a loop over the items...

@mwgamble
Copy link

Maybe the best solution is to document the differences and otherwise leave it be. It's easy to imagine there are many codebases around the world that depend on one of the two behaviours, given they aren't identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests