Feature: access already generated fields in custom generation #41

blagasz · 2022-05-05T14:43:03Z

It would be a nice feature to have access to the already generated field values in a customised generator scenario, e.g.

from typing import Any
from pydantic_factories import ModelFactory

class CustomFactory(ModelFactory[Any]):
    """Tweak the ModelFactory to add our custom mocks."""

    @classmethod
    def get_mock_value(cls, field_type: Any, previous_fields: dict[str, Any]) -> Any:
        """Add our custom mock value."""
        if str(field_type) == "my_dependant_field" and previous_fields["my_relying_on"] == 'special_value':
            return cls._get_faker().date_time_between()

        return super().get_mock_value(field_type)

I could even imagine some decorator or annotation based solution to the same problem, e.g.

class MyFactory(ModelFactory):
  __model__ = MyModel

  @depends('field_relying_on')
  def dependant_field_name(self, current_value: Any, other_value: Any):
    return 'special_value_generated_based_on_context'

Goldziher · 2022-05-05T19:39:34Z

Hi. Well, the issue here is really the ordering of generated fields - if you have a field A that is dependent on field B's value, this will not work out of the box currently. The library iterates the __fields__ property of the pydantic model, which has an order that derives from the order of declaration of the fields. Thus, we cannot know for certain that field B will be processed before field A, and so to implement this will require some thinking on how to do this in a none-hacky way.

So, before this enhancement can proceed we need to have a clear viable way of implementing this in a way that can be developed further. I invite you to add your ideas.

blagasz · 2022-05-05T20:12:22Z

Hi. I totally agree, I was also worried about the order. I think it is enough for all the realistic use cases I can think of that every field could only depend on a preceding one (it is also a good way to avoid circularities as an unwanted side effect). So I think it is completely satisfactory to somehow allow access for already generated fields and only those.

I am still unsure about the right syntax and need to familiarise myself more with the internal workings of the generation process, but I will try to make a v1 PR and we could iterate on that if you agree.

Goldziher · 2022-05-05T20:16:33Z

sure go ahead.

Also, would be happy for some further input @lindycoder

lindycoder · 2022-05-06T01:19:16Z

I do not have a strong opinion on this. I usually try to make data models that don't have interdependent fields because it's always a pain to generate data from.

If one field is generated from the others i would vote for a @property. If it's correlated with others and you're using pydantic i would make validators or root validators enforcing it.... but that doesn't solve the problem of the data generation.

In the case you're describing i would probably hide the factory.build() in a function that will control the input to the .build().

Regarding the feature/implementation:
Pydantic field validators are processed in order and the values you have access while validating are the ones that have been described before. As @Goldziher pointed out, generation is done iterating over __fields__, probably in the same fashion pydantic does its validator. If we could accumulate a dict[str, Any] of generated data and pass this as a parameter to all generative methods that would be work like pydantic, which might make sense to pydantic users.

I think it would be cool if handle_factory_field passed that to Use() and callable field definition, but that would be a backward incompatible change. Unless we check if the function takes args before sending it... kinda like pytest checks for fixtures by arg name. For example, if the callable has a "values" (name of param in validators) param we could supply it.

blagasz · 2022-05-06T07:09:14Z

I completely agree that this use case is best to be avoided. On the other hand it could add a nice extra to the package. Adding this option to Use though seems straightforward I would not go with the arg inspection option to avoid backward incompatible change. I suggest something like:

class MyFactory(ModelFactory):
  __model__ = MyModel

  @generate('special_field')
  def type_specific_generator(self, v, values: dict[str, Any])
     # do the magic (??? provide faker instance to help generation)
    return generated value

It is a second way of doing the same thing which is not ideal, but this way backward incompatibility could be avoided. It should throw and error if the same field has a Use specification as well. Wdyt?

Goldziher · 2022-05-06T07:15:04Z

well, so far we didnt use decorators in this package. its not a strict no, but I would prefer keeping the API logic the same.

How about, instead of modifying Use, we add a new class like Use, Ignore, etc, e.g. something like After or Depends, and then pass the data to it? What we could do is basically run the generation for all fields, but after the values have been generated, pass the result to any field that has After and then take the resulting value as the final value.

blagasz · 2022-05-06T14:48:04Z

I like this, just to be sure, I see something like this in my head:

class MyFactory(ModelFactory):
  __model__ = MyModel

  dependant_field = Depends('field_name or list of names', callable_receiving_dependant_values, args, kwargs)

And the the value returned by the callable would be the final value for dependant_field.

lindycoder · 2022-05-06T14:59:55Z

Just throwing that here

class MyFactory(ModelFactory):
  __model__ = MyModel

  dependant_field = PostGenerated(callable_receiving_dependant_values, *args, **kwargs)

And all post generated fields would be triggered at the end of everything else, in no particular order.

If you introduce the direct dependency, we need to change the iteration of the __fields__ and switch to a dependency tree because then you would need to support depends on depends... seems like complexity i'm not sure we want to support

blagasz · 2022-05-06T15:50:05Z

Fair point, it could even become circular I guess

blagasz · 2022-05-17T15:49:40Z

@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.

lindycoder · 2022-05-17T15:56:56Z

Note that this feature could still be relevant for people that likes immutability and makes their classes frozen=True as you can't change them after build(). Not sure how much of the audience that applies to.

Goldziher · 2022-05-17T16:29:08Z

@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.

Ok, for me simple is better and this feature is borderline too complex, although I'm not opposed to it. Feel free to close this issue unless you intend to contribute it.

blagasz · 2022-05-18T06:10:48Z

I contributed a first version, any feedback is welcome!

Goldziher · 2022-05-19T19:42:56Z

I contributed a first version, any feedback is welcome!

very nice stuff. I reviewed

DiegoPiloni · 2022-05-26T17:57:30Z

Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.

Would this be a good case for SQL relationships ? For example in SQLModel:

class Hero(SQLModel, table=True):
    ...
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")
    team: Optional[Team] = Relationship(back_populates="heroes")

How would we make sure that team_id and team.id are both the same?

In factory-boy this could be accomplished with SubFactory and SelfAttribute, something like this:

class HeroFactory(SQLAlchemyModelFactory):
   ...
   team_id = FuzzyInteger()
   team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")

Does this issue intends to support the same use cases? Is there a feature like SubFactory in pydantic_factories, where we can override certain fields of nested models ?

Goldziher · 2022-05-27T14:57:06Z

Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.

Would this be a good case for SQL relationships ? For example in SQLModel:
class Hero(SQLModel, table=True):
    ...
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")
    team: Optional[Team] = Relationship(back_populates="heroes")
How would we make sure that team_id and team.id are both the same?

In factory-boy this could be accomplished with SubFactory and SelfAttribute, something like this:
class HeroFactory(SQLAlchemyModelFactory):
   ...
   team_id = FuzzyInteger()
   team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")
Does this issue intends to support the same use cases? Is there a feature like SubFactory in pydantic_factories, where we can override certain fields of nested models ?

Hi there,

You could certainly use it like so I think. Please open a separate issue after this feature is released if it does not satisfy your requirements.

Goldziher · 2022-05-27T16:13:01Z

This feature has been released in v1.3.0 @blagasz. Thanks for your contribution!

Goldziher added the enhancement New feature or request label May 5, 2022

blagasz mentioned this issue May 18, 2022

Feature/post generate #42

Merged

Goldziher closed this as completed May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: access already generated fields in custom generation #41

Feature: access already generated fields in custom generation #41

blagasz commented May 5, 2022

Goldziher commented May 5, 2022

blagasz commented May 5, 2022

Goldziher commented May 5, 2022

lindycoder commented May 6, 2022 •

edited

blagasz commented May 6, 2022

Goldziher commented May 6, 2022

blagasz commented May 6, 2022

lindycoder commented May 6, 2022

blagasz commented May 6, 2022

blagasz commented May 17, 2022

lindycoder commented May 17, 2022 •

edited

Goldziher commented May 17, 2022

blagasz commented May 18, 2022

Goldziher commented May 19, 2022

DiegoPiloni commented May 26, 2022

Goldziher commented May 27, 2022

Goldziher commented May 27, 2022

Feature: access already generated fields in custom generation #41

Feature: access already generated fields in custom generation #41

Comments

blagasz commented May 5, 2022

Goldziher commented May 5, 2022

blagasz commented May 5, 2022

Goldziher commented May 5, 2022

lindycoder commented May 6, 2022 • edited

blagasz commented May 6, 2022

Goldziher commented May 6, 2022

blagasz commented May 6, 2022

lindycoder commented May 6, 2022

blagasz commented May 6, 2022

blagasz commented May 17, 2022

lindycoder commented May 17, 2022 • edited

Goldziher commented May 17, 2022

blagasz commented May 18, 2022

Goldziher commented May 19, 2022

DiegoPiloni commented May 26, 2022

Goldziher commented May 27, 2022

Goldziher commented May 27, 2022

lindycoder commented May 6, 2022 •

edited

lindycoder commented May 17, 2022 •

edited