Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a mechanism to parse Personal Data Fields (PDF) from API responses for Cardholders using pydantic #1

Closed
devraj opened this issue Feb 25, 2023 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation pending-release Pending release for the nexdt version testing Requires completing test cases

Comments

@devraj
Copy link
Member

devraj commented Feb 25, 2023

Gallagher's REST APIs return configurable fields (per instance of the server) called Personal Data Fields (PDFs). These are additional fields that are relevant to the organisation and are not common. These key of the fields are always prefixed with an @ symbol e.g @Student ID in an example from their docs:

{
  "href": "https://localhost:8904/api/cardholders/325",
  "id": "325",
  "firstName": "Algernon",
  "lastName": "Boothroyd",
  "shortName": "Q",
  "description": "Quartermaster",
  "authorised": true,
  "lastSuccessfulAccessTime": "2004-11-18T19:21:52Z",
  "lastSuccessfulAccessZone": {
    "href": "https://localhost:8904/api/access_zones/333",
    "name": "Twilight zone"
  },
  "serverDisplayName": "ruatoria.satellite.int",
  "division": {
    "href": "https://localhost:8904/api/divisions/2"
  },
  "@Student ID": "8904640"
}

We are using pyndatic schemas to parse the API response. The aim would be to use event based methods to dynamically parse these and make them available as a dictionary:

cardholder.pdfs["StudentId"]

for the API user to access these fields in a Pythonic manner.

Note that the response has a field called personalDataDefinitions which contains references to each one of the PDF definitions.

@devraj devraj self-assigned this Feb 25, 2023
@devraj
Copy link
Member Author

devraj commented Jan 21, 2024

I would really like a very pyndatic solution to this, and hence have started a discussion on the pydantic Github project.

Post a resolution this should be merged in our documentation.

@devraj
Copy link
Member Author

devraj commented Jan 21, 2024

For the purposes of the Gallagher API, it should be noted that

https://commandcentre-api-au.security.gallagher.cloud/api/personal_data_fields/

returns a valid set of Personal Data Fields for the command centre we are communicating with. I propose that we call and memoize #14 the response and use this as part of the validation process to ensure that unexpected keys don't pass through the validation process.

devraj added a commit that referenced this issue Jan 21, 2024
inline with completing the parsing portion of the library we need to
address the pdf data which is dynamically discovered from the command
centre, REFS #1

we have started a discussion around this on the pydantic repo
pydantic/pydantic#8596

and must implement the most pydantic version of the solution
devraj added a commit that referenced this issue Jan 22, 2024
doing some ground work towards #1 where we need pdf fields to be
auto discovered for the dynamically parseable feature to work properly

this commit parse the basic list call for PdfDefinitions, detail and
other preloading functions to be added post this commit
@devraj devraj added this to the alpha-3 milestone Apr 14, 2024
@devraj
Copy link
Member Author

devraj commented May 1, 2024

Doing a bit of research I found the Extras option in pyndatic which allows for fields other than the ones defined in the model to be populated. You can allow extras by modifying the configuration and can validate the model by providing a custom validator:

from pydantic import BaseModel, Field, root_validator, ValidationError

class ExtendedModel(BaseModel):
    name: str = Field(..., description="The name of the entity")
    value: int = Field(..., description="The value associated with the entity")

    @root_validator(pre=True)
    def validate_extras(cls, values):
        known_fields = {"name", "value"}

        # Validate all extra fields, or apply specific rules
        for key in values:
            if key not in known_fields:
                # Example: ensuring extra fields start with "extra_"
                if not key.startswith("extra_"):
                    raise ValidationError(f"Extra field '{key}' is not allowed.")

        return values

    class Config:
        # Allow extra fields to be passed
        extra = Extra.allow

example usage:

# Example usage
try:
    model = ExtendedModel(name="example", value=42, extra_field="some extra data")
    print(model.dict())
except ValidationError as e:
    print(e)

Another approach would be store a dictionary of items and then provide a getter function to fetch values using a key:

from pydantic import BaseModel, Field, PrivateAttr

class DynamicModel(BaseModel):
    name: str = Field(..., description="The name of the entity")
    value: int = Field(..., description="The value associated with the entity")

    # Store additional fields in a private dictionary
    _extra: dict = PrivateAttr(default={})

    def __init__(self, **kwargs):
        # Separate known fields from extra fields
        known_fields = {field: kwargs.pop(field) for field in self.__fields__ if field in kwargs}
        extra_fields = kwargs

        # Set known fields using the base constructor
        super().__init__(**known_fields)

        # Store extra fields
        self._extra = extra_fields

    def get_extra(self, key):
        return self._extra.get(key)

# Example usage
data = {
    "name": "example",
    "value": 42,
    "additional_field": "some extra data"
}

model = DynamicModel(**data)
print(model.name)  # Outputs: example
print(model.value)  # Outputs: 42
print(model.get_extra("additional_field"))  # Outputs: some extra data

This could be more appropriate for our use case at the keys would be dynamically discovered from the API and are configurable by the administrator of the system.

Note: we should post our approach on the pyndatic discussion for others to benefit from.

devraj added a commit that referenced this issue May 8, 2024
hrefs were strings, pyndatic offers HttpUrl which is way more useful
to validate that the urls are returned as full formed urls

note that the optional href is also configured in the same way.

this commit also enables extra fields for pyndatic ahead of implementing
the PDF ticket #1
@devraj devraj added the testing Requires completing test cases label May 11, 2024
@devraj
Copy link
Member Author

devraj commented Jun 10, 2024

Having implemented a few underlying features ahead of finishing this off, I think we should tackle this in the following order:

  • Parse and test the personalDataDefinitions property that defines the details of the PDF Fields
  • Outline a formal syntax of accessing PDF fields (possibly object.pdf.attribute_name)
  • Implement custom getter that is least error prone
  • Documentation on the design and how a user accesses these variables.

@devraj devraj added the documentation Improvements or additions to documentation label Jun 10, 2024
devraj added a commit that referenced this issue Jun 14, 2024
personal data fields are dynamically generated based on the definitions
found on the command centre instance, this is a special case in terms of
parsing responses from the server as the keys of these response are not
dyanamically populated, this commit moves to using a enumeration to
ensure that the values for the `type` is one of what the server should
send

refs #1
devraj added a commit that referenced this issue Jun 14, 2024
brings the cardholder detail model closer to completion with access
groups being parsed as per the definition

refs #1
devraj added a commit that referenced this issue Jun 15, 2024
just as we do with the discovery message, we cache the pdf fields
available on the server onto the capabilities constant, this can be
used to parse models that use the personal data fields

refs #1
devraj added a commit that referenced this issue Jun 15, 2024
…rsers

pdfs are dynamically populated from the data and schema that is sent
for the customer detail object, this is a first attempt to see if
we can parse the pdf items using mode_validate.

WARNING: nothing here may work and we may end up changing the entire
direction of this implementation

refs #1
devraj added a commit that referenced this issue Jun 15, 2024
i've just realised that the customer detail response sends the relevant
pdf fields as part of the personalDataDefinitions field, if we can
parse this and make sense of it then the keys to the dynamic fields will
exists in the response and we should not read this from the cache.

moreover it would make sense that the personDataDefinitions is the set
of keys that are part of the customer response i.e not every field is
available or is in use all the time.

changing tactic here, and going to try and parse the
personalDatadefinitions first and then populate the keys from there

refs #1
devraj added a commit that referenced this issue Jun 15, 2024
personalDataDefinition field parsing working given refactoring of the
dictionary or sorts and allowing pyndatic to make sense of what the
server sends back to the api client

the parser fails if the value key is missing from a payload which seems
to be the case in some items in the sample data for the command centre

refs #1
@devraj
Copy link
Member Author

devraj commented Jun 16, 2024

First a note that the examples I posted last month are specific to the way Pydantic v1 works.

Second I realised that the personalDataFields send back a value field which contain the same value as it appears on the top level dictionary i.e:

personalDataDefinitions.@Key.value maps to object.@Key

See a peculiar structure where the list of personalDataDefinitions has an object, which has a dynamic key and the value the dynamic key leads to is an object with the meta data and the value of the field for that cardholder.

Initially I was going down a path of implementing a cache of the personalDataFields which is unnecessary as the cardholder detail response will always return what we need. Hence it makes sense to parse the personalDataDefinitions (see commit) and then make the top level keys accessible as aliases.

devraj added a commit that referenced this issue Jun 21, 2024
personal_data_definitions has the validated fields accessible as pyndatic objects
this makes the same values accessible as pythonic fields so the developers code
can look a little more idiomatic

i.e @Email Address can now be accessed as .pdf.email_address

refs #1
@devraj devraj added the pending-release Pending release for the nexdt version label Jun 26, 2024
@devraj devraj closed this as completed Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation pending-release Pending release for the nexdt version testing Requires completing test cases
Projects
None yet
Development

No branches or pull requests

1 participant