-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve logging when skipping values or entities in mappings #1425
Conversation
This is a small change that allows callers of the `unsafe_add` method to determine if a value has been removed as part of value cleaning. This is helpful to be able to log/warn when values are removed.
Right now, mappings silently discard unclean values. This makes it difficult to debug mapping issues during development as well as to monitor mappings running in production.
prop.map(proxy, record, entities) | ||
discarded_values = prop.map(proxy, record, entities) | ||
for value in discarded_values: | ||
log.warn(f"[{self.name}] Discarded unclean value \"{value}\" for property \"{prop.prop.qname}\".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pudo You suggested also logging the proxy ID here. I can see how this can be useful, especially when the mapping uses id_column
. I just don’t know how to implement this in a smart way as the proxy ID hasn’t been computed yet at this point (and it cannot be computed earlier as that requires knowing the property values).
Only option I see is not to log the warning message directly, but cache them until the ID has been computed, something like this:
discarded_values: dict[Property, set[value]] = defaultdict(set)
for prop in self.properties:
if prop.prop.type == registry:country:
# Save discarded values
discarded_values = prop.map(proxy, record, entities)
discarded_values[prop].update(discarded_values)
for prop in self.properties:
if prop.prop.type == registry:country:
# Save discarded values
discarded_values = prop.map(proxy, record, entities)
discarded_values[prop].update(discarded_values)
proxy.id = self.compute_key(record)
if proxy.id is None:
if self.id_column:
log.warn(...)
if self.keys:
log.warn(...)
return None
if discarded_values:
# Finally about discarded values here
for prop, values in discarded_values.items():
log.warn(...)
But that’s a lot of code and indirection for a simple log message… Any good ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's too much faff in the core loop of the mapper. Let's leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be OK to merge this and make a minor version release?
prop.map(proxy, record, entities) | ||
discarded_values = prop.map(proxy, record, entities) | ||
for value in discarded_values: | ||
log.warn(f"[{self.name}] Discarded unclean value \"{value}\" for property \"{prop.prop.qname}\".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's too much faff in the core loop of the mapper. Let's leave it.
@pudo Thanks for taking another look, just merged the PR! :) |
This is a follow-up to #1404. This PR only includes the changes that improve logging as I agree that completely disabling value cleaning can be dangerous.