-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ArchiveField
#5
Conversation
Porting codecov/codecov-api-archive#1609 to the new repo.
616f5cb
to
b54945b
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #5 +/- ##
==========================================
- Coverage 95.20% 95.16% -0.05%
==========================================
Files 551 552 +1
Lines 13895 13958 +63
==========================================
+ Hits 13229 13283 +54
- Misses 666 675 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Codecov Report
@@ Coverage Diff @@
## main #5 +/- ##
=======================================
- Coverage 95.21 95.16 -0.05
=======================================
Files 552 552
Lines 13916 13958 +42
=======================================
+ Hits 13250 13283 +33
- Misses 666 675 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very slick, great work! requesting change for a couple things that should be small to fix
self.report.commit.repository.repoid in report_builder_repo_ids | ||
) | ||
return master_write_switch and ( | ||
is_codecov_repo or is_in_allowed_repos or not only_codecov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so while only_codecov
is true, and repo of ours, or a repo in a configured list, will use the new behavior. once only_codecov
is false, those factors don't matter at all - everybody should get the new behavior. that's what you intend, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If it's not only for Codecov then it's for everyone.
The is_in_allowed_repos
is an extra factor that allow us to shift the behavior only for selected customers.
utils/model_utils.py
Outdated
def __set__(self, obj, value): | ||
self._get_value_from_archive.cache_clear() | ||
# Set the new value | ||
if self.should_write_to_storage_fn(obj): | ||
repository = obj.get_repository() | ||
archive_service = ArchiveService(repository=repository) | ||
path = archive_service.write_json_data_to_storage( | ||
commit_id=obj.get_commitid(), | ||
table=self.table_name, | ||
field=self.public_name, | ||
external_id=obj.external_id, | ||
data=value, | ||
) | ||
setattr(obj, self.archive_field_name, path) | ||
setattr(obj, self.db_field_name, None) | ||
else: | ||
setattr(obj, self.db_field_name, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a potential race condition here. we clear the cache, and then while we're doing should_write_to_storage()
or instantiating the archive service or whatever, a call to __get__()
starts and finishes, setting the cache to the old value before we can update it. also we clear the cache even if we shouldn't be saving to storage, which is unnecessary
i think you can just move the self._get_value_from_archive.cache_clear()
call to right after you update the URLField
to fix
separately: do you know if we can/should delete whatever was stored in the storage path after we upload a new thing? we don't appear to save the path anywhere, so those uploads are orphaned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can delete, probably... yeah they are orphaned, good point.
files_array = ArchiveField( | ||
should_write_to_storage_fn=should_write_to_storage, | ||
default_value=[], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to get rid of the _files_array
and _files_array_storage_path
members on ReportDetails
and let the ArchiveField
own them? it doesn't look like they're referenced anywhere apart from the getattr
/setattr
calls in ArchiveField
, so we may as well make ArchiveField
encapsulate them. and if we do, then we can replace the getattr(obj, field_name)
/setattr(obj, field_name, val)
with straightforward self.db_field
and self.archive_field
so like
files_array = ArchiveField(
should_write_to_storage_fn=should_write_to_storage,
db_field=ArrayField(models.JSONField(), db_column="files_array", null=True),
archive_field=URLField(db_column="files_array_storage_path", null=True),
default_value=[],
)
i think in ArchiveField
the type-hint for db_field
would be the base class Field
, but archive_field
is always going to be a URLField
i think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concerns about that are the following:
- I think preserving the migrations working as expected and automagically as it's been up until now is more beneficial than having the extra fields encapsulated. Even if we rarely change these things, for Django to understand the model as it really is seems to be a good idea, given that we rely on Django quite a lot.
- It forces you to always go through the
ArchiveField
. I think in the future, once everything is being saved in GCS, having theArchiveField
is just extra stuff not needed, and we can drop it eventually. Or we can start to use the storage field directly and make optimizations around that (e.g. not save to storage immediately every time, but make it only in thewrap_db_session
).
I think I prefer to see ArchiveField
as an util, an extra. What you are proposing is to make it an integral part of the model.
if you really feel strongly about making this change I will make it happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect removing those fields from the model will mess with the migrations. Maybe there's some metaprogramming faniciness you could do there to define the fields on the model class from within ArchiveField
but I'm not sure off-hand. If that's straightforward enough maybe worth a shot but otherwise I think they'll have to remain on the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we can eventually do (but not right now, please) is create a custom field for the storage portion that handles the storing (and under the hood is just a URL field).
Then we drop the other field
(but not right now, please. A ton of work. Let's leave it for a next iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not messing with the magic migrations is a great point, thanks for explaining!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oop i didn't hit the request button last time
Moving the cache cleaning to after we update the field to prevent race conditions in which the cache is left in the old state. Also adding deletion of orphaned data that was saved in GCS, in the event that the storage path changes. Also moving the table name calculation to when we save the data in GCS, which is the only place we actually need it.
These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5
These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5
Porting https://github.com/codecov/codecov-api-archive/pull/1609 to the new repo.
Purpose/Motivation
What is the feature? Why is this being done?
Links to relevant tickets
What does this PR do?
Include a brief description of the changes in this PR. Bullet points are your friend.
Notes to Reviewer
Anything to note to the team? Any tips on how to review, or where to start?
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.