Create `ArchiveField` #5

giovanni-guidini · 2023-07-17T20:51:21Z

Porting https://github.com/codecov/codecov-api-archive/pull/1609 to the new repo.

Purpose/Motivation

What is the feature? Why is this being done?

Links to relevant tickets

What does this PR do?

Include a brief description of the changes in this PR. Bullet points are your friend.

Notes to Reviewer

Anything to note to the team? Any tips on how to review, or where to start?

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

Porting codecov/codecov-api-archive#1609 to the new repo.

codecov-staging · 2023-07-17T21:13:11Z

Codecov Report

Patch coverage: 87.34% and project coverage change: -0.05 ⚠️

Comparison is base (c7287ff) 95.20% compared to head (4a8fef4) 95.16%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
- Coverage   95.20%   95.16%   -0.05%     
==========================================
  Files         551      552       +1     
  Lines       13895    13958      +63     
==========================================
+ Hits        13229    13283      +54     
- Misses        666      675       +9

Flag	Coverage Δ
unit	`95.16% <87.34%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
reports/models.py	`92.66% <50.00%> (-7.34%)`	⬇️
utils/model_utils.py	`96.36% <96.36%> (ø)`
services/archive.py	`75.82% <100.00%> (+3.53%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

codecov · 2023-07-17T21:13:29Z

Codecov Report

Merging #5 (e2aa8a1) into main (9cffefd) will decrease coverage by 0.05%.
The diff coverage is 87.34%.

❗ Current head e2aa8a1 differs from pull request most recent head d35709a. Consider uploading reports for the commit d35709a to get more accurate results

@@           Coverage Diff           @@
##            main      #5     +/-   ##
=======================================
- Coverage   95.21   95.16   -0.05     
=======================================
  Files        552     552             
  Lines      13916   13958     +42     
=======================================
+ Hits       13250   13283     +33     
- Misses       666     675      +9

Flag	Coverage Δ
unit	`95.16% <87.34%> (-0.06%)`	⬇️
unit-latest-uploader	`95.16% <87.34%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
reports/models.py	`92.66% <50.00%> (-7.34%)`	⬇️
utils/model_utils.py	`96.36% <96.36%> (ø)`
services/archive.py	`75.82% <100.00%> (+3.53%)`	⬆️

... and 3 files with indirect coverage changes

matt-codecov

very slick, great work! requesting change for a couple things that should be small to fix

matt-codecov · 2023-07-17T22:35:55Z

reports/models.py

+            self.report.commit.repository.repoid in report_builder_repo_ids
+        )
+        return master_write_switch and (
+            is_codecov_repo or is_in_allowed_repos or not only_codecov


so while only_codecov is true, and repo of ours, or a repo in a configured list, will use the new behavior. once only_codecov is false, those factors don't matter at all - everybody should get the new behavior. that's what you intend, right?

Yes. If it's not only for Codecov then it's for everyone.
The is_in_allowed_repos is an extra factor that allow us to shift the behavior only for selected customers.

matt-codecov · 2023-07-17T23:25:52Z

utils/model_utils.py

+    def __set__(self, obj, value):
+        self._get_value_from_archive.cache_clear()
+        # Set the new value
+        if self.should_write_to_storage_fn(obj):
+            repository = obj.get_repository()
+            archive_service = ArchiveService(repository=repository)
+            path = archive_service.write_json_data_to_storage(
+                commit_id=obj.get_commitid(),
+                table=self.table_name,
+                field=self.public_name,
+                external_id=obj.external_id,
+                data=value,
+            )
+            setattr(obj, self.archive_field_name, path)
+            setattr(obj, self.db_field_name, None)
+        else:
+            setattr(obj, self.db_field_name, value)


there's a potential race condition here. we clear the cache, and then while we're doing should_write_to_storage() or instantiating the archive service or whatever, a call to __get__() starts and finishes, setting the cache to the old value before we can update it. also we clear the cache even if we shouldn't be saving to storage, which is unnecessary

i think you can just move the self._get_value_from_archive.cache_clear() call to right after you update the URLField to fix

separately: do you know if we can/should delete whatever was stored in the storage path after we upload a new thing? we don't appear to save the path anywhere, so those uploads are orphaned

I think we can delete, probably... yeah they are orphaned, good point.

matt-codecov · 2023-07-17T23:40:16Z

reports/models.py

+    files_array = ArchiveField(
+        should_write_to_storage_fn=should_write_to_storage,
+        default_value=[],
+    )


is it possible to get rid of the _files_array and _files_array_storage_path members on ReportDetails and let the ArchiveField own them? it doesn't look like they're referenced anywhere apart from the getattr/setattr calls in ArchiveField, so we may as well make ArchiveField encapsulate them. and if we do, then we can replace the getattr(obj, field_name)/setattr(obj, field_name, val) with straightforward self.db_field and self.archive_field

so like

files_array = ArchiveField( should_write_to_storage_fn=should_write_to_storage, db_field=ArrayField(models.JSONField(), db_column="files_array", null=True), archive_field=URLField(db_column="files_array_storage_path", null=True), default_value=[], )

i think in ArchiveField the type-hint for db_field would be the base class Field, but archive_field is always going to be a URLField i think?

My concerns about that are the following:

I think preserving the migrations working as expected and automagically as it's been up until now is more beneficial than having the extra fields encapsulated. Even if we rarely change these things, for Django to understand the model as it really is seems to be a good idea, given that we rely on Django quite a lot.

It forces you to always go through the ArchiveField. I think in the future, once everything is being saved in GCS, having the ArchiveField is just extra stuff not needed, and we can drop it eventually. Or we can start to use the storage field directly and make optimizations around that (e.g. not save to storage immediately every time, but make it only in the wrap_db_session).

I think I prefer to see ArchiveField as an util, an extra. What you are proposing is to make it an integral part of the model.

if you really feel strongly about making this change I will make it happen.

Yeah, I suspect removing those fields from the model will mess with the migrations. Maybe there's some metaprogramming faniciness you could do there to define the fields on the model class from within ArchiveField but I'm not sure off-hand. If that's straightforward enough maybe worth a shot but otherwise I think they'll have to remain on the model.

One thing we can eventually do (but not right now, please) is create a custom field for the storage portion that handles the storing (and under the hood is just a URL field).

Then we drop the other field

(but not right now, please. A ton of work. Let's leave it for a next iteration)

not messing with the magic migrations is a great point, thanks for explaining!

matt-codecov

oop i didn't hit the request button last time

Moving the cache cleaning to after we update the field to prevent race conditions in which the cache is left in the old state. Also adding deletion of orphaned data that was saved in GCS, in the event that the storage path changes. Also moving the table name calculation to when we save the data in GCS, which is the only place we actually need it.

These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5

giovanni-guidini requested a review from scott-codecov July 17, 2023 20:51

Create ArchiveField

b54945b

Porting codecov/codecov-api-archive#1609 to the new repo.

giovanni-guidini force-pushed the gio/archive-field branch from 616f5cb to b54945b Compare July 17, 2023 21:02

matt-codecov reviewed Jul 17, 2023

View reviewed changes

matt-codecov requested changes Jul 17, 2023

View reviewed changes

giovanni-guidini requested a review from matt-codecov July 18, 2023 13:32

scott-codecov approved these changes Jul 18, 2023

View reviewed changes

Merge branch 'main' into gio/archive-field

e2aa8a1

matt-codecov approved these changes Jul 18, 2023

View reviewed changes

Merge branch 'main' into gio/archive-field

d35709a

giovanni-guidini added a commit to codecov/worker that referenced this pull request Jul 18, 2023

Update ArchiveField

e87ac47

These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5

giovanni-guidini mentioned this pull request Jul 18, 2023

Update ArchiveField codecov/worker#18

Merged

giovanni-guidini merged commit 61ef4c9 into main Jul 18, 2023
6 checks passed

giovanni-guidini added a commit to codecov/worker that referenced this pull request Jul 20, 2023

Update ArchiveField (#18)

79a0f77

These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5

trent-codecov deleted the gio/archive-field branch November 29, 2023 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create `ArchiveField` #5

Create `ArchiveField` #5

giovanni-guidini commented Jul 17, 2023

codecov-staging bot commented Jul 17, 2023 •

edited

codecov bot commented Jul 17, 2023 •

edited

matt-codecov left a comment

matt-codecov Jul 17, 2023

giovanni-guidini Jul 18, 2023

matt-codecov Jul 17, 2023

giovanni-guidini Jul 18, 2023

matt-codecov Jul 17, 2023

giovanni-guidini Jul 18, 2023

scott-codecov Jul 18, 2023

giovanni-guidini Jul 18, 2023

matt-codecov Jul 18, 2023

matt-codecov left a comment

Create ArchiveField #5

Create ArchiveField #5

Conversation

giovanni-guidini commented Jul 17, 2023

Purpose/Motivation

Links to relevant tickets

What does this PR do?

Notes to Reviewer

Legal Boilerplate

codecov-staging bot commented Jul 17, 2023 • edited

Codecov Report

codecov bot commented Jul 17, 2023 • edited

Codecov Report

matt-codecov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matt-codecov left a comment

Choose a reason for hiding this comment

Create `ArchiveField` #5

Create `ArchiveField` #5

codecov-staging bot commented Jul 17, 2023 •

edited

codecov bot commented Jul 17, 2023 •

edited