Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ArchiveField #5

Merged
merged 4 commits into from
Jul 18, 2023
Merged

Create ArchiveField #5

merged 4 commits into from
Jul 18, 2023

Conversation

giovanni-guidini
Copy link
Contributor

Porting https://github.com/codecov/codecov-api-archive/pull/1609 to the new repo.

Purpose/Motivation

What is the feature? Why is this being done?

Links to relevant tickets

What does this PR do?

Include a brief description of the changes in this PR. Bullet points are your friend.

Notes to Reviewer

Anything to note to the team? Any tips on how to review, or where to start?

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@codecov-staging
Copy link

codecov-staging bot commented Jul 17, 2023

Codecov Report

Patch coverage: 87.34% and project coverage change: -0.05 ⚠️

Comparison is base (c7287ff) 95.20% compared to head (4a8fef4) 95.16%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
- Coverage   95.20%   95.16%   -0.05%     
==========================================
  Files         551      552       +1     
  Lines       13895    13958      +63     
==========================================
+ Hits        13229    13283      +54     
- Misses        666      675       +9     
Flag Coverage Δ
unit 95.16% <87.34%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reports/models.py 92.66% <50.00%> (-7.34%) ⬇️
utils/model_utils.py 96.36% <96.36%> (ø)
services/archive.py 75.82% <100.00%> (+3.53%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Merging #5 (e2aa8a1) into main (9cffefd) will decrease coverage by 0.05%.
The diff coverage is 87.34%.

❗ Current head e2aa8a1 differs from pull request most recent head d35709a. Consider uploading reports for the commit d35709a to get more accurate results

@@           Coverage Diff           @@
##            main      #5     +/-   ##
=======================================
- Coverage   95.21   95.16   -0.05     
=======================================
  Files        552     552             
  Lines      13916   13958     +42     
=======================================
+ Hits       13250   13283     +33     
- Misses       666     675      +9     
Flag Coverage Δ
unit 95.16% <87.34%> (-0.06%) ⬇️
unit-latest-uploader 95.16% <87.34%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reports/models.py 92.66% <50.00%> (-7.34%) ⬇️
utils/model_utils.py 96.36% <96.36%> (ø)
services/archive.py 75.82% <100.00%> (+3.53%) ⬆️

... and 3 files with indirect coverage changes

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very slick, great work! requesting change for a couple things that should be small to fix

self.report.commit.repository.repoid in report_builder_repo_ids
)
return master_write_switch and (
is_codecov_repo or is_in_allowed_repos or not only_codecov
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so while only_codecov is true, and repo of ours, or a repo in a configured list, will use the new behavior. once only_codecov is false, those factors don't matter at all - everybody should get the new behavior. that's what you intend, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If it's not only for Codecov then it's for everyone.
The is_in_allowed_repos is an extra factor that allow us to shift the behavior only for selected customers.

Comment on lines 104 to 120
def __set__(self, obj, value):
self._get_value_from_archive.cache_clear()
# Set the new value
if self.should_write_to_storage_fn(obj):
repository = obj.get_repository()
archive_service = ArchiveService(repository=repository)
path = archive_service.write_json_data_to_storage(
commit_id=obj.get_commitid(),
table=self.table_name,
field=self.public_name,
external_id=obj.external_id,
data=value,
)
setattr(obj, self.archive_field_name, path)
setattr(obj, self.db_field_name, None)
else:
setattr(obj, self.db_field_name, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a potential race condition here. we clear the cache, and then while we're doing should_write_to_storage() or instantiating the archive service or whatever, a call to __get__() starts and finishes, setting the cache to the old value before we can update it. also we clear the cache even if we shouldn't be saving to storage, which is unnecessary

i think you can just move the self._get_value_from_archive.cache_clear() call to right after you update the URLField to fix

separately: do you know if we can/should delete whatever was stored in the storage path after we upload a new thing? we don't appear to save the path anywhere, so those uploads are orphaned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete, probably... yeah they are orphaned, good point.

Comment on lines +94 to +97
files_array = ArchiveField(
should_write_to_storage_fn=should_write_to_storage,
default_value=[],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to get rid of the _files_array and _files_array_storage_path members on ReportDetails and let the ArchiveField own them? it doesn't look like they're referenced anywhere apart from the getattr/setattr calls in ArchiveField, so we may as well make ArchiveField encapsulate them. and if we do, then we can replace the getattr(obj, field_name)/setattr(obj, field_name, val) with straightforward self.db_field and self.archive_field

so like

files_array = ArchiveField(
    should_write_to_storage_fn=should_write_to_storage,
    db_field=ArrayField(models.JSONField(), db_column="files_array", null=True),
    archive_field=URLField(db_column="files_array_storage_path", null=True),
    default_value=[],
)

i think in ArchiveField the type-hint for db_field would be the base class Field, but archive_field is always going to be a URLField i think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns about that are the following:

  1. I think preserving the migrations working as expected and automagically as it's been up until now is more beneficial than having the extra fields encapsulated. Even if we rarely change these things, for Django to understand the model as it really is seems to be a good idea, given that we rely on Django quite a lot.
  2. It forces you to always go through the ArchiveField. I think in the future, once everything is being saved in GCS, having the ArchiveField is just extra stuff not needed, and we can drop it eventually. Or we can start to use the storage field directly and make optimizations around that (e.g. not save to storage immediately every time, but make it only in the wrap_db_session).

I think I prefer to see ArchiveField as an util, an extra. What you are proposing is to make it an integral part of the model.

if you really feel strongly about making this change I will make it happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suspect removing those fields from the model will mess with the migrations. Maybe there's some metaprogramming faniciness you could do there to define the fields on the model class from within ArchiveField but I'm not sure off-hand. If that's straightforward enough maybe worth a shot but otherwise I think they'll have to remain on the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we can eventually do (but not right now, please) is create a custom field for the storage portion that handles the storing (and under the hood is just a URL field).

Then we drop the other field

(but not right now, please. A ton of work. Let's leave it for a next iteration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not messing with the magic migrations is a great point, thanks for explaining!

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oop i didn't hit the request button last time

Moving the cache cleaning to after we update the field to prevent race conditions in which the cache is left in the old state.

Also adding deletion of orphaned data that was saved in GCS, in the event that the storage path changes.

Also moving the table name calculation to when we save the data in GCS, which is the only place we actually need it.
giovanni-guidini added a commit to codecov/worker that referenced this pull request Jul 18, 2023
These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5
@giovanni-guidini giovanni-guidini merged commit 61ef4c9 into main Jul 18, 2023
6 checks passed
giovanni-guidini added a commit to codecov/worker that referenced this pull request Jul 20, 2023
These changes bring the worker model up to par with the api model defined in codecov/codecov-api#5
@trent-codecov trent-codecov deleted the gio/archive-field branch November 29, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants