Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: actually drop delta tables #2415

Merged
merged 9 commits into from
Jan 23, 2024

Conversation

universalmind303
Copy link
Contributor

@universalmind303 universalmind303 commented Jan 11, 2024

closes #1870

I wanted to open this up to make sure that others are aligned on the direction I took with this before getting too far. This works as is, but could use a bit of cleanup in areas.

It uses our scheduler to spawn a system_exec to remove the delta tables. Currently it just logs the output, but we could put them in a table or something else.. Logging seemed to be fine for now.. I didn't want to over-engineer and throw away a bunch of work if this isn't the direction we want to take.

@universalmind303 universalmind303 changed the title wip WIP: actually drop delta tables Jan 11, 2024
@universalmind303
Copy link
Contributor Author

universalmind303 commented Jan 12, 2024

so this is technically working fine, but it isn't working inside of a test, likely because of the known issues with the scheduler. The task gets scheduled, but never picked up. Even when putting a long sleep in the test, the task never gets picked up. So since we can't write an integration test for this, I'll keep this in a WIP until we can get our scheduler working properly

@tychoish
Copy link
Collaborator

so this is technically working fine, but it isn't working inside of a test, likely because of the known issues with the scheduler. The task gets scheduled, but never picked up. Even when putting a long sleep in the test, the task never gets picked up. So since we can't write an integration test for this, I'll keep this in a WIP until we can get our scheduler working properly

When you say technically working fine, you mean that you can observe it working correctly in manual testing but that because the distributed execution scheduler is not working in the testing environment that your tests don't pass?

I think keeping it draft until distexec gets more ironed out seems fine. It also doesn't seem necessary to use the distexec scheduler for this because:

  • why not just have the remove operation run in the calling thread? having the calling operation return before the delete operation returns seems like it could lead to some confusions (and failed deletes)
  • why not spawn a task for doing this directly? The scheduler mostly exists to provide fairness and limitations when we hav a lot of very small tasks (as we would in a dist-exec situation,) but that doesn't seem like a huge problem here.
  • using the distexec scheduler for the "non-dist-exec" code paths, seems potentially confusing, and might make it harder operationally to turn things off, or understand things.

Having said that, all of the potential bugs or issues from using the distexec scheduler before everything becomes distexec is just that we'd regress on this issue, which is already happening today, so I wouldn't worry about it too hard, though obviously this is all moot until the scheduler works better.

@universalmind303
Copy link
Contributor Author

When you say technically working fine, you mean that you can observe it working correctly in manual testing but that because the distributed execution scheduler is not working in the testing environment that your tests don't pass?

yeah, it functionally works, but not in the testing suite.

@universalmind303 universalmind303 marked this pull request as ready for review January 16, 2024 19:52
@universalmind303 universalmind303 changed the title WIP: actually drop delta tables fix: actually drop delta tables Jan 16, 2024
Comment on lines +255 to +256
#[prost(message, repeated, tag = "4")]
pub tbl_entries: Vec<TableEntry>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have table entries, do we need table references? Can we just use the info provided in entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted to go that route, but we don't have access to an entry resolver inside the execution context, only during planning.

@universalmind303 universalmind303 changed the title fix: actually drop delta tables WIP: actually drop delta tables Jan 16, 2024
@universalmind303 universalmind303 force-pushed the universalmind303/native-table-drop branch from dd482ed to 75ac923 Compare January 19, 2024 17:15
@universalmind303 universalmind303 changed the title WIP: actually drop delta tables fix: actually drop delta tables Jan 19, 2024
Copy link
Collaborator

@tychoish tychoish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable, one question.

@universalmind303 universalmind303 merged commit efb8cbd into main Jan 23, 2024
20 checks passed
@universalmind303 universalmind303 deleted the universalmind303/native-table-drop branch January 23, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop tables doesn't delete native table files.
4 participants