Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Implement Ident as vec #2305

Closed
wants to merge 15 commits into from
Closed

Conversation

max-sixty
Copy link
Member

Part of #1535, this changes our Idents in PL from $namespace.$name to an arbitrary hierachy.

It's full of .clone() & .into() -- I've been doing this by replacing one definition and then adding lots of .into() untils it passes (not the most conceptual work!). Doing it incrementally at least means I can't end up in a quagmire of not knowing why the new version doesn't work; it's easy to revert to the last known good state. We can do another pass to improve the rust / reduce allocations.

The plan here would be to:

  • replace any other usages in the compilation prior to the ident being resolved. Once it's resolved, it doesn't necessarily need a full hierarchy. Possibly we can remove the old Ident / rename the new one to Ident.
  • adjust how the resolution works so we can have arbitrary hierarchies of schema (a.b.c). Still some work to think about how this should work (some initial comments in Resolve opaqueness of idents #1535)
  • make backticks fully opaque, so Resolve opaqueness of idents #1535 works -- then using parquet files will be easy, we won't need to quote schemas, the semantics will be simple & consistent

Part of PRQL#1535, this changes our Idents in PL from `$namespace.$name` to an arbitrary hierachy.

It's full of `.clone()` & `.into()` -- I've been doing this by replacing one definition and then adding lots of `.into()` untils it passes (not the most conceptual work!). Doing it incrementally at least means I can't end up in a quagmire of not knowing why the new version doesn't work; it's easy to revert to the last known good state. We can do another pass to improve the rust / reduce allocations.

The plan here would be to:
- replace any other usages in the compilation prior to the ident being resolved. Once it's resolved, it doesn't necessarily need a full hierarchy. Possibly we can remove the old `Ident` / rename the new one to `Ident`.
- adjust how the resolution works so we can have arbitrary hierarchies of schema (`a.b.c`). Still some work to think about how this should work (some initial comments in PRQL#1535)
- make backticks fully opaque, so PRQL#1535 works -- then using parquet files will be easy, we won't need to quote schemas, the semantics will be simple & consistent
@max-sixty max-sixty changed the title refactor: Implement Ident as arbitrary list refactor: Implement Ident as vec Mar 24, 2023
@max-sixty
Copy link
Member Author

This is now ready for review.

The final PR ends up being very small. I feel like it was a lot of work! — because each change undid some of the previous change as I moved through the pipeline swapping out Ident for IdentParts. But I think was necessary to avoid having no idea where the cause of a problem was — and it caught #2303, which I think would have been difficult to find for me if I'd been looking all over the repo.

If others agree the path above is good, we can merge this, and then we can work on the name resolution, as per bullet (2) above

Copy link
Member

@aljazerzen aljazerzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convince me that this is an improvement?

I see that:

  • Ident was capable of representing arbitrary deeply nested names before this PR,
  • Ident can now also represent names with zero path chunks, which is not desired. The methods guard against that, but one could still just construct the struct directly from an empty vec.
  • it does more cloning.

I may just not understand your plan entirely. We already have nested modules (with no syntax to express that) and lookups arbitrary deep into them. And the linked issue with table names lies mostly in how table names are translated in the SQL backend.

Comment on lines +57 to +58
// Q: @aljazerzen do we need these now? I couldn't immediately see what they
// do (but can look more if needed).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't. This makes it so the Ident was serialized into an array, even though it was a struct. But now this should have no effect, because it's gets serialized into array anyway.

@max-sixty
Copy link
Member Author

Ok, reflecting more, I think I might have made a mistake here. That's annoying — I should have raised the suggestion earlier. Thanks a lot for pushing back.

Do you remember where in the library currently stops us from having from x.y.z | select z? It can't find the item but I'm not sure why it's not placed.

Probably we should start there and then work out what's required, rather than my previous approach.

@aljazerzen
Copy link
Member

Ah, it happens :)

It's a long explanation of how table inference is working, so I'll just point you toward Module::wildcard and Module::default

@aljazerzen
Copy link
Member

@max-sixty max-sixty closed this Mar 27, 2023
@max-sixty max-sixty deleted the identparts branch March 27, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants