Skip to content
This repository has been archived by the owner on May 12, 2024. It is now read-only.

wip: data structures #8

Merged
merged 18 commits into from
Jun 21, 2022
Merged

wip: data structures #8

merged 18 commits into from
Jun 21, 2022

Conversation

Samyak2
Copy link
Collaborator

@Samyak2 Samyak2 commented Jun 14, 2022

  • feat: setup Rust project and directory structure
  • fix: forgot visibility specifier
  • wip: data structures

PR Info

  • Dependents:

Adds

Fixes

Breaking Changes

Changes

@Samyak2 Samyak2 added the wip DO NOT MERGE label Jun 14, 2022
src/database.rs Outdated Show resolved Hide resolved
src/schema.rs Outdated Show resolved Hide resolved
src/table.rs Outdated Show resolved Hide resolved
This was referenced Jun 19, 2022
src/column.rs Outdated Show resolved Hide resolved
src/schema.rs Outdated
Comment on lines 5 to 8
pub struct Schema {
pub name: String,
pub tables: Vec<Table>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should pub the attributes.
We can only pub(crate) at best, and provide getter methods as needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right approach for this - e8415da?

I have used suffixed underscore on the field that is meant to be private and provided an immutable getter for it with the actual name. I did this because I did not like the idea of having get_xxx() methods. Just xxx() seems more rust-y.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the suffix _ is not as necessary in Rust as in other languages because Rust already requires you to use self.xxx explicitly when using fields (but in, say, C++, you can simply type xxx to use a data member so it may be confused with a local variable). Again, I really appreciate this sense of clarity provided by Rust :).

Regarding the getters, I like the use of xxx() here instead of get_xxx(), because xxx() gives the sense that this method is simply a minimalistic getter without any noticeable work needed to be done. Usually, I prefer using get_xxx() only when xxx is a derived state that requires a significant amount of work to get.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shpun817 I assumed it would be a problem to have a field and a function with the same name, but looks like Rust handles that. Pretty cool.

Regarding the getters, I like the use of xxx() here instead of get_xxx(), because xxx() gives the sense that this method is simply a minimalistic getter without any noticeable work needed to be done. Usually, I prefer using get_xxx() only when xxx is a derived state that requires a significant amount of work to get.

I agree with this, makes a lot of sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the visibility for other structs too.

src/schema.rs Outdated Show resolved Hide resolved
src/table.rs Outdated Show resolved Hide resolved
src/vm.rs Outdated
}

pub enum Register {
Table(Rc<Table>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not wander into multi-thread context soon, but perhaps Arc is better (or perhaps like sea-query, SeaRc)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this - https://github.com/SeaQL/sea-query/blob/34532bc8d2a0511c8c617bebf59abd8e48d6d96d/src/types.rs#L6-L9?

That's an interesting way to go about this. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in 6882d4f. What do you think of the name Mrc?

src/vm.rs Outdated Show resolved Hide resolved
src/vm.rs Outdated
use crate::{table::Table, ic::IntermediateCode};

pub struct VirtualMachine {
pub registers: HashMap<usize, Register>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would like the hash key to be a new type instead of usize, may be something like pub struct RegisterIndex(usize)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Wouldn't this mean we would have to derive or implement Hash, Eq, Clone again for this new type? I meant for this to be used like a sparse array/vec. I don't see how a new type adds value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I second Chris in this.

Using a new type provides better clarity in the value when it's used (imagine creating a variable to hold the index and seeing its type is usize vs RegisterIndex).

It can also achieve encapsulation. If you design a nice API for it, the user does not need to know about what RegisterIndex is made up of. That way you can also better control, say, how a new index is generated and so on.

Wouldn't this mean we would have to derive or implement Hash, Eq, Clone again for this new type

That's true, but I would also argue that, even though it may incur additional costs of deriving/implementing those traits, it, again, provides clarity of the set of properties that you need for this index type. You see, usize is a built-in type with a lot of traits ready-implemented, but surely not all the traits are required for it to serve as your index type. If you create a new type and derive/implement only the necessary traits, you and anyone who takes just a glance of your code can be well-informed of exactly what functionalities are needed of this type in its place.

I believe this kind of stricter requirements in the typing system is one of the best of what Rust has to offer, since it ensures clarity and correctness in the long run.

Of course, those are general comments about abstracting types when built-in types "suffice". They may not always be or seem that relevant especially before things get too complicated in your systems, but I'd say investing that little extra work eventually pays off more often than not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense, thank you for explaining it so well @shpun817! I have implemented this in 9d91ece.

It also allowed me to provide a custom display for it too.

Copy link
Collaborator Author

@Samyak2 Samyak2 Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it did not make sense to have an auto-incrementing register index in the VM. The intermediate code will be generated before execution, so it will have concrete indexes anyway. Fixed this in b46d3e8. Though, this makes the insert and get methods simply a wrapper of HashMap's methods.

Copy link
Member

@tyt2y3 tyt2y3 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Sanford for the great points!
Actually why I want to avoid usize is to avoid the accidental casting from other integer types, particularly isize.
And that when a function accepts two usizes say foo(idx: usize, len: usize) this is error prone.
I have seen nasty bugs this way multiple times.

Comment on lines +12 to +13
UInt8Null(Option<u8>),
UInt8(u8),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have included a nullable variant of every type. This encodes the nullable property of a column in the type system itself. Although, I'm not sure if this will add value. Or if I should generate the nullable variants using a declarative macro.

@tyt2y3 @shpun817 thoughts?

Copy link
Member

@tyt2y3 tyt2y3 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks clumsy. I would prefer a Nullable<T> may be. Havn't got a clear idea of the use yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable<T>

Do you mean an Option<T>?

Copy link
Member

@tyt2y3 tyt2y3 Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I mean you remove all XXNull variants, and create a Nullable to wrap it.
It's kind of like Option, but with our own definition.
The trick is, we only impl Nullable<Value> so it is actually a sealed type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That seems like a much better approach. Alternatively, there could be a Null variant of Value.

src/vm.rs Outdated Show resolved Hide resolved
@tyt2y3 tyt2y3 merged commit dfc6800 into main Jun 21, 2022
@tyt2y3
Copy link
Member

tyt2y3 commented Jun 21, 2022

Would merge this now. Nice work so far!
Please rebase your other branches and make new work in a new PR

@Samyak2
Copy link
Collaborator Author

Samyak2 commented Jun 21, 2022

Actually, the commit messages in this branch weren't good. I wanted to squash them before merging. I'll do that on main instead.

Samyak2 added a commit that referenced this pull request Jun 23, 2022
@Samyak2 Samyak2 deleted the data-structures branch July 10, 2022 06:49
@Samyak2 Samyak2 restored the data-structures branch July 10, 2022 06:49
@Samyak2 Samyak2 deleted the data-structures branch July 10, 2022 06:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
wip DO NOT MERGE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup Rust project
3 participants