Design the instruction set #9

Samyak2 · 2022-06-19T14:04:22Z

#8 added an empty Instruction enum. This needs to be populated with the actual instruction set. A basic instruction set is needed at this stage to go ahead with the implementation. Initial ideas for the instructions can be found in this discussion: SeaQL/summer-of-code#11 (comment)

The text was updated successfully, but these errors were encountered:

Samyak2 · 2022-06-21T19:18:21Z

Here's a design I thought of:

Notations

%N -> Nth register
'value' -> A string
<Value> -> A crate::column::Value
[Type] -> A type

General instructions

View %N 'name'
- A view into a existing table with the name name.
- Has well defined columns and some data.
- The data can be mutated too.
- I have used View instead of the original Table we decided in the discussion because:
  - It represents a (mutable) view into the table, not the table itself
  - It can be used to represent temporary tables such as those created in CTEs
Filter %N 'col name' OPERATOR <Value>
- Filter into the view at %N
- It is not applied at this stage, only stored
Project %N 'col name'
- Adds col name to the column selection. Not having any means *.
- It is not applied at this stage, only stored
ProjectAggregate %N AGGR 'col name'
- Adds col name with given aggregation to the column selection.
- This column should not appear in the group by.
- It is not applied at this stage, only stored
GroupBy %N 'col name'
- Adds col name to the grouped columns for the view
- Must be added before any projections so as to catch errors in column selections earlier
- It is not applied at this stage, only stored
Having %N AGGR 'col name' OPERATOR <Value>
- Post group-by + aggregation filter.
- Can filter on both an aggregated column or a non-aggregated one. The aggregation is optional.
- It is not applied at this stage, only stored
Order %N 'col name' true|false
- true -> ascending
- false -> descending
Limit %N num
Return %N
- Return from a register - can be a view, filter, etc.
- This must be the last instruction in the code as it exits the execution.

Creation instructions

NewDatabase 'name' true|false
- true means IF NOT EXISTS
NewSchema 'name' true|false
- true means IF NOT EXISTS

Table creation

TableDef %T 'name'
- Starts creating a new table with name name and stores the temporary metadata in %T
ColumnDef %C 'col name' [Type]
- Starts defining a new column with given name and type
ColumnOption %C OPTION
- Adds an option (primary key, foreign key, etc.) or a constraint (NOT NULL, etc.) to the ColumnDef at %C
AddColumn %T %C
- Adds column to the table metadata
- This works for both a TableDef and a View
  - Adding columns to existing tables can be supported
  - A new entry is added to every row with either NULL or the default value (error if it's not nullable and a default is given).
NewTable %T
- Creates table
RemoveColumn %T 'col name'
- Removes column from a view/filter
- All the rows are modified to remove this column
RenameColumn %T 'old name' 'new name'
Question: this starts to mirror the sql-parser AST. I don't know if it makes sense to break down the AST into these instructions which itself construct a data structure again instruction by instruction. Could be a better idea to just say NewTable %T (sqlparser CreateTable AST).

Insert

InsertDef %V %I
- Starts a new insert statement on view/filter at %V
Column %I 'col name'
- Adds a column to insert
- Since we have the table metadata, we can validate the name
RowDef %I %R
- Starts defining a new row data
AddValue %R <Value>
- Adds a single value to the row
- Since we have the table and column metadata, we can validate the <Value> at this step
Insert %I
- Performs the insertion
Question: same problem as table creation above
Question: I have tried to avoid using variable number of args for any instruction. Some things could be simplified if it's allowed. Thoughts?

Update

Update %N 'col name' <Value>
- %N can have a view or a filter over a view.
- Updates all rows remaining after the filter

Joins

Scope:
- Trying to support almost all types of joins supported by MySQL: https://dev.mysql.com/doc/refman/8.0/en/join.html. These are:
  - Inner join
  - Cross join
  - Straight join
  - Left outer join
  - Right outer join
  - Left join
  - Right join
  - Natural join
  - Natural inner join
  - Natural left outer join
- Straight join does not make sense here and is not supported.
Union %I1 %I2 %O
- A union of all rows in the view/filters of %I1 and %I2
- The output is stored as a View in %O
- The columns in both must match
CrossJoin %I1 %I2 %O
- Does a cross/cartesian join
- Use Filter, RemoveColumn, RenameColumn to get it to the correct shape
- Question: if there are common columns between the two tables, how are they renamed?
- This combination will support inner join, cross join, left outer join, right outer join, left join, right join
NaturalJoin %I1 %I2 %O
- Does a natural join
- This is both a left and right join i.e., there will be NULLs
- Use Filter to remove them if needed
- This combination will support natural join, natural inner join, natural left outer join, natural right outer join

tyt2y3 · 2022-06-23T16:53:09Z

Makes a lot of sense! Please go ahead.
@shpun817 do you have some thoughts?

shpun817 · 2022-06-24T04:00:52Z

Looks good! Using fixed numbers of arguments definitely helps simplify things at operation, so I think it's a good idea.

For the rest of the questions, I'm afraid I can't give concrete answers yet, maybe take note of them first and proceed whereas possible?

Nice work overall!

Samyak2 · 2022-06-24T18:23:53Z

Thank you, I'm working on the implementation starting with the data structures for the register.

Also, I realized now that I missed considering GROUP BY. I will think about how to implement that and update the instruction set if needed.

tyt2y3 · 2022-06-26T06:42:30Z

Actually GROUP BY and other aggregates is yet another stage in the execution pipeline that should be reasonably easy to add later on.
Good if you can think ahead though.

Samyak2 · 2022-06-30T18:55:30Z

I have added a few instructions (ProjectAggregate, GroupBy and Having) for GROUP BY and aggregations in the "General instructions" section above.

Reference: https://www.postgresql.org/docs/current/tutorial-agg.html

Samyak2 · 2022-07-01T17:43:31Z

Found something that cannot be implemented using the current instruction set: https://www.postgresql.org/docs/current/queries-union.html. I'll add Intersect and Difference to support this.

Samyak2 mentioned this issue Jun 19, 2022

Week 2 #10

Closed

3 tasks

tyt2y3 mentioned this issue Jun 19, 2022

Plan #5

Open

4 tasks

Samyak2 mentioned this issue Jun 25, 2022

instruction set #12

Merged

tyt2y3 closed this as completed in #12 Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design the instruction set #9

Design the instruction set #9

Samyak2 commented Jun 19, 2022

Samyak2 commented Jun 21, 2022 •

edited

Loading

tyt2y3 commented Jun 23, 2022

shpun817 commented Jun 24, 2022

Samyak2 commented Jun 24, 2022

tyt2y3 commented Jun 26, 2022 •

edited

Loading

Samyak2 commented Jun 30, 2022 •

edited

Loading

Samyak2 commented Jul 1, 2022

Design the instruction set #9

Design the instruction set #9

Comments

Samyak2 commented Jun 19, 2022

Samyak2 commented Jun 21, 2022 • edited Loading

Notations

General instructions

Creation instructions

Table creation

Insert

Update

Joins

tyt2y3 commented Jun 23, 2022

shpun817 commented Jun 24, 2022

Samyak2 commented Jun 24, 2022

tyt2y3 commented Jun 26, 2022 • edited Loading

Samyak2 commented Jun 30, 2022 • edited Loading

Samyak2 commented Jul 1, 2022

Samyak2 commented Jun 21, 2022 •

edited

Loading

tyt2y3 commented Jun 26, 2022 •

edited

Loading

Samyak2 commented Jun 30, 2022 •

edited

Loading