Skip to content
This repository has been archived by the owner on May 12, 2024. It is now read-only.

Design the instruction set #9

Closed
Tracked by #10
Samyak2 opened this issue Jun 19, 2022 · 7 comments · Fixed by #12
Closed
Tracked by #10

Design the instruction set #9

Samyak2 opened this issue Jun 19, 2022 · 7 comments · Fixed by #12

Comments

@Samyak2
Copy link
Collaborator

Samyak2 commented Jun 19, 2022

#8 added an empty Instruction enum. This needs to be populated with the actual instruction set. A basic instruction set is needed at this stage to go ahead with the implementation. Initial ideas for the instructions can be found in this discussion: SeaQL/summer-of-code#11 (comment)

@Samyak2 Samyak2 mentioned this issue Jun 19, 2022
3 tasks
@tyt2y3 tyt2y3 mentioned this issue Jun 19, 2022
4 tasks
@Samyak2
Copy link
Collaborator Author

Samyak2 commented Jun 21, 2022

Here's a design I thought of:

Notations

  • %N -> Nth register
  • 'value' -> A string
  • <Value> -> A crate::column::Value
  • [Type] -> A type

General instructions

  • View %N 'name'
    • A view into a existing table with the name name.
    • Has well defined columns and some data.
    • The data can be mutated too.
    • I have used View instead of the original Table we decided in the discussion because:
      • It represents a (mutable) view into the table, not the table itself
      • It can be used to represent temporary tables such as those created in CTEs
  • Filter %N 'col name' OPERATOR <Value>
    • Filter into the view at %N
    • It is not applied at this stage, only stored
  • Project %N 'col name'
    • Adds col name to the column selection. Not having any means *.
    • It is not applied at this stage, only stored
  • ProjectAggregate %N AGGR 'col name'
    • Adds col name with given aggregation to the column selection.
    • This column should not appear in the group by.
    • It is not applied at this stage, only stored
  • GroupBy %N 'col name'
    • Adds col name to the grouped columns for the view
    • Must be added before any projections so as to catch errors in column selections earlier
    • It is not applied at this stage, only stored
  • Having %N AGGR 'col name' OPERATOR <Value>
    • Post group-by + aggregation filter.
    • Can filter on both an aggregated column or a non-aggregated one. The aggregation is optional.
    • It is not applied at this stage, only stored
  • Order %N 'col name' true|false
    • true -> ascending
    • false -> descending
  • Limit %N num
  • Return %N
    • Return from a register - can be a view, filter, etc.
    • This must be the last instruction in the code as it exits the execution.

Creation instructions

  • NewDatabase 'name' true|false
    • true means IF NOT EXISTS
  • NewSchema 'name' true|false
    • true means IF NOT EXISTS

Table creation

  • TableDef %T 'name'
    • Starts creating a new table with name name and stores the temporary metadata in %T
  • ColumnDef %C 'col name' [Type]
    • Starts defining a new column with given name and type
  • ColumnOption %C OPTION
    • Adds an option (primary key, foreign key, etc.) or a constraint (NOT NULL, etc.) to the ColumnDef at %C
  • AddColumn %T %C
    • Adds column to the table metadata
    • This works for both a TableDef and a View
      • Adding columns to existing tables can be supported
      • A new entry is added to every row with either NULL or the default value (error if it's not nullable and a default is given).
  • NewTable %T
    • Creates table
  • RemoveColumn %T 'col name'
    • Removes column from a view/filter
    • All the rows are modified to remove this column
  • RenameColumn %T 'old name' 'new name'
  • Question: this starts to mirror the sql-parser AST. I don't know if it makes sense to break down the AST into these instructions which itself construct a data structure again instruction by instruction. Could be a better idea to just say NewTable %T (sqlparser CreateTable AST).

Insert

  • InsertDef %V %I
    • Starts a new insert statement on view/filter at %V
  • Column %I 'col name'
    • Adds a column to insert
    • Since we have the table metadata, we can validate the name
  • RowDef %I %R
    • Starts defining a new row data
  • AddValue %R <Value>
    • Adds a single value to the row
    • Since we have the table and column metadata, we can validate the <Value> at this step
  • Insert %I
    • Performs the insertion
  • Question: same problem as table creation above
  • Question: I have tried to avoid using variable number of args for any instruction. Some things could be simplified if it's allowed. Thoughts?

Update

  • Update %N 'col name' <Value>
    • %N can have a view or a filter over a view.
    • Updates all rows remaining after the filter

Joins

  • Scope:
    • Trying to support almost all types of joins supported by MySQL: https://dev.mysql.com/doc/refman/8.0/en/join.html. These are:
      • Inner join
      • Cross join
      • Straight join
      • Left outer join
      • Right outer join
      • Left join
      • Right join
      • Natural join
      • Natural inner join
      • Natural left outer join
    • Straight join does not make sense here and is not supported.
  • Union %I1 %I2 %O
    • A union of all rows in the view/filters of %I1 and %I2
    • The output is stored as a View in %O
    • The columns in both must match
  • CrossJoin %I1 %I2 %O
    • Does a cross/cartesian join
    • Use Filter, RemoveColumn, RenameColumn to get it to the correct shape
    • Question: if there are common columns between the two tables, how are they renamed?
    • This combination will support inner join, cross join, left outer join, right outer join, left join, right join
  • NaturalJoin %I1 %I2 %O
    • Does a natural join
    • This is both a left and right join i.e., there will be NULLs
    • Use Filter to remove them if needed
    • This combination will support natural join, natural inner join, natural left outer join, natural right outer join

@tyt2y3
Copy link
Member

tyt2y3 commented Jun 23, 2022

Makes a lot of sense! Please go ahead.
@shpun817 do you have some thoughts?

@shpun817
Copy link
Member

Looks good! Using fixed numbers of arguments definitely helps simplify things at operation, so I think it's a good idea.

For the rest of the questions, I'm afraid I can't give concrete answers yet, maybe take note of them first and proceed whereas possible?

Nice work overall!

@Samyak2
Copy link
Collaborator Author

Samyak2 commented Jun 24, 2022

Thank you, I'm working on the implementation starting with the data structures for the register.

Also, I realized now that I missed considering GROUP BY. I will think about how to implement that and update the instruction set if needed.

@Samyak2 Samyak2 mentioned this issue Jun 25, 2022
@tyt2y3
Copy link
Member

tyt2y3 commented Jun 26, 2022

Actually GROUP BY and other aggregates is yet another stage in the execution pipeline that should be reasonably easy to add later on.
Good if you can think ahead though.

@Samyak2
Copy link
Collaborator Author

Samyak2 commented Jun 30, 2022

I have added a few instructions (ProjectAggregate, GroupBy and Having) for GROUP BY and aggregations in the "General instructions" section above.

Reference: https://www.postgresql.org/docs/current/tutorial-agg.html

@Samyak2
Copy link
Collaborator Author

Samyak2 commented Jul 1, 2022

Found something that cannot be implemented using the current instruction set: https://www.postgresql.org/docs/current/queries-union.html. I'll add Intersect and Difference to support this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants