-
Notifications
You must be signed in to change notification settings - Fork 13
docs: Document SQL generation #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,4 +4,5 @@ | |
| column-metadata | ||
| primary-keys | ||
| serialization | ||
| sql-generation | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # Generating SQL schema definitions | ||
|
|
||
| It is often useful to store data in a SQL-based database server. `dataframely` aims to make this easy by | ||
| providing a simple mechanism for translating your `dataframely` schemas to SQL table definitions. | ||
|
|
||
| There are many different flavors of SQL syntax. To avoid reinventing the wheel, we use [ | ||
| `sqlalchemy`](https://www.sqlalchemy.org/) as an abstraction | ||
| layer between python and SQL. | ||
|
|
||
| ## Individual tables | ||
|
|
||
| The main functionality `dataframely` offers is that it converts your `dy.Schema` to a collection of `sqlalchemy.Column`: | ||
|
|
||
| ```python | ||
| import dataframely as dy | ||
| import sqlalchemy as sa | ||
|
|
||
|
|
||
| class MySchema(dy.Schema): | ||
| x = dy.Int64(primary_key=True) | ||
| y = dy.String(nullable=False) | ||
|
|
||
|
|
||
| engine = sa.create_engine(...) | ||
| columns: list[sa.Column] = MySchema.sql_schema(engine.dialect) | ||
| ``` | ||
|
|
||
| You can then do with the columns what you please. Most likely, you want to create a table with them: | ||
|
|
||
| ```python | ||
| my_table = sa.Table("myTable", sa.MetaData(), *columns) | ||
| my_table.create(engine) | ||
| ``` | ||
|
|
||
| You can also inspect the SQL code that `sqlalchemy` would execute: | ||
|
|
||
| ```python | ||
| from sqlalchemy.schema import CreateTable | ||
|
|
||
| print(CreateTable(my_table).compile()) | ||
| ``` | ||
|
|
||
| In the example case, this renders to: | ||
|
|
||
| ```SQL | ||
| CREATE TABLE "myTable" | ||
| ( | ||
| x BIGINT NOT NULL, | ||
| y VARCHAR NOT NULL, | ||
| PRIMARY KEY (x) | ||
| ) | ||
| ``` | ||
|
|
||
| Uploading data can then be handled by {meth}`polars.DataFrame.write_database`: | ||
|
|
||
| ```python | ||
| df: dy.DataFrame[MySchema] | ||
|
|
||
| df.write_database( | ||
| connection=engine, | ||
| table_name=my_table.name, | ||
| if_table_exists="append" | ||
| ) | ||
| ``` | ||
|
|
||
| ```{note} | ||
| **Why do you need to pass in the SQL dialect?** Even though `sqlalchemy` handles most dialect dependencies, we sometimes still need to intervene. For example, when using Microsoft SQL Server, `sqlalchemy` will render the `sqlalchemy.Date` type into a raw SQL `DATETIME`, while we think that `DATE` would be more appropriate. | ||
| ``` | ||
|
|
||
| ```{note} | ||
| **Implementation:** The choice of `sqlalchemy` type is implemented in {meth}`~dataframely.Column.sqlalchemy_dtype`, which is overwritten by each of the subtypes of {class}`~dataframely.Column`. For example, the implementation for {class}`~dataframely.Date` is {meth}`~dataframely.Date.sqlalchemy_dtype`. | ||
| ``` | ||
|
|
||
| ```{note} | ||
| **Constraints:** The nullability and primary key constraints you define in `dataframely` are translated to SQL. Custom filters and rules are not. | ||
AndreasAlbertQC marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ```{note} | ||
| **Length of string columns:** For string columns, `dataframely` will attempt to pass information about the maximal length into the SQL definition. This is trivial if `max_length` is set. Otherwise, if a `regex` is provided, | ||
| the maximal length of the string is inferred from the regular expression if possible. Note that having inferable | ||
| maximal lengths can be particularly important for primary key columns. Some database systems, such as Microsoft SQL Server, do not allow `VARCHAR(max)` columns (unbounded strings) to be used as primary keys. | ||
| ``` | ||
|
|
||
| ## Collections of multiple tables | ||
|
|
||
| If you have an entire `dy.Collection`, it's also easy to generate one table for each member table of the collection. | ||
| `sqlalchemy.MetaData` is a commonly used container in such scenarios: | ||
|
|
||
| ```python | ||
| MyCollection: dy.Collection | ||
| meta = sa.MetaData() | ||
| for name, dy_schema in MyCollection.member_schemas().items(): | ||
| sa.Table( | ||
| name, | ||
| meta, | ||
| *dy_schema.sql_schema(dialect=engine.dialect), | ||
| ) | ||
| meta.create_all() | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.