Research persisting to DB with Pandas/Dask/Prefect #1399

zaneselvans · 2022-01-13T15:15:37Z

Persisting dfs / Prefect Results to DB:

For writing to SQL we are already explicitly specifying the column types for the database, using dtypes from the SQLAlchemy metadata object that’s generated by Package.to_sql():

# Load any tables that exist in our dictionary of dataframes into the
# corresponding table in the newly create database:
for table in md.sorted_tables:
    dfs[table.name].to_sql(
        table.name,
        engine,
        if_exists="append",
        index=False,
        dtype={c.name: c.type for c in table.columns},
    )

Pandas added a dtype argument to read_sql_query in v1.3.0, which does not exist in the read_sql() wrapper or read_sql_table() function. So if we are using an SQL query rather than trying to read the whole table, we can specify what data types we get in the resulting dataframe using the same metadata structures we’ve already defined.
Though this would mean that we need to define metadata for the intermediary tables and columns as well (anything that’s going to get persisted and read back out). But if the intent is for these tables to stick around for reference and re-use, then that’s something we’d already be doing.
For many of these additional tables we would not need to have descriptions, foreign key relationships, primary keys etc. – those would only make sense in the context of the “real” normalized database tables.
Question: is it just me or does it seem like well structured database tables with clear primary keys, constraints, foreign key relations, good normalization etc. are getting kinda kicked to the curb in the "Modern Data Stack" universe?

Compare with Prefect+dbt+SQL architecture

The text was updated successfully, but these errors were encountered:

zaneselvans added the prefect label Jan 13, 2022

zaneselvans self-assigned this Jan 13, 2022

bendnorman self-assigned this Jan 16, 2022

zaneselvans closed this as completed Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research persisting to DB with Pandas/Dask/Prefect #1399

Research persisting to DB with Pandas/Dask/Prefect #1399

zaneselvans commented Jan 13, 2022 •

edited

Loading

Research persisting to DB with Pandas/Dask/Prefect #1399

Research persisting to DB with Pandas/Dask/Prefect #1399

Comments

zaneselvans commented Jan 13, 2022 • edited Loading

Persisting dfs / Prefect Results to DB:

Compare with Prefect+dbt+SQL architecture

zaneselvans commented Jan 13, 2022 •

edited

Loading