The class should be useable by SQLAlchemy applications in two ways: as a statement generator or as a statement consumer. Python applications should be able to provide statements into the query generator as ‘tables’ or should be able to get the generated statement (also influence statement generation) and use the generated star/snowflake for whatever they want. For example for materialization, use in other advanced, more complex customized queries, schema auditing or validation, value auditing, searching, indexing... In other words, output from the query generator should be useable denormalized star/snowflake with well defined interface.
Proposal:
class StarSchema(object):
Represents a star/snowflake table schema. Attributes:
name – user specific name of the star schema, used for the schemaidentification, debug purposes and logging. Has no effect on the execution or statement composition.
metadata is a SQLAlchemy metadata object where the snowflake tables are described.
attributes is a dictionary of snowflake attributes. The keys are attribute names, the values can be strings, dictionaries or objects with specific attributes (see below)
joins is a list of join specification (see below)
tables are SQL Alchemy selectables (tables or statements) that are referenced in the attributes. This dictionary is looked-up first before the actual metadata. Only table name has to be specified and database schema should not be used in this case.
schema – default database schema containing tables
The columns can be specified as:
- a string with format:
column, table.column or schema.table.column. When no table is specified, then the fact table is considered.
- as a list of arguments
[[schema,] table,] column
StarColumn or any object with attributes schema, table, column, extract, unary can be used.
- a dictionary with keys same as the attributes of
StarColumn object
Non-object arguments will be stored as a StarColumn objects internally.
The joins can be specified as a list of:
- tuples of column specification in form of (
master, detail)
- a dictionary with keys or object with attributes:
master, detail,
alias and method.
master is a specification of a column in the master table (fact) and detail is a specification of a column in the detail table (usually a dimension). alias is an alternative name for the detail table to be
joined.
The method can be: match – LEFT INNER JOIN, master – LEFT OUTER JOIN or detail – RIGHT OUTER JOIN.
Note: The class should be cubes-independent, despite residing in the cubes repository. It should not use Cubes model objects. Reason is ability to separate it in the future when complexity of the SQL backend grows into extend where it would not be maintainable within the Cubes library.
The class should be useable by SQLAlchemy applications in two ways: as a statement generator or as a statement consumer. Python applications should be able to provide statements into the query generator as ‘tables’ or should be able to get the generated statement (also influence statement generation) and use the generated star/snowflake for whatever they want. For example for materialization, use in other advanced, more complex customized queries, schema auditing or validation, value auditing, searching, indexing... In other words, output from the query generator should be useable denormalized star/snowflake with well defined interface.
Proposal:
Represents a star/snowflake table schema. Attributes:
name– user specific name of the star schema, used for the schemaidentification, debug purposes and logging. Has no effect on the execution or statement composition.metadatais a SQLAlchemy metadata object where the snowflake tables are described.attributesis a dictionary of snowflake attributes. The keys are attribute names, the values can be strings, dictionaries or objects with specific attributes (see below)joinsis a list of join specification (see below)tablesare SQL Alchemy selectables (tables or statements) that are referenced in the attributes. This dictionary is looked-up first before the actual metadata. Only table name has to be specified and database schema should not be used in this case.schema– default database schema containing tablesThe columns can be specified as:
column,table.columnorschema.table.column. When no table is specified, then the fact table is considered.[[schema,] table,] columnStarColumnor any object with attributesschema,table,column,extract,unarycan be used.StarColumnobjectNon-object arguments will be stored as a
StarColumnobjects internally.The joins can be specified as a list of:
master,detail)master,detail,aliasandmethod.masteris a specification of a column in the master table (fact) anddetailis a specification of a column in the detail table (usually a dimension).aliasis an alternative name for thedetailtable to bejoined.
The
methodcan be:match–LEFT INNER JOIN,master–LEFT OUTER JOINordetail–RIGHT OUTER JOIN.Note: The class should be cubes-independent, despite residing in the cubes repository. It should not use Cubes model objects. Reason is ability to separate it in the future when complexity of the SQL backend grows into extend where it would not be maintainable within the Cubes library.