# OLAP Schemas

In [None]:
1. Analytics databases

A. Optimize for queries, not transactions
* Having tables with more columns are ok, because it minimizes joins, and makes the queries faster
* We don't care as much about the cost of updating the tables because 

B. The stakeholders using the analytics database, may be non-technical
* 


In [None]:
* OLTP 

> <img src="./pagila_schema.png" width="90%">

### Problems with OLTP

With an analytical context context.

* Don't  have to worry about a large number of inserts or updates to the database, as users will not directly interact.

* Optimize for *queries* needed to explore and answer questions about our data.  
    * And joins are time intensive

* Also the users of an analytics database may be less technical

So in an analytical context, we no longer need to optimize for inserts and updates, where smaller tables with few columns perform well.  Instead, our use case is in *querying* our data, where we prefer to have larger tables to avoid costly joins.  In addition, a simpler table structure (with a fewer number of tables) would make for a more understandable schema, which is good for business users performing queries on the database.

### The Star Schema - What is it?

In [None]:
mart tables (datamart)

> <img src="./star_schemad_movies.png" width="60%">

In [None]:
The fact table -> the event that is occurring.

Here we place sales (previously rentals) directly in the center of the schema.  The surrounding tables answer questions of who, what, where, and when.  We can think of the star schema as consisting of two kinds of tables - a fact table at the center, and dimension table branching out of the fact table.

* The fact table 

Generally **event** that we are trying to measure and often optimize.  
* And `price` provides that information.  

And the context who, what, where, and when this sale was involved.  

* The dimension tables 

More more descriptive attributes, often of type text.  

* Star Schema: But Why?

With the star schema, any information we want is just one table away.  This makes our schema both more understandable, and our queries less costly to perform.  

### Comparing the OLTP and OLAP schemas

Let's take another look at the star schema, considering some of the rules we needed to relax to get to this structure.

> <img src="./star-schema-pagila-updated.png" width="60%">

So the structure above is obviously simpler -- but with the star schema, we now have a structure that will repeat information.  

* The entirety of the `dates` table, to begin with, is not normalized.  Every column is a derivative of the dates column and violates *single source of truth*
* We collapsed the film-store relationship so that we no longer track inventory.  This is because *sales* is the center of our queries.  We care about the movies involved in the sale, not queries about inventory.  
* We moved the manager information about a store to the same table as the store itself.
* And we no longer give city, state, and zip their own table.  This will condense the number of tables, with a tradeoff of duplicating values across our tables.

So in summary, we will not have a *single source of truth* with our OLAP schema.  But by relaxing this restriction, we have fewer tables, and each table is just one step removed from the fact table of sales.  This should make our queries both simpler to write, and less expensive.

### Summary

In this lesson, we saw the differences we get when moving from an OLTP to an OLAP structure.  With the OLAP structure, we use the star schema.  Here we have a fact table at the center, usually describing an event.  Every column of the fact table is either a numeric value or a foreign key to a dimension table.  Our dimension tables provide context to the event, describing information about who, what, where, and when the event occurred.  
We can see that the star schema makes our information simpler and easier to query.  This is preferable when we are no longer making many updates to our data, so do not need to optimize for the speed of transactions or maintaining a single source of truth.  Instead, we prefer to make our queries easier to perform, and faster to perform.  And because the star schema has fewer tables that we need to join together, we generally achieve this with this structure.

### Resources

* [Snowflake Schema](https://en.wikipedia.org/wiki/Snowflake_schema)
* [Star Schema](https://en.wikipedia.org/wiki/Star_schema)