# Basics 3: Add a new dimension table to database and create data pipeline for it

In this lesson we add a new dimension table to our data model. The new **dim_stores** dimension describes the store where the sale was made. It also contains location information like postal code, region, city, and country. We could model location as a separate dimension, as it would be more reusable that way, but this time we choose to include location attributes directly in the **dim_stores** dimension.

## Step 1: Add a new database migration

1. Execute `taito db add dim_stores`.
2. Add the following content to the newly created files (**database/deploy/dim_stores.sql**, **database/revert/dim_stores.sql**, and **database/verify/dim_stores.sql**).

```sql
-- Deploy dim_stores to pg

BEGIN;

CREATE TABLE dim_stores (
  key text PRIMARY KEY,
  name text NOT NULL,
  postal_code text NOT NULL,
  city text NOT NULL,
  country text NOT NULL
);

CREATE VIEW load_stores AS SELECT * FROM dim_stores;

CREATE OR REPLACE FUNCTION load_stores() RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO dim_stores VALUES (NEW.*)
  ON CONFLICT (key) DO
    UPDATE SET
      name = EXCLUDED.name,
      postal_code = EXCLUDED.postal_code,
      city = EXCLUDED.city,
      country = EXCLUDED.country;
  RETURN new;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER load_stores
INSTEAD OF INSERT ON load_stores
FOR EACH ROW EXECUTE PROCEDURE load_stores();

COMMIT;
```

```sql
-- Revert dim_stores from pg

BEGIN;

DROP TRIGGER load_stores ON load_stores;
DROP FUNCTION load_stores;
DROP VIEW load_stores;
DROP TABLE dim_stores;

COMMIT;
```

```sql
-- Verify dim_stores on pg

BEGIN;

SELECT key FROM load_stores LIMIT 1;
SELECT key FROM dim_stores LIMIT 1;

ROLLBACK;
```

3. Deploy the new database migration to local database with `taito db deploy`.

## Step 2: Create a CSV files for districts and stores, and upload them to bucket

1. Create district.csv file with the following content:

```excel
Postal Code,City,Country
Unknown,Unknown,Unknown
00100,Helsinki,Finland
11122,Stockholm,Sweden
```

2. Create stores.csv file with the following content:

```excel
Name,Postal Code
Unknown,Unknown
Super Shop,00100
Super Shop,11122
```

2. Upload both files to the root folder of the bucket

## Step 3: Load the CSV file to database

Execute the following code:

In [1]:
# Imports
import pandas as pd

# Load generic helper functions
%run ../../common/jupyter.ipynb
import src_common_database as db
import src_common_storage as st
import src_common_util as util

# Read CSV files from the storage bucket
bucket = st.create_storage_bucket_client(os.environ['STORAGE_BUCKET'])
districts_csv = bucket.get_object_contents("/districts.csv")
stores_csv = bucket.get_object_contents("/stores.csv")

# Read CSV data into a Pandas dataframe
districts_df = pd.read_csv(districts_csv)
stores_df = pd.read_csv(stores_csv)

# Merge
df = pd.merge(stores_df, districts_df, on=['Postal Code','Postal Code'])

# Change dataframe schema to match the database table
db_df = df.rename(
    columns = {
        'Name': 'name',
        'Postal Code': 'postal_code',
        'City': 'city',
        'Country': 'country',
    },
    inplace = False
)

# Generate unique key by concatenating concatenating name and country
db_df["key"] = db_df["country"] + " - " + db_df["name"]

# Write the data to the "load_stores" view
database = db.create_engine()
db_df.to_sql('load_stores', con=database, if_exists='append', index=False)

# DEBUG: Show the data stored in fact_sales. You manual data changes should have been overwritten.
pd.read_sql('select * from dim_stores', con=database).style

ResourceNotFoundError: The specified blob does not exist.
RequestId:17a0323d-601e-0097-07ca-508124000000
Time:2021-05-24T18:28:37.9969709Z
ErrorCode:BlobNotFound
Error:None

## Step 4: Add dim_store reference to the fact_sales table

This time you cannot just add the new columns to the existing fact_sales migration files, because fact_sales migration was created before the dim_sales migration. However, if you want to avoid creating a new migration just for one new column, you can do the following:

1. Move the dim_stores migration one step up in **database/sqitch.plan** so that it will be executed before fact_sales.

2. Add the new store_key column to the **database/deploy/fact_sales.sql** file:

```sql
CREATE TABLE fact_sales (
  ...
  store_key text NOT NULL REFERENCES dim_stores (key),
  ...
);
```

3. Add at least one example store to the **database/data/dev.sql** file. Add the stores before the fact_sales.

4. Add a store_key value for each example sale defined in **database/data/dev.sql**.

5. Redeploy all database migrations and example data with `taito init --clean`. Redeploy is required because you altered the sqitch.plan order instead of creating a new ALTER TABLE database migration.

## Step 5 (optional): Generate database documentation

1. Generate database documentation with `taito db generate`.

2. Open the `docs/database/index.html` file with your web browser. Note that your code editor may not display these files as they have been placed in .gitignore.

3. Browse to **Relationships**.

As you can see, our database model is based on [star schema](https://en.wikipedia.org/wiki/Star_schema).


## Next lesson: [Basics 4 - Create a dataset view on top of star schema](04.ipynb)