# Basics 8: Create a new Power BI push dataset

Unfortunately Power BI Service does not support PostgreSQL database as an online data source. PostgreSQL connection requires a separate Power BI Gateway installed on a Windows server. However, we can go around this limitation by creating a Power BI push dataset and pushing data from our data pipeline directly to Power BI Service.

## Step 1: Create the Power BI push dataset

Execute the following code:

In [None]:
import json
import pandas as pd
import requests

# Load generic helper functions
%run ../../common/jupyter.ipynb
import src_common_database as db
import src_common_powerbi as bi

# Define database connection
database = db.create_engine()

# Define database tables that we need to replicate in Power BI
tables = ['dim_dates', 'dim_products', 'dim_stores', 'fact_sales']

# Read Power BI group id (workspace id) from an environment variable
group_id = os.environ['POWERBI_GROUP_ID']

# Create Power BI dataset schema based on our existing database tables
dataset_schema = {
    "name": "Sales",
    "defaultMode": "Push",
    "tables": list(map(lambda table : bi.as_powerbi_table_schema(table, database), tables)),
    # Foreign key references of the fact_sales table
    "relationships": [
        {
            "name": "Sale date",
            "fromTable": "public fact_sales",
            "fromColumn": "date_key",
            "toTable": "public dim_dates",
            "toColumn": "key"
        },
        {
            "name": "Sale store",
            "fromTable": "public fact_sales",
            "fromColumn": "store_key",
            "toTable": "public dim_stores",
            "toColumn": "key"
        },

        {
            "name": "Sale product",
            "fromTable": "public fact_sales",
            "fromColumn": "product_key",
            "toTable": "public dim_products",
            "toColumn": "key"
        }
    ]
}

# OPTIONAL: Add additional definitions like dataCategory and summarizeBy for columns, etc. See:
# https://docs.microsoft.com/en-us/power-bi/developer/automation/api-dataset-properties

# DEBUG: Print the schema
print(json.dumps(dataset_schema, indent=2))

In [None]:
# Get access token for accessing Power BI API
app = bi.get_app()
api_headers = bi.get_api_headers(app)

# Create a new Power BI Push dataset based on the schema
dataset_id = bi.create_dataset(api_headers, group_id, dataset_schema)

# Print dataset id so that we can use it later
print("dataset_id: " + dataset_id)

In [None]:
# Copy all data from our database to the Power BI dataset
dataset = bi.PowerBIDataset(api_headers, group_id, dataset_id, table_name_prefix="public ")
for table in tables:
    dataset.copy_table_data(table_name=table, order_by='key', database=database)

> WARNING: If your dataset contains over 1 million rows, you cannot push all of them at once within one hour.
> See [Power BI REST API limitations](https://docs.microsoft.com/en-us/power-bi/developer/automation/api-rest-api-limitations).

## Step 2: Configure dataset id as an environment variable

We configure the dataset id as an environment variable so that our data pipeline can push data to the dataset:

1. Configure **dataset id** in **docker-compose.yaml** for local development:

```yaml
  data-pipeline-template-worker:
    environment:
      POWERBI_DATASET_ID: 2a240645-4c88-454d-a54c-3f1c91f4a25f
      
  data-pipeline-template-lab:
    environment:
      POWERBI_DATASET_ID: 2a240645-4c88-454d-a54c-3f1c91f4a25f
```

2. Optional: Configure **dataset id** in `scripts/helm.yaml` for Kubernetes:

```yaml
    worker:
      env:
        POWERBI_DATASET_ID: 2a240645-4c88-454d-a54c-3f1c91f4a25f
```

3. Stop containers with **ctrl-c** and then start them again with `taito start`. This is required for configuration changes to take effect.

TIP: You can find more instructions on environment variables in [Taito CLI documentation](https://taitounited.github.io/taito-cli/tutorial/06-env-variables-and-secrets/).

## Step 3: Republish the Power BI report with the new Power BI dataset

1. Open your report in **Power BI Desktop**.
2. Save the report with a new name by selecting **File -> Save As**.
3. Select **Transform data -> Transform data**.
4. Select all queries from the left pane (keep shift key down and click each of them).
5. Delete selected queries by clicking right mouse button and selecting **Delete**.
6. Close the view with by selecting **Close & apply**.
7. Select **Power BI datasets**.
8. Select your Power BI push dataset from the list of datasets.
9. Save your report with **File -> Save**.
10. Publish your report by selecting **File -> Publish**.

## Next lesson: [Basics 9: Keep Power BI dataset up-to-date](09.ipynb)