
# Your Lakehouse is the best Warehouse

Traditional Data Warehouses can’t keep up with the variety of data and use cases. Business agility requires reliable, real-time data, with insight from ML models.

Working with the lakehouse unlock traditional BI analysis but also real time applications having a direct connection to your entire data, while remaining fully secured.
<img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/dbsql.png" width="700px" style="float: left" />
<div style="float: left; margin-top: 240px; font-size: 23px">
  Instant, elastic compute<br>
  Lower TCO with Serveless<br>
  Zero management<br>
  Governance layer - row level<br>
  Your data. Your schema (star, data vault…)
</div>

# BI & Datawarehousing with Databricks SQL

<img style="float: right; margin-top: 10px" width="500px" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/retail/lakehouse-churn/lakehouse-retail-c360-churn-3.png" />

Our datasets are now properly ingested, secured, with a high quality and easily discoverable within our organization.

Let's explore how Databricks SQL support your Data Analyst team with interactive BI and start analyzing our customer Churn.

To start with Databricks SQL, open the SQL view on the top left menu.

You'll be able to:

- Create a SQL Warehouse to run your queries
- Use DBSQL to build your own dashboards
- Plug any BI tools (Tableau/PowerBI/..) to run your analysis

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://www.google-analytics.com/collect?v=1&gtm=GTM-NKQ8TT7&tid=UA-163989034-1&cid=555&aip=1&t=event&ec=field_demos&ea=display&dp=%2F42_field_demos%2Fretail%2Flakehouse_churn%2Fbi&dt=LAKEHOUSE_RETAIL_CHURN">

## Databricks SQL Warehouses: best-in-class BI engine

<img style="float: right; margin-left: 10px" width="600px" src="https://www.databricks.com/wp-content/uploads/2022/06/how-does-it-work-image-5.svg" />

Databricks SQL is a warehouse engine packed with thousands of optimizations to provide you with the best performance for all your tools, query types and real-world applications. <a href='https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html'>It holds the Data Warehousing Performance Record.</a>

This includes the next-generation vectorized query engine Photon, which together with SQL warehouses, provides up to 12x better price/performance than other cloud data warehouses.

**Serverless warehouse** provide instant, elastic SQL compute — decoupled from storage — and will automatically scale to provide unlimited concurrency without disruption, for high concurrency use cases.

Make no compromise. Your best Datawarehouse is a Lakehouse.

### Creating a SQL Warehouse

SQL Wharehouse are managed by databricks. [Creating a warehouse](/sql/warehouses) is a 1-click step: 


## Creating your first Query

<img style="float: right; margin-left: 10px" width="600px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-dbsql-query.png" />

Our users can now start running SQL queries using the SQL editor and add new visualizations.

By leveraging auto-completion and the schema browser, we can start running adhoc queries on top of our data.

While this is ideal for Data Analyst to start analysing our customer Churn, other personas can also leverage DBSQL to track our data ingestion pipeline, the data quality, model behavior etc.

Open the [Queries menu](/sql/queries) to start writting your first analysis.

In [0]:
%run ./includes/SetupLab

In [0]:
print("For the following exercise use the following catalog.schema : \n" + labContext.catalogAndSchema() )

For the following exercise use the following catalog.schema : 
cloud_lakehouse_labs.odl_user_1237583_databrickslabs_com_retail


### Lab exercise

Create the following queries and visualisations using the above catalog and schema

**1. Total MRR**
```
SELECT
  sum(amount)/1000 as MRR
FROM churn_orders
WHERE
	month(to_timestamp(transaction_date, 'MM-dd-yyyy HH:mm:ss')) = 
  (
    select max(month(to_timestamp(transaction_date, 'MM-dd-yyyy HH:mm:ss')))
  	from churn_orders
  );
```
Create a *counter* visualisation

**2. MRR at Risk**
```
SELECT
	sum(amount)/1000 as MRR_at_risk
FROM churn_orders
WHERE month(to_timestamp(transaction_date, 'MM-dd-yyyy HH:mm:ss')) = 
	(
		select max(month(to_timestamp(transaction_date, 'MM-dd-yyyy HH:mm:ss')))
		from churn_orders
	)
	and user_id in
	(
		SELECT user_id FROM churn_prediction WHERE churn_prediction=1
	)
```
Create a *counter* visualisation

**3. Customers at risk**
```
SELECT count(*) as Customers, cast(churn_prediction as boolean) as `At Risk`
FROM churn_prediction GROUP BY churn_prediction;
```

### For 4 and 5 switch to the schema in the hive metastore where the DLT tables are created 

**4. Customer Tenure - Historical**
```
SELECT cast(days_since_creation/30 as int) as days_since_creation, churn, count(*) as customers
FROM churn_features
GROUP BY days_since_creation, churn having days_since_creation < 1000
```
Create a *bar* visualisation

**5. Subscriptions by Internet Service - Historical**
```
select platform, churn, count(*) as event_count
from churn_app_events
inner join churn_users using (user_id)
where platform is not null
group by platform, churn
```
Create a *horizontal bar* visualisation

### For the rest switch back to the original catalog and schema (if applicable) 

**6. Predicted to churn by channel**
```
SELECT channel, count(*) as users
FROM churn_prediction
WHERE churn_prediction=1 and channel is not null
GROUP BY channel
```
Create a *pie chart* visualisation

**7. Predicted to churn by country**
```
SELECT country, churn_prediction, count(*) as customers
FROM churn_prediction
GROUP BY country, churn_prediction
```
Create a *bar* visualisation


## Creating our Churn Dashboard

<img style="float: right; margin-left: 10px" width="600px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-dbsql-dashboard.png" />

The next step is now to assemble our queries and their visualization in a comprehensive SQL dashboard that our business will be able to track.

### Lab exercise
Assemple the visualisations defined with the above queries into a dashboard


## Using Third party BI tools

<iframe style="float: right" width="560" height="315" src="https://www.youtube.com/embed/EcKqQV0rCnQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

SQL warehouse can also be used with an external BI tool such as Tableau or PowerBI.

This will allow you to run direct queries on top of your table, with a unified security model and Unity Catalog (ex: through SSO). Now analysts can use their favorite tools to discover new business insights on the most complete and freshest data.

To start using your Warehouse with third party BI tool, click on "Partner Connect" on the bottom left and chose your provider.

## Going further with DBSQL & Databricks Warehouse

Databricks SQL offers much more and provides a full warehouse capabilities

<img style="float: right" width="400px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-dbsql-pk-fk.png" />

### Data modeling

Comprehensive data modeling. Save your data based on your requirements: Data vault, Star schema, Inmon...

Databricks let you create your PK/FK, identity columns (auto-increment)

### Data ingestion made easy with DBSQL & DBT

Turnkey capabilities allow analysts and analytic engineers to easily ingest data from anything like cloud storage to enterprise applications such as Salesforce, Google Analytics, or Marketo using Fivetran. It’s just one click away. 

Then, simply manage dependencies and transform data in-place with built-in ETL capabilities on the Lakehouse (Delta Live Table), or using your favorite tools like dbt on Databricks SQL for best-in-class performance.

### Query federation

Need to access cross-system data? Databricks SQL query federation let you define datasources outside of databricks (ex: PostgreSQL)

### Materialized views

Avoid expensive queries and materialize your tables. The engine will recompute only what's required when your data get updated. 

### Next up
[Orchestrating and automating with Workflows]($./04 - Orchestrating with Workflows)