
# Your Lakehouse is the best Warehouse

Traditional Data Warehouses can’t keep up with the variety of data and use cases. Business agility requires reliable, real-time data, with insight from ML models.

Working with the lakehouse unlock traditional BI analysis but also real time applications having a direct connection to your entire data, while remaining fully secured.

<br>

<img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/dbsql.png" width="700px" style="float: left" />

<div style="float: left; margin-top: 240px; font-size: 23px">
  Instant, elastic compute<br>
  Lower TCO with Serveless<br>
  Zero management<br><br>

  Governance layer - row level<br><br>

  Your data. Your schema (star, data vault…)
</div>

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=984752964297111&notebook=%2F03-BI-data-warehousing%2F03-BI-Datawarehousing-fraud&demo_name=lakehouse-fsi-fraud&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-fsi-fraud%2F03-BI-data-warehousing%2F03-BI-Datawarehousing-fraud&version=1&user_hash=086247655aad7f847fc5af0bced92d31b6454844129a39a1b73eef221886867a">

# BI & Datawarehousing with Databricks SQL

<img style="float: right; margin-top: 10px" width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/fsi/fraud-detection/lakehouse-fsi-fraud-overview-3.png" />

Our datasets are now properly ingested, secured, with a high quality and easily discoverable within our organization.

Let's explore how Databricks SQL support your Data Analyst team with interactive BI and start analyzing our transactions and Fraud.

To start with Databricks SQL, open the SQL view on the top left menu.

You'll be able to:

- Create a SQL Warehouse to run your queries
- Use DBSQL to build your own dashboards
- Plug any BI tools (Tableau/PowerBI/..) to run your analysis

## Databricks SQL Warehouses: best-in-class BI engine

<img style="float: right; margin-left: 10px" width="600px" src="https://www.databricks.com/wp-content/uploads/2022/06/how-does-it-work-image-5.svg" />

Databricks SQL is a warehouse engine packed with thousands of optimizations to provide you with the best performance for all your tools, query types and real-world applications. <a href='https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html'>It won the Data Warehousing Performance Record.</a>

This includes the next-generation vectorized query engine Photon, which together with SQL warehouses, provides up to 12x better price/performance than other cloud data warehouses.

**Serverless warehouse** provide instant, elastic SQL compute — decoupled from storage — and will automatically scale to provide unlimited concurrency without disruption, for high concurrency use cases.

Make no compromise. Your best Datawarehouse is a Lakehouse.

### Creating a SQL Warehouse

SQL Wharehouse are managed by databricks. [Creating a warehouse](/sql/warehouses) is a 1-click step. 
For the purpose of the hackathon a shared Warehouse has been created already. So no need to create an additional one


## Creating your first Query





<img style="float: right; margin-left: 10px" width="600px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-dbsql-query.png" />

Our users can now start running SQL queries using the SQL editor and add new visualizations.

By leveraging auto-completion and the schema browser, we can start running adhoc queries on top of our data.

While this is ideal for Data Analyst to start analysing our fraud dataset, other personas can also leverage DBSQL to track our data ingestion pipeline, the data quality, model behavior etc.

Open the [Queries menu](/sql/queries) to start writting your first analysis. Use the SQL code below for a quick start!

# **_Task 6 Start_**

6.1 Open the **SQL Editor**, create a new SQL Statement, rename the new query tab with YourName_TestQuery -> "MaxMustermann_TestQuery"
 and write a simple SQL query to retrieve all data from YourCatalog_YourSchema.gold_transactions with a LIMIT 1000. 
Save the Statement as a query (top right save option). 
Find the saved query in the “Queries” Menu section, open it and look into the share and scheduling options


6.2 Write another SQL Statement that retrieves the _id_, _country_ and _isfraud_ from YourCatalog_YourSchema.gold_transactions. 
Execute the query and add a graph for x column = country, y = sum of isFraud in your SQL Editor view (small plus sign above the query output)

6.3 Select the small drop down at the visualization, and create a new dashboard based on this graph. Name it FirstNameLastName_AllianzHackathonDashboard.

6.4 Before moving on with the Dashboard, open your "03-BI-Datawarehousing-fraud" notebook again. 
Execute the SQL code Select * from YourCatalog.YourSchema.gold_transactions **LIMIT 100**; 
Notice that you can do the same things in a Notebook. Why do you think SQLQuerys and specific compute exist? 

6.5 In this notebook after executing the Select * from YourCatalog.YourSchema.gold_transactions **LIMIT 100** you could create a visualization via the small plus sign too. Instead visualization choose the "Data Profile" option.


In [0]:
/*Select * from psprenger_staging_catalog_internal.allianz_lakehouse_fsi_fraud.gold_transactions LIMIT 100;*/

Databricks data profile. Run in Databricks to view.

# **Task 6 End**

# **_Task 7 Start_**

7.1 In task 6 you created a dashboard. If you have it still open, navigate to the tab. 
Otherwise you can find it under Dashboard, remember you named it FirstNameLastName_AllianzHackathonDashboard.

7.2 In the dashboard switch from the canvas to the data tab view, you can see the SQL statement you added before as a datasource. Give it a meaningful name. 
You can also add datasources via the catalog explorer view. Choose "Add data source", choose your catalog, schema and the banking_customers table. Run the statement and switch back to the canvas view.

7.3 In the canvas view create a new visualization via the blue navigation menu. Use the AI assitant feature ask for "Fraud Transactions by Country". 
See how the assistant does the same SQL visualisations you beforehand spend minutes on in some seconds & sorts it in descending order. 

7.4 In the canvas view publish the dashboard. Choose to embed credentials or not embed credentials based on who you think should be able to access such a dashboard. Switch to the publish view (top, middle above your dashboard) to view the published version.


# **_Task 7 End_**


## Creating our Fraud Analysis Dashboard

<img style="float: right; margin-left: 10px" width="600px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/fsi/fraud-detection/lakehouse-fsi-fraud-dashboard.png" />

The next step is now to assemble our queries and their visualization in a comprehensive SQL dashboard that our business will be able to track.

The Dashboard has been loaded for you. Open the <a dbdemos-dashboard-id="fraud-detection" href='/sql/dashboardsv3/01f041519cec119bbf35ae5be0f58c60' target="_blank">DBSQL FSI Fraud Dashboard</a> to start reviewing our existing Fraud pattern.


## Using Third party BI tools

<iframe style="float: right; margin-left: 10px" width="560" height="315" src="https://www.youtube.com/embed/EcKqQV0rCnQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

SQL warehouse can also be used with an external BI tool such as Tableau or PowerBI.

This will allow you to run direct queries on top of your table, with a unified security model and Unity Catalog (ex: through SSO). Now analysts can use their favorite tools to discover new business insights on the most complete and freshest data.

To start using your Warehouse with third party BI tool, click on "Partner Connect" on the bottom left and chose your provider.

## Going further with DBSQL & Databricks Warehouse

Databricks SQL offers much more and provides a full warehouse capabilities

<img style="float: right" width="400px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-dbsql-pk-fk.png" />

### Data modeling

Comprehensive data modeling. Save your data based on your requirements: Data vault, Star schema, Inmon...

Databricks let you create your PK/FK, identity columns (auto-increment): `dbdemos.install('identity-pk-fk')`

### Data ingestion made easy with DBSQL & DBT

Turnkey capabilities allow analysts and analytic engineers to easily ingest data from anything like cloud storage to enterprise applications such as Salesforce, Google Analytics, or Marketo using Fivetran. It’s just one click away. 

Then, simply manage dependencies and transform data in-place with built-in ETL capabilities on the Lakehouse (Delta Live Table), or using your favorite tools like dbt on Databricks SQL for best-in-class performance.

### Query federation

Need to access cross-system data? Databricks SQL query federation let you define datasources outside of databricks (ex: PostgreSQL)

### Materialized view

Avoid expensive queries and materialize your tables. The engine will recompute only what's required when your data get updated. 


# Taking our analysis one step further: Detecting Fraud

Being able to run analysis on our past data already gives us a lot of insight. We can better understand our fraud pattern and quantify their impact.

However, knowing that we had past fraud isn't enough. We now need to take it to the next level and build a predictive model to flag financial transaction as fraud in real time, improving our revenue and reducing risl.

Let's see how this can be done with [Databricks Machine Learning notebook]($../04-Data-Science-ML/04.1-AutoML-FSI-fraud) |  [Go back to the introduction]($../00-FSI-fraud-detection-introduction-lakehouse)