# IoT Platform with Databricks Intelligence Data Platform - Ingesting real-time Industrial Sensor Data for Prescriptive Maintenance

<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_0.png" style="float: left; margin-right: 30px" width="600px" />

<br/>

## What is The Databricks Intelligence Data Platform for IoT & Manufacturing?
The Databricks Data Intelligence Platform for Manufacturing unlocks the full value of manufacturing data, enabling intelligent networks, enhanced customer experiences, smarter products, and sustainable businesses. It empowers data teams with unmatched scalability, real-time insights, and innovative capabilities across all data types and sources. Manufacturers benefit from reduced costs, increased productivity, improved customer responsiveness, and accelerated innovation. The platform integrates diverse data sources with top-tier AI processing and offers manufacturing-specific Solution Accelerators and partners for powerful real-time decision-making.
<img src="https://github.com/Datastohne/demo/blob/main/Intelligence%20Engine.png?raw=true " style="float: left; margin-right: 30px" width="200px" />

**Intelligent**
Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. 

<img src="https://github.com/Datastohne/demo/blob/main/24840.png?raw=true " style="float: right; margin-left: 30px" width="200px" />

**Simple** Natural language substantially simplifies the user experience on Databricks. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker. Additionally, developing new data and applications is accelerated through natural language assistance to write code, remediate errors and find answers.

<img src="https://github.com/Datastohne/demo/blob/main/24841.png?raw=true " style="float: left; margin-right: 30px" width="200px" />

**Private** Data and AI applications require strong governance and security, especially with the advent of generative AI. Databricks provides an end-to-end MLOps and AI development solution that’s built upon our unified approach to governance and security. You’re able to pursue all your AI initiatives — from using APIs like OpenAI to custom-built models — without compromising data privacy and IP control.
 
<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=4003492105941350&notebook=%2F00-IOT-wind-turbine-introduction-DI-platform&demo_name=lakehouse-iot-platform&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-iot-platform%2F00-IOT-wind-turbine-introduction-DI-platform&version=1">

## Wind Turbine Prescriptive Maintenance with the Databricks Intelligence Data Platform: Bringing Generative AI to Predictive Maintenance

Being able to collect and centralize industrial equipment information in real time is critical in the energy space. When a wind turbine is down, it is not generating power which leads to poor customer service and lost revenue. Data is the key to unlock critical capabilities such as energy optimization, anomaly detection, and/or predictive maintenance. The rapid rise of Generative AI provides the opportunity to revolutionize maintenance by not only predicting when equipment is likely to fail, but also generating prescriptive maintenance actions to prevent failurs before they arise and optimize equipment performance. This enables a shift from predictive to prescriptive maintenance. <br/> 

<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/prescriptive_maintenance.png" width="700px" style="float:right; margin-left: 20px"/>

Prescriptive maintenance examples include:

- Analyzing equipment IoT sensor data in real time
- Predict mechanical failure in an energy pipeline
- Diagnose root causes for predicted failure and generate prescriptive actions
- Detect abnormal behavior in a production line
- Optimize supply chain of parts and staging for scheduled maintenance and repairs

### What we'll build

In this demo, we'll build an end-to-end IoT platform to collect real-time data from multiple sources.

We'll create a predictive model to forecast wind turbine failures and use it to generate maintenance work orders, reducing downtime and increasing Overall Equipment Effectiveness (OEE).

Additionally, we'll develop a dashboard for the Turbine Maintenance team to monitor turbines, identify those at risk, and review maintenance work orders, ensuring we meet our productivity goals.

At a very high level, this is the flow we will implement:

<div style="text-align: center;">
    <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/team_flow_overview.png" width="1000px">
</div>

1. Ingest and create our IoT database and tables which are easily queriable via SQL.
2. Secure data and grant read access to the Data Analyst and Data Science teams.
3. Run BI queries to analyze existing failures.
4. Build ML model to monitor our wind turbine farm & trigger predictive maintenance operations.
5. Generate maintenance work orders for field service engineers utilizing Generative AI.

Being able to predict which wind turbine will potentially fail is only the first step to increase our wind turbine farm efficiency. Once we're able to build a model predicting potential maintenance, we can dynamically adapt our spare part stock, generate work orders for field service engineers and even automatically dispatch maintenance team with the proper equipment.

### Our dataset

To simplify this demo, we'll consider that an external system is periodically sending data into our blob storage (S3/ADLS/GCS):

- Turbine data *(location, model, identifier etc)*
- Wind turbine sensors, every sec *(energy produced, vibration, typically in streaming)*
- Turbine status over time, labelled by our analyst team, and historical maintenance reports *(historical data to train on model on and to index into vector database)*

*Note that at a technical level, our data could come from any source. Databricks can ingest data from any system (SalesForce, Fivetran, queuing message like kafka, blob storage, SQL & NoSQL databases...).*

Let's see how this data can be used within the Data Intelligence Platform to analyze sensor data,  trigger predictive maintenance and generate work orders.

&nbsp;
&nbsp;
![](_resources/images/e2eai-0.jpg)


## 1/ Ingesting and Preparing the Data (Jobs & Pipelines)

<!img style="float: left; margin-right: 20px" width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_1.png" />


<br/>
<!div style="padding-left: 420px">
Our first step is to ingest and clean the raw data we received so that our Data Analyst team can start running analysis on top of it.


<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-logo.png" style="float: right; margin-top: 20px" width="200px">

### Delta Lake

All the tables we'll create in the Lakehouse will be stored as Delta Lake tables. [Delta Lake](https://delta.io) is an open storage framework for reliability and performance. <br/>
It provides many functionalities such as *(ACID Transaction, DELETE/UPDATE/MERGE, Clone zero copy, Change data Capture...)* <br />
For more details on Delta Lake, run `dbdemos.install('delta-lake')`

### Simplify ingestion with Spark Declarative Pipelines (SDP)

Databricks simplifies data ingestion and transformation with Spark Declarative Pipelines by allowing SQL users to create advanced pipelines via batch or streaming. Databricks also simplifies pipeline deployment, testing, and tracking data quality which reduces operational complexity, so that you can focus on the needs of the business.<br/>

Open the Wind Turbine 
  [SQL notebook]($./01-Data-ingestion/01.1-DLT-Wind-Turbine-SQL) <br>
  For more details on Lakeflow Declarative Pipelines (formerly known as DLT): `dbdemos.install('dlt-load')` or `dbdemos.install('dlt-cdc')`

## 2/ Securing Data & Governance (Unity Catalog)

<!img width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_2.png"  style="float: left; margin-right: 10px"/>

<br/>
<!div style="padding-left: 420px">
  Now that our first tables have been created, we need to grant our Data Analyst team READ access to be able to start analyzing our turbine failure information.
  
  Let's see how Unity Catalog provides Security & governance across our data assets and includes data lineage and audit logs.
  
  Note that Unity Catalog integrates Delta Sharing, an open protocol to share your data with any external organization, regardless of their software or data stack. For more details:  `dbdemos.install('delta-sharing-airlines')`
 </div>

   Open [Unity Catalog notebook]($./02-Data-governance/02-UC-data-governance-security-iot-turbine) to see how to setup ACL and explore lineage with the Data Explorer.
  

## 3/ Analysing Failures (AI/BI Dashboards / SQL Editor) 

<img width="300px" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/manufacturing/lakehouse-iot-turbine/lakehouse-manuf-iot-dashboard-1.png"  style="float: right; margin: 100px 0px 10px;"/>

<!img width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_2.png"  style="float: left; margin-right: 10px"/>
 
<br>
Our datasets are now properly ingested, secured, are of high quality and easily discoverable within our organization.

Data Analysts are now ready to run BI interactive queries which are low latency & high throughput. They can choose to either create a new compute cluster, use a shared cluster, or for even faster response times, use Databricks Serverless Datawarehouses which provide instant stop & start.

Let's see how Data Warehousing is done using Databricks! We will look at our built-in dashboards as Databricks provides a complete data platform from ingest to analysis but also provides to integrations with many popular BI tools such as PowerBI, Tableau and others!

Open the [Datawarehousing notebook]($./03-BI-data-warehousing/03-BI-Datawarehousing-iot-turbine) to start running your BI queries

## 4/ Predict Failure with Data Science & ML (AI / ML)

<!img width="500px" style="float: left; margin-right: 10px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform3.png" />

<br>
Being able to run analysis on our historical data provided the team with a lot of insights to drive our business. We can now better understand the impact of downtime and see which turbines are currently down in our near real-time dashboard.

However, knowing what turbines have failed isn't enough. We now need to take it to the next level and build a predictive model to detect potential failures before they happen and increase uptime and minimize costs.

This is where the Lakehouse value comes in. Within the same platform, anyone can start building an ML model to predict the failures using traditional ML development.

Start building understanding our data and building ML models in the [04.1-EDA]($./04-Data-Science-ML/04.1-EDA) notebook.  Run all notebooks in this section in order to deploy and serve your best ML model.

## 5/ Generate Maintenance Work Orders with Generative AI (AI/ML and Compute Apps)

<!img width="500px" style="float: left; margin-right: 10px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_4.png" />

<br>

The rise of Generative AI enables a shift from Predictive to Prescriptive Maintenance ML Models. By going from ML models to agent systems, we can now leverage the predictive model as one of the many components of the AI system. This opens up a whole lot of new opportunities for automation and efficiency gains, which will further increase uptime and minimize costs.

Databricks offers a a set of tools to help developers build, deploy and evaluate production-quality AI agents like Retrievel Augmented Generation (RAG) applications, including a vector database, model serving endpoints, governance, monitoring and evaluation capabilties. 

The app in this demo is a simple tool using agent chat, but apps can be customimzed to just about anything. Integrations with other systems such as ticketing or work orders would allow custom applications to automatically create work orders based on predicted maintenance.

Let's create our first agent system with the [05.1-ai-tools]($./05-Generative-AI/05.1-ai-tools)

## 6/ Deploying and Orchestrating the Full Workflow (Jobs & Pipelines)

<!img style="float: left; margin-right: 10px" width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/di_platform_5.png" />

<br>
While our data pipeline is almost completed, we're missing one last step: orchestrating the full workflow in production.

With Databricks Lakehouse, there is no need to utilize an external orchestrator to run your job. Databricks Lakeflow Jobs simplify all your jobs, with advanced alerting, monitoring, branching options etc.

Open the [workflow and orchestration notebook]($./06-Workflow-orchestration/06-Workflow-orchestration-iot-turbine) to schedule our pipeline (data ingetion, model re-training etc)


## Conclusion

We demonstrated how to implement an end-to-end pipeline with the Lakehouse, using a single, unified and secured platform. We saw:

- Data Ingestion
- Data Analysis / DW / BI 
- Data Science / ML
- Generative AI
- Workflow & Orchestration

And as a result, our business analysis team was able to build a system to not only understand failures better but also forecast future failures and let the maintenance team take action accordingly.

*This was only an introduction to the Databricks Platform. For more details, contact your account team and explore more demos with `dbdemos.list()`!*

In [0]:
%python
# Run the cell below.  It can take 10+ minutes to run.
# This code was last tested on Serverless Environment 3.
# To check which Serverless Environment you are using,
# click on your connected compute.  To change, go to the 
# Configuration option.

In [0]:
%run ./_resources/01-load-data

Collecting git+https://github.com/QuentinAmbard/mandrova
  Cloning https://github.com/QuentinAmbard/mandrova to /tmp/pip-req-build-q52d13ey
  Running command git clone --filter=blob:none --quiet https://github.com/QuentinAmbard/mandrova /tmp/pip-req-build-q52d13ey
  Resolved https://github.com/QuentinAmbard/mandrova to commit 553986e2ab1e5e349e095b83f7c1a1e1226f99e0
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting faker
  Downloading faker-37.11.0-py3-none-any.whl.metadata (15 kB)
Collecting databricks-sdk==0.40.0
  Downloading databricks_sdk-0.40.0-py3-none-any.whl.metadata (38 kB)
Collecting mlflow==2.22.0
  Downloading mlflow-2.22.0-py3-none-any.whl.metadata (30 kB)
Collecting pandas==2.2.3
  Downloading pandas-2.2.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.metadata (89 kB)
Collecting mlflow-skinny==2.22.0 (from mlflow==2.22.0)
  Downloading mlflow_skinny-2.22.0-py3-none-any.whl.metadata (31 kB)
Coll

## Configuration file

Please change your catalog and schema here to run the demo on a different catalog.

 
<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=4003492105941350&notebook=%2Fconfig&demo_name=lakehouse-iot-platform&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-iot-platform%2Fconfig&version=1">


# Technical Setup notebook. Hide this cell results
Initialize dataset to the current user and cleanup data when reset_all_data is set to true

Do not edit

USE CATALOG `main`
using catalog.database `main`.`e2eai_iot_turbine`
folder doesn't exists, generating the data...


saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/historical_turbine_status/part-00000-tid-6324144606306677564-046e9e75-8be6-4cb7-98af-4968e0d745cd-446-1-c000.json
saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/parts/part-00001-tid-7121886240773590243-caac67ab-2de6-4dde-8d08-e416821625ec-669-1-c000.json
saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/parts/part-00002-tid-7121886240773590243-caac67ab-2de6-4dde-8d08-e416821625ec-670-1-c000.json
saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/parts/part-00000-tid-7121886240773590243-caac67ab-2de6-4dde-8d08-e416821625ec-668-1-c000.json
saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/turbine/part-00000-tid-3148847613653410419-fcc4166d-87a2-4660-bfb1-a1ea83a470f4-393-1-c000.json
saving /Volumes/main/e2eai_iot_turbine/e2eai_turbine_raw_landing/turbine/part-00000-tid-9025614560650335470-30d18e88-1904-4393-adec-f936a1012078-340-1-c000.json
saving /Volumes/main/e

<img src="https://upload.wikimedia.org/wikipedia/commons/5/52/EERE_illust_large_turbine.gif">