## Table of Contents
- [B3 Model-Based Enterprise-Engineering (MBE)](#B3-Model-Based-Enterprise-Engineering-%28MBE%29)
  - [3.1 Advanced MBE Concepts](#3.1-Advanced-MBE-Concepts)
  - [3.2 Interoperability & Standards in MBE](#3.2-Interoperability-%26-Standards-in-MBE)
  - [3.3 Automation & AI in MBE](#3.3-Automation-%26-AI-in-MBE)
  - [3.4 Example: Hands-on Excercise for Predictive Maintenance</font>](#3.4-Example%3A-Hands-on-Excercise-for-Predictive-Maintenance%3C/font%3E)
  - [3.5 Example 2: Conceptual Concept for Predictive Maintenance of Aircraft Gas Turbine Engines](#3.5-Example-2%3A-Conceptual-Concept-for-Predictive-Maintenance-of-Aircraft-Gas-Turbine-Engines)
  - [3.6 Cybersecurity & Data Integrity in MBE](#3.6-Cybersecurity-%26-Data-Integrity-in-MBE)
  - [3.7 Large Scale Real-World Applications](#3.7-Large-Scale-Real-World-Applications)
  - [3.8 Automotive & Manufacturing Applications](#3.8-Automotive-%26-Manufacturing-Applications)
  - [3.9 **Conclusion & Next Steps**](#3.9-%2A%2AConclusion-%26-Next-Steps%2A%2A)
- [🏠 Home](../../welcomePage.ipynb)

# B3 Model-Based Enterprise-Engineering (MBE)
Historically, the industry relied on drawings to communicate manufacturing components and systems requirements. The decentralization of manufacturing systems has amplified the challenge of collecting and communicating the product and process specifications needed to make decisions about design, production, and supply chain tasks while delivering products to market. Moving away from a reliance on drawings, MBE leverages computer-based technology to design, price, and manufacture items in a digital-centric environment. MBE is a fully integrated and collaborative environment built on detailed 3D product definitions shared across the enterprise to enable rapid, seamless, and affordable deployment of products from concept to disposal.  The foundational elements of MBE are A single digital master data set containing the 3D model and all needed product data in a managed, secure, and controlled environment that supports maximum data reuse for all acquisition, maintenance, and operations aspects.

At the Black Belt level, the focus shifts from understanding and adopting MBE principles to **mastering advanced integration, optimization, and automation techniques**. This module explores the **full-scale deployment of MBE**, emphasizing **interoperability, automation, AI-driven analytics, and real-world implementation challenges**.

## 3.1 Advanced MBE Concepts

### <font color = '#646464'>3.1.1 Enterprise-Wide Digital Thread Implementation</font>
**Goal:** Establish a comprehensive, end-to-end connected ecosystem that spans design, manufacturing, and sustainment processes, ensuring seamless information flow throughout the product lifecycle.

- **End-to-End Connected Ecosystem:** The digital thread concept connects disparate systems and teams by enabling the flow of data across the entire product lifecycle—from initial design to final sustainment. By ensuring this connectivity, organizations can better understand product performance, identify inefficiencies, and enhance decision-making.

- **Using PLM (Product Lifecycle Management) and ERP (Enterprise Resource Planning) Systems for Seamless Data Flow:** Product Lifecycle Management (PLM) and Enterprise Resource Planning (ERP) systems play a crucial role in supporting the digital thread by providing centralized repositories for data and ensuring that product information is available across different stages of the lifecycle. PLM systems manage the product's design, manufacturing, and maintenance data, while ERP systems provide support for planning, procurement, inventory management, and production processes. The integration of both systems ensures a smooth and consistent data flow between different teams and functions.

- **Case Study: Boeing’s Implementation of Digital Thread:** Boeing is an example of a company that has successfully implemented a digital thread across its operations. The company uses digital twins and simulation models to track and optimize the performance of their aircraft throughout the entire lifecycle. This digital thread connects design, production, and maintenance processes, allowing for real-time performance monitoring, faster issue identification, and better resource management. Boeing’s use of the digital thread not only ensures that engineering and manufacturing teams are working with the most up-to-date information but also improves the sustainment phase by providing predictive maintenance insights based on real-world data.

<center><img src="https://www.boeingsuppliers.com/content/dam/microsites/static/supplier/suppliers/doingbiz_redesign/images/section/MBE_DigitalThread.png" alt="Alt text" /></center>

---

### <font color = '#646464'>3.1.2 Model-Based Systems Engineering (MBSE) at Scale</font>
**Goal:** Expand Model-Based Systems Engineering (MBSE) practices beyond just product design, applying them to the full lifecycle management of complex systems, and ensuring cross-disciplinary integration.

- **Expanding MBSE Beyond Product Design to Full Lifecycle Management:** MBSE traditionally focuses on the design phase of the product lifecycle, but its applications can be extended to include the entire lifecycle—from concept development and design to manufacturing, testing, and sustainment. The goal is to use MBSE to not only improve the design phase but also optimize each subsequent phase by integrating real-time data and feedback into system models.

- **Integrating Multi-Disciplinary Models (Mechanical, Electrical, Software, etc.):** A key feature of MBSE is its ability to integrate multi-disciplinary models. In the context of complex systems, models from various domains (e.g., mechanical, electrical, software, etc.) need to be linked together for better analysis and optimization. For instance, the mechanical design of a system may be linked to its electrical and software models, allowing engineers to analyze the interactions and interdependencies between these disciplines. This integration leads to a more holistic understanding of the system and ensures that all aspects are considered during the development process.

- **Tools:** Several tools can support MBSE at scale, including:
  - **SysML (Systems Modeling Language):** A standard modeling language used to represent complex system structures and behaviors. SysML is widely used for creating diagrams and models that help describe system functions, architecture, and requirements.
  - **Cameo Systems Modeler:** A modeling tool that supports SysML and is used to build, analyze, and manage system models. It allows users to create rich models that can integrate with other tools in the systems engineering workflow.

**SysML example**: The diagram below shows both the SysML code and visualization for the structure of a vehicle system that consists of Engine, Transmission, Wheel, Steering, and Chassis. Each of those subsystems can also have its own components. This view is only for the system design; there are other views for requirements, system analysis, implementation, and architecture. 


<center><img src="https://media.licdn.com/dms/image/v2/C4D12AQGooY4AojdsNQ/article-inline_image-shrink_1500_2232/article-inline_image-shrink_1500_2232/0/1603862142761?e=1746057600&v=beta&t=IQ5ZGOYMc1IC5JURVnQjOS64_bV7aYMOSmorPAe7cv4" alt="Alt text" /></center>

---

## 3.2 Interoperability & Standards in MBE

### <font color = '#646464'>3.2.1 Ensuring Interoperability Across Systems</font>
**Goal:** To ensure seamless communication and data exchange between different systems used in MBE, including CAD, PLM, and MES, while adhering to common standards that enable interoperability.

- **Standards:**
    - **ISO 10303 (STEP):** The ISO 10303 standard, commonly known as STEP (Standard for the Exchange of Product model data), is a comprehensive international standard that facilitates the exchange of product data between different systems. STEP provides a method for representing product data (such as 3D models, materials, and assembly information) in a neutral format that can be understood across various platforms, including CAD, PLM, and ERP systems. It is widely used to ensure that data can be shared across disciplines without loss of information.
      
    - **ANSI QIF (Quality Information Framework):** The ANSI QIF is a standard for representing and exchanging quality-related information in the manufacturing process. It defines a common framework for data related to inspection, verification, and measurement, ensuring that quality data generated during manufacturing can be easily integrated into the broader product lifecycle. ANSI QIF plays a key role in ensuring that the information needed for quality assurance is accurately captured and shared across systems.
      
    - **ASME Y14.41:** ASME Y14.41 defines the standards for digital product definition data, focusing on the use of 3D models and their integration into product definition documents. This standard provides guidelines on how to represent product design data, including 3D CAD models, and ensures that this information can be seamlessly communicated to downstream manufacturing and quality systems.
      
- **Bridging Different CAD, PLM, and MES Systems:** One of the key challenges in MBE is ensuring that data can flow smoothly between different software systems that play different roles in the product lifecycle. 
    - **CAD (Computer-Aided Design):**  
      CAD systems are used to create 3D product designs, and often have proprietary file formats that are not easily compatible with other systems.
      
    - **PLM (Product Lifecycle Management):** PLM systems manage the data and processes associated with the entire product lifecycle, from initial design to end-of-life. PLM systems often store vast amounts of data, including design specifications, bills of materials (BOM), and configuration data, which need to be easily accessible by other systems.
      
    - **MES (Manufacturing Execution Systems):**  MES systems manage the production processes, such as scheduling, tracking, and optimizing manufacturing activities. Ensuring that the right data from CAD and PLM systems reaches MES is crucial for accurate manufacturing execution.

    The key to bridging these systems lies in creating data exchange frameworks based on standard file formats like STEP and QIF, as well as using middleware and integration tools that facilitate communication between these diverse systems. This interoperability ensures that no information is lost, that the systems are synchronized, and that engineers and manufacturers are working with up-to-date data.



<center><img src="https://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-3-030-62807-9_7/MediaObjects/506696_1_En_7_Fig3_HTML.png" alt="Alt text" /></center>

---

### <font color = '#646464'>3.2.2 Ontology and Semantic Data Models</font>
**Goal:** To enhance data integration, retrieval, and analysis in MBE by utilizing semantic data models and graph-based databases that enable AI-driven automation and advanced analytics.

- **Using Graph-Based Databases (RDF, OWL, SPARQL) for MBE Data Integration:** Graph-based databases allow for more flexible and dynamic data representation compared to traditional relational databases. These databases are designed to handle complex relationships between data entities, which is particularly useful in MBE where systems and components are highly interconnected.
    
    - **RDF (Resource Description Framework):**  RDF is a framework for representing data as triples (subject-predicate-object) in a way that can describe relationships between resources. RDF is highly useful for integrating different kinds of data from various MBE tools, such as CAD, PLM, and MES, because it allows different data points to be connected, enabling richer information retrieval and better integration across the lifecycle.
      
    - **OWL (Web Ontology Language):**  OWL is used to define the types, properties, and relationships of data in a way that makes it machine-readable. In MBE, OWL can be used to build ontologies that define the key entities (e.g., parts, assemblies, materials, manufacturing processes) and their relationships, facilitating better data sharing and understanding across the entire lifecycle.
      
    - **SPARQL (SPARQL Protocol and RDF Query Language):**  SPARQL is a query language for querying RDF data stores. It allows users to retrieve and manipulate data stored in graph-based formats. In MBE, SPARQL enables complex queries that can combine information from different systems and domains, such as design, manufacturing, and maintenance, into a single query result.

- **Enabling AI-Driven Automation and Analytics:** The use of semantic data models and graph-based databases in MBE can provide a rich foundation for AI-driven automation and advanced analytics. 
    - **Automation:** AI can leverage the interconnected data in a graph database to automate tasks such as design optimization, failure prediction, and real-time manufacturing scheduling. By analyzing relationships and patterns across multiple systems, AI can suggest improvements, automate repetitive tasks, and trigger actions based on real-time data.
      
    - **Analytics:** The ability to query complex data relationships with SPARQL and model systems using OWL enables more sophisticated data analytics. Engineers can use advanced analytics to uncover hidden insights about product performance, quality, and potential risks. For example, AI could analyze product lifecycle data to predict failure modes or identify inefficiencies in the design or manufacturing process.

    - **Example:**  In a manufacturing setting, a graph-based database can integrate design, manufacturing, and operational data, allowing AI to track a product's lifecycle from design to final use. The AI system could analyze this data to predict future failures, optimize resource allocation, and identify potential issues in the manufacturing process before they occur.

<center><img src="https://www.applyscience.it/wp-content/uploads/2020/09/predictive-maintenance.png" alt="Alt text" /></center>

---

## 3.3 Automation & AI in MBE

### <font color = '#646464'>3.3.1 Digital Twin & Simulation-Driven Decision Making
**Goal:** To leverage digital twin technology and simulation-driven decision-making to optimize product performance, enhance predictive maintenance, and facilitate real-time monitoring throughout the lifecycle.

- **Deploying Digital Twins for Predictive Maintenance & Real-Time Monitoring:** A **Digital Twin** is a digital replica of a physical object or system that is continuously updated with real-time data from sensors embedded in the physical counterpart. Digital twins can be deployed in manufacturing and product operations to predict and prevent failures before they occur.
  
    - **Predictive Maintenance:** By continuously monitoring the physical system’s data (such as temperature, vibration, pressure, etc.) and comparing it with the digital model, the digital twin can predict when a part will likely fail or require maintenance. For example, in industrial machinery, a digital twin can predict when a component is nearing the end of its useful life, allowing maintenance teams to schedule repairs proactively and avoid unplanned downtime.
    
    - **Real-Time Monitoring:** A digital twin can be used to monitor real-time performance data from systems such as machines, vehicles, or even entire production lines. This allows operators to track system health and performance in real-time, identifying deviations from expected behavior and adjusting parameters or processes before an issue escalates.

    - **Optimized Asset Management:** Digital twins help organizations optimize asset management by providing insights into the performance and condition of physical assets. By analyzing historical data, manufacturers can optimize asset utilization, minimize downtime, and extend the lifespan of expensive equipment.

- **Simulation-Based What-If Analysis for Design and Manufacturing Optimization:** Simulation tools allow engineers to run “what-if” scenarios to test different design or manufacturing decisions before committing to them. This enables designers to optimize products and manufacturing processes early in the design phase, reducing risks and costs.

    - **What-If Analysis for Design Optimization:** Simulation-based what-if analysis allows designers to explore different variations in design, materials, or manufacturing processes. For example, engineers can simulate the impact of different materials or manufacturing methods on a product’s performance or cost. By testing these variations in a virtual environment, designers can identify the best options without having to build physical prototypes.

    - **What-If Analysis for Manufacturing Optimization:** In manufacturing, what-if analysis can simulate the impact of changing production parameters, such as machine speeds or assembly line configurations, on overall efficiency, product quality, and cost. By leveraging simulation tools, manufacturers can optimize their processes, reducing waste, improving throughput, and ensuring higher quality outcomes.

    - **Risk Mitigation:** What-if simulations help identify potential failure points in designs or manufacturing processes, allowing companies to take corrective actions before real-world consequences occur. This mitigates risks associated with new product introductions or process changes.

- **Case Study: GE’s Digital Twin for Aircraft Engines:** **General Electric (GE)** has been a leader in implementing digital twin technology, particularly for aircraft engines. GE's digital twin for aircraft engines continuously collects real-time data from sensors embedded in the engines, which is then compared against the digital model of the engine. This allows GE to monitor the engine’s health throughout its operational lifecycle.
  
    - **Predictive Maintenance in Action:** By using a digital twin of the aircraft engine, GE can predict when maintenance will be needed, even before the engine shows signs of malfunction. This predictive capability allows airlines to schedule maintenance during downtime, reducing the risk of unexpected engine failures and costly flight delays.
    
    - **Optimization of Engine Performance:** The digital twin also allows GE to optimize the performance of the engines by analyzing real-time data against the digital model. This continuous feedback loop helps improve engine efficiency and fuel economy, and provides insights into how the engine performs under various operational conditions.

    - **Improved Lifecycle Management:** The use of digital twins helps GE manage the entire lifecycle of its aircraft engines, from design and manufacturing to operations and maintenance. By utilizing simulation, real-time data, and predictive maintenance capabilities, GE is able to reduce operational costs and improve the reliability of its engines.

## 3.4 Example: Hands-on Excercise for Predictive Maintenance</font>

#### Modelling Guide for Predictive Maintenance

This example outlines the process of implementing a predictive maintenance model using a sample scenario where machine failures occur due to specific component malfunctions. The objective is to predict these failures. The example datasets illustrate key steps in predictive maintenance, including feature engineering, labeling, training, and evaluation.


#### Example
The example in this notebook considers a system with multiple components and sensors.

<center><img src="Module 3 Content/img/machine.png" alt="Alt text" /></center>

#### Outline

- [Problem Description](#Problem-Description)
- [Data Sources](#Data-Sources)
   - [Telemetry](#Telemetry)
   - [Errors](#Errors)
   - [Maintenance](#Maintenance)
   - [Machines](#Machines)
   - [Failures](#Failures)
- [Feature Engineering](#Feature-Engineering)
  - [Lag Features from Telemetry](#Lag-Features-from-Telemetry)
  - [Lag Features from Errors](#Lag-Features-from-Errors)
  - [Days Since Last Replacement from Maintenance](#Days-Since-Last-Replacement-from-Maintenance)
  - [Machine Features](#Machine-Features)
- [Label Construction](#Label-Construction)
- [Modelling](#Modelling)
  - [Training, Validation and Testing](#Training,-Validation-and-Testing)
  - [Evaluation](#Evaluation)
- [Summary](#Summary)


### <font color = '#646464'>3.4.1 Problem Description</font>
Businesses in asset-heavy industries, like manufacturing, face substantial costs due to production delays caused by mechanical problems. To mitigate the costly impact of downtime, these businesses seek to predict such issues in advance, allowing them to take proactive measures.

In this example, the business problem involves predicting failures due to component malfunctions, answering the question, "What is the probability that a machine will fail in the near future due to a specific component failure?" This is framed as a multi-class classification problem, where a machine learning algorithm is employed to create a predictive model based on historical machine data.

The following sections will guide you through the implementation steps of such a model, including feature engineering, label construction, training, and evaluation. In the next section, we begin by explaining the data sources.

### <font color = '#646464'>3.4.2 Data Sources</font>

Common data sources for predictive maintenance problems are:

- **Failure history**: The failure history of a machine or component within the machine.
- **Maintenance history**: The repair history of a machine, e.g. error codes, previous maintenance activities or component replacements.
- **Machine conditions and usage**: The operating conditions of a machine e.g. data collected from sensors.
- **Machine features**: The features of a machine, e.g. engine size, make and model, location.
- **Operator features**: The features of the operator, e.g. gender, past experience.

The data for this example comes from 4 different sources: real-time telemetry data collected from machines, error messages, historical maintenance records that include failures, and machine information such as type and age.



#### Press ▶ to load the data

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Code Running Please Wait ...</span>")
display(waiting_widget)

import pandas as pd

import warnings
warnings.filterwarnings("ignore")

telemetry = pd.read_csv('Module 3 Content/data/PdM_telemetry.csv')
errors = pd.read_csv('Module 3 Content/data/PdM_errors.csv')
maint = pd.read_csv('Module 3 Content/data/PdM_maint.csv')
failures = pd.read_csv('Module 3 Content/data/PdM_failures.csv')
machines = pd.read_csv('Module 3 Content/data/PdM_machines.csv')

waiting_widget.value = "<span style='color: green;'>✅ Code Successful</span>"
clear_output(wait=True)

#### Telemetry

The first data source is the telemetry time-series data which consists of voltage, rotation, pressure, and vibration measurements collected from 100 machines in real time averaged over every hour collected during the year 2015. Below, we display the first 5 records in the dataset. A summary of the whole dataset is also provided.

#### Press ▶ to display the telemetry data and its description.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Code Running Please Wait ...</span>")
display(waiting_widget)

# format datetime field which comes in as string
telemetry['datetime'] = pd.to_datetime(telemetry['datetime'], format="%Y-%m-%d %H:%M:%S")

waiting_widget.value = "<span style='color: green;'>✅ Code Successful</span>"
clear_output(wait=True)

print("\033[1m Total number of telemetry records: %d \033[0m" % len(telemetry.index))
display(telemetry.head())
telemetry.describe()

As an example, below is a plot of voltage values for machine ID 1 for the first two months of 2015.

#### Press ▶ to display the voltage values.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

plot_df = telemetry.loc[(telemetry['machineID'] == 1) &
                        (telemetry['datetime'] > pd.to_datetime('2015-01-01')) &
                        (telemetry['datetime'] < pd.to_datetime('2015-02-01')), ['datetime', 'volt']]

sns.set_style("darkgrid")
plt.figure(figsize=(12, 6))
plt.plot(plot_df['datetime'], plot_df['volt']);
plt.ylabel('voltage')

# make x-axis ticks legible
adf = plt.gca().get_xaxis().get_major_formatter()
adf.scaled[1.0] = '%m-%d'
plt.xlabel('Date')

clear_output(wait=True)

plt.show()

#### Errors

The second major data source is the error logs. These are non-breaking errors thrown while the machine is still operational and do not constitute as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

#### Press ▶ to show the error data and its count records per error ID.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

# format datetime field which comes in as string
errors['datetime'] = pd.to_datetime(errors['datetime'], format="%Y-%m-%d %H:%M:%S")
errors['errorID'] = errors['errorID'].astype('category')

sns.set_style("darkgrid")
plt.figure(figsize=(8, 4))
errors['errorID'].value_counts().plot(kind='bar')
plt.ylabel('Count');

clear_output(wait=True)

print("\033[1m Total number of error records: %d \033[0m" % len(errors.index))
errors.head()

#### Maintenance

These are the scheduled and unscheduled maintenance records which correspond to both regular inspection of components as well as failures. A record is generated if a component is replaced during the scheduled inspection or replaced due to a breakdown. The records that are created due to breakdowns will be called failures which is explained in the later sections. Maintenance data has both 2014 and 2015 records.

#### Press ▶ to display the maintenance data and its count records per component.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

# format datetime field which comes in as string
maint['datetime'] = pd.to_datetime(maint['datetime'], format="%Y-%m-%d %H:%M:%S")
maint['comp'] = maint['comp'].astype('category')

sns.set_style("darkgrid")
plt.figure(figsize=(8, 4))
maint['comp'].value_counts().plot(kind='bar')
plt.ylabel('Count');

clear_output(wait=True)

print("\033[1m Total number of maintenance records: %d \033[0m" % len(maint.index))
maint.head()

#### Machines

This data set includes some information about the machines: model type and age (years in service).

#### Press ▶ to display a sample of the machine data and its count over the years.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

machines['model'] = machines['model'].astype('category')

sns.set_style("darkgrid")
plt.figure(figsize=(8, 6))
_, bins, _ = plt.hist([machines.loc[machines['model'] == 'model1', 'age'],
                       machines.loc[machines['model'] == 'model2', 'age'],
                       machines.loc[machines['model'] == 'model3', 'age'],
                       machines.loc[machines['model'] == 'model4', 'age']],
                       20, stacked=True, label=['model1', 'model2', 'model3', 'model4'])
plt.xlabel('Age (yrs)')
plt.ylabel('Count')
plt.legend();

clear_output(wait=True)

print("\033[1m Total number of machines: %d \033[0m" % len(machines.index))
machines.head()

#### Failures

These are the records of component replacements due to failures. Each record has a date and time, machine ID, and failed component type.

Below is the histogram of the failures due to each component. We see that component 2 causes the most failures.

#### Press ▶ to display a sample of the failure data and its count per component.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

# format datetime field which comes in as string
failures['datetime'] = pd.to_datetime(failures['datetime'], format="%Y-%m-%d %H:%M:%S")
failures['failure'] = failures['failure'].astype('category')

sns.set_style("darkgrid")
plt.figure(figsize=(8, 4))
failures['failure'].value_counts().plot(kind='bar')
plt.ylabel('Count');

clear_output(wait=True)

print("\033[1m Total number of failures: %d \033[0m" % len(failures.index))
failures.head()

### <font color = '#646464'>3.4.3 Feature Engineering</font>

The first step in predictive maintenance applications is feature engineering which requires bringing the different data sources together to create features that best describe a machines's health condition at a given point in time. In the next sections, several feature engineering methods are used to create features based on the properties of each data source.

#### Lag Features from Telemetry

Telemetry data almost always comes with time-stamps which makes it suitable for calculating lagging features. A common method is to pick a window size for the lag features to be created and compute rolling aggregate measures such as mean, standard deviation, minimum, maximum, etc. to represent the short term history of the telemetry over the lag window. In the following, rolling mean and standard deviation of the telemetry data over the last 3 hour lag window is calculated for every 3 hours.

#### Press ▶ to show the 3 hour lag features.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Loading Data, Please Wait ...</span>")
display(waiting_widget)

# Calculate mean values for telemetry features
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
    temp.append(pd.pivot_table(telemetry,
                               index='datetime',
                               columns='machineID',
                               values=col).resample('3H', closed='left', label='right').mean().unstack())
telemetry_mean_3h = pd.concat(temp, axis=1)
telemetry_mean_3h.columns = [i + 'mean_3h' for i in fields]
telemetry_mean_3h.reset_index(inplace=True)

# repeat for standard deviation
temp = []
for col in fields:
    temp.append(pd.pivot_table(telemetry,
                               index='datetime',
                               columns='machineID',
                               values=col).resample('3H', closed='left', label='right').std().unstack())
telemetry_sd_3h = pd.concat(temp, axis=1)
telemetry_sd_3h.columns = [i + 'sd_3h' for i in fields]
telemetry_sd_3h.reset_index(inplace=True)

clear_output(wait=True)

telemetry_mean_3h.head()

For capturing a longer term effect, 24 hour lag features are also calculated as below.

#### Press ▶ to show the 24 hour lag features.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Loading Data, Please Wait ...</span>")
display(waiting_widget)

temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
    rolling_mean = pd.pivot_table(telemetry,
                                  index='datetime',
                                  columns='machineID',
                                  values=col).rolling(window=24).mean()    
    resampled = rolling_mean.resample('3H', closed='left', label='right').first().unstack()
    temp.append(resampled)
    
telemetry_mean_24h = pd.concat(temp, axis=1)
telemetry_mean_24h.columns = [i + 'mean_24h' for i in fields]
telemetry_mean_24h.reset_index(inplace=True)
telemetry_mean_24h = telemetry_mean_24h.loc[-telemetry_mean_24h['voltmean_24h'].isnull()]

# repeat for standard deviation
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
    rolling_std = pd.pivot_table(telemetry,
                                  index='datetime',
                                  columns='machineID',
                                  values=col).rolling(window=24).std()
    resampled = rolling_std.resample('3H', closed='left', label='right').first().unstack()
    temp.append(resampled)
    
telemetry_sd_24h = pd.concat(temp, axis=1)
telemetry_sd_24h.columns = [i + 'sd_24h' for i in fields]
telemetry_sd_24h.reset_index(inplace=True)
telemetry_sd_24h = telemetry_sd_24h.loc[-telemetry_sd_24h['voltsd_24h'].isnull()]

clear_output(wait=True)

# Notice that a 24h rolling average is not available at the earliest timepoints
telemetry_mean_24h.head(10)

Next, the columns of the feature datasets created earlier are merged to create the final feature set from telemetry.

#### Press ▶ to display the final feature set from telemetry.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

# merge columns of feature sets created earlier
telemetry_feat = pd.concat([telemetry_mean_3h,
                            telemetry_sd_3h.iloc[:, 2:6],
                            telemetry_mean_24h.iloc[:, 2:6],
                            telemetry_sd_24h.iloc[:, 2:6]], axis=1).dropna()

clear_output(wait=True)

print('\033[1mThis is a sample of the data.\033[0m\n')
display(telemetry_feat.head())
print('\n\033[1mThis is the summary of the entire data\033[0m\n')
telemetry_feat.describe()

This is a sample of the data.

#### Lag Features from Errors

Like telemetry data, errors come with timestamps. An important difference is that the error IDs are categorical values and should not be averaged over time intervals like the telemetry measurements. Instead, we count the number of errors of each type in a lagging window. We begin by reformatting the error data to have one entry per machine per time at which at least one error occurred:

#### Press ▶ to display the updated error data.

In [None]:
# create a column for each error type
error_count = pd.get_dummies(errors.set_index('datetime')).reset_index()
error_count.columns = ['datetime', 'machineID', 'error1', 'error2', 'error3', 'error4', 'error5']

# combine errors for a given machine in a given hour
error_count = error_count.groupby(['machineID', 'datetime']).sum().reset_index()
error_count.head(13)

Now we add blank entries for all other hourly timepoints (since no errors occurred at those times):

#### Press ▶ to display the summary of the updated error data.

error_count = telemetry[['datetime', 'machineID']].merge(error_count, on=['machineID', 'datetime'], how='left').fillna(0.0)
error_count.describe()

#### Press ▶ to display the summary of the updated error data.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

error_count = telemetry[['datetime', 'machineID']].merge(error_count, on=['machineID', 'datetime'], how='left').fillna(0.0)
clear_output(wait=True)
error_count.describe()

#### Days Since Last Replacement from Maintenance

A crucial data set in this example is the maintenance records which contain the information of component replacement records. Possible features from this data set can be, for example, the number of replacements of each component in the last 3 months to incorporate the frequency of replacements. However, more relevent information would be to calculate how long it has been since a component is last replaced as that would be expected to correlate better with component failures since the longer a component is used, the more degradation should be expected. 

As a side note, creating lagging features from maintenance data is not as straightforward as for telemetry and errors, so the features from this data are generated in a more custom way. This type of ad-hoc feature engineering is very common in predictive maintenance since domain knowledge plays a big role in understanding the predictors of a problem. In the following, the days since last component replacement are calculated for each component type as features from the maintenance data. 

#### Press ▶ to show a sample of the maintenance data and its summary.

In [None]:
import numpy as np
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Loading Data, Please Wait ...</span>")
display(waiting_widget)


# create a column for each error type
comp_rep = pd.get_dummies(maint.set_index('datetime')).reset_index()
comp_rep.columns = ['datetime', 'machineID', 'comp1', 'comp2', 'comp3', 'comp4']

# combine repairs for a given machine in a given hour
comp_rep = comp_rep.groupby(['machineID', 'datetime']).sum().reset_index()

# add timepoints where no components were replaced
comp_rep = telemetry[['datetime', 'machineID']].merge(comp_rep,
                                                      on=['datetime', 'machineID'],
                                                      how='outer').fillna(0).sort_values(by=['machineID', 'datetime'])

components = ['comp1', 'comp2', 'comp3', 'comp4']
for comp in components:
    # convert indicator to most recent date of component change
    comp_rep.loc[comp_rep[comp] < 1, comp] = None
    comp_rep.loc[-comp_rep[comp].isnull(), comp] = comp_rep.loc[-comp_rep[comp].isnull(), 'datetime']
    
    # forward-fill the most-recent date of component change
    comp_rep[comp] = comp_rep[comp].fillna(method='ffill')

# remove dates in 2014 (may have NaN or future component change dates)    
comp_rep = comp_rep.loc[comp_rep['datetime'] > pd.to_datetime('2015-01-01')]

# replace dates of most recent component change with days since most recent component change
for comp in components:
    comp_rep[comp] = (comp_rep['datetime'] - comp_rep[comp]) / np.timedelta64(1, 'D')
    
clear_output(wait=True)

print('\033[1mThis is sample of the data.\033[0m\n')
display(comp_rep.head())

print('\n\033[1mThis is the summary of the data.\033[0m\n')
comp_rep.describe()

#### Machine Features

The machine features can be used without further modification. These include descriptive information about the type of each machine and its age (number of years in service). If the age information had been recorded as a "first use date" for each machine, a transformation would have been necessary to turn those into a numeric values indicating the years in service.

Lastly, we merge all the feature data sets we created earlier to get the final feature matrix.

#### Press ▶ to show a sample of the final feature data and its summary.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

final_feat = telemetry_feat.merge(error_count, on=['datetime', 'machineID'], how='left')
final_feat = final_feat.merge(comp_rep, on=['datetime', 'machineID'], how='left')
final_feat = final_feat.merge(machines, on=['machineID'], how='left')

clear_output(wait=True)

with pd.option_context('display.max_rows', 5, 'display.max_columns', None): 
    display(final_feat)
#display(final_feat.head())
final_feat.describe()

### <font color = '#646464'>3.4.4 Label Construction</font>

When using multi-class classification for predicting failure due to a problem, labelling is done by taking a time window prior to the failure of an asset and labelling the feature records that fall into that window as "about to fail due to a problem" while labelling all other records as "normal." This time window should be picked according to the business case: in some situations it may be enough to predict failures hours in advance, while in others days or weeks may be needed to allow e.g. for arrival of replacement parts.

The prediction problem for this example scenerio is to estimate the probability that a machine will fail in the near future due to a failure of a certain component. More specifically, the goal is to compute the probability that a machine will fail in the next 24 hours due to a certain component failure (component 1, 2, 3, or 4). Below, a categorical `failure` feature is created to serve as the label. All records within a 24 hour window before a failure of component 1 have `failure=comp1`, and so on for components 2, 3, and 4; all records not within 24 hours of a component failure have `failure=none`.

#### Press ▶ to show the feature data and its "failure" label.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: green;'>✅ Code Running Please Wait ...</span>")
display(waiting_widget)

labeled_features = final_feat.merge(failures, on=['datetime', 'machineID'], how='left')
labeled_features = labeled_features.fillna(method='bfill', limit=7) # fill backward up to 24h

for col in labeled_features.select_dtypes(include='category').columns:
    labeled_features[col] = labeled_features[col].astype(str)

labeled_features = labeled_features.replace('nan', 'none')

clear_output(wait=True)

labeled_features.head()

Below is an example of records that are labeled as `failure=comp4` in the failure column. Notice that the first 8 records all occur in the 24-hour window before the first recorded failure of component 4. The next 8 records are within the 24 hour window before another failure of component 4.

#### Press ▶ to show an example of "comp4" failure.

In [None]:
labeled_features.loc[labeled_features['failure'] == 'comp4'][:16]

### <font color = '#646464'>3.4.5 Modelling</font>

#### Training, Validation and Testing

When working with time-stamped data as in this example, record partitioning into training, validation, and test sets should be performed carefully to prevent overestimating the performance of the models. In predictive maintenance, the features are usually generated using lagging aggregates: records in the same time window will likely have identical labels and similar feature values. These correlations can give a model an "unfair advantage" when predicting on a test set record that shares its time window with a training set record. We therefore partition records into training, validation, and test sets in large chunks, to minimize the number of time intervals shared between them.

Predictive models have no advance knowledge of future chronological trends: in practice, such trends are likely to exist and to adversely impact the model's performance. To obtain an accurate assessment of a predictive model's performance, we recommend training on older records and validating/testing using newer records.

For both of these reasons, a time-dependent record splitting strategy is an excellent choice for predictive maintenace models. The split is effected by choosing a point in time based on the desired size of the training and test sets: all records before the timepoint are used for training the model, and all remaining records are used for testing. (If desired, the timeline could be further divided to create validation sets for parameter selection.) To prevent any records in the training set from sharing time windows with the records in the test set, we remove any records at the boundary -- in this case, by ignoring 24 hours' worth of data prior to the timepoint.

#### Press ▶ to train the model and predict the test results. 

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Code Running Please Wait ...</span>")
display(waiting_widget)

from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier

# make test and training splits
threshold_dates = [[pd.to_datetime('2015-07-31 01:00:00'), pd.to_datetime('2015-08-01 01:00:00')],
                   [pd.to_datetime('2015-08-31 01:00:00'), pd.to_datetime('2015-09-01 01:00:00')],
                   [pd.to_datetime('2015-09-30 01:00:00'), pd.to_datetime('2015-10-01 01:00:00')]]

test_results = []
models = []
for last_train_date, first_test_date in threshold_dates:
    # split out training and test data
    train_y = labeled_features.loc[labeled_features['datetime'] < last_train_date, 'failure']
    train_X = pd.get_dummies(labeled_features.loc[labeled_features['datetime'] < last_train_date].drop(columns=['datetime',
                                                                                                        'machineID',
                                                                                                        'failure']))
    test_X = pd.get_dummies(labeled_features.loc[labeled_features['datetime'] > first_test_date].drop(columns=['datetime',
                                                                                                       'machineID',
                                                                                                       'failure']))
    # train and predict using the model, storing results for later
    my_model = HistGradientBoostingClassifier(max_depth = 9, verbose=2, random_state=42)
    my_model.fit(train_X, train_y)
    test_result = pd.DataFrame(labeled_features.loc[labeled_features['datetime'] > first_test_date])
    test_result['predicted_failure'] = my_model.predict(test_X)
    test_results.append(test_result)
    models.append(my_model)
    
waiting_widget.value = "<span style='color: green;'>✅ Code Successful</span>"

### <font color = '#646464'>3.4.6 Evaluation</font>

In predictive maintenance, machine failures are usually rare occurrences in the lifetime of the assets compared to normal operations. This causes an imbalance in the label distribution, which usually causes poor performance as algorithms tend to classify majority class examples better at the expense of minority class examples, as the total misclassification error is much improved when the majority class is labeled correctly.  This causes low recall rates, although accuracy can be high, and becomes a larger problem when the cost of false alarms to the business is very high. To help with this problem, sampling techniques such as oversampling of the minority examples are usually used along with more sophisticated techniques that are not covered in this notebook.

#### Data Imbalance

Visualize the categories distribution. We clearly see that the "none" class is dominant and there is a data imbalance.

#### Press ▶ to display the distribution of the target classes.

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(8, 4))
labeled_features['failure'].value_counts().plot(kind='bar')
plt.xlabel('Component failing')
plt.ylabel('Count');

#### Baseline Classification Metrics

Also, due to the class imbalance problem, it is important to look at evaluation metrics other than accuracy alone and compare those metrics to the baseline metrics, which are computed when random chance is used to make predictions rather than a machine learning model.  The comparison will better highlight the value and benefits of using a machine learning model.

In the following, we use an evaluation function that computes many important evaluation metrics and baseline metrics for classification problems.

#### Press ▶ to display the confusion matrices the three different splits and the evaluation results.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
waiting_widget = widgets.HTML(value="<span style='color: orange;'>🟧 Code Running Please Wait ...</span>")
display(waiting_widget)

from sklearn.metrics import confusion_matrix, recall_score, accuracy_score, precision_score
import itertools

def Evaluate(predicted, actual, labels):
    output_labels = []
    output = []
    
    # Calculate and display confusion matrix
    cm = confusion_matrix(actual, predicted, labels=labels)

    # Plotting the confusion matrix
    plt.figure(figsize=(5, 4))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title('Confusion Matrix')
    plt.colorbar()
    tick_marks = np.arange(len(labels))
    plt.xticks(tick_marks, labels, rotation=45)
    plt.yticks(tick_marks, labels)
    plt.xlabel('True Labels')
    plt.ylabel('Predicted Labels')

    plt.grid(False)

    # Annotating the confusion matrix
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.show()
    
    #print('Confusion matrix\n- x-axis is true labels (none, comp1, etc.)\n- y-axis is predicted labels')
    #print(cm)
    
    # Calculate precision, recall, and F1 score
    accuracy = np.array([float(np.trace(cm)) / np.sum(cm)] * len(labels))
    precision = precision_score(actual, predicted, average=None, labels=labels)
    recall = recall_score(actual, predicted, average=None, labels=labels)
    f1 = 2 * precision * recall / (precision + recall)
    output.extend([accuracy.tolist(), precision.tolist(), recall.tolist(), f1.tolist()])
    output_labels.extend(['accuracy', 'precision', 'recall', 'F1'])
    
    # Calculate the macro versions of these metrics
    output.extend([[np.mean(precision)] * len(labels),
                   [np.mean(recall)] * len(labels),
                   [np.mean(f1)] * len(labels)])
    output_labels.extend(['macro precision', 'macro recall', 'macro F1'])
    
    # Find the one-vs.-all confusion matrix
    cm_row_sums = cm.sum(axis = 1)
    cm_col_sums = cm.sum(axis = 0)
    s = np.zeros((2, 2))
    for i in range(len(labels)):
        v = np.array([[cm[i, i],
                       cm_row_sums[i] - cm[i, i]],
                      [cm_col_sums[i] - cm[i, i],
                       np.sum(cm) + cm[i, i] - (cm_row_sums[i] + cm_col_sums[i])]])
        s += v
    s_row_sums = s.sum(axis = 1)
    
    # Add average accuracy and micro-averaged  precision/recall/F1
    avg_accuracy = [np.trace(s) / np.sum(s)] * len(labels)
    micro_prf = [float(s[0,0]) / s_row_sums[0]] * len(labels)
    output.extend([avg_accuracy, micro_prf])
    output_labels.extend(['average accuracy',
                          'micro-averaged precision/recall/F1'])
    
    # Compute metrics for the majority classifier
    mc_index = np.where(cm_row_sums == np.max(cm_row_sums))[0][0]
    cm_row_dist = cm_row_sums / float(np.sum(cm))
    mc_accuracy = 0 * cm_row_dist; mc_accuracy[mc_index] = cm_row_dist[mc_index]
    mc_recall = 0 * cm_row_dist; mc_recall[mc_index] = 1
    mc_precision = 0 * cm_row_dist
    mc_precision[mc_index] = cm_row_dist[mc_index]
    mc_F1 = 0 * cm_row_dist;
    mc_F1[mc_index] = 2 * mc_precision[mc_index] / (mc_precision[mc_index] + 1)
    output.extend([mc_accuracy.tolist(), mc_recall.tolist(),
                   mc_precision.tolist(), mc_F1.tolist()])
    output_labels.extend(['majority class accuracy', 'majority class recall',
                          'majority class precision', 'majority class F1'])
        
    # Random accuracy and kappa
    cm_col_dist = cm_col_sums / float(np.sum(cm))
    exp_accuracy = np.array([np.sum(cm_row_dist * cm_col_dist)] * len(labels))
    kappa = (accuracy - exp_accuracy) / (1 - exp_accuracy)
    output.extend([exp_accuracy.tolist(), kappa.tolist()])
    output_labels.extend(['expected accuracy', 'kappa'])
    

    # Random guess
    rg_accuracy = np.ones(len(labels)) / float(len(labels))
    rg_precision = cm_row_dist
    rg_recall = np.ones(len(labels)) / float(len(labels))
    rg_F1 = 2 * cm_row_dist / (len(labels) * cm_row_dist + 1)
    output.extend([rg_accuracy.tolist(), rg_precision.tolist(),
                   rg_recall.tolist(), rg_F1.tolist()])
    output_labels.extend(['random guess accuracy', 'random guess precision',
                          'random guess recall', 'random guess F1'])
    
    # Random weighted guess
    rwg_accuracy = np.ones(len(labels)) * sum(cm_row_dist**2)
    rwg_precision = cm_row_dist
    rwg_recall = cm_row_dist
    rwg_F1 = cm_row_dist
    output.extend([rwg_accuracy.tolist(), rwg_precision.tolist(),
                   rwg_recall.tolist(), rwg_F1.tolist()])
    output_labels.extend(['random weighted guess accuracy',
                          'random weighted guess precision',
                          'random weighted guess recall',
                          'random weighted guess F1'])

    output_df = pd.DataFrame(output, columns=labels)
    output_df.index = output_labels
                  
    return output_df

evaluation_results = []
for i, test_result in enumerate(test_results):
    print('\n\033[1mSplit %d:\033[0m' % (i+1))
    evaluation_result = Evaluate(actual = test_result['failure'],
                                 predicted = test_result['predicted_failure'],
                                 labels = ['none', 'comp1', 'comp2', 'comp3', 'comp4'])
    evaluation_results.append(evaluation_result)
evaluation_results[0]  # show full results for first split only
waiting_widget.value = "<span style='color: green;'>✅ Code Successful</span>"

#### Evaluation Highlights
In predictive maintenance, we are often most concerned with how many of the actual failures were predicted by the model, i.e. the model's recall. (Recall becomes more important as the consequences of *false negatives* -- true failures that the model did not predict -- exceed the consequences of *false positives*, viz. false prediction of impending failure.) Below, we compare the recall rates for each failure type for the three models. The recall rates for all components as well as no failure are all above 90% meaning the model was able to capture above 90% of the failures correctly.

#### Press ▶ to show the recall values per split per class.

In [None]:
recall_df = pd.DataFrame([evaluation_results[0].loc['recall'].values,
                          evaluation_results[1].loc['recall'].values,
                          evaluation_results[2].loc['recall'].values],
                         columns = ['none', 'comp1', 'comp2', 'comp3', 'comp4'],
                         index = ['recall for first split',
                                  'recall for second split',
                                  'recall for third split'])
recall_df

## 3.5 Example 2: Conceptual Concept for Predictive Maintenance of Aircraft Gas Turbine Engines

### <font color = '#646464'>3.5.1 Problem Formulation</font>
In modern aviation, ensuring the reliability and safety of aircraft gas turbine engines is paramount. Unplanned engine failures can lead to severe operational disruptions, increased maintenance costs, and, most critically, safety risks. Traditional maintenance strategies, such as reactive (run-to-failure) or scheduled preventative maintenance, often result in either unnecessary early component replacements or failures occurring between scheduled inspections. To mitigate these issues, a data-driven predictive maintenance framework is essential — leveraging real-time sensory data to anticipate failures and optimize maintenance schedules.

### <font color = '#646464'>3.5.2 Goal</font>
The goal of this predictive maintenance approach is to shift from time-based or usage-based servicing to a more intelligent, condition-based maintenance (CBM) paradigm. This involves continuously monitoring engine performance through a network of onboard sensors that capture critical parameters such as temperature, pressure, vibration, fuel flow, and rotational speeds. By analyzing historical trends and current sensor readings, degradation patterns can be identified, allowing for early fault detection, Remaining Useful Life (RUL) estimation, and dynamic maintenance planning.

#### Inputs
Multivariate time-series sensor data from the aircraft engine (e.g., temperature, pressure, vibration), as shown in the Table below.

#### Outputs
Predictive insights, including component health status, estimated time to failure, and recommended maintenance action times.

#### Approach
A statistical framework that processes multi-sensory data collected up to the current time and predicts the system's degradation profile and remaining useful life.

<center><img src="Module 3 Content/img/Table1.JPG" alt="Alt text" /></center>

**Goal:** Construct a degradation score and identify a threshold at which condition-based maintenance would be recommended. 

**Approach:** 

1. Construct a fused health index based on the sensory data that can separate the health states of an engine. The figure below shows an ideal health index that can clearly separate between failure state (0), state just before failure (1), and so on. The figure shows the health index for multiple engines, and the variations between the engines at a given state are negligible compared to the gap between the two states. In other words, the 95% CI of the distribution in black is much smaller than the solid lines.

<center><img src="Module 3 Content/img/Health_state.JPG" alt="Alt text" /></center>

The optimization function used to find the best possible health index from historical engines that ran until failure in a controlled environment is shown below:

<center><img src="Module 3 Content/img/Optimization_function.JPG" alt="Alt text" /></center>

Here, deep learning calculates the health index as a fused signal of the multiple sensors.

<center><img src="Module 3 Content/img/model.JPG" alt="Alt text" /></center>

The optimization function aims to separate any two consecutive states from one side, thus resulting in a monotonic health index where the statistical distance between consecutive states is maximized. The optimization objective function is inspired by a series of hypothesis testing shown below.

<center><img src="Module 3 Content/img/hypothesis.JPG" alt="Alt text" /></center>

The figure below shows an example outcome from a practical scenario in real time. The fused signal is less noisy and easier to forecast.

<center><img src="Module 3 Content/img/Fused_signal.JPG" alt="Alt text" /></center>


2. Find the statistical distribution of the health index values at all health states, including State 0 (failure), State 1, ..., and normal operating state. The figure below shows the distributions to be estimated in green boxes. Each state will have its own distribution.

<center><img src="Module 3 Content/img/Distributions.JPG" alt="Alt text" /></center>

3. In real time, the health state can be estimated by finding the maximum likelihood state that fits the observed multiple sensor data. The governing equations to find the maximum-likelihood estimate (MLE) for the state are:

<center><img src="Module 3 Content/img/MLE_1.JPG" alt="Alt text" /></center>
Using Bayesian statistics, we know that the likelihood that a calculated health-index measurement belongs to the distribution of health index values from state p is proportion to the probability that the health state is "p" given the calculated health index. Here, h_p represents the distribution for state p, and h_i,t represents the real-time calculated health-index value for engine i at time t.

Then, the MLE health-state is the answer for the following optimization problem:

<center><img src="Module 3 Content/img/MLE2.JPG" alt="Alt text" /></center>

Without diving into the detailed derivations, this simplifies to

<center><img src="Module 3 Content/img/MLE3.JPG" alt="Alt text" /></center>

Here, we assume that the calculated health index is not a single static number but a distribution that is more robust to noisy sensory data. This is why the health index is smoother than the individual sensor data, as was shown in one of the previous figures. Specifically, the distribution for the calculated health-index follows:

<center><img src="Module 3 Content/img/h_i_distr.JPG" alt="Alt text" /></center>

4. Construct a normalized degradation score between 0 and 1 that will be later useful to identify a trigger for recommending predictive maintenance. One potential estimation for the DS is:

<center><img src="Module 3 Content/img/DS_eq.JPG" alt="Alt text" /></center>

Here, N is the number of health states.

The figure shows an example engine around 100 cycles before failing one requirement for its intended function. To achieve this figure, the health index is forecasted using a machine learning algorithm with every forecast, we find the MLE state until we reach the failure MLE state.

5. Forecast the health index and the degradation score. Specifically, the health index is forecasted using a machine learning algorithm with every forecast, we find the MLE state and then find the DS. Once the MLE state is the failure state, then the engine is assumed to have reached failure at the forecasted time point.

The figure below shows an example for forecasting the DS and finding the failure cycle. The figure shows that the predicted failure cycle is 268 but the true failure cycle was 251. In other words, the DS at true failure is 0.88 and predicted DS at failure is always 1 by defintion.

<center><img src="Module 3 Content/img/Result.JPG" alt="Alt text" /></center>

6. Find a threshold, and when the DS surpasses it, condition-based predictive maintenance is requested. This threshold is estimated based on historical engines and the predictive performance of the DS. The figure below shows all the predicted DS at true failure and it is clear that a threshold of 0.7 is sufficient to request maintenance. There is only one anomolous point below 0.7.

<center><img src="Module 3 Content/img/Predictive_maintenance.JPG" alt="Alt text" /></center>

**Conclusion**: This concludes the predictive maintenance example, where a digital thread that orchestrates the accessibility of the sensor data for different components of the engine can be utilized for effective strategies for predictive maintenance and helps avoid unexpected failures.

---

## 3.6 Cybersecurity & Data Integrity in MBE

### <font color = '#646464'>3.6.1 Securing the MBE Ecosystem</font>
**Goal:** To ensure that the Model-Based Enterprise (MBE) ecosystem is secure from external threats, unauthorized access, and tampering, while maintaining the integrity of the design, manufacturing, and sustainment processes.

- **Implementing Zero Trust Architecture (ZTA) in Digital Engineering Workflows:** **Zero Trust Architecture (ZTA)** is an advanced security framework that assumes no entity, inside or outside the organization, can be trusted by default. ZTA requires verification and authorization for every user, device, and application attempting to access resources, ensuring that security is enforced at every level.
  
    - **Access Control Based on Identity and Context:** In the context of digital engineering workflows, Zero Trust can be implemented by enforcing strict access controls for users, devices, and systems at all stages of the product lifecycle. This includes designing, manufacturing, and sustaining digital assets such as CAD models, simulation data, and digital twins.
    
    - **Continuous Monitoring and Authentication:**  
      ZTA continuously monitors all activities across the MBE ecosystem, ensuring that any unusual or suspicious behavior is detected and addressed in real-time. This might include ensuring that the user requesting access has valid credentials, that the devices are properly authenticated, and that the request complies with the established security policies.
    
    - **Granular Access to Data & Systems:** In a Zero Trust environment, each step of the workflow, from design to manufacturing to maintenance, would have strict, role-based controls. These controls could restrict access to sensitive data such as intellectual property (IP), proprietary algorithms, and sensitive customer information, ensuring that only authorized personnel can interact with specific parts of the system.

- **Blockchain for Tamper-Proof Design & Manufacturing Records:** Blockchain technology can provide a secure, transparent, and immutable record of all activities within the MBE ecosystem. Each transaction (or change) in the product lifecycle is recorded in a decentralized ledger, making it impossible to alter or delete past records without detection.
  
    - **Tamper-Proof Records:** Blockchain provides tamper-proof records by creating a chain of blocks that securely store design, manufacturing, and testing data. As each step is completed in the MBE process (e.g., a new design iteration, a change in manufacturing specifications, or a product test), a cryptographic hash is generated and recorded in the blockchain. This ensures that every change is permanent and cannot be retroactively altered, protecting the integrity of the data.
    
    - **Traceability of Design Changes:** Blockchain’s transparency makes it easy to track and verify design changes throughout the product lifecycle. In industries like aerospace, automotive, or defense, where safety and reliability are paramount, this feature ensures that every change is traceable, providing an audit trail that can be used for verification and compliance purposes.
    
    - **Smart Contracts for Manufacturing and Supply Chain:** **Smart contracts** (self-executing contracts with the terms of the agreement directly written into code) can be used to automate and enforce agreements between parties in the manufacturing and supply chain processes. These contracts ensure that only authorized changes or updates can occur, further enhancing the security and efficiency of the ecosystem.

---

### <font color = '#646464'>4.2 Digital Rights Management (DRM) & Access Control</font>
**Goal:** To safeguard intellectual property (IP) and sensitive data in a collaborative MBE environment, ensuring that only authorized individuals can access or modify designs and engineering information.

- **Ensuring Intellectual Property Protection in Collaborative Environments:** The collaborative nature of MBE, where multiple stakeholders across different organizations or departments interact with digital models and data, makes intellectual property protection a critical issue. MBE involves sharing sensitive designs and data between design teams, manufacturers, suppliers, and maintenance providers, all of whom need access to specific parts of the product lifecycle. 
   
    - **Watermarking and Encryption:** **Digital watermarking** can be applied to CAD models, simulation data, and other digital assets to uniquely identify their origin and prevent unauthorized copying or distribution. Additionally, **encryption** methods can be used to secure sensitive design data, making it unreadable to unauthorized parties.
    
    - **Control over Distribution and Use of IP:** To prevent unauthorized sharing or use of sensitive designs, DRM tools can enforce restrictions on the copying, printing, or distributing of intellectual property. For instance, encrypted CAD files might require special software or keys to open and work with, ensuring that only authorized personnel can interact with the design.
    
    - **Tracking and Auditing IP Usage:** DRM tools also allow for the monitoring and tracking of how intellectual property is accessed and used. This means that every access event can be logged and reviewed, providing a clear audit trail of who viewed, modified, or shared the data and under what circumstances.

- **Role-Based Access Control for PLM & MBE Tools:** **Role-Based Access Control (RBAC)** is a security paradigm used to restrict access to resources based on the roles of individual users within an organization. By using RBAC, MBE tools and PLM systems can be tailored to ensure that users only have access to the data and functionality necessary for their roles, improving security and protecting sensitive information.
  
    - **Granular Role Assignments:** In MBE, different teams (e.g., designers, engineers, manufacturers, suppliers) have varying levels of access to data and tools. RBAC ensures that each user has access to the appropriate resources based on their role. For instance, a manufacturing engineer may have access to manufacturing specifications but not to sensitive design data, while a design engineer would have access to the CAD model but not to production scheduling.
    
    - **Separation of Duties:** RBAC also helps enforce the principle of separation of duties, ensuring that no individual has control over the entire product lifecycle or the ability to make changes to critical data without oversight. This can prevent fraud or errors and ensure that critical tasks (e.g., approval processes) are carried out by the appropriate personnel.
    
    - **Access Auditing and Monitoring:** For added security, RBAC systems allow for logging and auditing of all access events. This enables administrators to monitor who is accessing sensitive data or systems and when, and to ensure that users are adhering to established policies. If an anomaly is detected, immediate actions can be taken, such as revoking access or conducting an investigation.

---

## 3.7 Large Scale Real-World Applications

### <font color = '#646464'>3.7.1 MBE Implementation in Aerospace & Defense</font>
**Goal:** To explore how MBE and its associated methodologies are applied in highly complex industries like aerospace and defense, where precision, collaboration, and innovation are critical.

- **Lockheed Martin’s MBE Approach for F-35 Production:** Lockheed Martin has revolutionized the way military aircraft like the F-35 Lightning II are designed, built, and sustained by implementing Model-Based Engineering (MBE). In this case, MBE plays a critical role throughout the entire lifecycle of the aircraft, from design to manufacturing and sustainment.
  
    - **End-to-End Integration:** Lockheed Martin integrated MBE across all stages of the aircraft's lifecycle. From initial conceptual design, detailed modeling, and simulation, to production and sustainment, digital models are used as the **single source of truth**. These models ensure that all stakeholders, including designers, engineers, suppliers, and maintenance teams, have access to accurate and up-to-date information.
    
    - **Digital Thread:** The Digital Thread concept ties together all aspects of the aircraft’s lifecycle, enabling seamless communication and information exchange across different systems. This ensures that changes made during any phase of the lifecycle are reflected across the entire ecosystem, leading to better collaboration, fewer errors, and increased operational efficiency.
    
    - **Reduced Development Time and Costs:** By utilizing MBE, Lockheed Martin has been able to reduce development time and costs for the F-35 program. Digital models have been used to simulate manufacturing processes, perform virtual testing, and predict potential issues, which has led to a more streamlined and efficient production process.

<center><img src="https://defense.info/wp-content/uploads/2018/11/Screen-Shot-2018-11-07-at-4.54.18-AM-1024x569.png" alt="Alt text" /></center>

---

## 3.8 Automotive & Manufacturing Applications
**Goal:** To showcase how MBE is being adopted in automotive and manufacturing industries to optimize design, improve production efficiency, and enable real-time decision-making.

- **MBE-Driven Smart Manufacturing at BMW & Tesla:** Both BMW and Tesla are leaders in the adoption of Smart Manufacturing driven by MBE principles, enabling faster innovation, more efficient production, and enhanced product quality. In these companies, MBE technologies like Digital Twins, predictive analytics, and real-time data monitoring are transforming the way vehicles are designed, produced, and maintained.

    - **BMW’s Use of Digital Twin and Smart Factory:** BMW has integrated Digital Twin technology into its smart factories, where virtual representations of physical production lines and vehicles are created. These digital replicas are continuously updated with real-time data from the physical world, allowing BMW to monitor and optimize the production process. Any deviation from the expected conditions (such as equipment malfunctions or production delays) can be detected and addressed proactively.
    
    - **Tesla’s Agile Manufacturing with MBSE:** Tesla’s production lines are renowned for their agility, and MBE plays a crucial role in making this possible. Tesla uses virtual prototypes and simulation models to rapidly iterate on vehicle designs and manufacturing processes. Additionally, Tesla’s real-time feedback loops ensure that production data is continuously fed back into the system to improve future iterations of the design and manufacturing process.
    
    - **Benefits:**
      - Enhanced flexibility and faster time-to-market.
      - Greater accuracy and precision in production.
      - Reduced downtime due to predictive maintenance and real-time monitoring.

<div style="text-align: center;">
    <a href="https://www.youtube.com/watch?v=g78YHYXXils" target="_blank">
        <img src="https://img.youtube.com/vi/g78YHYXXils/0.jpg" alt="Video Thumbnail" width="560" height="315" />
    </a>
</div>


- **AI-Powered Predictive Quality in Production Lines:** Both in the automotive and manufacturing sectors, AI is playing a major role in ensuring high quality while minimizing waste. By integrating AI algorithms into production lines, manufacturers can predict potential quality issues before they occur, ensuring products meet stringent standards and reducing the need for rework.

    - **Predictive Analytics for Quality Control:** AI-powered systems are being used to monitor key parameters like temperature, pressure, and material quality in real time during production. For example, BMW uses AI to analyze sensor data from its production lines and predict potential defects before they affect the product. This helps reduce waste, improve product consistency, and increase throughput.
    
    - **Real-Time Adjustments in Production:** AI also allows manufacturers to make real-time adjustments to the production process. For example, if the system detects that a production line is deviating from optimal conditions, it can automatically adjust machine settings, material inputs, or other parameters to ensure the final product remains within quality standards.
    
    - **Automotive Industry Applications:** For companies like Tesla, AI-driven quality control is integrated directly into the manufacturing process, ensuring that vehicles are free of defects before they are shipped to customers. AI systems are used to inspect the paint quality, alignment, and performance of individual vehicle components during the production process, ensuring a higher level of precision and minimizing human error.

---


## 3.9 **Conclusion & Next Steps**

The journey toward mastering **Model-Based Enterprise (MBE)** is both exciting and challenging, as it requires the integration of cutting-edge technologies, frameworks, and methodologies. As industries continue to evolve, embracing MBE not only optimizes processes but also positions organizations for **sustainable innovation** and **competitive advantage**.

### <font color = '#646464'>3.9.1 Mastering MBE Requires Continuous Learning and Adaptation to New Technologies</font>

- **Adapting to Technological Advancements:** As digital transformation accelerates, MBE methodologies must evolve to leverage the latest tools and innovations. Mastering MBE is not a one-time achievement; it requires a mindset of continuous improvement and an openness to adopting emerging technologies such as AI, Machine Learning, Blockchain, and Digital Twins. Organizations must invest in ongoing training and knowledge sharing to keep pace with rapid technological advancements and ensure their teams are well-equipped to implement MBE effectively.

- **Integration with Emerging Technologies:** The ability to integrate MBE with technologies such as Industry 4.0, IoT, and AI is crucial for staying competitive. For example, Digital Twins and AI-driven automation are enhancing the capabilities of MBE, improving everything from real-time product monitoring to predictive maintenance. As manufacturing and design environments become increasingly interconnected, staying current with these technologies will be key to reaping the full benefits of MBE.

- **Continuous Professional Development:** The complexity of MBE means that professionals must stay up-to-date with new software tools, standards, and techniques. Many professionals will seek advanced certifications and training in MBSE, PLM, IoT, and other related fields to remain competitive in the industry.

---

### **Industry 4.0 and AI Will Further Enhance MBE Capabilities in the Coming Years**

- **The Role of Industry 4.0:** The future of MBE is closely tied to the evolution of Industry 4.0, which introduces smart factories, automation**, and connected systems. This transformation allows for real-time data exchange, increased collaboration across teams, and more efficient decision-making. Industry 4.0 promises to enhance MBE by introducing innovations like AI-powered manufacturing, real-time analytics, and the Internet of Things (IoT), which will continuously improve design, manufacturing, and operational processes.

- **AI & Machine Learning’s Impact:** The application of AI and Machine Learning (ML) to MBE will continue to reshape how companies develop products. AI can automate many of the repetitive tasks traditionally handled manually, such as validating design rules, performing simulations, and optimizing production schedules. Additionally, predictive analytics driven by AI will provide real-time insights into the performance and health of products, enabling faster decision-making and improving lifecycle management. The ability of AI to handle large datasets and perform complex analyses will significantly enhance the effectiveness of MBE.

- **Autonomous Systems and AI Integration:** In the coming years, AI will likely integrate even more deeply into MBE processes, allowing for fully autonomous design optimization, manufacturing, and maintenance schedules. As AI systems learn from past data, they will be able to identify hidden patterns and optimize processes that would otherwise take human teams much longer to detect. This will lead to increased efficiency, lower costs, and improved product quality across various sectors.


### <center>[◀︎ Module 2](Module2.ipynb)     [🏠 Home](../../welcomePage.ipynb)     [Module 4 ▶︎](Module4.ipynb)</center>