# **AI in Pipeline Engineering**

# Summary

This notebook centers on predicting the maximum depth of anomalies in In-Line Inspection (ILI) data. Utilizing various machine learning techniques, the goal is to fill in missing values and forecast the future growth of anomalies. Accurately estimating anomaly depth is critical for assessing pipeline strength and ensuring safety. The process involves data exploration, cleaning, feature engineering, anomaly mapping, and advanced modeling. These steps offer valuable insights for managing pipeline integrity, enabling proactive maintenance and risk mitigation.

The ILI data for this study is publicly available from the [Mendeley Data repository](https://data.mendeley.com/datasets/c2h2jf5c54/1). The dataset, titled "Dataset for: Cross-country Pipeline Inspection Data Analysis and Testing of Probabilistic Degradation Models", was published on October 4, 2021, by Rioshar Yarveisy, Faisal Khan, and Rouzbeh Abbassi from Memorial University of Newfoundland and Macquarie University. The dataset includes four consecutive ILI data sets, which lack certain details such as coordinates, likely due to anonymization efforts.

# 1. Introduction

Pipeline integrity management is crucial in ensuring the safety and reliability of gas and oil transportation. In-line inspection (ILI) tools are extensively used to detect and measure anomalies in pipelines. Accurately predicting the maximum depth of these anomalies is essential for proactive maintenance and risk mitigation. This notebook demonstrates a comprehensive workflow, from data loading and cleaning to advanced machine learning modeling, aimed at predicting anomaly depths effectively. Key steps in the process include:

**Data Exploration and Cleaning**: This involves exploratory data analysis (EDA) to understand the data distribution and identify patterns, handling duplicate records, and managing missing values.

**Feature Engineering**: We compute new features such as aspect ratio and area of anomalies, estimate the maximum depth using domain-specific calculations, and create cyclic features from angular measurements.

**Anomaly Mapping**: We match anomalies across different inspection years to track their growth and changes over time. This involves sophisticated matching algorithms to identify corresponding anomalies based on relative distances and orientations.

**Modeling**: We employ machine learning models, particularly the Hist Gradient Boosting Regressor, to predict the maximum depth of anomalies. This includes data preparation, model training, hyperparameter tuning, and evaluation.

**Prediction and Validation**: The predicted values are validated against actual measurements to ensure accuracy. We also compare the machine learning predictions with domain-specific estimates to highlight the added value of advanced modeling techniques.

The ILI data for this study is publicly available from the [Mendeley Data repository](https://data.mendeley.com/datasets/c2h2jf5c54/1). The dataset, titled "Dataset for: Cross-country Pipeline Inspection Data Analysis and Testing of Probabilistic Degradation Models," was published on October 4, 2021, by Rioshar Yarveisy, Faisal Khan, and Rouzbeh Abbassi from Memorial University of Newfoundland and Macquarie University. The dataset includes four consecutive ILI data sets, which lack certain details such as coordinates, likely due to anonymization efforts.

# 2. Setup

**Import Dependencies**

In [1]:
import warnings
warnings.filterwarnings('ignore')
import os
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import importlib
from src import tools
importlib.reload(tools)

<module 'src.tools' from 'c:\\Users\\Farhad.Davaripour\\Repositories\\AI_Applications_in_Pipeline_Engineering\\src\\tools.py'>

In [2]:
from openai import OpenAI
from dotenv import load_dotenv
_ = load_dotenv(override=True)
client = OpenAI()

In [None]:
def ask_me(user_query):
    messages = [{"role": "system", "content": "You are an AI assistant that answers to the user query."},
                {"role": "user", "content": user_query}
                ]
    completion = client.chat.completions.create(
                model='gpt-4o-mini', 
                temperature=0,
                messages=messages)
    return completion.choices[0].message.content

user_query = 'Provide concise description of International Pipeline Conference (IPC) conference in Calgary? keep it in under 20 words.'
print(ask_me(user_query))


**Loading the ILI Data**

In [5]:
# Load the data locally - Set the directory to save and read the data
# save_path = "Dataset/processed_data/"
# Anomalies_df  = pd.read_parquet(os.path.join(save_path, f'Anomoly_processed.parquet'))

In [75]:
# URL of the raw Parquet file from GitHub
url = 'https://github.com/Farhad-Davaripour/AI_Applications_in_Pipeline_Engineering/raw/main/Dataset/processed_data/Anomoly_processed.parquet'

# Use pandas with fsspec to read the Parquet file directly
Anomalies_df = pd.read_parquet(url, engine='pyarrow', storage_options={"anon": True})

**Rename Columns**

In [6]:
from src.tools import rename_anomaly_columns

# Rename the columns to make them more readable
Anomalies_df = rename_anomaly_columns(Anomalies_df)

In [None]:
i = 0
for col in Anomalies_df.columns:
    print(f"column #{i+1}: {col}")
    i+=1

**Fix Data Types**

In [None]:
Anomalies_df.dtypes

In [9]:
Anomalies_df['InspectionYear'] = Anomalies_df['InspectionYear'].astype(int)
Anomalies_df['GirthWeldNumber'] = Anomalies_df['GirthWeldNumber'].astype(int)
Anomalies_df['WallThickness_mm'] = Anomalies_df['WallThickness_mm'].astype(float)

In [None]:
Anomalies_df.dtypes

# 3. EDA

In [11]:
from src.tools import EDA

**Max Depth (mm)**

In [12]:
# Create the EDA object
eda = EDA(Anomalies_df)

In [None]:
# plot the histogram of the max depth
eda.plot_histogram_max_depth('MaxDepth_mm')

In [None]:
# Summary statistics
Anomalies_df['MaxDepth_mm'].describe()

In [None]:
# Calculate the percentiles and IQR
eda.calculate_percentiles_and_iqr('MaxDepth_mm')

In [None]:
# Plot the boxplot
eda.plot_boxplot_max_depth('MaxDepth_mm')

**Linear Correlation**

In [None]:
eda.plot_correlation_matrix()

# 4. Data Preprocessing

## 4.1 Duplicate Values

In [None]:
# Check for duplicate rows
duplicates = Anomalies_df.duplicated(keep=False)

# Display the duplicate rows
duplicate_rows = Anomalies_df[duplicates]

# Print the duplicate rows
print("Duplicate rows in the dataframe:")
print(duplicate_rows)

## 4.2 Missing Values

In [19]:
from src.tools import MissingValuesAnalyzer

# Create the MissingValuesAnalyzer object
MissingValuesAnalyzer = MissingValuesAnalyzer(Anomalies_df)

**Identify Features w/ Missing Values**

In [None]:
# Find columns with missing values
MissingValuesAnalyzer.find_missing_values()

### 4.2.1 End Point Distance                 

In [21]:
# Apply the calculation only if the 'EndPointDistance_m' column has NaN values
Anomalies_df['EndPointDistance_m'] = np.where(
    Anomalies_df['EndPointDistance_m'].isna(),
    Anomalies_df['StartPointDistance_m'] + Anomalies_df['FeatureLength_mm'] / 1000,
    Anomalies_df['EndPointDistance_m']
)

In [None]:
# Find the remaining columns with missing values
MissingValuesAnalyzer.find_missing_values()

### 4.2.2 Seam Orientation             

#### 4.2.2.1 Handle Joints with Inconsistent Seam Orientation**

In [None]:
# Handle joints with inconsistent seam orientation and print only the last joint
MissingValuesAnalyzer.check_inconsistent_seam_orientation()

In [24]:
# Handle joints with inconsistent seam orientation
Anomalies_df = MissingValuesAnalyzer.handle_inconsistent_seam_orientation()

In [None]:
# Find and report the inconsistent joints
MissingValuesAnalyzer.find_and_report_inconsistent_joints()

#### 4.2.2.2 Handle Joints with missing Values

**Permutation using Mean**

In [None]:
# Fill the missing seam orientation values with the average. Since each joints has a unique seam orientation, the average is the same as the original value.
Anomalies_df, filled = MissingValuesAnalyzer.fill_missing_seam_orientation_w_average()
print(f"number of filled values: {filled}")

In [None]:
# Find columns with missing values. The remaining missing values occurs in joints with no seam orientation across all inspection years. For those, we can use fill forward from the previous joint.
MissingValuesAnalyzer.find_missing_values()

**Permutation using Fill Forward**

In [28]:
# Fill the missing seam orientation values with the previous value
AnomaliesProc =  MissingValuesAnalyzer.fill_missing_seam_orientation_w_ffill()

In [None]:
# Find columns with missing values
MissingValuesAnalyzer.find_missing_values()

## 4.3 Outlier Removal

In [30]:
from src.tools import HandlingOutlier

# Example usage
handling_outlier_columns = ['MaxDepth_mm', 'FeatureWidth_mm', 'FeatureLength_mm', 'InspectionYear']

# Create an instance of the HandlingOutlier class
outlier_handler = HandlingOutlier(AnomaliesProc)

# Remove outliers using Z-score method
Anomalies_OutliersAdjusted_df = outlier_handler.remove_outliers_zscore(handling_outlier_columns)

# Remove outliers using Isolation Forest method
Anomalies_OutliersAdjusted_df = outlier_handler.remove_outliers_isolation_forest(handling_outlier_columns)

In [None]:
# Create the EDA object
eda = EDA(Anomalies_OutliersAdjusted_df)

# Plot the boxplot
eda.plot_boxplot_max_depth('MaxDepth_mm')

# 5. Feature Engineering

**Predicting Anomaly Depth: A Machine Learning Approach**
This exercise aims to predict the maximum depth of anomalies for educational purposes. The applications of this prediction include filling in missing data and forecasting the future growth of anomalies, particularly the maximum depth.

## 5.1 Setup

In [32]:
from sklearn.model_selection import train_test_split
from sklearn.experimental import enable_hist_gradient_boosting 
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore
from skopt import BayesSearchCV
from skopt.space import Integer, Real
from skopt.callbacks import DeltaYStopper

In [33]:
# Make a copy of the DataFrame
Anomalies_EngineeringFeatures_df = Anomalies_OutliersAdjusted_df.copy()

## 5.2 Anomaly Mapping

**Setup**

In [34]:
from src.tools import Anomaly_mapping, plot_anomalies_by_year

# Define the parameters
increment_size = 1000
relative_distance_threshold = 0.1  # meters
orientation_threshold = 10  # degrees

**Running the Mapping Pipeline**

In [35]:
# Create an instance of Anomaly_mapping using AnomaliesProc_ML_Ready
# anomaly_mapper = Anomaly_mapping(Anomalies_EngineeringFeatures_df, relative_distance_threshold, orientation_threshold)

# # Call the process_in_increments method and store the result in Anomalies_EngineeringFeatures_Mapped_df
# anomaly_mapper.process_in_increments(save_path, increment_size)

In [36]:
# Anomalies_EngineeringFeatures_Mapped_df = anomaly_mapper.concat_mapped_dfs(save_path)

In [37]:
# Save the updated DataFrame to a CSV file
# Plot_Anomaly_mapped_df_file_path = (os.path.join(save_path, f'Plot_Mapped_Anomalies.csv'))

# Anomalies_EngineeringFeatures_Mapped_df[[ # type: ignore
#     'GirthWeldNumber',
#     'InspectionYear',
#     'RelativeDistance_m',
#     'Tag',
#     'SignificantPointOrientation_deg'
# ]].to_parquet(Plot_Anomaly_mapped_df_file_path, index=False)

**Plot the anomalies by year**

In [77]:
# Load the data locally
# Anomalies_EngineeringFeatures_Mapped_df = pd.read_parquet(os.path.join(save_path, 'Anomalies_Mapped_First_1000_GirthWelds.parquet'))

# Load the data from github url
url = 'https://github.com/Farhad-Davaripour/AI_Applications_in_Pipeline_Engineering/raw/main/Dataset/processed_data/Anomalies_Mapped_First_1000_GirthWelds.parquet'

# Use pandas with fsspec to read the Parquet file directly
Anomalies_EngineeringFeatures_Mapped_df = pd.read_parquet(url, engine='pyarrow', storage_options={"anon": True})

In [None]:
# Plot the anomalies by year
plot_anomalies_by_year(Anomalies_EngineeringFeatures_Mapped_df, 14, figsize=(8, 3)) # type: ignore

## 5.4 Aspect Ratio and Area

In [40]:
from src.tools import FeatureEngineering

# Create an instance of the class with your dataframe
feature_engineering = FeatureEngineering(Anomalies_EngineeringFeatures_Mapped_df)

In [41]:
# Compute the aspect ratio
Anomalies_EngineeringFeatures_Geometric_df = feature_engineering.compute_aspect_ratio()

# Calculate the feature area
Anomalies_EngineeringFeatures_Geometric_df = feature_engineering.calculate_feature_area()

In [None]:
# Selecting columns
selected_columns = ['FeatureLength_mm', 'FeatureWidth_mm', 'AspectRatio', 'FeatureArea_mm2',]

# Creating a new DataFrame with only the selected columns
Anomalies_EngineeringFeatures_Geometric_df[selected_columns].head()

## 5.5 Radial to Cyclic Features

In [43]:
# Add the angular features
angle_columns = ['SignificantPointOrientation_deg']
Anomalies_EngineeringFeatures_Radial2Cyclic_df = feature_engineering.add_angular_features(angle_columns)

In [None]:
# Selecting columns that end with '_rad'
rad_columns = [col for col in Anomalies_EngineeringFeatures_Radial2Cyclic_df.columns if col.startswith('SignificantPointOrientation')]
# Creating a new DataFrame with only the selected columns
Anomalies_EngineeringFeatures_Radial2Cyclic_df[rad_columns].head()

## 5.6 Tag Erroneous Records

In [45]:
from src.tools import ErroneousAnomalyProcessor

# Detect errors in mapped anomalies
anomaly_processor = ErroneousAnomalyProcessor(Anomalies_EngineeringFeatures_Radial2Cyclic_df)

# Apply the detect_errors method
Anomalies_EngineeringFeatures_Tagging_df = Anomalies_EngineeringFeatures_Radial2Cyclic_df.copy()
Anomalies_EngineeringFeatures_Tagging_df['ErrorClassification'] = Anomalies_EngineeringFeatures_Tagging_df.apply(
    anomaly_processor.detect_errors, axis=1
)

In [None]:
anomaly_processor = ErroneousAnomalyProcessor(Anomalies_EngineeringFeatures_Tagging_df)

# Print anomaly statistics
anomaly_processor.print_error_statistics()

## 5.7 Include Second Prior Inspection Data

In [None]:
from src.tools import add_dprev_features

# Add the secont previous inspection year features to the DataFrame
Old_Anomalies_EngineeringFeatures_Dprev_df = add_dprev_features(Anomalies_EngineeringFeatures_Tagging_df)

In [78]:
# Load the data locally
# Old_Anomalies_EngineeringFeatures_Dprev_df = pd.read_parquet(os.path.join(save_path, 'Old_Anomalies_EngineeringFeatures_Dprev.parquet'))

# Load from github url
url = 'https://github.com/Farhad-Davaripour/AI_Applications_in_Pipeline_Engineering/raw/main/Dataset/processed_data/Old_Anomalies_EngineeringFeatures_Dprev.parquet'

# Use pandas with fsspec to read the Parquet file directly
Old_Anomalies_EngineeringFeatures_Dprev_df = pd.read_parquet(url, engine='pyarrow', storage_options={"anon": True})

## 5.8 Filter Anomalies
This section should ideally be moved from Feature Engineering to Data Pre-processing step. However, the next step of including data from the second prior inspection is computationally intensive. The curation process in this step will reduce the data population, thereby decreasing computational latency.

In [49]:
# Filter the DataFrame to include only the 'Okay' records
Anomalies_EngineeringFeatures_Filtered_df = Old_Anomalies_EngineeringFeatures_Dprev_df[Old_Anomalies_EngineeringFeatures_Dprev_df.ErrorClassification == 'Okay']

# Filter the DataFrame to include only the 'old' and 'new' records
Old_Anomalies_EngineeringFeatures_Filtered_df = Anomalies_EngineeringFeatures_Filtered_df[Anomalies_EngineeringFeatures_Filtered_df.Tag == 'old']
New_Anomalies_EngineeringFeatures_Filtered_df = Anomalies_EngineeringFeatures_Filtered_df[Anomalies_EngineeringFeatures_Filtered_df.Tag == 'new']

## 5.9 Estimated Anomaly Geometry

In [50]:
Old_Anomalies_EngineeringFeatures_EstGeometry_df = Old_Anomalies_EngineeringFeatures_Filtered_df.copy()

Old_Anomalies_EngineeringFeatures_EstGeometry_df['Estimated_FeatureLength_mm'] = (
    2 * Old_Anomalies_EngineeringFeatures_EstGeometry_df['Prev_FeatureLength_mm'] -
    Old_Anomalies_EngineeringFeatures_EstGeometry_df['DPrev_FeatureLength_mm']
)

Old_Anomalies_EngineeringFeatures_EstGeometry_df['Estimated_FeatureWidth_mm'] = (
    2 * Old_Anomalies_EngineeringFeatures_EstGeometry_df['Prev_FeatureWidth_mm'] -
    Old_Anomalies_EngineeringFeatures_EstGeometry_df['DPrev_FeatureWidth_mm']
)

Old_Anomalies_EngineeringFeatures_EstGeometry_df['Powered_Prev_MaxDepth_mm'] = (
    Old_Anomalies_EngineeringFeatures_EstGeometry_df['Prev_MaxDepth_mm'] ** 2
)

## 5.10 Encoding Anomaly Cluster

In [51]:
from src.tools import AnomalyClusterer

In [52]:
# List of features to be used for clustering
clustering_features = [
    'RelativeDistance_m',
    'FeatureLength_mm',
    'FeatureWidth_mm',
    'MaxDepth_mm',
    'SignificantPointOrientation_deg',
    'Prev_RelativeDistance_m',
    'Prev_FeatureLength_mm',
    'Prev_FeatureWidth_mm',
    'Prev_MaxDepth_mm',
    'Prev_SignificantPointOrientation_deg',
    'JointLength_m',
    'SeamOrientation_deg',
    'StartPointDistance_m',
    'StartPointOrientation_deg',
    'EndPointDistance_m',
    'EndPointOrientation_deg',
    'SignificantPointRelDistance_m',
    'WallThickness_mm',
    'AspectRatio',
    'FeatureArea_mm2']

In [53]:
# Create an instance of the AnomalyClusterer class
clusterer = AnomalyClusterer(Old_Anomalies_EngineeringFeatures_EstGeometry_df, clustering_features, 3)

In [54]:
# Perform clustering
Old_Anomalies_EngineeringFeatures_Clustered_df = clusterer.perform_clustering()

In [None]:
Old_Anomalies_EngineeringFeatures_Clustered_df[['GirthWeldNumber',
                                                'InspectionYear',
                                                'RelativeDistance_m',
                                                'SignificantPointOrientation_deg',
                                                'anomaly_type']
                                                ].head()

In [None]:
# plot the explained variance ratio for each principal component
clusterer.plot_pca_explained_variance()

In [None]:
# Visualize the clusters in a two-dimensional space using first two principal components
clusterer.visualize_clusters()

## 5.11 Feature Importance

Lasso regularization algorithm is used for to identify feature importance.

**Processing Features**

In [58]:
# Define features and target
features = Old_Anomalies_EngineeringFeatures_Clustered_df.drop(
    columns=[
        'MaxDepth_mm',
        'Tag',
        'ErrorClassification',
        'DepthChange'
    ]
)

target = Old_Anomalies_EngineeringFeatures_Clustered_df['MaxDepth_mm']

In [None]:
from src.tools import FeatureImportance

feature_importance = FeatureImportance(features, target)

# Perform the steps
feature_importance.standardize_features()
feature_importance.split_data()
feature_importance.perform_grid_search()
feature_importance.fit_best_lasso()
feature_importance.calculate_coefficients()

# Plot the coefficients
feature_importance.plot_coefficients()

In [None]:
# Plot non-zero coefficients
feature_importance.plot_non_zero_coefficients()

In [None]:
# Get the important features
importance_df = feature_importance.importance_df
important_features = importance_df[importance_df['Coefficient'].abs() > 0.01].Feature.tolist()
important_features

# 6. Training

## 6.1 Predicting Missing Values

### 6.1.1 Defining Features

In [62]:
Old_Anomalies_Training_df = Old_Anomalies_EngineeringFeatures_Clustered_df.copy()

In [63]:
# Define target variable
target = Old_Anomalies_Training_df['MaxDepth_mm']

# Define wall thickness variable
wt_mm = Old_Anomalies_EngineeringFeatures_Clustered_df.WallThickness_mm

# Keep all features, including rows where target is an outlier
features = Old_Anomalies_Training_df[important_features]

### 6.1.2 Training Pipeline and Evaluation

In [None]:
from src.tools import TrainingPipeline

# Create an instance of the TrainingPipeline class
ML_pipeline = TrainingPipeline(features, target)
print("Pipeline instance created")

# Scale the features
ML_pipeline.scale_features()
print("Features scaled")

# Split the data
ML_pipeline.split_data(handle_imbalance=True)
print("Data split")

# Perform hyperparameter tuning and return the best parameters
best_params = ML_pipeline.hyperparameter_tuning()
print("Best parameters found:")
print(best_params)

# Fit the model using the best hyper parameters
best_model = ML_pipeline.fit_model()
print("Model fitted")

In [None]:
# Evaluate the model and print the metrics across minority class
ML_pipeline.evaluate_model()

```python
# Previous performance metrics
{'RMSE': 0.2721, 'MAE': 0.224, 'R2': 0.9164, 'MAPE': 27.4117, 'ME': 0.0209}


### 6.1.3 Imbalance Distribution Analysis

In [None]:
# class_counts = DPrev_Old_Filtered_Anomaly_mapped_df['MaxDepth_mm'].value_counts().sort_index()
sns.histplot(Old_Anomalies_Training_df['MaxDepth_mm'], bins=100)
plt.title('Hist Plot of MaxDepth_mm')
plt.xlabel('MaxDepth_mm')
plt.ylabel('Count')
plt.show()

### 6.1.4 Model Performance Visualization

In [None]:
# Plotting results and evaluating prediction accuracy
results = ML_pipeline.plot_prediction_accuracy()

In [None]:
ML_pipeline.plot_scatter(results)

## 6.2 Anomaly Growth

### 6.2.1 Training and Evaluation Pipeline

In [69]:
# Apply the mask to remove outliers only from the target variable
target = Old_Anomalies_Training_df['MaxDepth_mm']

# Define the list of features
feature_columns = [
        'GirthWeldNumber',
        'InspectionYear',
        'RelativeDistance_m',
        'Estimated_FeatureLength_mm',
        'Estimated_FeatureWidth_mm',
        'SignificantPointOrientation_deg',
        'Prev_InspectionYear',
        'Prev_RelativeDistance_m',
        'Prev_FeatureLength_mm',
        'Prev_FeatureWidth_mm',
        'Prev_MaxDepth_mm',
        'Powered_Prev_MaxDepth_mm',
        'Prev_SignificantPointOrientation_deg',
        'DPrev_RelativeDistance_m',
        'DPrev_FeatureLength_mm',
        'DPrev_FeatureWidth_mm',
        'DPrev_MaxDepth_mm',
        'DPrev_SignificantPointOrientation_deg',
        'SignificantPointOrientation_deg_sin',
        'SignificantPointOrientation_deg_cos'
    ]

# Keep all features, including rows where target is an outlier
features = Old_Anomalies_Training_df[feature_columns]

# Ensure features and target have the same index
features = features.loc[target.index]

In [None]:
from src.tools import TrainingPipeline

# Create an instance of the TrainingPipeline class
ML_pipeline = TrainingPipeline(features, target)
print("Pipeline instance created")

# Scale the features
ML_pipeline.scale_features()
print("Features scaled")

# Split the data
ML_pipeline.split_data(handle_imbalance=True)
print("Data split")

# Perform hyperparameter tuning and return the best parameters
best_params = ML_pipeline.hyperparameter_tuning()
print("Best parameters found:")
print(best_params)

# Fit the model using the best hyper parameters
best_model = ML_pipeline.fit_model()
print("Model fitted")

# Evaluate the model and print the metrics
evaluation_metrics = ML_pipeline.evaluate_model()
for metrics,performance in evaluation_metrics.items():
    print(f"{metrics}: {performance}")

### 6.2.2 Visualizing Model Performance

In [None]:
results = ML_pipeline.plot_prediction_accuracy()

In [None]:
ML_pipeline.plot_scatter(results)

### 6.2.3 Predicting Future Max Depth

In [None]:
from src.tools import AnomalyPredictionPipeline

Old_Anomalies_Prediction_df = Old_Anomalies_Training_df.copy()
Old_Anomalies_Prediction_df['WallThickness_mm'] = wt_mm

prev_inspection_year = 7
next_inspection_year = 9

# Create an instance of the pipeline
pipeline = AnomalyPredictionPipeline(model=best_model, df= Old_Anomalies_Prediction_df, prev_inspection_year=7, next_inspection_year=9)

# Prepare data
Old_Anomalies_Prediction_df = pipeline.prepare_data(wt_mm, feature_columns, target.name)

# Make predictions
Old_Anomalies_Prediction_df = pipeline.make_predictions( feature_columns)

# Perform analytics
pipeline.perform_analytics((6, 3))