## Table of Contents

  - [Table of Contents](#Table-of-Contents)
- [B1 Data](#B1-Data)
  - [1.1 Handling Unstructured Data in Digital Engineering](#1.1-Handling-Unstructured-Data-in-Digital-Engineering)
  - [1.2 Implementation](#1.2-Implementation)
  - [1.3 Extracting Numeric Features](#1.3-Extracting-Numeric-Features)
- [🏠 Home](../../welcomePage.ipynb)

# B1 Data
## 1.1 Handling Unstructured Data in Digital Engineering

In this module, we focus on handling unstructured data, using height map scans in digital engineering as a practical example. Unstructured data can come in many different forms, images, text, sensor outputs, or complex measurement grids , making it challenging to work with directly. To address this, we present a focused example that demonstrates our approach to processing and extracting meaningful insights from such data.

### <font color = '#646464'>1.1.1 What is a 3D Scan?</font>
A 3D scan digitally captures an object's shape, size, and sometimes texture. It generates a **point cloud**, a set of 3D data points used to create models.  

<center>
    <img src="Module 1 Content/img/01.jpg" alt="3D Scan Example" width="400"/>
</center>

### <font color = '#646464'>1.1.2 Objective</font> 
Techniques in this module apply to various unstructured data types. Learn to process raw data, extract key features, and analyze it effectively.  

### <font color = '#646464'>1.1.3 Key Topics</font>

#### - Cleaning Noise from Data  
Identify and remove extraneous data to isolate useful information, similar to noise reduction in a 3D scan.  

#### - Data Calibration  
Use mathematical methods like regression to correct raw data.  

#### - Feature Extraction and Analysis  
Extract key features from data including geometric features.  

## 1.2 Implementation

### <font color = '#646464'>1.2.1 Load Data</font>

In [None]:
import pandas as pd
import numpy as np
import os
from scipy.stats import linregress
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import Dropdown, interact
from IPython.display import clear_output
import sys
sys.path.append('Module 1 Content')  # Adjust the path as necessary

from functions import *



# Function to get a list of CSV files in the 'data' folder
def get_csv_files():
    data_dir = './Module 1 Content/data'  # Directory where CSV files are located
    return [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.csv')]

# Function to load the selected CSV file into a global variable
def load_selected_csv(file):
    global data
    clear_output(wait=True)
    print(f"Loading {file}...")
    data = load_data(file)  # Assuming load_data is a custom function to load CSV into data
    plot(data)  # Assuming plot is a custom function to plot data

# Dropdown widget to select a CSV file
csv_files = get_csv_files()
file_selector = Dropdown(options=csv_files, description="Select file:")

# Interactive widget to update the global variable based on selected file
interact(load_selected_csv, file=file_selector);

In [None]:
import numpy as np
import plotly.graph_objects as go
# Generate full X and Y positions based on 0.02 mm spacing
x = np.arange(0, data.shape[1]) * 0.02  # X axis (1500,)
y = np.arange(0, data.shape[0]) * 0.02  # Y axis (2250,)

# Downsample: pick every 50th point in both directions
data_downsampled = data[::5, ::5]
x_down = x[::5]
y_down = y[::5]

# Create meshgrid for downsampled data
X_down, Y_down = np.meshgrid(x_down, y_down)

# Flatten everything for Scatter3D
x_flat = X_down.flatten()
y_flat = Y_down.flatten()
z_flat = data_downsampled.flatten()

# Create scatter plot
fig = go.Figure(data=go.Scatter3d(
    x=x_flat,
    y=y_flat,
    z=z_flat,
    mode='markers',
    marker=dict(
        size=2,
        color=z_flat,
        colorscale='Viridis',
        colorbar=dict(title='Z Height'),
        opacity=0.8
    )
))

# Layout with equal aspect ratio
fig.update_layout(
    scene=dict(
        xaxis=dict(title='X (mm)', range=[x_down.min(), x_down.max()]),
        yaxis=dict(title='Y (mm)', range=[y_down.min(), y_down.max()]),
        zaxis=dict(title='Z Height'),
        aspectmode='data'
    ),
    title='3D Scatter Plot of Height Data (Downsampled)',
)

fig.show()

### <font color = '#646464'>1.2.2 Plot Calibration Instructions</font>

Plot calibration ensures data accuracy and alignment, particularly when working with physical objects or scans. This process corrects distortions using a **flat reference region ("bed")** in the scan.  

1. **Select the Bed Region**  
   Identify a stable, flat area in the scan to serve as the reference. Precise selection using sliders ensures a reliable base for calibration.  

2. **Fit a Linear Regression Line**  
   Compute a linear regression for the bed region to determine its slope. Any deviation from flatness quantifies tilt or unevenness in the dataset.  

3. **Apply Slope Calibration**  
   Adjust the entire dataset using the computed slope, correcting inclinations and distortions to align data accurately. This eliminates biases from uneven surfaces, ensuring reliable measurements and analysis.


If interested in more advanced techniques, read more about:
- **Plane Fitting** – Fits a plane to the bed region using least squares to correct tilt.  
- **RANSAC Plane Estimation** – Identifies the dominant flat surface while ignoring outliers.  
- **Gradient Descent Leveling** – Iteratively adjusts data points to minimize slope deviations.  
- **Affine Transformation** – Applies linear transformations to correct tilt and misalignment.  
- **Z-Axis Offset Correction** – Adjusts height values to ensure a uniform reference plane.  
- **Barycentric Coordinates Adjustment** – Uses weighted averages to realign points to a common flat plane.  

<center>
    <img src="Module 1 Content/img/02.jpg" alt="Alt Text" width="800"/>
</center>

### <font color = '#646464'>1.2.3 Bed Scan</font>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import FloatSlider, interactive
from IPython.display import display

# Assuming 'data' is your original 2D numpy array
data_temp = np.copy(data)  # Use the original data directly with NaNs preserved

# Flatten the data to prepare for the scatter plot
points = data_temp.flatten()

# Initialize global variables
global upper_bound_plate, lower_bound_plate
upper_bound_plate = np.nanmax(points)
lower_bound_plate = np.nanmin(points)

def update_plots(y1, y2):
    global upper_bound_plate, lower_bound_plate
    # Update global variables with the slider values
    upper_bound_plate = y2
    lower_bound_plate = y1
    
    # Create a mask for values outside the slider range, respecting NaNs
    mask = (data_temp < y1) | (data_temp > y2)
    filtered_data = np.copy(data_temp)
    filtered_data[mask] = np.nan  # Set values outside the range to NaN
    
    # Plotting using subplots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
    
    # Scatter plot on the first subplot
    valid_points = ~np.isnan(points)  # Mask to remove NaNs for the scatter plot
    ax1.scatter(np.arange(len(points))[valid_points], points[valid_points], marker='o', linestyle='-', s=0.01)
    ax1.set_title('Flattened Array Values')
    ax1.set_xlabel('Index')
    ax1.set_ylabel('Height Value')
    ax1.grid(True)
    ax1.axhline(y=lower_bound_plate, color='r', linestyle='--')
    ax1.axhline(y=upper_bound_plate, color='g', linestyle='--')
    
    # Heatmap on the second subplot
    sns.heatmap(filtered_data, cmap='viridis', cbar=False, ax=ax2)
    ax2.set_aspect(aspect= 'equal')
    ax2.set_title('Filtered Data Heatmap')
    ax2.set_xticks([])
    ax2.set_yticks([])
    
    plt.tight_layout()
    plt.show()

# Set up sliders for the interactive lines
y1_slider = FloatSlider(min=np.nanmin(points), max=np.nanmax(points), step=0.01, value=np.nanmin(points), description='Minimum')
y2_slider = FloatSlider(min=np.nanmin(points), max=np.nanmax(points), step=0.01, value=np.nanmax(points), description='Maximum')

interactive_plot = interactive(update_plots, y1=y1_slider, y2=y2_slider)
output = interactive_plot.children[-1]
display(interactive_plot)


### <font color = '#646464'>1.2.4 Bed Calibration</font>

In [None]:
import ipywidgets as widgets
from IPython.display import display

button = widgets.Button(description="Calibrate Scan")
output = widgets.Output()

def on_button_clicked(b):
    global data  # Make sure to modify the global 'data' variable
    with output:
        data = correct_tilt(data, lower_bound_plate, upper_bound_plate)
        print("Data has been updated.")

button.on_click(on_button_clicked)
display(button, output)

### <font color = '#646464'>1.2.5 Feature Extraction Instructions</font>

In the feature extraction process, the goal is to focus on the relevant part of the data while removing any unnecessary information that could distract from the analysis. The first step in this process is to **define the area** of interest.

To do this, you carefully adjust the sliders to select the region of the plot that contains the part you are most concerned with, whether it's a specific feature or the object you're studying. This allows you to isolate the relevant data from the rest of the plot. The **bed**, which we previously calibrated, is not part of the area of interest, so it must be excluded from this selection. By removing the bed, we ensure that the data we're working with is focused on the specific object or feature of interest, allowing for a more precise analysis. 

This process helps us filter out irrelevant data and concentrate on what truly matters, ensuring that our subsequent steps in analysis, such as fitting models or extracting more detailed features, are based only on the data that is most important for the task at hand.


### <font color = '#646464'>1.2.6 Region Selection</font>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import FloatSlider, interactive
from IPython.display import display

# Assuming 'data' is your original 2D numpy array
data_temp = np.copy(data)  # Use the original data directly with NaNs preserved

# Flatten the data to prepare for the scatter plot
points = data_temp.flatten()

# Initialize global variables
global upper_bound, lower_bound
upper_bound = np.nanmax(points)
lower_bound = np.nanmin(points)

def update_plots(y1, y2):
    global upper_bound, lower_bound
    # Update global variables with the slider values
    upper_bound = y2
    lower_bound = y1
    
    # Create a mask for values outside the slider range, respecting NaNs
    mask = (data_temp < y1) | (data_temp > y2)
    filtered_data = np.copy(data_temp)
    filtered_data[mask] = np.nan  # Set values outside the range to NaN
    
    # Plotting using subplots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
    
    # Scatter plot on the first subplot
    valid_points = ~np.isnan(points)  # Mask to remove NaNs for the scatter plot
    ax1.scatter(np.arange(len(points))[valid_points], points[valid_points], marker='o', linestyle='-', s=0.01)
    ax1.set_title('Flattened Array Values')
    ax1.set_xlabel('Index')
    ax1.set_ylabel('Height Value')
    ax1.grid(True)
    ax1.axhline(y=lower_bound, color='r', linestyle='--')
    ax1.axhline(y=upper_bound, color='g', linestyle='--')
    
    # Heatmap on the second subplot
    sns.heatmap(filtered_data, cmap='viridis', cbar=False, ax=ax2)
    ax2.set_aspect(aspect= 'equal')
    ax2.set_title('Filtered Data Heatmap')
    ax2.set_xticks([])
    ax2.set_yticks([])
    
    plt.tight_layout()
    plt.show()

# Set up sliders for the interactive lines
y1_slider = FloatSlider(min=np.nanmin(points), max=np.nanmax(points), step=0.01, value=np.nanmin(points), description='Minimum')
y2_slider = FloatSlider(min=np.nanmin(points), max=np.nanmax(points), step=0.01, value=np.nanmax(points), description='Maximum')

interactive_plot = interactive(update_plots, y1=y1_slider, y2=y2_slider)
output = interactive_plot.children[-1]
display(interactive_plot)


## 1.3 Extracting Numeric Features

The decision to convert the scan into an image format is a practical step toward simplifying the analysis of complex 3D data. When the features of interest are primarily 2D in nature, projecting the data into a 2D representation allows us to leverage a wide range of well-established image processing techniques tailored for visual data.

This approach represents one of the simpler and more efficient ways to handle 3D data—especially when the dimensionality of the features aligns with the 2D plane. However, it’s worth noting that more advanced, domain-specific algorithms can be developed to work directly with full 3D data when needed.

In this module, our goal is to streamline the process by applying existing and open-source libraries and tools.

### <font color = '#646464'>1.3.1 Explanation of the Code and its Functionality</font>

This code is designed to extract and fit geometric shapes to a binary image, which represents data that has been processed for feature extraction. The following steps outline the core functionality:

1. **Binary Data Preparation**: The data is first thresholded into a binary format, where pixels falling within a certain intensity range are marked as `1` (white), while all other pixels are marked as `0` (black). This step essentially isolates the relevant features from the background.

2. **Contour Detection**: Once the binary data is prepared, the contours in the image are identified using the `cv2.findContours()` function. Contours represent the boundaries of connected regions in the binary image. The largest contour is selected for further analysis because it is assumed to correspond to the object of interest.

3. **Shape Fitting**: Based on the selected shape (Rectangle, Triangle, Square, Circle, or Ellipse), the code fits a specific geometric model to the detected contour.

   - **Rectangle**: The code uses `cv2.minAreaRect()` to fit a rectangle to the contour, and then the rectangle’s corners are extracted and scaled to physical units (in millimeters). The width and height of the rectangle are printed as part of the output.
   
   - **Triangle**: The function `cv2.minEnclosingTriangle()` fits the smallest enclosing triangle around the contour. The vertices of the triangle are extracted and scaled, and the coordinates of these vertices are printed.
   
   - **Square**: Similar to the rectangle fitting, a square is fitted by taking the largest side of the bounding rectangle and using that length for all sides. The square is placed at the center of the contour, and its side length is printed.
   
   - **Circle**: The function `cv2.minEnclosingCircle()` is used to fit the smallest enclosing circle to the contour. The center and radius of the circle are calculated, scaled, and displayed in the output.

   - **Ellipse**: If the contour has at least 5 points, the code uses `cv2.fitEllipse()` to fit an ellipse to the contour. The center, axes (major and minor), and the orientation angle of the ellipse are calculated, scaled, and displayed.

### <font color = '#646464'>1.3.2 Extract Features</font>

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon, Ellipse
import ipywidgets as widgets
from IPython.display import display

# Define the scale factor (0.02 mm per pixel)
scale_factor = 0.02

# Define the function to execute upon shape and epsilon selection
def fit_and_plot(shape, epsilon_factor):
    binary_data = np.where((data >= lower_bound) & (data <= upper_bound), 1, 0)
    contours, _ = cv2.findContours(binary_data.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    if contours:
        largest_contour = max(contours, key=cv2.contourArea)
        epsilon = epsilon_factor * cv2.arcLength(largest_contour, True)
        approx = cv2.approxPolyDP(largest_contour, epsilon, True)

        plt.figure(figsize=(6, 6))
        plt.imshow(binary_data, cmap='gray', origin='lower', extent=[0, binary_data.shape[1] * scale_factor, 0, binary_data.shape[0] * scale_factor])
        
        if shape == 'Rectangle':
            rect = cv2.minAreaRect(approx)
            box = cv2.boxPoints(rect) * scale_factor
            plt.gca().add_patch(Polygon(box, closed=True, color='red', fill=False, linewidth=2))
            print(f"Rectangle width: {rect[1][0] * scale_factor} mm, height: {rect[1][1] * scale_factor} mm")
        
        elif shape == 'Triangle':
            triangle = cv2.minEnclosingTriangle(approx)[1] * scale_factor
            plt.gca().add_patch(Polygon(triangle[0], closed=True, color='blue', fill=False, linewidth=2))
            print("Triangle vertices:", triangle[0])
        
        elif shape == 'Square':
            rect = cv2.minAreaRect(approx)
            side = max(rect[1]) * scale_factor
            center = np.array(rect[0]) * scale_factor
            box = np.array([
                [center[0] - side / 2, center[1] - side / 2],
                [center[0] + side / 2, center[1] - side / 2],
                [center[0] + side / 2, center[1] + side / 2],
                [center[0] - side / 2, center[1] + side / 2],
            ])
            plt.gca().add_patch(Polygon(box, closed=True, color='green', fill=False, linewidth=2))
            print(f"Square side length: {side} mm")
        
        elif shape == 'Circle':
            center, radius = cv2.minEnclosingCircle(approx)
            center_scaled = np.array(center) * scale_factor
            radius_scaled = radius * scale_factor
            circle_patch = plt.Circle(center_scaled, radius_scaled, color='orange', fill=False, linewidth=2)
            plt.gca().add_patch(circle_patch)
            print(f"Circle radius: {radius_scaled} mm")
        
        elif shape == 'Ellipse':
            if len(approx) >= 5:  # Ensure there are at least 5 points to fit an ellipse
                ellipse = cv2.fitEllipse(approx)
                ellipse_center = np.array(ellipse[0]) * scale_factor
                axes = np.array(ellipse[1]) * scale_factor
                angle = ellipse[2]
                plt.gca().add_patch(Ellipse(xy=ellipse_center, width=axes[0], height=axes[1], angle=angle, edgecolor='purple', fill=False, linewidth=2))
                print(f"Ellipse center: {ellipse_center} mm, axes: {axes} mm, angle: {angle}")
            else:
                print("Not enough points to fit an ellipse.")


        plt.xlabel('X (mm)')
        plt.ylabel('Y (mm)')
        plt.show()
    else:
        print("No contours found in the binary image.")

# Widgets for shape and epsilon factor selection
shape_selector = widgets.Dropdown(options=['Rectangle', 'Triangle', 'Square', 'Ellipse', 'Circle'], description='Shape:')
epsilon_slider = widgets.FloatSlider(value=0.01, min=0.0, max=0.1, step=0.005, description='Epsilon Factor:', readout_format='.3f')

# Link widgets to the function
interactive_plot = widgets.interactive_output(fit_and_plot, {'shape': shape_selector, 'epsilon_factor': epsilon_slider})

# Display the widgets and the plot
display(widgets.VBox([shape_selector, epsilon_slider]), interactive_plot)


The 3D scans performed by technicians or engineers initially captured valuable data, but in their raw, unstructured form, they held little meaning. The data was essentially a collection of points and measurements without context, and without further processing, it wasn't immediately useful for decision-making or analysis. This is common in engineering where raw scans or measurements often lack direct applicability without additional interpretation.

To learn how to deal with Video Unstructured Data, another example will be presented in [Green Belt Module 4](../../Modules/2.%20Green%20Belt%20Level/Module4.ipynb)

### <center>[🏠 Home](../../welcomePage.ipynb)     [Module 2 ▶︎](Module2.ipynb)</center>