#### DATA INTERPOLATION AND EXTRAPOLATION

- In our original dataset, we have 60 data points.
- While the data looks fairly sufficient, we can improve the model's performance by increasing the number of data points through interpolation and extrapolation techniques.
- Here, we will use linear interpolation to generate additional data points between existing ones and extrapolate to predict values outside the original data range.
- This approach helps in creating a more robust dataset for training our model.
- Interpolation basically is: If we have two points (x0, y0) and (x1, y1), we can estimate the value of y at any point x between x0 and x1 using the formula:

  y = y0 + (y1 - y0) * ((x - x0) / (x1 - x0))

- Extrapolation is similar, but it is used to estimate values outside the range of known data points.
- However, caution should be exercised when extrapolating, as it can lead to significant errors if the underlying trend does not continue beyond the known data range, thus we are not supposed to extrapolate beyond 15% of our original data range.

In [26]:
# STEP 1: IMPORT NECESSARY LIBRARIES
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import os
import warnings
warnings.filterwarnings('ignore')
from scipy.interpolate import CubicSpline



# There are very man y ways we can interpolate the data, we have linear, polynomial, spline interpolation etc.
# For this case we will use cubic spline interpolation as it provides a smooth curve that passes through all the data points.

In [27]:
# STEP 2: LOAD INPUT AND OUTPUT FILES SEPARATELY
BASE_DIR = os.path.dirname(os.path.dirname(os.getcwd()))  # Current working directory
data_dir = os.path.join(BASE_DIR, 'data', 'original_data')

# Define file names
input_file_name = 'Independent_variables.xlsx'
output_file_name = 'Response data.xlsx'

# Create SEPARATE file paths for each file
input_file_path = os.path.join(data_dir, input_file_name)
output_file_path = os.path.join(data_dir, output_file_name)

print("Using data directory:", data_dir)
print("Input file path:", input_file_path)
print("Output file path:", output_file_path)

# LOAD BOTH FILES
print("\n" + "=" * 80)
print("LOADING DATA FILES")
print("=" * 80)

# Load input file (4 columns: T⁻¹, ln_SR, Strain, ε')
inputs_df = pd.read_excel(input_file_path)
print(f"\n✓ Input file loaded: {inputs_df.shape[0]} rows × {inputs_df.shape[1]} columns")
print(f"  Columns: {inputs_df.columns.tolist()}")

# Load output file (1 column: σ/σmax)
outputs_df = pd.read_excel(output_file_path)
print(f"\n✓ Output file loaded: {outputs_df.shape[0]} rows × {outputs_df.shape[1]} columns")
print(f"  Columns: {outputs_df.columns.tolist()}")

Using data directory: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data
Input file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Independent_variables.xlsx
Output file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Response data.xlsx

LOADING DATA FILES

✓ Input file loaded: 60 rows × 4 columns
  Columns: ['T⁻¹ (K⁻¹)', 'ln strain rate', 'Strain', 'ε͘΄']

✓ Output file loaded: 60 rows × 1 columns
  Columns: ['Output (σ/σmax)']


In [28]:
# STEP 3: # VERIFY DATA INTEGRITY
print("\n" + "=" * 80)
print("DATA INTEGRITY CHECK")
print("=" * 80)

if len(inputs_df) != len(outputs_df):
    print(f"   WARNING: Row count mismatch!")
    print(f"   Input file: {len(inputs_df)} rows")
    print(f"   Output file: {len(outputs_df)} rows")
    raise ValueError("Input and output files must have the same number of rows!")
else:
    print(f"✓ Row count matches: {len(inputs_df)} rows in both files")


DATA INTEGRITY CHECK
✓ Row count matches: 60 rows in both files


In [29]:
# STEP 4: MERGING DATA
print("\n" + "=" * 80)
print("MERGING DATA")
print("=" * 80)

# Rename input columns for clarity
inputs_df.columns = ['T_inv', 'ln_Strain_Rate', 'Strain', 'Strain_Rate']
print("\n Renamed Columns:")
print("  1. T_inv (Inverse Temperature)")
print("  2. ln_Strain_Rate (Log Strain Rate)")
print("  3. Strain (True Strain)")
print("  4. Strain_Rate (Strain Rate)")

# Rename output column
outputs_df.columns = ['Stress_Normalized']
print("\n Renamed Output Column:")
print("  1. Stress_Normalized (σ/σmax)")
# Merge the input and output experimental
data = pd.concat([inputs_df, outputs_df], axis=1)

# First 10 rows of the input data
df = data
df.head(10)



MERGING DATA

 Renamed Columns:
  1. T_inv (Inverse Temperature)
  2. ln_Strain_Rate (Log Strain Rate)
  3. Strain (True Strain)
  4. Strain_Rate (Strain Rate)

 Renamed Output Column:
  1. Stress_Normalized (σ/σmax)


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
0,0.00089,-2.302585,0.1,0.1,0.565722
1,0.000853,-2.302585,0.1,0.1,0.482011
2,0.000818,-2.302585,0.1,0.1,0.411714
3,0.000786,-2.302585,0.1,0.1,0.352587
4,0.00089,0.0,0.1,0.5,0.690677
5,0.000853,0.0,0.1,0.5,0.594937
6,0.000818,0.0,0.1,0.5,0.513077
7,0.000786,0.0,0.1,0.5,0.443056
8,0.00089,2.302585,0.1,0.9,0.825966
9,0.000853,2.302585,0.1,0.9,0.719375


In [30]:
# GET THE LAST 5 ROWS
print("\nLast 5 rows of the input data:")
df.tail(10)


Last 5 rows of the input data:


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
50,0.000818,-2.302585,0.5,0.1,0.524995
51,0.000786,-2.302585,0.5,0.1,0.455814
52,0.00089,0.0,0.5,0.5,0.839206
53,0.000853,0.0,0.5,0.5,0.73054
54,0.000818,0.0,0.5,0.5,0.637045
55,0.000786,0.0,0.5,0.5,0.556418
56,0.00089,2.302585,0.5,0.9,0.987976
57,0.000853,2.302585,0.5,0.9,0.867193
58,0.000818,2.302585,0.5,0.9,0.761909
59,0.000786,2.302585,0.5,0.9,0.669971


In [None]:
# STEP 5: INTERPOLATE DATA
print("\n" + "=" * 80)
print("DATA INTERPOLATION")
print("=" * 80)

# Interpolate each column
def interpolate_data(df, points_per_condition = 25):
    """Interpolate data for each unique condition in the dataframe.
    Parameters:
        df: Dataframe containing data columns ['T_inv', 'ln_Strain_Rate', 'Strain', 'Strain_Rate', 'Stress_Normalized']
        points_per_condition: Number of interpolated points to generate per unique condition.
    Returns:
        Dataframe with interpolated data.
    """
    interpolated_data = []



In [93]:
# STEP 6: SAVE THE NEW DATAFRAME TO AN EXCEL FILE
save_dir = os.path.abspath('../../data/interpolation')
output_path = os.path.join(save_dir, "interpolated_data.xlsx")
print(f"Interpolated and extrapolated data will be saved to {output_path}")
new_data.to_excel(output_path, index=False)
print(f"Interpolated and extrapolated data saved to {output_path}")

Interpolated and extrapolated data will be saved to /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/interpolation/interpolated_extrapolated_data.xlsx
Interpolated and extrapolated data saved to /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/interpolation/interpolated_extrapolated_data.xlsx
