#### DATA INTERPOLATION AND EXTRAPOLATION

- In our original dataset, we have 60 data points.
- While the data looks fairly sufficient, we can improve the model's performance by increasing the number of data points through interpolation and extrapolation techniques.
- Here, we will use linear interpolation to generate additional data points between existing ones and extrapolate to predict values outside the original data range.
- This approach helps in creating a more robust dataset for training our model.
- Interpolation basically is: If we have two points (x0, y0) and (x1, y1), we can estimate the value of y at any point x between x0 and x1 using the formula:

  y = y0 + (y1 - y0) * ((x - x0) / (x1 - x0))

- Extrapolation is similar, but it is used to estimate values outside the range of known data points.
- However, caution should be exercised when extrapolating, as it can lead to significant errors if the underlying trend does not continue beyond the known data range, thus we are not supposed to extrapolate beyond 15% of our original data range.

In [83]:
# IMPORT NECESSARY LIBRARIES
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import os
import warnings
warnings.filterwarnings('ignore')

In [84]:
# STEP 1: LOAD INPUT AND OUTPUT FILES SEPARATELY
BASE_DIR = os.path.dirname(os.path.dirname(os.getcwd()))  # Current working directory
data_dir = os.path.join(BASE_DIR, 'data', 'original_data')

# Define file names
input_file_name = 'Independent_variables.xlsx'
output_file_name = 'Response data.xlsx'

# Create SEPARATE file paths for each file
input_file_path = os.path.join(data_dir, input_file_name)
output_file_path = os.path.join(data_dir, output_file_name)

print("Using data directory:", data_dir)
print("Input file path:", input_file_path)
print("Output file path:", output_file_path)

# LOAD BOTH FILES
print("\n" + "=" * 80)
print("LOADING DATA FILES")
print("=" * 80)

# Load input file (4 columns: T⁻¹, ln_SR, Strain, ε')
inputs_df = pd.read_excel(input_file_path)
print(f"\n✓ Input file loaded: {inputs_df.shape[0]} rows × {inputs_df.shape[1]} columns")
print(f"  Columns: {inputs_df.columns.tolist()}")

# Load output file (1 column: σ/σmax)
outputs_df = pd.read_excel(output_file_path)
print(f"\n✓ Output file loaded: {outputs_df.shape[0]} rows × {outputs_df.shape[1]} columns")
print(f"  Columns: {outputs_df.columns.tolist()}")

Using data directory: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data
Input file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Independent_variables.xlsx
Output file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Response data.xlsx

LOADING DATA FILES

✓ Input file loaded: 60 rows × 4 columns
  Columns: ['T⁻¹ (K⁻¹)', 'ln strain rate', 'Strain', 'ε͘΄']

✓ Output file loaded: 60 rows × 1 columns
  Columns: ['Output (σ/σmax)']


In [85]:
# STEP 2: # VERIFY DATA INTEGRITY
print("\n" + "=" * 80)
print("DATA INTEGRITY CHECK")
print("=" * 80)

if len(inputs_df) != len(outputs_df):
    print(f"   WARNING: Row count mismatch!")
    print(f"   Input file: {len(inputs_df)} rows")
    print(f"   Output file: {len(outputs_df)} rows")
    raise ValueError("Input and output files must have the same number of rows!")
else:
    print(f"✓ Row count matches: {len(inputs_df)} rows in both files")


DATA INTEGRITY CHECK
✓ Row count matches: 60 rows in both files


In [86]:
# STEP 3: SELECT RELEVANT COLUMNS (Drop ε' - column 4)
print("\n" + "=" * 80)
print("SELECTING FEATURES")
print("=" * 80)

# Rename input columns for clarity
inputs_df.columns = ['T_inv', 'ln_Strain_Rate', 'Strain', 'epsilon_prime']

# Select only the 3 features we need (drop epsilon_prime)
inputs_selected = inputs_df[['T_inv', 'ln_Strain_Rate', 'Strain']].copy()

print("\n Selected input features:")
print("  1. T_inv (Inverse Temperature)")
print("  2. ln_Strain_Rate (Log Strain Rate)")
print("  3. Strain (True Strain)")
print(f"\n Dropped: epsilon_prime (normalized strain rate - not needed)")

# Rename output column
outputs_df.columns = ['Stress_Normalized']


SELECTING FEATURES

 Selected input features:
  1. T_inv (Inverse Temperature)
  2. ln_Strain_Rate (Log Strain Rate)
  3. Strain (True Strain)

 Dropped: epsilon_prime (normalized strain rate - not needed)


N/B: Why we dropped epsilon_prime:
- The normalized strain rate (ε') is not required for the ANN model as per project scope.
- We already normalized ALL inputs using MinMaxScaler to [0, 1]. So we have THREE normalizations happening:
- 1) Original data normalization (σ/σmax)
- 2) MinMaxScaler normalization for ANN inputs
- 3) MinMaxScaler normalization for ANN output
- Result: Strain rate gets normalized twice!
- Instead, we use ln(ε̇) directly: ln_Strain_Rate = np.log(Strain_Rate) and then normalize it using MinMaxScaler.

In [87]:
# STEP 4: MERGE INPUT AND OUTPUT
print("\n" + "="*80)
print("MERGING DATA")
print("="*80)

# Merge side by side (axis=1 means column-wise)
df = pd.concat([inputs_selected, outputs_df], axis=1)

print(f"\n✓ Data merged successfully!")
print(f"  Final shape: {df.shape[0]} rows × {df.shape[1]} columns")
print(f"  Columns: {df.columns.tolist()}")

# Load the first few rows to verify
df.head(10)


MERGING DATA

✓ Data merged successfully!
  Final shape: 60 rows × 4 columns
  Columns: ['T_inv', 'ln_Strain_Rate', 'Strain', 'Stress_Normalized']


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Stress_Normalized
0,0.00089,-2.302585,0.1,0.565722
1,0.000853,-2.302585,0.1,0.482011
2,0.000818,-2.302585,0.1,0.411714
3,0.000786,-2.302585,0.1,0.352587
4,0.00089,0.0,0.1,0.690677
5,0.000853,0.0,0.1,0.594937
6,0.000818,0.0,0.1,0.513077
7,0.000786,0.0,0.1,0.443056
8,0.00089,2.302585,0.1,0.825966
9,0.000853,2.302585,0.1,0.719375


In [88]:
# STEP 5: REVERSE ENGINEER FOR UNDERSTANDING AND GETTING TEST CONDITIONS
# Calculate Temperature in Kelvin and Celsius, and Strain Rate
df['Temperature_K'] = 1 / df['T_inv']
df['Temperature_C'] = df['Temperature_K'] - 273.15
df['Strain_Rate'] = np.exp(df['ln_Strain_Rate'])

print(f"\nTest Conditions:")
print(f"Temperatures: {sorted(df['Temperature_C'].unique())} °C")
print(f"Strain Rates: {sorted(df['Strain_Rate'].unique())} s⁻¹")
print(f"Strains: {sorted(df['Strain'].unique())}")
df.head(10)


Test Conditions:
Temperatures: [np.float64(849.8499999999998), np.float64(849.85), np.float64(899.85), np.float64(949.85), np.float64(999.85)] °C
Strain Rates: [np.float64(0.09999999999999958), np.float64(0.10000000000000002), np.float64(1.0), np.float64(10.000000000000002)] s⁻¹
Strains: [np.float64(0.1), np.float64(0.2), np.float64(0.3), np.float64(0.4), np.float64(0.5)]


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Stress_Normalized,Temperature_K,Temperature_C,Strain_Rate
0,0.00089,-2.302585,0.1,0.565722,1123.0,849.85,0.1
1,0.000853,-2.302585,0.1,0.482011,1173.0,899.85,0.1
2,0.000818,-2.302585,0.1,0.411714,1223.0,949.85,0.1
3,0.000786,-2.302585,0.1,0.352587,1273.0,999.85,0.1
4,0.00089,0.0,0.1,0.690677,1123.0,849.85,1.0
5,0.000853,0.0,0.1,0.594937,1173.0,899.85,1.0
6,0.000818,0.0,0.1,0.513077,1223.0,949.85,1.0
7,0.000786,0.0,0.1,0.443056,1273.0,999.85,1.0
8,0.00089,2.302585,0.1,0.825966,1123.0,849.85,10.0
9,0.000853,2.302585,0.1,0.719375,1173.0,899.85,10.0


In [89]:
# STEP 2: DEFINE THE NEW TEMPERATURE RANGE FOR INTERPOLATION AND EXTRAPOLATION
min_original = df['Temperature_K'].min()
max_original = df['Temperature_K'].max()
print(f"Original Temperature Range: {min_original} to {max_original}")

# Define the extrapolation buffer amount (10-15% of original range)
buffer = 0.1 * (max_original - min_original)

new_min_temp = min_original - buffer
new_max_temp = max_original + buffer
print(f"New Temperature Range for Interpolation and Extrapolation: {new_min_temp} to {new_max_temp}")

# Create new  temperature value for interpolation and extrapolation
new_temps = np.linspace(new_min_temp, new_max_temp,300)


Original Temperature Range: 1122.9999999999998 to 1273.0
New Temperature Range for Interpolation and Extrapolation: 1107.9999999999998 to 1288.0


In [90]:
# STEP 3: INTERPOLATE EACH VARIABLE
# REMOVE OR AVERAGE DUPLICATE TEMPERATURE ENTRIES
data = df.groupby('Temperature_C', as_index=False).mean()
# STRAIN
strain_interp_func = interp1d(
        data['Temperature_C'],
        data['Strain'],
        kind = 'linear',
        fill_value="extrapolate")
new_strain = strain_interp_func(new_temps)

# STRAIN RATE
strain_rate_interp_func = interp1d(
        data['Temperature_C'],
        data['Strain_Rate'],
        kind = 'linear',
        fill_value="extrapolate"
)
# STRESS NORMALIZED
stress_normalized_interp_func = interp1d(
        data['Temperature_C'],
        data['Stress_Normalized'],
        kind = 'linear',
        fill_value="extrapolate"
)

In [91]:
# CALCULATE DERIVED VARIABLES
new_temp_C = new_temps - 273.15
new_T_inv = 1 / new_temps
new_ln_strain_rate = np.log(new_strain)

In [92]:
#STEP 5: COMBINE INTO A NEW  DATAFRAME
new_data = pd.DataFrame({
    'Temperature_C': new_temp_C,
    'Temperature_K': new_temps,
    '1/Temperature_K': new_T_inv,
    'Strain': new_strain,
    'Strain_Rate': strain_rate_interp_func(new_temps),
    'Stress_Normalized': stress_normalized_interp_func(new_temps),
    'ln(Strain_Rate)': new_ln_strain_rate
})

In [93]:
# STEP 6: SAVE THE NEW DATAFRAME TO AN EXCEL FILE
save_dir = os.path.abspath('../../data/interpolation')
output_path = os.path.join(save_dir, "interpolated_extrapolated_data.xlsx")
print(f"Interpolated and extrapolated data will be saved to {output_path}")
new_data.to_excel(output_path, index=False)
print(f"Interpolated and extrapolated data saved to {output_path}")

Interpolated and extrapolated data will be saved to /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/interpolation/interpolated_extrapolated_data.xlsx
Interpolated and extrapolated data saved to /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/interpolation/interpolated_extrapolated_data.xlsx
