#### DATA INTERPOLATION AND EXTRAPOLATION

- In our original dataset, we have 60 data points.
- While the data looks fairly sufficient, we can improve the model's performance by increasing the number of data points through interpolation and extrapolation techniques.
- Here, we will use linear interpolation to generate additional data points between existing ones and extrapolate to predict values outside the original data range.
- This approach helps in creating a more robust dataset for training our model.
- Interpolation basically is: If we have two points (x0, y0) and (x1, y1), we can estimate the value of y at any point x between x0 and x1 using the formula:

  y = y0 + (y1 - y0) * ((x - x0) / (x1 - x0))

- Extrapolation is similar, but it is used to estimate values outside the range of known data points.
- However, caution should be exercised when extrapolating, as it can lead to significant errors if the underlying trend does not continue beyond the known data range, thus we are not supposed to extrapolate beyond 15% of our original data range.

In [6]:
# STEP 1: IMPORT NECESSARY LIBRARIES
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import os
import warnings
warnings.filterwarnings('ignore')
from scipy.interpolate import CubicSpline



# There are very man y ways we can interpolate the data, we have linear, polynomial, spline interpolation etc.
# For this case we will use cubic spline interpolation as it provides a smooth curve that passes through all the data points.

In [8]:
# STEP 2: LOAD INPUT AND OUTPUT FILES SEPARATELY
BASE_DIR = os.path.dirname(os.path.dirname(os.getcwd()))  # Current working directory
data_dir = os.path.join(BASE_DIR, 'data', 'original_data')

# Define file names
input_file_name = 'Independent_variables.xlsx'
output_file_name = 'Response data.xlsx'

# Create SEPARATE file paths for each file
input_file_path = os.path.join(data_dir, input_file_name)
output_file_path = os.path.join(data_dir, output_file_name)

print("Using data directory:", data_dir)
print("Input file path:", input_file_path)
print("Output file path:", output_file_path)

# LOAD BOTH FILES
print("\n" + "=" * 80)
print("LOADING DATA FILES")
print("=" * 80)

# Load input file (4 columns: T‚Åª¬π, ln_SR, Strain, Œµ')
inputs_df = pd.read_excel(input_file_path)
print(f"\n‚úì Input file loaded: {inputs_df.shape[0]} rows √ó {inputs_df.shape[1]} columns")
print(f"  Columns: {inputs_df.columns.tolist()}")

# Load output file (1 column: œÉ/œÉmax)
outputs_df = pd.read_excel(output_file_path)
print(f"\n‚úì Output file loaded: {outputs_df.shape[0]} rows √ó {outputs_df.shape[1]} columns")
print(f"  Columns: {outputs_df.columns.tolist()}")

Using data directory: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data
Input file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Independent_variables.xlsx
Output file path: /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/original_data/Response data.xlsx

LOADING DATA FILES

‚úì Input file loaded: 60 rows √ó 4 columns
  Columns: ['T‚Åª¬π (K‚Åª¬π)', 'ln strain rate', 'Strain', 'ŒµÕòŒÑ']

‚úì Output file loaded: 60 rows √ó 1 columns
  Columns: ['Output (œÉ/œÉmax)']


In [9]:
# STEP 3: # VERIFY DATA INTEGRITY
print("\n" + "=" * 80)
print("DATA INTEGRITY CHECK")
print("=" * 80)

if len(inputs_df) != len(outputs_df):
    print(f"   WARNING: Row count mismatch!")
    print(f"   Input file: {len(inputs_df)} rows")
    print(f"   Output file: {len(outputs_df)} rows")
    raise ValueError("Input and output files must have the same number of rows!")
else:
    print(f"‚úì Row count matches: {len(inputs_df)} rows in both files")


DATA INTEGRITY CHECK
‚úì Row count matches: 60 rows in both files


In [10]:
# STEP 4: MERGING DATA
print("\n" + "=" * 80)
print("MERGING DATA")
print("=" * 80)

# Rename input columns for clarity
inputs_df.columns = ['T_inv', 'ln_Strain_Rate', 'Strain', 'Strain_Rate']
print("\n Renamed Columns:")
print("  1. T_inv (Inverse Temperature)")
print("  2. ln_Strain_Rate (Log Strain Rate)")
print("  3. Strain (True Strain)")
print("  4. Strain_Rate (Strain Rate)")

# Rename output column
outputs_df.columns = ['Stress_Normalized']
print("\n Renamed Output Column:")
print("  1. Stress_Normalized (œÉ/œÉmax)")
# Merge the input and output experimental
data = pd.concat([inputs_df, outputs_df], axis=1)

# First 10 rows of the input data
df = data
df.head(10)



MERGING DATA

 Renamed Columns:
  1. T_inv (Inverse Temperature)
  2. ln_Strain_Rate (Log Strain Rate)
  3. Strain (True Strain)
  4. Strain_Rate (Strain Rate)

 Renamed Output Column:
  1. Stress_Normalized (œÉ/œÉmax)


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
0,0.00089,-2.302585,0.1,0.1,0.565722
1,0.000853,-2.302585,0.1,0.1,0.482011
2,0.000818,-2.302585,0.1,0.1,0.411714
3,0.000786,-2.302585,0.1,0.1,0.352587
4,0.00089,0.0,0.1,0.5,0.690677
5,0.000853,0.0,0.1,0.5,0.594937
6,0.000818,0.0,0.1,0.5,0.513077
7,0.000786,0.0,0.1,0.5,0.443056
8,0.00089,2.302585,0.1,0.9,0.825966
9,0.000853,2.302585,0.1,0.9,0.719375


In [11]:
# GET THE LAST 5 ROWS
print("\nLast 5 rows of the input data:")
df.tail(10)


Last 5 rows of the input data:


Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
50,0.000818,-2.302585,0.5,0.1,0.524995
51,0.000786,-2.302585,0.5,0.1,0.455814
52,0.00089,0.0,0.5,0.5,0.839206
53,0.000853,0.0,0.5,0.5,0.73054
54,0.000818,0.0,0.5,0.5,0.637045
55,0.000786,0.0,0.5,0.5,0.556418
56,0.00089,2.302585,0.5,0.9,0.987976
57,0.000853,2.302585,0.5,0.9,0.867193
58,0.000818,2.302585,0.5,0.9,0.761909
59,0.000786,2.302585,0.5,0.9,0.669971


In [15]:
# Count bricks in each pile
pile_counts = df.groupby(['T_inv', 'ln_Strain_Rate']).size()
print(pile_counts)

# Show me which piles have problems
bad_piles = pile_counts[pile_counts < 2]
print("\nüö® Problem piles (less than 2 points):")
print(bad_piles)

T_inv     ln_Strain_Rate
0.000786  -2.302585         5
           0.000000         5
           2.302585         5
0.000818  -2.302585         5
           0.000000         5
           2.302585         5
0.000853  -2.302585         5
           0.000000         5
           2.302585         5
0.000890  -2.302585         4
           0.000000         5
           2.302585         5
          -2.302585         1
dtype: int64

üö® Problem piles (less than 2 points):
T_inv    ln_Strain_Rate
0.00089  -2.302585         1
dtype: int64


In [18]:
# STEP 5: INTERPOLATE DATA
def interpolate_data(df, points_per_condition=25):
    """
    Fast cubic spline interpolation for ANN training data

    Parameters:
    -----------
    df : DataFrame with columns ['T_inv', 'ln_Strain_Rate', 'Strain', 'Strain_Rate', 'Stress_Normalized']
    points_per_condition : int, number of interpolated points per temperature-strain_rate combo

    Returns:
    --------
    df_interpolated : Expanded DataFrame ready for ANN
    """

    # Group by temperature and strain rate (each unique condition)
    # Use ONLY original experimental values, not calculated columns
    df_copy = df.copy()
    df_copy['T_inv'] = df_copy['T_inv'].round(6)

    grouped = df_copy.groupby(['T_inv', 'Strain_Rate'])

    interpolated_data = []

    for (t_inv, sr), group in grouped:
        # Sort by strain
        group = group.sort_values('Strain')

        # Original data
        strain_orig = group['Strain'].values
        stress_norm_orig = group['Stress_Normalized'].values

        # Get ln_strain_rate from the group (should be same for all rows)
        ln_sr = group['ln_Strain_Rate'].iloc[0]

        # Check if enough points for interpolation
        if len(strain_orig) < 2:
            print(f"Skipping T_inv={t_inv}, Strain_Rate={sr}: only {len(strain_orig)} point(s)")
            continue

        # Create cubic spline
        cs = CubicSpline(strain_orig, stress_norm_orig)

        # New strain points (more dense)
        strain_new = np.linspace(strain_orig.min(), strain_orig.max(), points_per_condition)

        # Interpolate stress
        stress_norm_new = cs(strain_new)

        # Create new rows
        for strain, stress_norm in zip(strain_new, stress_norm_new):
            interpolated_data.append({
                'T_inv': t_inv,
                'ln_Strain_Rate': ln_sr,
                'Strain_Rate': sr,
                'Strain': strain,
                'Stress_Normalized': stress_norm
            })

    # Create new DataFrame
    df_interpolated = pd.DataFrame(interpolated_data)

    print(f"Original data points: {len(df)}")
    print(f"Interpolated data points: {len(df_interpolated)}")
    print(f"Expansion factor: {len(df_interpolated)/len(df):.1f}x")

    return df_interpolated


# USAGE - RUN THIS:
df_expanded = interpolate_data(df, points_per_condition=25)

‚ö†Ô∏è Skipping T_inv=0.00089, Strain_Rate=0.1: only 1 point(s)
Original data points: 60
Interpolated data points: 300
Expansion factor: 5.0x


In [19]:
# STEP 6: PREPARE THE NEW DATAFRAME
new_data = df_expanded[['T_inv', 'ln_Strain_Rate', 'Strain', 'Strain_Rate', 'Stress_Normalized']]
new_data.head(10)

Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
0,0.000786,-2.302585,0.1,0.1,0.352587
1,0.000786,-2.302585,0.116667,0.1,0.357643
2,0.000786,-2.302585,0.133333,0.1,0.364981
3,0.000786,-2.302585,0.15,0.1,0.374146
4,0.000786,-2.302585,0.166667,0.1,0.384679
5,0.000786,-2.302585,0.183333,0.1,0.396123
6,0.000786,-2.302585,0.2,0.1,0.408021
7,0.000786,-2.302585,0.216667,0.1,0.419916
8,0.000786,-2.302585,0.233333,0.1,0.43135
9,0.000786,-2.302585,0.25,0.1,0.441866


In [21]:
new_data.tail(10)

Unnamed: 0,T_inv,ln_Strain_Rate,Strain,Strain_Rate,Stress_Normalized
290,0.00089,2.302585,0.35,0.9,0.999509
291,0.00089,2.302585,0.366667,0.9,0.998231
292,0.00089,2.302585,0.383333,0.9,0.996628
293,0.00089,2.302585,0.4,0.9,0.994839
294,0.00089,2.302585,0.416667,0.9,0.993002
295,0.00089,2.302585,0.433333,0.9,0.991257
296,0.00089,2.302585,0.45,0.9,0.989743
297,0.00089,2.302585,0.466667,0.9,0.988599
298,0.00089,2.302585,0.483333,0.9,0.987964
299,0.00089,2.302585,0.5,0.9,0.987976


In [24]:
# STEP 6: SAVE THE NEW DATAFRAME TO AN EXCEL FILE
save_dir = os.path.abspath('../../data/interpolation')
output_path = os.path.join(save_dir, "interpolated_data.xlsx")
print(f"Interpolated and extrapolated data will be saved to {output_path}")
new_data.to_excel(output_path, index=False)
print("Data saved successfully.")


Interpolated and extrapolated data will be saved to /home/darlenewendie/PycharmProjects/ANN-Hot-workability-behaviour-of-AISI304-stainless-steel/data/interpolation/interpolated_data.xlsx
Data saved successfully.
