## 📊 Exploratory Data Analysis (EDA) on Fuel Blend Properties Dataset

**Goal**: Develop accurate predictive models that can estimate 10 key properties of blended fuels, enabling faster development and optimization of sustainable fuel formulations.



#### Step 1: Load & Explore the dataset

In [1]:
import pandas as pd
from pathlib import Path

# Directory of the datasets
data_path = Path('../data')

# Load the raw dataset
raw_train_data = pd.read_csv(data_path / 'train.csv')
raw_test_data = pd.read_csv(data_path / 'test.csv')

# Shape of the datasets
print("Shape of train data: ", raw_train_data.shape)
print("Shape of test data: ", raw_test_data.shape)

Shape of train data:  (2000, 65)
Shape of test data:  (500, 56)


In [None]:
# Create new column to differentiate b/w train & test dataset
raw_train_data['dataset'] = 'train'
raw_test_data['dataset'] = 'test'

# Merge both datasets together so we can perform EDA on both at once
df = pd.concat([raw_train_data, raw_test_data], axis=0)
df.head(3)

In [None]:
# Information about the dataset
df.info()

In [None]:
# Statistical info
df.describe()

In [None]:
# Seprate the data
train_data = raw_train_data[raw_train_data['dataset'] == "train"]
test_data = raw_test_data[raw_test_data['dataset'] == "test"]

# Remove 'dataset' column
train_data.drop(columns=['dataset'], inplace=True)
test_data.drop(columns=['dataset', 'humidity_percent'], inplace=True)

# Shape of the datasets
print("Shape of train data: ", train_data.shape)
print("Shape of test data: ", test_data.shape)