## CRUDE OIL CHARACTERIZATION & PRODUCT PROPERTY ESTIMATION

## I. Introduction

##### 1) What is Crude Oil? 
- Crude oil is a highly complex hydrocarbon mixture containing over 10,000 distinct components. Its physicochemical properties (such as viscosity and chemical composition) vary significantly between different oil fields and even between batches from the same field. 

##### 2) Crude Oil Characterization (CDU)
- In a petroleum refinery, the Crude Distillation Unit (CDU) separates this raw feedstock into valuable products like naphtha, kerosene, and diesel. To optimize the temperature and pressure profiles of the CDU, refinery operators require precise, real-time knowledge of the incoming crude oil’s properties.

##### 3) Limitations of Traditional Analysis
- Laboratory Assays (Time Lag): Standardized tests (ASTM methods) to determine properties like Kinematic Viscosity or PNA (Paraffin, Naphthene, Aromatic) composition are time-consuming, often taking 4 to 8 hours. This delay creates a significant dead time between sampling and decision-making, during which the crude has already been processed, potentially leading to off-specification production and energy wastage. Also, these tests require large sample volumes and hazardous chemical reagents.

- Mathematical Correlations (Inaccuracy):To save time, refineries often use mathematical correlations to estimate crude properties, but they often fail to capture the highly non-linear relationships between crude oil components. 

##### 4) Problem Statement
- This project investigates how critical crude oil qualities (Chemical Composition and Viscosity) are determined by readily available physical attributes such as Density, Sulfur content, and Distillation Profile.

## II. Dataset

The dataset consists of five crude oil blends: Basrah Heavy, Basrah Medium, Gippsland, Grogon, and Kutubu. Each crude assay provides a detailed characterization of the crude’s physical, chemical, and distillation-based properties.

#### 1) Independent Variables
- Bulk Physical & Chemical Properties: StdLiquidDensity (kg/m³), SulfurByWt (%), ConradsonCarbonByWt (%), NitrogenByWt (%)
- Distillation Curve: The TBP (True Boiling Point) curve describes how crude fractions boil off at various temperatures 1%, 5%, 10%, 30%, 50%, 70%, 90%, 95%, and 99%.

#### 2) Target Variables
2.1 Set 1: Hydrocarbon Composition (by Weight)
- AromByWt (%)
- NaphthenesByWt (%)
- ParaffinsByWt (%)

2.2 Set2: Kinematic Viscosity (Primary Targets)
- KinematicViscosity (cSt) @ 37.78°C (100°F)
- KinematicViscosity (cSt) @ 98.89°C (210°F)

2.3 Set 3: Additional Viscosity Targets (If Available)
- KinematicViscosity (cSt) @ 20°C
- KinematicViscosity (cSt) @ 40°C
- KinematicViscosity (cSt) @ 50°C
- KinematicViscosity (cSt) @ 100°C
- KinematicViscosity (cSt) @ 150°C

## III. Data Extraction

In [None]:
import pandas as pd
import os

def extract_from_file_list():
    #Excel Files Paths
    file_paths = [
        r"Basrah-Heavy-2021.xlsx",
        r"Basrah-Medium-2021.xlsx",
        r"Gippsland-2021.xlsx",
        r"Grogon-2021.xlsx",
        r"Kutubu-2021.xlsx"
    ]

    #All the fileds to extract
    fields_to_extract = [
        #Input Variables
        "StdLiquidDensity (kg/m3)",
        "SulfurByWt (%)",
        "ConradsonCarbonByWt (%)",
        "NitrogenByWt (%)",
        "Distillation Mass @ X Pct (C)@ 1 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 5 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 10 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 30 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 50 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 70 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 90 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 95 (%) - TBP",
        "Distillation Mass @ X Pct (C)@ 99 (%) - TBP",

        #Output Set1
        "AromByWt (%)",
        "NaphthenesByWt (%)",
        "ParaffinsByWt (%)",

        #Output Set2
        "KinematicViscosity (cSt)@ 37.78 (C)",
        "KinematicViscosity (cSt)@ 98.89 (C)",

        #Output Set3
        "KinematicViscosity (cSt)@ 20 (C)",
        "KinematicViscosity (cSt)@ 40 (C)",
        "KinematicViscosity (cSt)@ 50 (C)",
        "KinematicViscosity (cSt)@ 100 (C)",
        "KinematicViscosity (cSt)@ 150 (C)"
    ]

    data_list = []
    print(f"Processing {len(file_paths)} specific files.")

    #Iterate through each file
    for filepath in file_paths:
        try:
            #Check if file exists
            if not os.path.exists(filepath):
                print(f"Error: File not found: {filepath}")
                continue

            #Details sheet
            try:
                df = pd.read_excel(filepath, sheet_name='Details', engine='openpyxl')
            except ValueError:
                print(f"Skipping {filepath}: Sheet 'Details' not found.")
                continue
            
            #Clean Column Names
            df.columns = [str(c).strip() for c in df.columns]

            #Check if required columns exist
            if 'Property' not in df.columns or 'Bulk Value' not in df.columns:
                print(f"Skipping {filepath}: Missing 'Property' or 'Bulk Value' columns.")
                continue

            #Identify crude
            crude_name = os.path.basename(filepath).replace('.xlsx', '')
            row_data = {"Crude Name": crude_name}

            #Index by Property column
            df_indexed = df.set_index('Property')

            #Extract fields
            for field in fields_to_extract:
                try:
                    row_data[field] = df_indexed.loc[field, 'Bulk Value']
                except KeyError:
                    row_data[field] = None

            data_list.append(row_data)

        except Exception as e:
            print(f"Error reading {filepath}: {e}")

    #Save extracted data to a single CSV file
    if data_list:
        final_df = pd.DataFrame(data_list)
        final_cols = ['Crude Name'] + fields_to_extract
        final_df = final_df[final_cols]
        
        final_df.to_csv("Extracted_CrudeData.csv", index=False)
        print("-" * 30)
        print("Done. Saved to 'Extracted_CrudeData.csv'")
        print(final_df.head())
    else:
        print("No valid data extracted.")

if __name__ == "__main__":
    extract_from_file_list()

Processing 5 specific files...
------------------------------
Done. Saved to 'Extracted_CrudeData.csv'
       Crude Name      StdLiquidDensity (kg/m3)  SulfurByWt (%)  ConradsonCarbonByWt (%)  NitrogenByWt (%)  Distillation Mass @ X Pct (C)@ 1 (%) - TBP  Distillation Mass @ X Pct (C)@ 5 (%) - TBP  Distillation Mass @ X Pct (C)@ 10 (%) - TBP  Distillation Mass @ X Pct (C)@ 30 (%) - TBP  Distillation Mass @ X Pct (C)@ 50 (%) - TBP  Distillation Mass @ X Pct (C)@ 70 (%) - TBP  Distillation Mass @ X Pct (C)@ 90 (%) - TBP  Distillation Mass @ X Pct (C)@ 95 (%) - TBP  Distillation Mass @ X Pct (C)@ 99 (%) - TBP  AromByWt (%)  NaphthenesByWt (%)  ParaffinsByWt (%)  KinematicViscosity (cSt)@ 37.78 (C)  KinematicViscosity (cSt)@ 98.89 (C)  KinematicViscosity (cSt)@ 20 (C)  KinematicViscosity (cSt)@ 40 (C)  KinematicViscosity (cSt)@ 50 (C)  KinematicViscosity (cSt)@ 100 (C)  KinematicViscosity (cSt)@ 150 (C)
0   Basrah-Heavy-2021         905.234491            3.832519            10.015924       