# Hi, I am Ali - Your Guide to Oil Exploration Probability Using Machine Learning


![](https://i.pinimg.com/736x/4a/8b/ba/4a8bba1b12c11bec354cfaefbac3581f.jpg)

# Introduction:
"Hello! I’m Ali. Today, I’ll walk you through how we can use machine learning to predict the probability of oil exploration success based on seismic data. This model helps ADNOC make informed decisions about which exploration sites are more likely to contain hydrocarbons, saving time and resources."

![](https://i.pinimg.com/736x/18/58/f7/1858f76702337b0ce9f465ccf4986ae3.jpg)

# 1. Loading the Seismic Data:
"Let’s begin by loading the seismic data, which contains various features collected from different locations. Each feature represents a unique geological property, which plays a key role in determining the likelihood of hydrocarbon presence."

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd

# Load the seismic dataset
df = pd.read_csv('seismic_with_location_and_features.csv')

# Display the first few rows to understand the data
df.head()

Mounted at /content/drive


# 2. Understanding the Original Features:
"The dataset includes the following original features, each of which contributes to predicting the likelihood of finding hydrocarbons:

*   Amplitude: This represents the strength of the seismic waves. Higher amplitudes often indicate denser or more reflective geological layers, which can suggest the presence of hydrocarbons.
*   Wave_Velocity: The speed at which seismic waves travel through different geological formations. Slower wave velocities can indicate softer, hydrocarbon-bearing formations.
*   Travel_Time: The time taken for seismic waves to travel from their source to a receiver. Longer travel times could suggest deeper formations.
*   Reflection_Strength: The strength of the reflection of seismic waves off various geological boundaries. Stronger reflections might indicate significant changes in the subsurface, possibly caused by hydrocarbon reservoirs.
*   Fault_Presence: Whether or not there is a fault line. Faults can trap hydrocarbons, making them important indicators.
*   Layer_Thickness: The thickness of the geological layers. Thicker layers can trap more hydrocarbons.
*   Seismic_Impedance: The resistance of geological layers to seismic waves. Lower impedance often indicates porous, hydrocarbon-bearing formations."

In [2]:
# Display the column names to get a sense of the original features
df.columns

Index(['Amplitude', 'Wave_Velocity', 'Travel_Time', 'Reflection_Strength',
       'Fault_Presence', 'Layer_Thickness', 'Seismic_Impedance',
       'Amplitude_to_Reflection_Ratio', 'Normalized_Travel_Time',
       'Fault_Influence', 'Velocity_to_Layer_Ratio', 'Latitude', 'Longitude',
       'Hydrocarbons_Present'],
      dtype='object')

![](https://i.pinimg.com/736x/e3/7e/1e/e37e1ece032291fe48f792eca287d10e.jpg)

# 3. Engineering New Features:
"Now, let’s engineer additional features based on the original ones to improve the model’s predictive power. Feature engineering helps us capture more meaningful patterns from the data."



Here are the engineered features we'll create and why they matter:

*   Amplitude_to_Reflection_Ratio: This ratio compares the strength of seismic waves to the strength of reflections. A higher ratio might indicate areas with more significant subsurface changes.
*   Normalized_Travel_Time: We normalize the travel time to remove the effects of variations between locations, making it easier to compare.
*   Fault_Influence: A weighted score representing the presence and influence of faults. This captures how faults affect the likelihood of finding hydrocarbons.
*   Velocity_to_Layer_Ratio: The ratio of wave velocity to layer thickness. This can indicate whether waves are moving quickly through thin, dense layers, which might suggest hydrocarbons.


In [36]:
# Creating df2 with the engineered features
df2 = pd.DataFrame({
    'Amplitude_to_Reflection_Ratio': df['Amplitude'] / (df['Reflection_Strength'] + 0.001),  # Avoid division by zero
    'Normalized_Travel_Time': (df['Travel_Time'] - df['Travel_Time'].mean()) / df['Travel_Time'].std(),
    'Fault_Influence': df['Fault_Presence'] * df['Seismic_Impedance'],
    'Velocity_to_Layer_Ratio': df['Wave_Velocity'] / df['Layer_Thickness'],
    'Hydrocarbons_Present': df['Hydrocarbons_Present'],# Target variable
    'Longitude':df['Longitude'],
    'Latitude':df['Latitude']
})

# Displaying the first few rows of df2
df2.head()


Unnamed: 0,Amplitude_to_Reflection_Ratio,Normalized_Travel_Time,Fault_Influence,Velocity_to_Layer_Ratio,Hydrocarbons_Present,Longitude,Latitude
0,1.433857,-0.828075,5.428105,0.188402,80.671477,53.925065,23.074508
1,2.980777,-0.87874,0.0,0.455754,1.653945,55.65349,23.052695
2,7.237034,1.389352,2.712156,1.161942,18.68079,55.028701,24.478249
3,2.418037,-0.869907,0.0,9.784403,65.49053,54.022749,23.636019
4,0.752534,-0.792833,0.0,0.653056,37.575935,54.630603,24.5926



![](https://i.pinimg.com/736x/3d/96/3b/3d963bc36fee5de95e817d14c28289c6.jpg)

# 4. Splitting Data and Training the Model:
"Now that we have the original and engineered features, let’s split the data into training and test sets. We’ll train the model using a Linear Regression algorithm and evaluate its performance."

In [37]:
# Splitting the dataset into features (X) and target (y)
X = df2.drop(columns=['Hydrocarbons_Present'])  # Features (original + engineered)
y = df2['Hydrocarbons_Present']  # Target (hydrocarbon probability)

# Train-test split (80% training, 20% testing)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Linear Regression model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)


![](https://i.pinimg.com/736x/45/4c/71/454c7127ee38bb1b79f0f60636cc6d7b.jpg)

# 5. Visualizing the Results:
"Let’s visualize the performance of the model using a heatmap of locations with a hydrocarbon probability greater than 80%. This will give us a clear picture of the most promising exploration sites."

In [43]:
# Importing necessary libraries
import pandas as pd

# Filter locations with probability greater than 65% and assign categories
high_prob_locs = df2[df2['Hydrocarbons_Present'] > 65].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Create a function to categorize the probability levels
def categorize_probability(prob):
    if prob > 90:
        return 'High Probability'
    elif 80 <= prob <= 90:
        return 'Medium Probability'
    elif 70 <= prob < 80:
        return 'Low Probability'
    elif 60 <= prob < 70:
        return 'Very Low Probability'
    else:
        return 'No Hope'


# Apply the function to the dataset
high_prob_locs.loc[:, 'Probability_Category'] = high_prob_locs['Hydrocarbons_Present'].apply(categorize_probability)

# Rename locations as Loc1, Loc2, etc.
high_prob_locs.loc[:, 'Location'] = ['Loc' + str(i+1) for i in range(len(high_prob_locs))]
high_prob_locs=high_prob_locs[:50]
# Create a table that includes Location, and the respective Probability Category
table = high_prob_locs[['Location', 'Longitude', 'Latitude',  'Probability_Category']]

# Display the tables
print("Table : All Locations with Probabilities > 65%")
print(table)




Table : All Locations with Probabilities > 65%
    Location  Longitude   Latitude  Probability_Category
0       Loc1  53.925065  23.074508    Medium Probability
3       Loc2  54.022749  23.636019  Very Low Probability
5       Loc3  55.521608  23.863466      High Probability
6       Loc4  55.650105  23.905996  Very Low Probability
7       Loc5  53.481182  24.025765      High Probability
9       Loc6  54.005670  23.547006    Medium Probability
11      Loc7  54.794713  24.112115      High Probability
14      Loc8  55.220108  24.293893       Low Probability
17      Loc9  53.423179  24.088400      High Probability
21     Loc10  53.053635  24.357806    Medium Probability
31     Loc11  53.251043  24.517588  Very Low Probability
36     Loc12  55.718763  23.661266  Very Low Probability
37     Loc13  54.519115  23.180637       Low Probability
42     Loc14  55.430208  24.475906      High Probability
47     Loc15  55.835152  24.730859  Very Low Probability
49     Loc16  55.859533  23.318566    Med

# Conclusion:

"By engineering new features and training a machine learning model, ADNOC can make more accurate predictions about the success of oil exploration. This approach saves time and resources by focusing on the most promising areas, ensuring data-driven decisions lead to more efficient exploration processes."