"Oil Well Development Analysis: Maximizing Profitability through Predictive Modeling and Risk Assessment"






Introduction:

The oil and gas industry plays a vital role in the global economy, and efficient exploration and development of oil wells are crucial for maximizing profitability. In this project, we aim to leverage predictive modeling and risk assessment techniques to identify the most promising regions for oil well development.

The project focuses on three distinct regions, each containing geological exploration data stored in separate files. These files provide information on unique oil well identifiers, as well as three significant features of points. The volume of reserves in each oil well is also included, serving as the target variable for our analysis.

In [21]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd


In [22]:
# Load the data
data_0 = pd.read_csv('/datasets/geo_data_0.csv')
data_1= pd.read_csv('/datasets/geo_data_1.csv')
data_2 = pd.read_csv('/datasets/geo_data_2.csv')




In [23]:
print(data_0.head()) 


      id        f0        f1        f2     product
0  txEyH  0.705745 -0.497823  1.221170  105.280062
1  2acmU  1.334711 -0.340164  4.365080   73.037750
2  409Wp  1.022732  0.151990  1.419926   85.265647
3  iJLyR -0.032172  0.139033  2.978566  168.620776
4  Xdl7t  1.988431  0.155413  4.751769  154.036647


In [24]:
data_0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 5 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   id       100000 non-null  object 
 1   f0       100000 non-null  float64
 2   f1       100000 non-null  float64
 3   f2       100000 non-null  float64
 4   product  100000 non-null  float64
dtypes: float64(4), object(1)
memory usage: 3.8+ MB


In [25]:
data_0.isnull().sum()

id         0
f0         0
f1         0
f2         0
product    0
dtype: int64

df_1.isnull().sum()

In [26]:
data_2.isnull().sum()

id         0
f0         0
f1         0
f2         0
product    0
dtype: int64

In [27]:
def train_and_evaluate_model(data):

    X = data.drop(['id', 'product'], axis=1)
    y = data['product']
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)

   
    predictions = pd.DataFrame({'Prediction': y_pred, 'Actual': y_val})

    # 2.4. Calculating Average Volume and RMSE
    average_volume = np.mean(y_pred)
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))

    # 2.5. Analyzing the Results
    print("Average Volume of Predicted Reserves:", average_volume)
    print("Model RMSE:", rmse)
    
    return predictions

predictions_0 = train_and_evaluate_model(data_0)
predictions_1 = train_and_evaluate_model(data_1)
predictions_2 = train_and_evaluate_model(data_2)

Average Volume of Predicted Reserves: 92.3987999065777
Model RMSE: 37.756600350261685
Average Volume of Predicted Reserves: 68.71287803913762
Model RMSE: 0.890280100102884
Average Volume of Predicted Reserves: 94.77102387765939
Model RMSE: 40.14587231134218


After training and evaluating the model for each region, we obtained the following results:

Region 0:
- Average Volume of Predicted Reserves: 92.3987999065777
- Model RMSE: 37.756600350261685

Region 1:
- Average Volume of Predicted Reserves: 68.71287803913762
- Model RMSE: 0.890280100102884

Region 2:
- Average Volume of Predicted Reserves: 94.77102387765939
- Model RMSE: 40.14587231134218

For Region 0, the average volume of predicted reserves is approximately 92.4 thousand barrels. The model's root mean squared error (RMSE) is around 37.8, indicating that the predictions may have a considerable deviation from the actual values.

In Region 1, the average volume of predicted reserves is about 68.7 thousand barrels. The model's RMSE is very low at around 0.89, suggesting that the predictions are quite accurate and closely aligned with the actual values.

Region 2 shows an average volume of predicted reserves of approximately 94.8 thousand barrels. The model's RMSE is approximately 40.1, implying that the predictions may have a notable level of error compared to the true values.

Based on these results, it seems that Region 1 has the lowest RMSE and relatively high average volume of predicted reserves, indicating that the model performs well in this region. However, further analysis is required to make a decision regarding the selection of the region with the highest profit margin.


In [28]:
# Step 3: Prepare for Profit Calculation
budget = 100_000_000  # Budget for oil well development in USD
revenue_per_barrel = 4_500  # Revenue per barrel in USD

threshold = budget / revenue_per_barrel
average_volumes = [predictions_0['Prediction'].mean(), predictions_1['Prediction'].mean(), predictions_2['Prediction'].mean()]

print("Threshold Value:", threshold)
print("Average Volumes:", average_volumes)

Threshold Value: 22222.222222222223
Average Volumes: [92.3987999065777, 68.71287803913762, 94.77102387765939]


Step 4: Calculate Profit from Selected Oil Wells and Model Predictions

In [29]:

def calculate_profit(predictions):
    # 4.1. Select Wells with Highest Predictions
    selected_wells = predictions.nlargest(200, 'Prediction')

    # 4.2. Summarize Target Volume
    target_volume = selected_wells['Actual'].sum()

    # 4.3. Findings
    profit = (target_volume * revenue_per_barrel) - budget
    return profit

profit_0 = calculate_profit(predictions_0)
profit_1 = calculate_profit(predictions_1)
profit_2 = calculate_profit(predictions_2)

print("Profit for Region 0:", profit_0)
print("Profit for Region 1:", profit_1)
print("Profit for Region 2:", profit_2)

Profit for Region 0: 33591411.14462179
Profit for Region 1: 24150866.966815114
Profit for Region 2: 25985717.59374112


In [30]:
print(calculate_profit)

<function calculate_profit at 0x7f55dd9318b0>


After calculating the profits from selected oil wells and model predictions, we obtained the following results:

Region 0:
- Profit: 33,591,411.14 USD

Region 1:
- Profit: 24,150,866.97 USD

Region 2:
- Profit: 25,985,717.59 USD

Based on these profit calculations, Region 0 has the highest estimated profit among the three regions, with a profit of approximately 33.6 million USD. Region 2 follows closely behind with a profit of around 26 million USD, and Region 1 has the lowest estimated profit at about 24.2 million USD.

Therefore, if we solely consider the profit values, Region 0 appears to be the most promising choice for oil well development due to its higher potential profitability. However, it is crucial to note that other factors, such as risks and uncertainties, need to be considered before making a final decision.



 Step 5: Calculate Risks and Profit for Each Region

In [31]:

def calculate_bootstrap_profit(predictions):
    profits = []
    np.random.seed(42)  # Set seed for reproducibility

    for _ in range(1000):
        bootstrap_sample = predictions.sample(n=200, replace=True)
        profit = calculate_profit(bootstrap_sample)
        profits.append(profit)

    average_profit = np.mean(profits)
    confidence_interval = np.percentile(profits, [2.5, 97.5])
    risk_of_loss = np.mean(np.array(profits) < 0)

    return average_profit, confidence_interval, risk_of_loss

avg_profit_0, conf_interval_0, risk_loss_0 = calculate_bootstrap_profit(predictions_0)
avg_profit_1, conf_interval_1, risk_loss_1 = calculate_bootstrap_profit(predictions_1)
avg_profit_2, conf_interval_2, risk_loss_2 = calculate_bootstrap_profit(predictions_2)

print("Region 0:")
print("Average Profit:", avg_profit_0)
print("Confidence Interval:", conf_interval_0)
print("Risk of Loss:", risk_loss_0)

print("Region 1:")
print("Average Profit:", avg_profit_1)
print("Confidence Interval:", conf_interval_1)
print("Risk of Loss:", risk_loss_1)

print("Region 2:")
print("Average Profit:", avg_profit_2)
print("Confidence Interval:", conf_interval_2)
print("Risk of Loss:", risk_loss_2)


Region 0:
Average Profit: -16906296.373642936
Confidence Interval: [-22101553.30056624 -11701227.26251526]
Risk of Loss: 1.0
Region 1:
Average Profit: -38091634.64643422
Confidence Interval: [-44175727.49792942 -32544374.24889251]
Risk of Loss: 1.0
Region 2:
Average Profit: -14300720.646929419
Confidence Interval: [-19598121.20799231  -8888228.10924703]
Risk of Loss: 1.0


After applying the bootstrap technique to calculate risks and profit for each region, we obtained the following results:

Region 0:
- Average Profit: -16,906,296.37 USD
- Confidence Interval: [-22,101,553.30, -11,701,227.26] USD
- Risk of Loss: 100%

Region 1:
- Average Profit: -38,091,634.65 USD
- Confidence Interval: [-44,175,727.50, -32,544,374.25] USD
- Risk of Loss: 100%

Region 2:
- Average Profit: -14,300,720.65 USD
- Confidence Interval: [-19,598,121.21, -8,888,228.11] USD
- Risk of Loss: 100%

From these results, we can observe that all three regions have a risk of loss (negative profit) of 100%. The average profit values for all regions are negative, indicating that the expected profits are below zero.

Considering the high risk of loss and negative average profits, it is advisable to reconsider the decision of developing oil wells in any of these regions. Further analysis, exploration, or consideration of other factors may be necessary to identify more viable and profitable opportunities.



conclusions

Based on the analysis conducted, the following conclusions can be drawn:

1. Model Performance: The linear regression models trained for each region showed varying performance. Region 1 exhibited the best model performance, with the lowest RMSE and accurate predictions. In contrast, Regions 0 and 2 had higher RMSE values, suggesting less accurate predictions.

2. Profitability Analysis: When considering the average volume of predicted reserves and profit calculations, Region 0 demonstrated the highest estimated profit among the three regions. However, it is important to note that all regions had negative average profits and a 100% risk of loss, indicating that the expected profitability for oil well development in these regions is unfavorable.

3. Risk Evaluation: The bootstrap technique revealed a high risk of loss for all regions, with negative average profits and confidence intervals that did not exceed zero. This implies that the projected profits are uncertain and may not offset the investment costs.

Based on these findings, it is recommended to reconsider the decision to develop oil wells in the analyzed regions. Further exploration and analysis, including a deeper assessment of geological conditions, market dynamics, and cost considerations, are necessary to identify more promising and profitable opportunities. It is essential to conduct a comprehensive evaluation and seek expert advice before making any investment decisions in the oil and gas industry.
