# Crude Oil Production Analysis

## Introduction

Crude oil production is a critical component of the global energy market and has significant implications for economies and industries worldwide. This project aims to analyze crude oil production data to uncover trends, patterns, and insights that can inform decision-making in the energy sector.

### Objectives
- **Data Collection**: Gather historical crude oil production data from reliable sources.
- **Data Cleaning**: Process the data to handle missing values, outliers, and inconsistencies.
- **Exploratory Data Analysis (EDA)**: Use statistical methods and visualizations to explore the data.
- **Trend Analysis**: Identify and analyze long-term trends in crude oil production.
- **Predictive Modeling**: Build models to forecast future production levels.

### Dataset
The dataset used in this project includes:
- Historical crude oil production data of Volve field.

### Tools and Technologies
- **Python**: Programming language used for data analysis and modeling.
- **Pandas**: Library for data manipulation and analysis.
- **Plotly.JS**: Libraries for data visualization.
- **Scikit-learn**: Machine learning library for predictive modeling.

### Structure of the Notebook
1. **Data Collection and Cleaning**: Steps to gather and preprocess the data.
2. **Exploratory Data Analysis**: Visualizations and statistical analysis of the data.
3. **Trend Analysis**: Examination of production trends over time.
4. **Predictive Modeling**: Development and evaluation of predictive models.
5. **Conclusions and Insights**: Key findings and their implications for the industry.

By the end of this project, we aim to provide a comprehensive analysis of crude oil production trends and deliver actionable insights that can help stakeholders in making informed decisions.


## Importing libraries and getting started

In [6]:
import pandas as pd
import numpy as np

import plotly.express as px
from sklearn.preprocessing import StandardScaler

seed = 0

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

In [35]:
data = pd.read_excel('data/volve-field-daily-data.xlsx')

In [36]:
data.head()

Unnamed: 0,DATEPRD,WELL_BORE_CODE,NPD_WELL_BORE_CODE,NPD_WELL_BORE_NAME,NPD_FIELD_CODE,NPD_FIELD_NAME,NPD_FACILITY_CODE,NPD_FACILITY_NAME,ON_STREAM_HRS,AVG_DOWNHOLE_PRESSURE,AVG_DOWNHOLE_TEMPERATURE,AVG_DP_TUBING,AVG_ANNULUS_PRESS,AVG_CHOKE_SIZE_P,AVG_CHOKE_UOM,AVG_WHP_P,AVG_WHT_P,DP_CHOKE_SIZE,BORE_OIL_VOL,BORE_GAS_VOL,BORE_WAT_VOL,BORE_WI_VOL,FLOW_KIND,WELL_TYPE
0,2014-04-07,NO 15/9-F-1 C,7405,15/9-F-1 C,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,0.0,0.0,%,0.0,0.0,0.0,0.0,0.0,0.0,,production,WI
1,2014-04-08,NO 15/9-F-1 C,7405,15/9-F-1 C,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,,,,0.0,1.003059,%,0.0,0.0,0.0,0.0,0.0,0.0,,production,OP
2,2014-04-09,NO 15/9-F-1 C,7405,15/9-F-1 C,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,,,,0.0,0.979008,%,0.0,0.0,0.0,0.0,0.0,0.0,,production,OP
3,2014-04-10,NO 15/9-F-1 C,7405,15/9-F-1 C,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,,,,0.0,0.545759,%,0.0,0.0,0.0,0.0,0.0,0.0,,production,OP
4,2014-04-11,NO 15/9-F-1 C,7405,15/9-F-1 C,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,310.37614,96.87589,277.27826,0.0,1.215987,%,33.09788,10.47992,33.07195,0.0,0.0,0.0,,production,OP


In [37]:
data.nunique()

DATEPRD                     3327
WELL_BORE_CODE                 7
NPD_WELL_BORE_CODE             7
NPD_WELL_BORE_NAME             7
NPD_FIELD_CODE                 1
NPD_FIELD_NAME                 1
NPD_FACILITY_CODE              1
NPD_FACILITY_NAME              1
ON_STREAM_HRS                925
AVG_DOWNHOLE_PRESSURE       6567
AVG_DOWNHOLE_TEMPERATURE    6461
AVG_DP_TUBING               8684
AVG_ANNULUS_PRESS           6644
AVG_CHOKE_SIZE_P            6419
AVG_CHOKE_UOM                  1
AVG_WHP_P                   8829
AVG_WHT_P                   8793
DP_CHOKE_SIZE               9057
BORE_OIL_VOL                7818
BORE_GAS_VOL                8005
BORE_WAT_VOL                7361
BORE_WI_VOL                 5258
FLOW_KIND                      2
WELL_TYPE                      2
dtype: int64

In [38]:
data.shape

(15634, 24)

In [39]:
data.isna().sum()

DATEPRD                        0
WELL_BORE_CODE                 0
NPD_WELL_BORE_CODE             0
NPD_WELL_BORE_NAME             0
NPD_FIELD_CODE                 0
NPD_FIELD_NAME                 0
NPD_FACILITY_CODE              0
NPD_FACILITY_NAME              0
ON_STREAM_HRS                285
AVG_DOWNHOLE_PRESSURE       6654
AVG_DOWNHOLE_TEMPERATURE    6654
AVG_DP_TUBING               6654
AVG_ANNULUS_PRESS           7744
AVG_CHOKE_SIZE_P            6715
AVG_CHOKE_UOM               6473
AVG_WHP_P                   6479
AVG_WHT_P                   6488
DP_CHOKE_SIZE                294
BORE_OIL_VOL                6473
BORE_GAS_VOL                6473
BORE_WAT_VOL                6473
BORE_WI_VOL                 9928
FLOW_KIND                      0
WELL_TYPE                      0
dtype: int64

In [55]:
for col in data.columns:
    print(col, round(data[col].isna().sum() / (data.shape[0]*data.shape[1]) * 100, 2))

DATEPRD 0.0
WELL_BORE_CODE 0.0
NPD_WELL_BORE_CODE 0.0
NPD_WELL_BORE_NAME 0.0
NPD_FIELD_CODE 0.0
NPD_FIELD_NAME 0.0
NPD_FACILITY_CODE 0.0
NPD_FACILITY_NAME 0.0
ON_STREAM_HRS 0.08
AVG_DOWNHOLE_PRESSURE 1.77
AVG_DOWNHOLE_TEMPERATURE 1.77
AVG_DP_TUBING 1.77
AVG_ANNULUS_PRESS 2.06
AVG_CHOKE_SIZE_P 1.79
AVG_CHOKE_UOM 1.73
AVG_WHP_P 1.73
AVG_WHT_P 1.73
DP_CHOKE_SIZE 0.08
BORE_OIL_VOL 1.73
BORE_GAS_VOL 1.73
BORE_WAT_VOL 1.73
BORE_WI_VOL 2.65
FLOW_KIND 0.0
WELL_TYPE 0.0


In [53]:
data.isna().sum().sum() / (data.shape[0]*data.shape[1]) * 100

22.33033772547013

22.33% of our data is NaN values. <br />
Percentage NaN values by columns is given below
| Column Name                | Percentage NaN |
|----------------------------|----------------|
| DATEPRD                    | 0.0            |
| WELL_BORE_CODE             | 0.0            |
| NPD_WELL_BORE_CODE         | 0.0            |
| NPD_WELL_BORE_NAME         | 0.0            |
| NPD_FIELD_CODE             | 0.0            |
| NPD_FIELD_NAME             | 0.0            |
| NPD_FACILITY_CODE          | 0.0            |
| NPD_FACILITY_NAME          | 0.0            |
| ON_STREAM_HRS              | 0.08           |
| AVG_DOWNHOLE_PRESSURE      | 1.77           |
| AVG_DOWNHOLE_TEMPERATURE   | 1.77           |
| AVG_DP_TUBING              | 1.77           |
| AVG_ANNULUS_PRESS          | 2.06           |
| AVG_CHOKE_SIZE_P           | 1.79           |
| AVG_CHOKE_UOM              | 1.73           |
| AVG_WHP_P                  | 1.73           |
| AVG_WHT_P                  | 1.73           |
| DP_CHOKE_SIZE              | 0.08           |
| BORE_OIL_VOL               | 1.73           |
| BORE_GAS_VOL               | 1.73           |
| BORE_WAT_VOL               | 1.73           |
| BORE_WI_VOL                | 2.65           |
| FLOW_KIND                  | 0.0            |
| WELL_TYPE                  | 0.0            |


In [80]:
np.unique(data['NPD_WELL_BORE_CODE'])

array([5351, 5599, 5693, 5769, 7078, 7289, 7405], dtype=int64)

In [88]:
grouped_data = data.groupby('NPD_WELL_BORE_CODE')

In [89]:
dataframes = {well_bore_code : group for well_bore_code, group in grouped_data}

In [111]:
data_well1, data_well2, data_well3, data_well4, data_well5, data_well6, data_well7 = dataframes.values()

In [116]:
print(data_well1.shape, data_well2.shape, data_well3.shape, data_well4.shape, data_well5.shape, data_well6.shape, data_well7.shape)

(3056, 24) (3056, 24) (3327, 24) (3306, 24) (1165, 24) (978, 24) (746, 24)


## Analysis for well NO 15/9-F-14 H || 5351 starts below

In [122]:
data_well1.head()

Unnamed: 0,DATEPRD,WELL_BORE_CODE,NPD_WELL_BORE_CODE,NPD_WELL_BORE_NAME,NPD_FIELD_CODE,NPD_FIELD_NAME,NPD_FACILITY_CODE,NPD_FACILITY_NAME,ON_STREAM_HRS,AVG_DOWNHOLE_PRESSURE,AVG_DOWNHOLE_TEMPERATURE,AVG_DP_TUBING,AVG_ANNULUS_PRESS,AVG_CHOKE_SIZE_P,AVG_CHOKE_UOM,AVG_WHP_P,AVG_WHT_P,DP_CHOKE_SIZE,BORE_OIL_VOL,BORE_GAS_VOL,BORE_WAT_VOL,BORE_WI_VOL,FLOW_KIND,WELL_TYPE
4967,2008-02-12,NO 15/9-F-14 H,5351,15/9-F-14,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,0.0,,%,0.0,0.0,0.05885,0.0,0.0,0.0,,production,OP
4968,2008-02-13,NO 15/9-F-14 H,5351,15/9-F-14,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,0.0,,%,0.0,0.0,0.06768,0.0,0.0,0.0,,production,OP
4969,2008-02-14,NO 15/9-F-14 H,5351,15/9-F-14,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,0.0,,%,0.0,0.0,0.0495,0.0,0.0,0.0,,production,OP
4970,2008-02-15,NO 15/9-F-14 H,5351,15/9-F-14,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,6e-05,,%,0.0,0.0,0.0664,0.0,0.0,0.0,,production,OP
4971,2008-02-16,NO 15/9-F-14 H,5351,15/9-F-14,3420717,VOLVE,369304,MÆRSK INSPIRER,0.0,0.0,0.0,0.0,6e-05,,%,0.0,0.0,0.10479,0.0,0.0,0.0,,production,OP


In [125]:
data_well1.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3056 entries, 4967 to 8022
Data columns (total 24 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   DATEPRD                   3056 non-null   datetime64[ns]
 1   WELL_BORE_CODE            3056 non-null   object        
 2   NPD_WELL_BORE_CODE        3056 non-null   int64         
 3   NPD_WELL_BORE_NAME        3056 non-null   object        
 4   NPD_FIELD_CODE            3056 non-null   int64         
 5   NPD_FIELD_NAME            3056 non-null   object        
 6   NPD_FACILITY_CODE         3056 non-null   int64         
 7   NPD_FACILITY_NAME         3056 non-null   object        
 8   ON_STREAM_HRS             3056 non-null   float64       
 9   AVG_DOWNHOLE_PRESSURE     3050 non-null   float64       
 10  AVG_DOWNHOLE_TEMPERATURE  3050 non-null   float64       
 11  AVG_DP_TUBING             3050 non-null   float64       
 12  AVG_ANNULUS_PRESS     

In [127]:
data_well1.nunique()

DATEPRD                     3056
WELL_BORE_CODE                 1
NPD_WELL_BORE_CODE             1
NPD_WELL_BORE_NAME             1
NPD_FIELD_CODE                 1
NPD_FIELD_NAME                 1
NPD_FACILITY_CODE              1
NPD_FACILITY_NAME              1
ON_STREAM_HRS                280
AVG_DOWNHOLE_PRESSURE       2837
AVG_DOWNHOLE_TEMPERATURE    2733
AVG_DP_TUBING               2900
AVG_ANNULUS_PRESS           1482
AVG_CHOKE_SIZE_P            1672
AVG_CHOKE_UOM                  1
AVG_WHP_P                   2894
AVG_WHT_P                   2862
DP_CHOKE_SIZE               3041
BORE_OIL_VOL                2705
BORE_GAS_VOL                2723
BORE_WAT_VOL                2676
BORE_WI_VOL                    0
FLOW_KIND                      1
WELL_TYPE                      1
dtype: int64