# Generating Synthetic Oil Well Data
This Jupyter Notebook generates synthetic oil well data and saves it to a CSV file. The data includes information about oil production, operating costs, well locations, API gravity, well depth, well type, and well pressure for a set of randomly generated oil wells.
<div style="position:relative;">
  <img src="image.jpg" style="width:400px; opacity:0.8;">
</div>




## Import necessary libraries
We begin by importing the required Python libraries.

In [6]:
import pandas as pd
import numpy as np
import random
from datetime import timedelta, datetime

## Set the seed for reproducibility
We set a random seed to ensure reproducibility of the generated data.

In [7]:
random.seed(42)

## Define the number of oil wells and line items per well
Here, we specify the number of oil wells to generate and the number of line items (data points) per well.

In [8]:
num_wells = 1000
num_line_items_per_well = 20

## Generate dates within a specific range
We define a start date and an end date to generate random dates for each line item.

In [9]:
start_date = datetime(2015, 1, 1)
end_date = datetime(2022, 12, 31)

## Create lists to store data
We create lists to store the data we will generate for each well.

In [10]:
well_ids = []
dates = []
production_volumes = []
operating_costs = []
well_locations = []
api_gravity = []
well_depths = []
well_types = []
well_pressures = []

## Generate random data for each well
Next, we generate random data for each oil well based on the defined parameters.

In [11]:
for well_id in range(1, num_wells + 1):
    for _ in range(num_line_items_per_well):
        well_ids.append(f'Well_{well_id}')
        dates.append(random.choice(pd.date_range(start_date, end_date)))
        
        # Generate random production volume between 50 and 1000 BBL per day
        production_volumes.append(random.uniform(50, 1000))
        
        # Generate random operating cost based on production volume, well type, and well pressure
        # For simplicity, assume onshore wells have lower operating costs than offshore wells
        # Higher well pressure leads to higher operating costs
        well_type = random.choice(['Onshore', 'Offshore'])
        well_pressure = random.uniform(2000, 5000)  # PSI
        if well_type == 'Onshore':
            operating_costs.append(production_volumes[-1] * 10 - (well_pressure / 1000))
        else:
            operating_costs.append(production_volumes[-1] * 15 + (well_pressure / 1000))
        
        # Generate random well locations based on well_id
        if well_id <= 150:
            well_locations.append('Abqaiq')
        elif well_id <= 470:
            well_locations.append('Ghawar')
        elif well_id <= 560:
            well_locations.append('Haradh')
        elif well_id <= 680:
            well_locations.append('Khurais')
        elif well_id <= 790:
            well_locations.append('Khursaniyah')
        elif well_id <= 870:
            well_locations.append('Manifa')
        elif well_id <= 930:
            well_locations.append('Nuayyim')
        else:
            well_locations.append('Qatif')
        
        # Generate random oil composition based on API gravity and sulfur content
        api_gravity.append(random.choice([random.uniform(36, 45), random.uniform(23, 30), random.uniform(10, 18)]))
        
        # Generate random well depth between 500 and 5000 feet
        well_depths.append(random.uniform(500, 5000))
        
        well_types.append(well_type)
        well_pressures.append(well_pressure)

## Create a list to store the Weight_of_crude_oil based on API gravity
Based on the API gravity values, we determine the weight of the crude oil.

In [12]:
weight_of_crude_oil = []

for gravity in api_gravity:
    if gravity > 31:
        weight_of_crude_oil.append('Light')
    elif 31 >= gravity > 22:
        weight_of_crude_oil.append('Medium')
    else:
        weight_of_crude_oil.append('Heavy')

## Create the DataFrame
We create a DataFrame to organize all the generated data.

In [13]:
data = {
    'Well ID': well_ids,
    'Production Date': dates,
    'Production Volume (BBL/day)': production_volumes,
    'Operating Costs (USD/day)': operating_costs,
    'Well Location': well_locations,
    'Weight of Crude Oil': weight_of_crude_oil,
    'API Gravity': api_gravity,
    'Well Depth (Feet)': well_depths,
    'Well Type': well_types,
    'Well Pressure (PSI)': well_pressures
}
df = pd.DataFrame(data)

## Display the DataFrame

In [14]:
print(df)

         Well ID Production Date  Production Volume (BBL/day)  \
0         Well_1      2022-03-04                   155.764515   
1         Well_1      2019-09-25                    80.193546   
2         Well_1      2019-09-15                   259.418591   
3         Well_1      2017-05-31                   959.352419   
4         Well_1      2020-02-25                   559.416687   
...          ...             ...                          ...   
19995  Well_1000      2015-07-05                   690.178355   
19996  Well_1000      2021-10-07                   810.107971   
19997  Well_1000      2020-03-30                   199.334314   
19998  Well_1000      2017-02-18                   947.876963   
19999  Well_1000      2017-08-26                   836.579350   

       Operating Costs (USD/day) Well Location Weight of Crude Oil  \
0                    2339.202397        Abqaiq               Heavy   
1                     799.279541        Abqaiq               Heavy   
2        

## Save the DataFrame to a CSV file
Finally, we save the DataFrame to a CSV file for further analysis.

In [11]:
df.to_csv('oil_well_data.csv', index=False)
print("Data saved to 'oil_well_data.csv'")

Data saved to 'oil_well_data.csv'
