# Experiment 6: Time Series Forecasting for Oil & Gas Production

**Course:** Introduction to Deep Learning | **Module:** Sequential Data Analysis

---

## Objective

Design and implement deep learning models for forecasting oil & gas production metrics using LSTM, GRU, and Transformer architectures to predict future production trends and optimize operations.

## Learning Outcomes

By the end of this experiment, you will:

1. Understand time series analysis and forecasting fundamentals
2. Implement LSTM and GRU networks for sequential prediction
3. Apply multi-step forecasting techniques for production planning
4. Handle seasonality, trends, and anomalies in time series data
5. Evaluate forecasting models using industry-standard metrics

## Background & Theory

**Time Series Forecasting** involves predicting future values based on historical sequential data. In oil & gas operations, accurate forecasting is crucial for production optimization, maintenance planning, and resource allocation.

**Key Components:**

- **Trend:** Long-term increase or decrease in data values
- **Seasonality:** Regular patterns that repeat over fixed periods
- **Cyclical Patterns:** Irregular fluctuations over longer periods
- **Noise:** Random variations in the data
- **Anomalies:** Unusual events that deviate from normal patterns

**Mathematical Foundation:**

- LSTM cell: f*t = σ(W_f[h*{t-1}, x*t] + b_f), i_t = σ(W_i[h*{t-1}, x_t] + b_i)
- GRU cell: r*t = σ(W_r[h*{t-1}, x*t]), z_t = σ(W_z[h*{t-1}, x_t])
- Multi-step prediction: ŷ*{t+1:t+h} = f*θ(x\_{t-w+1:t})
- Loss function: L = (1/n)Σ||y_i - ŷ_i||² (MSE) or (1/n)Σ|y_i - ŷ_i| (MAE)

**Applications in Oil & Gas:**

- Production forecasting for reservoir management
- Equipment failure prediction and maintenance scheduling
- Market demand forecasting and supply chain optimization
- Environmental monitoring and compliance prediction
- Economic modeling and investment planning


## Setup & Dependencies

**What to Expect:** This section establishes the Python environment for deep learning-based time series forecasting. We'll install PyTorch for neural networks, specialized time series libraries, and visualization tools for analyzing temporal patterns.

**Process Overview:**

1. **Package Installation:** Install PyTorch, pandas, matplotlib, seaborn, and time series analysis libraries
2. **Environment Configuration:** Set up device detection (CPU/GPU) and random seeds for reproducible experiments
3. **Time Series Tools:** Configure specialized libraries for temporal data analysis and forecasting
4. **Visualization Setup:** Apply ArivuAI styling for professional time series plots and analysis charts
5. **Data Directory Setup:** Establish paths for time series datasets and model outputs

**Expected Outcome:** A fully configured environment ready for time series forecasting with LSTM/GRU networks, including all temporal analysis tools and deep learning frameworks.


In [1]:
# Install required packages
import subprocess, sys
packages = ['torch', 'numpy', 'matplotlib', 'pandas', 'scikit-learn', 'seaborn']
for pkg in packages:
    try: __import__(pkg.replace('-', '_').lower())
    except ImportError: subprocess.check_call([sys.executable, '-m', 'pip', 'install', pkg])

import torch, torch.nn as nn, torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, TensorDataset
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import json, random, time
from pathlib import Path
from datetime import datetime, timedelta

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Data directory setup
DATA_DIR = Path('data')
if not DATA_DIR.exists():
    DATA_DIR = Path('Expirements/data')
if not DATA_DIR.exists():
    DATA_DIR = Path('.')
    print('Warning: Using current directory for data')

# ArivuAI styling
plt.style.use('default')
colors = {'primary': '#004E89', 'secondary': '#3DA5D9', 'accent': '#F1A208', 'dark': '#4F4F4F'}
sns.set_palette([colors['primary'], colors['secondary'], colors['accent'], colors['dark']])

print(f'✓ PyTorch version: {torch.__version__}')
print(f'✓ Device: {device}')
print(f'✓ Data directory: {DATA_DIR.absolute()}')
print('✓ All packages installed and configured')
print('✓ Random seeds set for reproducible results')
print('✓ ArivuAI styling applied')

✓ PyTorch version: 2.8.0+cpu
✓ Device: cpu
✓ Data directory: d:\Suni Files\AI Code Base\Oil and Gas\Oil and Gas Pruthvi College Course Material\Updated\Expirements\Experiment_6_Time_Series_Forecasting\data
✓ All packages installed and configured
✓ Random seeds set for reproducible results
✓ ArivuAI styling applied


## Time Series Data Generation

Create realistic oil & gas production time series with trends, seasonality, and anomalies.


In [2]:
class ProductionTimeSeriesGenerator:
    def __init__(self, config_path):
        """Initialize time series generator with configuration"""
        try:
            with open(config_path, 'r') as f:
                self.config = json.load(f)
            print('✓ Configuration loaded from JSON')
        except FileNotFoundError:
            print('Creating default configuration...')
            self.config = self._create_default_config()
        
        self.time_series_types = self.config['time_series_types']
        self.total_days = self.config['data_generation']['total_days']
    
    def _create_default_config(self):
        """Create default configuration if JSON file not found"""
        return {
            'time_series_types': {
                'oil_production': {'baseline': 2500, 'seasonal_amplitude': 200, 'trend_slope': -0.5, 'noise_level': 50},
                'gas_production': {'baseline': 15000, 'seasonal_amplitude': 1500, 'trend_slope': 0.8, 'noise_level': 300}
            },
            'data_generation': {'total_days': 1095, 'start_date': '2021-01-01'}
        }
    
    def generate_time_series(self):
        """Generate synthetic time series data for all production metrics"""
        # Create date range
        start_date = datetime.strptime(self.config['data_generation']['start_date'], '%Y-%m-%d')
        dates = [start_date + timedelta(days=i) for i in range(self.total_days)]
        
        # Initialize dataframe
        df = pd.DataFrame({'date': dates})
        df['day_of_year'] = df['date'].dt.dayofyear
        df['days_since_start'] = range(self.total_days)
        
        # Generate each time series
        for series_name, config in self.time_series_types.items():
            df[series_name] = self._generate_single_series(df, config)
        
        # Add external factors
        df = self._add_external_factors(df)
        
        # Inject anomalies
        df = self._inject_anomalies(df)
        
        return df
    
    def _generate_single_series(self, df, config):
        """Generate a single time series with trend, seasonality, and noise"""
        baseline = config['baseline']
        seasonal_amp = config['seasonal_amplitude']
        trend_slope = config['trend_slope']
        noise_level = config['noise_level']
        seasonal_period = config.get('seasonal_period', 365)
        
        # Trend component
        trend = trend_slope * df['days_since_start']
        
        # Seasonal component
        seasonal = seasonal_amp * np.sin(2 * np.pi * df['day_of_year'] / seasonal_period)
        
        # Noise component
        noise = np.random.normal(0, noise_level, len(df))
        
        # Combine components
        series = baseline + trend + seasonal + noise
        
        # Ensure positive values for production metrics
        if 'production' in config or 'pressure' in config:
            series = np.maximum(series, baseline * 0.1)
        
        return series
    
    def _add_external_factors(self, df):
        """Add external factors that influence production"""
        # Weather temperature (seasonal)
        df['weather_temperature'] = 25 + 20 * np.sin(2 * np.pi * df['day_of_year'] / 365) + np.random.normal(0, 3, len(df))
        
        # Market price (random walk with volatility)
        price_changes = np.random.normal(0, 2, len(df))
        df['market_price'] = 70 + np.cumsum(price_changes * 0.1)
        
        return df
    
    def _inject_anomalies(self, df):
        """Inject realistic anomalies into the time series"""
        anomaly_config = self.config.get('anomaly_patterns', {})
        
        for anomaly_type, config in anomaly_config.items():
            probability = config['probability']
            duration_range = config['duration_days']
            impact_range = config['production_impact']
            
            # Randomly select days for anomalies
            anomaly_days = np.random.choice(
                len(df), 
                size=int(len(df) * probability), 
                replace=False
            )
            
            for start_day in anomaly_days:
                duration = np.random.randint(duration_range[0], duration_range[1] + 1)
                impact = np.random.uniform(impact_range[0], impact_range[1])
                
                end_day = min(start_day + duration, len(df))
                
                # Apply impact to production metrics
                for col in ['oil_production', 'gas_production']:
                    if col in df.columns:
                        df.loc[start_day:end_day, col] *= (1 + impact)
        
        return df

# Initialize generator and create time series
generator = ProductionTimeSeriesGenerator(DATA_DIR / 'production_timeseries.json')
ts_data = generator.generate_time_series()

print(f'✓ Time series data generated:')
print(f'• Total days: {len(ts_data):,}')
print(f'• Features: {len(ts_data.columns)} ({list(ts_data.columns)})')
print(f'• Date range: {ts_data["date"].min()} to {ts_data["date"].max()}')
print(f'• Data shape: {ts_data.shape}')

# Display basic statistics
print('\nProduction metrics summary:')
production_cols = ['oil_production', 'gas_production', 'water_cut', 'wellhead_pressure']
for col in production_cols:
    if col in ts_data.columns:
        print(f'• {col}: {ts_data[col].mean():.1f} ± {ts_data[col].std():.1f}')

✓ Configuration loaded from JSON
✓ Time series data generated:
• Total days: 1,095
• Features: 10 (['date', 'day_of_year', 'days_since_start', 'oil_production', 'gas_production', 'water_cut', 'wellhead_pressure', 'reservoir_temperature', 'weather_temperature', 'market_price'])
• Date range: 2021-01-01 00:00:00 to 2023-12-31 00:00:00
• Data shape: (1095, 10)

Production metrics summary:
• oil_production: 2017.4 ± 584.2
• gas_production: 14011.3 ± 3912.3
• water_cut: 45.9 ± 6.7
• wellhead_pressure: 1034.9 ± 135.7


## Summary & Validation

This is a simplified version of Experiment 6 for testing. The complete implementation would include LSTM/GRU models, multi-step forecasting, and comprehensive evaluation.

**Key Components Demonstrated:**

- Time series forecasting theory and mathematical foundations
- Realistic oil & gas production data with multiple metrics
- Trend, seasonality, and anomaly injection
- External factors and market influences

**Next Steps:**

- Implement LSTM and GRU architectures for forecasting
- Add sequence preparation and windowing functions
- Create multi-step prediction capabilities
- Include comprehensive evaluation metrics (MAE, RMSE, MAPE)
- Add visualization and forecasting analysis tools
