# GIS Wind-Methane Dispersion Analysis - Exploratory Data Analysis

This notebook provides an initial exploration of the methane sensor data and wind data from an industrial site in the Permian Basin, Texas. We'll perform the following tasks:

1. Load and inspect the data
2. Clean and preprocess the datasets
3. Perform exploratory visualizations
4. Initial analysis of the relationship between wind patterns and methane dispersion

## 1. Setup and Data Loading

First, let's import the necessary libraries and load our datasets.

In [None]:
# Import libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Point
import folium
import branca.colormap as cm
from datetime import datetime, timedelta

# Set up plotting parameters
plt.style.use('seaborn-whitegrid')
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
sns.set_style("whitegrid")

In [None]:
# Define file paths
# Adjust these paths as needed
project_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
methane_path = os.path.join(project_dir, 'data', 'methane_sensors.csv')
wind_path = os.path.join(project_dir, 'data', 'wind_data.csv')

# Alternative direct paths if needed
if not os.path.exists(methane_path):
    methane_path = r"C:\Users\pradeep dubey\Downloads\methane_sensors.csv"
    wind_path = r"C:\Users\pradeep dubey\Downloads\wind_data.csv"

# Load the data
methane_df = pd.read_csv(methane_path)
wind_df = pd.read_csv(wind_path)

## 2. Initial Data Inspection

Let's examine the structure and content of our datasets.

In [None]:
# Display first few rows of methane data
print("Methane Sensor Data:")
display(methane_df.head())

# Display basic information about the methane data
print("\nMethane Data Info:")
methane_df.info()

# Display summary statistics for methane data
print("\nMethane Data Summary Statistics:")
display(methane_df.describe())

In [None]:
# Display first few rows of wind data
print("Wind Data:")
display(wind_df.head())

# Display basic information about the wind data
print("\nWind Data Info:")
wind_df.info()

# Display summary statistics for wind data
print("\nWind Data Summary Statistics:")
display(wind_df.describe())

## 3. Data Preprocessing

Now let's preprocess our data by:
1. Converting timestamps to datetime objects
2. Converting methane data to a GeoDataFrame with geometry
3. Adding wind vector components for visualization

In [None]:
# Convert timestamps to datetime
methane_df['Timestamp'] = pd.to_datetime(methane_df['Timestamp'])
wind_df['Timestamp'] = pd.to_datetime(wind_df['Timestamp'])

# Check for missing values
print("Missing values in methane data:")
print(methane_df.isnull().sum())

print("\nMissing values in wind data:")
print(wind_df.isnull().sum())

In [None]:
# Convert methane data to GeoDataFrame
geometry = [Point(xy) for xy in zip(methane_df['Longitude'], methane_df['Latitude'])]
methane_gdf = gpd.GeoDataFrame(methane_df, geometry=geometry, crs="EPSG:4326")

print(f"Created GeoDataFrame with {len(methane_gdf)} records and CRS: {methane_gdf.crs}")
display(methane_gdf.head())

In [None]:
# Calculate wind vector components
# Convert wind direction from degrees to radians for vector calculations
# Note: Wind direction in meteorology is where the wind is coming FROM
# For vector calculations, we need where the wind is going TO (opposite direction)
wind_df['Wind_Direction_Rad'] = np.radians((wind_df['Wind_Direction (°)'] + 180) % 360)

# Calculate wind vector components (U: West-East, V: South-North)
wind_df['U'] = -wind_df['Wind_Speed (m/s)'] * np.sin(wind_df['Wind_Direction_Rad'])
wind_df['V'] = -wind_df['Wind_Speed (m/s)'] * np.cos(wind_df['Wind_Direction_Rad'])

display(wind_df.head())

## 4. Data Visualization

Let's create some exploratory visualizations to understand our data better.

In [None]:
# Visualize methane concentration over time for all sensors
plt.figure(figsize=(14, 8))

# Different color for each sensor
for sensor in methane_gdf['Sensor_ID'].unique():
    sensor_data = methane_gdf[methane_gdf['Sensor_ID'] == sensor]
    plt.plot(sensor_data['Timestamp'], sensor_data['Methane_Concentration (ppm)'], 
             marker='.', linestyle='-', label=sensor)

plt.title('Methane Concentration Over Time for All Sensors', fontsize=16)
plt.xlabel('Time', fontsize=14)
plt.ylabel('Methane Concentration (ppm)', fontsize=14)
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.legend(title='Sensor ID')
plt.tight_layout()
plt.show()

In [None]:
# Visualize wind direction and speed over time
fig, ax1 = plt.subplots(figsize=(14, 8))

# Plot wind speed
ax1.plot(wind_df['Timestamp'], wind_df['Wind_Speed (m/s)'], 'b-', marker='o', label='Wind Speed')
ax1.set_xlabel('Time', fontsize=14)
ax1.set_ylabel('Wind Speed (m/s)', color='b', fontsize=14)
ax1.tick_params(axis='y', labelcolor='b')
ax1.grid(True, alpha=0.3)

# Create second y-axis for wind direction
ax2 = ax1.twinx()
ax2.plot(wind_df['Timestamp'], wind_df['Wind_Direction (°)'], 'r-', marker='x', label='Wind Direction')
ax2.set_ylabel('Wind Direction (degrees)', color='r', fontsize=14)
ax2.tick_params(axis='y', labelcolor='r')
ax2.set_ylim([0, 360])

# Add horizontal lines at cardinal directions
ax2.axhline(y=0, color='gray', linestyle='--', alpha=0.5)   # North
ax2.axhline(y=90, color='gray', linestyle='--', alpha=0.5)  # East
ax2.axhline(y=180, color='gray', linestyle='--', alpha=0.5) # South
ax2.axhline(y=270, color='gray', linestyle='--', alpha=0.5) # West

# Add text labels for cardinal directions
ax2.text(wind_df['Timestamp'].iloc[0], 0, 'N', color='gray')
ax2.text(wind_df['Timestamp'].iloc[0], 90, 'E', color='gray')
ax2.text(wind_df['Timestamp'].iloc[0], 180, 'S', color='gray')
ax2.text(wind_df['Timestamp'].iloc[0], 270, 'W', color='gray')

# Add title and legend
plt.title('Wind Speed and Direction Over Time', fontsize=16)
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Create a simple visualization of sensor locations
fig, ax = plt.subplots(figsize=(12, 10))

# Get a single timestamp to visualize
timestamp = methane_gdf['Timestamp'].unique()[23]  # Noon timestamp
timestamped_data = methane_gdf[methane_gdf['Timestamp'] == timestamp]

# Create scatter plot with sensor locations
scatter = ax.scatter(
    timestamped_data.geometry.x,
    timestamped_data.geometry.y,
    c=timestamped_data['Methane_Concentration (ppm)'],
    cmap='YlOrRd',
    s=100,
    edgecolor='k'
)

# Add sensor labels
for idx, row in timestamped_data.iterrows():
    ax.annotate(
        row['Sensor_ID'], 
        (row.geometry.x, row.geometry.y),
        xytext=(5, 5),
        textcoords="offset points",
        fontsize=12,
        fontweight='bold'
    )

# Add colorbar
cbar = plt.colorbar(scatter)
cbar.set_label('Methane Concentration (ppm)', fontsize=12)

# Add title and axis labels
plt.title(f'Methane Sensor Locations and Concentrations\n{timestamp}', fontsize=16)
plt.xlabel('Longitude', fontsize=14)
plt.ylabel('Latitude', fontsize=14)
plt.grid(True, alpha=0.3)
plt.axis('equal')
plt.tight_layout()
plt.show()

## 5. Merge Data and Explore Relationships

Now let's merge the methane and wind datasets to explore relationships between wind patterns and methane concentrations.

In [None]:
# Merge datasets based on timestamp
merged_gdf = methane_gdf.merge(wind_df, on='Timestamp')

print(f"Merged dataset shape: {merged_gdf.shape}")
display(merged_gdf.head())

In [None]:
# Add hour of day column for analysis
merged_gdf['Hour'] = merged_gdf['Timestamp'].dt.hour

# Visualize methane concentrations by hour of day
plt.figure(figsize=(12, 6))
sns.boxplot(x='Hour', y='Methane_Concentration (ppm)', data=merged_gdf)
plt.title('Methane Concentration by Hour of Day', fontsize=16)
plt.xlabel('Hour of Day', fontsize=14)
plt.ylabel('Methane Concentration (ppm)', fontsize=14)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Analyze relationship between wind direction and methane concentration
merged_gdf['Wind_Direction_Bin'] = np.round(merged_gdf['Wind_Direction (°)'] / 10) * 10

plt.figure(figsize=(14, 6))
sns.boxplot(x='Wind_Direction_Bin', y='Methane_Concentration (ppm)', data=merged_gdf)
plt.title('Methane Concentration by Wind Direction', fontsize=16)
plt.xlabel('Wind Direction (binned to nearest 10°)', fontsize=14)
plt.ylabel('Methane Concentration (ppm)', fontsize=14)
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Analyze relationship between wind speed and methane concentration
plt.figure(figsize=(10, 6))
plt.scatter(
    merged_gdf['Wind_Speed (m/s)'], 
    merged_gdf['Methane_Concentration (ppm)'],
    alpha=0.6
)

# Add trend line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
    merged_gdf['Wind_Speed (m/s)'], 
    merged_gdf['Methane_Concentration (ppm)']
)
x = np.array([merged_gdf['Wind_Speed (m/s)'].min(), merged_gdf['Wind_Speed (m/s)'].max()])
plt.plot(x, intercept + slope*x, 'r--', label=f'Trend: y={slope:.3f}x+{intercept:.3f}, R²={r_value**2:.3f}')

plt.title('Wind Speed vs Methane Concentration', fontsize=16)
plt.xlabel('Wind Speed (m/s)', fontsize=14)
plt.ylabel('Methane Concentration (ppm)', fontsize=14)
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
# Create a heatmap showing methane concentration by time and sensor
pivot_df = merged_gdf.pivot_table(
    index='Hour', 
    columns='Sensor_ID', 
    values='Methane_Concentration (ppm)', 
    aggfunc='mean'
)

plt.figure(figsize=(12, 8))
sns.heatmap(pivot_df, annot=True, cmap='YlOrRd', fmt='.2f', linewidths=0.5)
plt.title('Average Methane Concentration by Hour and Sensor', fontsize=16)
plt.xlabel('Sensor ID', fontsize=14)
plt.ylabel('Hour of Day', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Create an interactive folium map for a specific timestamp
# Choose noon as it typically has the highest methane concentrations
noon_timestamp = pd.Timestamp('2025-02-10 12:00:00')
noon_data = methane_gdf[methane_gdf['Timestamp'] == noon_timestamp]
noon_wind = wind_df[wind_df['Timestamp'] == noon_timestamp].iloc[0]

# Create map centered on the mean location of sensors
m = folium.Map(
    location=[noon_data.geometry.y.mean(), noon_data.geometry.x.mean()],
    zoom_start=15,
    tiles='OpenStreetMap'
)

# Create a color map for methane concentration
vmin = methane_gdf['Methane_Concentration (ppm)'].min()
vmax = methane_gdf['Methane_Concentration (ppm)'].max()
colormap = cm.LinearColormap(
    colors=['green', 'yellow', 'orange', 'red'],
    vmin=vmin,
    vmax=vmax,
    caption='Methane Concentration (ppm)'
)

# Add colormap to map
m.add_child(colormap)

# Add markers for each sensor
for idx, row in noon_data.iterrows():
    color = colormap(row['Methane_Concentration (ppm)'])
    
    popup_html = f"""
    <div style="width: 200px">
        <h4>Sensor: {row['Sensor_ID']}</h4>
        <b>Methane:</b> {row['Methane_Concentration (ppm)']:.2f} ppm<br>
        <b>Time:</b> {row['Timestamp'].strftime('%Y-%m-%d %H:%M')}<br>
        <b>Location:</b> ({row.geometry.y:.5f}, {row.geometry.x:.5f})
    </div>
    """
    
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=10,
        color='black',
        weight=1,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=folium.Popup(popup_html, max_width=300),
        tooltip=f"Sensor {row['Sensor_ID']}: {row['Methane_Concentration (ppm)']:.2f} ppm"
    ).add_to(m)

# Add wind vector information
wind_speed = noon_wind['Wind_Speed (m/s)']
wind_direction = noon_wind['Wind_Direction (°)']
u = noon_wind['U']  # East-West component
v = noon_wind['V']  # North-South component

# Add wind vector legend
legend_lat = noon_data.geometry.y.min() + (noon_data.geometry.y.max() - noon_data.geometry.y.min()) * 0.05
legend_lon = noon_data.geometry.x.min() + (noon_data.geometry.x.max() - noon_data.geometry.x.min()) * 0.05

folium.Marker(
    location=[legend_lat, legend_lon],
    icon=folium.DivIcon(
        icon_size=(200, 36),
        icon_anchor=(0, 0),
        html=f'<div style="font-size: 12pt;">Wind: {wind_speed:.1f} m/s, {wind_direction:.0f}°</div>'
    )
).add_to(m)

# Display the map
m

## 6. Conclusions from Exploratory Analysis

Based on our initial exploration, we can draw the following conclusions:

1. **Temporal Patterns**: 
   - Methane concentrations show clear temporal patterns, with higher values generally observed around midday.
   - Different sensors show varying patterns, suggesting spatial variability in methane emission sources.

2. **Wind Influence**:
   - Wind direction appears to have a relationship with methane concentrations, with certain wind directions associated with higher readings.
   - Wind speed shows a modest relationship with methane concentrations.

3. **Spatial Distribution**:
   - Methane concentrations vary significantly across sensor locations, suggesting localized emission sources.
   - The spatial pattern combined with wind data could help identify potential emission sources.

## 7. Next Steps

For further analysis, we should:

1. Use spatial interpolation methods to estimate methane concentrations between sensor locations.
2. Apply clustering algorithms to identify high-risk zones.
3. Develop predictive models that incorporate wind data to forecast methane dispersion.
4. Create a dashboard for interactive exploration of the data.