# Data Exploration for Biochar Application in Brazil

In this notebook, we will explore and visualize the raw data related to soil and biomass in Brazil. The goal is to understand the data structure and identify key features that may influence biochar application suitability.

In [1]:
# Import necessary libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load soil data
soil_data = pd.read_csv('../data/raw/soil_data.csv')
print(soil_data.head())

In [3]:
# Load biomass data
biomass_data = pd.read_csv('../data/raw/biomass_data.csv')
print(biomass_data.head())

In [4]:
# Visualize soil properties distribution
plt.figure(figsize=(12, 6))
sns.histplot(soil_data['ph'], bins=30, kde=True)
plt.title('Distribution of Soil pH')
plt.xlabel('pH')
plt.ylabel('Frequency')
plt.show()

In [5]:
# Correlation heatmap of soil properties
plt.figure(figsize=(10, 8))
correlation_matrix = soil_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap of Soil Properties')
plt.show()

In [6]:
# Map visualization of biomass data
biomass_gdf = gpd.GeoDataFrame(biomass_data, geometry=gpd.points_from_xy(biomass_data.longitude, biomass_data.latitude))
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

plt.figure(figsize=(15, 10))
world.boundary.plot()
biomass_gdf.plot(marker='o', color='red', markersize=5, ax=plt.gca())
plt.title('Biomass Locations in Brazil')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

## Spatial Encoding with H3

To support uniform spatial analysis across Brazil, we use **Uber's H3 hexagonal spatial indexing system**. H3 divides the Earth's surface into hexagons of equal area, simplifying spatial aggregation and visualization.

Below, we visualize the biomass data encoded into H3 cells to preview the spatial resolution of our grid system.

In [7]:
# Spatial Encoding with H3 (Preview)
import h3
from shapely.geometry import Polygon

# Choose H3 resolution for preview (5â€“7 common for national scale)
H3_RESOLUTION = 6

# Encode biomass points into H3 cells
biomass_data['h3_index'] = biomass_data.apply(lambda row: h3.geo_to_h3(row['latitude'], row['longitude'], H3_RESOLUTION), axis=1)

# Create H3 polygons for visualization
h3_polygons = []
for h in biomass_data['h3_index'].unique():
    boundary = h3.h3_to_geo_boundary(h, geo_json=True)
    poly = Polygon(boundary)
    h3_polygons.append({'h3_index': h, 'geometry': poly})

h3_gdf = gpd.GeoDataFrame(h3_polygons, geometry='geometry', crs='EPSG:4326')

# Plot H3 grid overlay with biomass points
fig, ax = plt.subplots(figsize=(10, 10))
h3_gdf.boundary.plot(ax=ax, linewidth=0.5, color='gray')
biomass_gdf.plot(ax=ax, color='red', markersize=3)
plt.title(f'H3 Grid (Resolution {H3_RESOLUTION}) over Biomass Points')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

## Conclusion

In this notebook, we explored the raw soil and biomass data. We visualized the distribution of soil pH, examined correlations between soil properties, mapped biomass locations across Brazil, and overlaid an H3 hexagonal grid to preview the spatial structure for subsequent suitability analysis.