# Data Exploration for Biochar Application in Brazil

In this notebook, we will explore and visualize the raw data related to soil and biomass in Brazil. The goal is to understand the data structure and identify key features that may influence biochar application suitability.

In [1]:
# Import necessary libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load soil data
soil_data = pd.read_csv('../data/raw/soil_data.csv')
print(soil_data.head())

In [3]:
# Load biomass data
biomass_data = pd.read_csv('../data/raw/biomass_data.csv')
print(biomass_data.head())

In [4]:
# Visualize soil properties distribution
plt.figure(figsize=(12, 6))
sns.histplot(soil_data['ph'], bins=30, kde=True)
plt.title('Distribution of Soil pH')
plt.xlabel('pH')
plt.ylabel('Frequency')
plt.show()

In [5]:
# Correlation heatmap of soil properties
plt.figure(figsize=(10, 8))
correlation_matrix = soil_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap of Soil Properties')
plt.show()

In [6]:
# Map visualization of biomass data
biomass_gdf = gpd.GeoDataFrame(biomass_data, geometry=gpd.points_from_xy(biomass_data.longitude, biomass_data.latitude))
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

plt.figure(figsize=(15, 10))
world.boundary.plot()
biomass_gdf.plot(marker='o', color='red', markersize=5, ax=plt.gca())
plt.title('Biomass Locations in Brazil')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

## Conclusion

In this notebook, we explored the raw soil and biomass data. We visualized the distribution of soil pH, examined correlations between soil properties, and mapped biomass locations across Brazil. These insights will guide further analysis and modeling for biochar application suitability.