# Basic Temperature Data Extraction with GEEMAP

This tutorial will teach you how to extract climate data using Google Earth Engine and GEEMAP for health research.

## Learning Objectives
- Initialize Google Earth Engine
- Define study locations and time periods
- Extract temperature data from ERA5
- Export data for analysis

## Prerequisites
- Google Earth Engine account
- Basic Python knowledge

## Step 1: Import Libraries and Initialize

In [None]:
# Import required libraries
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Initialize Google Earth Engine
try:
    ee.Initialize()
    print("✅ Google Earth Engine initialized successfully!")
except Exception as e:
    print(f"❌ Error: {e}")
    print("Please run 'earthengine authenticate' in your terminal first")

## Step 2: Define Your Study Area

Choose coordinates for your location of interest. You can use:
- Google Maps to find coordinates
- GPS coordinates from field work
- Administrative boundaries

In [None]:
# Define your study location (example: Cape Town, South Africa)
study_lat = -33.9249
study_lon = 18.4241
location_name = "Cape Town"

# Create a point geometry
study_point = ee.Geometry.Point([study_lon, study_lat])

# Create a buffer area around the point (10km radius)
buffer_km = 10
study_area = study_point.buffer(buffer_km * 1000)  # Convert km to meters

print(f"📍 Study Location: {location_name}")
print(f"📐 Coordinates: {study_lat}, {study_lon}")
print(f"🔄 Buffer radius: {buffer_km} km")

## Step 3: Visualize Your Study Area

In [None]:
# Create an interactive map
Map = geemap.Map(center=[study_lat, study_lon], zoom=10)

# Add the study area to the map
Map.addLayer(study_area, {'color': 'red'}, 'Study Area')
Map.addLayer(study_point, {'color': 'blue'}, 'Center Point')

# Display the map
Map

## Step 4: Define Time Period

Set the start and end dates for your climate data extraction.

In [None]:
# Define time period for analysis
start_date = '2020-01-01'
end_date = '2022-12-31'

# Calculate the number of days
start_dt = datetime.strptime(start_date, '%Y-%m-%d')
end_dt = datetime.strptime(end_date, '%Y-%m-%d')
num_days = (end_dt - start_dt).days + 1

print(f"📅 Study Period: {start_date} to {end_date}")
print(f"⏱️  Duration: {num_days} days (~{num_days/365.25:.1f} years)")

## Step 5: Load Climate Data

We'll use ERA5-Land daily data, which provides high-quality temperature information.

In [None]:
# Load ERA5-Land daily temperature data
era5_collection = ee.ImageCollection('ECMWF/ERA5_LAND/DAILY_AGGR') \
    .filterDate(start_date, end_date) \
    .filterBounds(study_area)

print(f"📊 Found {era5_collection.size().getInfo()} daily temperature records")

# Function to convert temperature from Kelvin to Celsius
def convert_temperature(image):
    # Select temperature bands and convert K to °C
    temp_c = image.select(['temperature_2m_max', 'temperature_2m_mean']) \
        .subtract(273.15) \
        .copyProperties(image, ['system:time_start'])
    
    # Rename bands for clarity
    return temp_c.select(
        ['temperature_2m_max', 'temperature_2m_mean'],
        ['tmax_celsius', 'tmean_celsius']
    )

# Apply temperature conversion
temperature_data = era5_collection.map(convert_temperature)

print("✅ Temperature data loaded and converted to Celsius")

## Step 6: Extract Temperature Time Series

This function extracts daily temperature values for your study location.

In [None]:
def extract_daily_temperature(collection, point, start_date, end_date):
    """
    Extract daily temperature time series from ERA5 data
    
    Parameters:
    - collection: ERA5 image collection
    - point: Geographic point (ee.Geometry.Point)
    - start_date: Start date string (YYYY-MM-DD)
    - end_date: End date string (YYYY-MM-DD)
    
    Returns:
    - pandas DataFrame with daily temperature data
    """
    
    print("🔄 Extracting temperature time series...")
    
    # Get the time series data
    time_series = collection.getRegion(point, 1000).getInfo()
    
    # Convert to DataFrame
    header = time_series[0]
    data_rows = time_series[1:]
    
    # Create DataFrame
    df = pd.DataFrame(data_rows, columns=header)
    
    # Convert time to datetime
    df['datetime'] = pd.to_datetime(df['time'], unit='ms')
    df['date'] = df['datetime'].dt.date
    
    # Clean and organize data
    temp_df = df[['date', 'tmax_celsius', 'tmean_celsius']].copy()
    temp_df = temp_df.dropna()  # Remove any missing values
    temp_df = temp_df.sort_values('date').reset_index(drop=True)
    
    print(f"✅ Extracted {len(temp_df)} daily temperature records")
    
    return temp_df

# Extract the temperature data (this may take a minute or two)
daily_temps = extract_daily_temperature(temperature_data, study_point, start_date, end_date)

# Display first few records
print("\n📋 First 10 records:")
print(daily_temps.head(10))

## Step 7: Basic Data Analysis and Visualization

In [None]:
# Convert date column to datetime for plotting
daily_temps['date'] = pd.to_datetime(daily_temps['date'])

# Calculate basic statistics
print("📊 TEMPERATURE STATISTICS")
print("=" * 40)
print(f"Maximum Temperature:")
print(f"  Mean: {daily_temps['tmax_celsius'].mean():.1f}°C")
print(f"  Min:  {daily_temps['tmax_celsius'].min():.1f}°C")
print(f"  Max:  {daily_temps['tmax_celsius'].max():.1f}°C")
print(f"\nMean Temperature:")
print(f"  Mean: {daily_temps['tmean_celsius'].mean():.1f}°C")
print(f"  Min:  {daily_temps['tmean_celsius'].min():.1f}°C")
print(f"  Max:  {daily_temps['tmean_celsius'].max():.1f}°C")

# Create visualization
fig, axes = plt.subplots(2, 1, figsize=(12, 10))
fig.suptitle(f'Temperature Analysis for {location_name}\n{start_date} to {end_date}', 
             fontsize=14, fontweight='bold')

# Time series plot
axes[0].plot(daily_temps['date'], daily_temps['tmax_celsius'], 
            'r-', alpha=0.7, linewidth=0.8, label='Daily Maximum')
axes[0].plot(daily_temps['date'], daily_temps['tmean_celsius'], 
            'b-', alpha=0.7, linewidth=0.8, label='Daily Mean')
axes[0].set_title('Daily Temperature Time Series')
axes[0].set_ylabel('Temperature (°C)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Temperature distribution
axes[1].hist(daily_temps['tmax_celsius'], bins=30, alpha=0.7, 
            color='red', label='Maximum Temperature', density=True)
axes[1].hist(daily_temps['tmean_celsius'], bins=30, alpha=0.7, 
            color='blue', label='Mean Temperature', density=True)
axes[1].set_title('Temperature Distribution')
axes[1].set_xlabel('Temperature (°C)')
axes[1].set_ylabel('Density')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 8: Calculate Monthly Averages

Monthly averages are often more useful for health analysis as they smooth out daily variations.

In [None]:
# Add year and month columns
daily_temps['year'] = daily_temps['date'].dt.year
daily_temps['month'] = daily_temps['date'].dt.month
daily_temps['year_month'] = daily_temps['date'].dt.to_period('M')

# Calculate monthly averages
monthly_averages = daily_temps.groupby('year_month').agg({
    'tmax_celsius': ['mean', 'std', 'min', 'max'],
    'tmean_celsius': ['mean', 'std', 'min', 'max']
}).round(2)

# Flatten column names
monthly_averages.columns = ['_'.join(col).strip() for col in monthly_averages.columns]
monthly_averages = monthly_averages.reset_index()

# Add a proper date column
monthly_averages['date'] = monthly_averages['year_month'].dt.to_timestamp()

print("📅 Monthly Temperature Averages")
print("=" * 50)
print(monthly_averages[['year_month', 'tmax_celsius_mean', 'tmean_celsius_mean']].head(10))

# Plot monthly averages
plt.figure(figsize=(12, 6))
plt.plot(monthly_averages['date'], monthly_averages['tmax_celsius_mean'], 
         'ro-', label='Monthly Avg Maximum Temp', markersize=4)
plt.plot(monthly_averages['date'], monthly_averages['tmean_celsius_mean'], 
         'bo-', label='Monthly Avg Mean Temp', markersize=4)
plt.title(f'Monthly Average Temperatures - {location_name}')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Step 9: Export Data for Further Analysis

In [None]:
# Create export directory if it doesn't exist
import os
os.makedirs('../data', exist_ok=True)

# Prepare data for export
daily_export = daily_temps[['date', 'tmax_celsius', 'tmean_celsius']].copy()
monthly_export = monthly_averages[['year_month', 'tmax_celsius_mean', 'tmean_celsius_mean']].copy()
monthly_export.columns = ['month', 'avg_tmax_celsius', 'avg_tmean_celsius']

# Export to CSV
daily_filename = f"../data/{location_name.lower().replace(' ', '_')}_daily_temp_{start_date[:4]}_{end_date[:4]}.csv"
monthly_filename = f"../data/{location_name.lower().replace(' ', '_')}_monthly_temp_{start_date[:4]}_{end_date[:4]}.csv"

daily_export.to_csv(daily_filename, index=False)
monthly_export.to_csv(monthly_filename, index=False)

# Export to Excel
excel_filename = f"../data/{location_name.lower().replace(' ', '_')}_temperature_data_{start_date[:4]}_{end_date[:4]}.xlsx"
with pd.ExcelWriter(excel_filename) as writer:
    daily_export.to_excel(writer, sheet_name='Daily_Data', index=False)
    monthly_export.to_excel(writer, sheet_name='Monthly_Averages', index=False)

print("💾 Data exported successfully!")
print(f"📁 Files created:")
print(f"   • {daily_filename}")
print(f"   • {monthly_filename}")
print(f"   • {excel_filename}")

print(f"\n📊 Summary:")
print(f"   • Daily records: {len(daily_export)}")
print(f"   • Monthly records: {len(monthly_export)}")
print(f"   • Temperature range: {daily_temps['tmean_celsius'].min():.1f}°C to {daily_temps['tmean_celsius'].max():.1f}°C")

## Next Steps

Now that you have extracted temperature data, you can:

1. **Integrate with Health Data**: Merge your temperature data with health outcome data
2. **Statistical Analysis**: Perform correlation analysis, regression, or time series analysis
3. **Advanced Visualization**: Create publication-ready plots and maps
4. **Multiple Locations**: Repeat this process for different study areas
5. **Different Variables**: Extract precipitation, humidity, or other climate variables

## Key Learning Points

✅ **Google Earth Engine Setup**: Initialize and authenticate with GEE  
✅ **Spatial Definition**: Define study areas using coordinates and buffers  
✅ **Temporal Filtering**: Select specific time periods for analysis  
✅ **Data Processing**: Convert units and clean data  
✅ **Time Series Extraction**: Get daily values for statistical analysis  
✅ **Data Export**: Save results in multiple formats (CSV, Excel)  

## Resources

- [Google Earth Engine Documentation](https://developers.google.com/earth-engine)
- [GEEMAP Tutorials](https://geemap.org)
- [ERA5 Dataset Information](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land)
- [Climate Data for Health Research Guide](https://github.com/your-repo)