# Week 1: Introduction to Seoul Heatwave Analysis
## Environment Setup and Course Overview

**Instructor**: Sohn Chul

---

## 🎯 Learning Objectives

By the end of this session, you will be able to:
1. Understand the importance of heatwave analysis in urban environments
2. Set up Python environment for climate data analysis
3. Navigate the S-DoT sensor network data structure
4. Execute basic data loading and exploration tasks
5. Use Git/GitHub for version control

## 1. Project Background

### 1.1 Why Study Urban Heatwaves?

Urban heatwaves pose significant challenges:
- **Public Health**: Increased mortality and morbidity
- **Energy Demand**: Peak cooling loads strain infrastructure
- **Economic Impact**: Reduced productivity and increased healthcare costs
- **Environmental Justice**: Disproportionate impacts on vulnerable populations

### 1.2 Seoul's Climate Context

Seoul experiences:
- Hot, humid summers (June-August)
- Urban Heat Island (UHI) effects
- Rapid urbanization increasing heat vulnerability
- Need for data-driven mitigation strategies

## 2. S-DoT Sensor Network

### 2.1 What is S-DoT?

**S-DoT (Smart Seoul Data of Things)** is Seoul's IoT sensor network:
- Over 1,100 sensors across the city
- Measures environmental parameters every 10 minutes
- Real-time data for urban management

### 2.2 Data Variables

| Variable | Unit | Description |
|----------|------|-------------|
| Temperature | °C | Air temperature |
| Humidity | % | Relative humidity |
| PM2.5 | μg/m³ | Fine particulate matter |
| PM10 | μg/m³ | Coarse particulate matter |
| Noise | dB | Sound level |

## 3. Environment Setup

### 3.1 Check Python Version

In [None]:
import sys
print(f"Python version: {sys.version}")
print(f"Python path: {sys.executable}")

# Should be Python 3.8 or higher
assert sys.version_info >= (3, 8), "Python 3.8 or higher required"

### 3.2 Install Required Packages

In [None]:
# 🌍 Environment-specific Package Installation
import sys

def install_required_packages():
    """
    Install required packages based on the environment
    """
    IN_COLAB = 'google.colab' in sys.modules
    IN_KAGGLE = 'kaggle_secrets' in sys.modules or 'KAGGLE_KERNEL_RUN_TYPE' in os.environ
    
    if IN_COLAB:
        print("📱 Installing packages for Google Colab...")
        
        # Colab-specific installations
        !pip install -q geopandas folium plotly
        !pip install -q scikit-learn statsmodels prophet xgboost
        !pip install -q tqdm python-dotenv openpyxl xlrd
        
        print("✅ Colab packages installed successfully!")
        
    elif IN_KAGGLE:
        print("🏆 Kaggle environment detected - most packages pre-installed")
        
        # Kaggle-specific installations (if needed)
        try:
            import folium
        except ImportError:
            !pip install -q folium
            
    else:
        print("💻 Local environment detected")
        print("📝 Ensure you have installed packages from requirements.txt:")
        print("   pip install -r requirements.txt")
        print("\nCore packages needed:")
        print("   • pandas, numpy, matplotlib, seaborn")
        print("   • geopandas, folium, plotly")
        print("   • scikit-learn, statsmodels")
        print("   • jupyter, ipywidgets, tqdm")

# Uncomment the following line to install packages automatically
install_required_packages()

### 3.3 Import Essential Libraries

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Date handling
from datetime import datetime, timedelta

# File handling
import os
import glob

# Warnings
import warnings
warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully!")

### 3.4 Configure Visualization Settings

In [None]:
# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Configure matplotlib for inline display
%matplotlib inline

# Set default figure size
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

# 🌍 Environment-specific Korean font configuration
def configure_korean_fonts():
    """
    Configure Korean fonts based on the environment
    """
    import sys
    import platform
    import matplotlib.pyplot as plt
    import matplotlib.font_manager as fm
    
    IN_COLAB = 'google.colab' in sys.modules
    
    if IN_COLAB:
        print("  📱 Configuring fonts for Google Colab")
        # Install Korean fonts in Colab
        try:
            import subprocess
            import os
            
            # Download and install Korean font
            subprocess.run(['apt-get', 'install', '-y', 'fonts-nanum'], check=True, capture_output=True)
            
            # Set font
            plt.rcParams['font.family'] = 'NanumGothic'
            print("  ✅ NanumGothic font configured")
            
        except Exception as e:
            print("  ⚠️ Could not install Korean font, using default")
            plt.rcParams['font.family'] = 'DejaVu Sans'
            
    else:
        # Local environment
        system = platform.system()
        print(f"  💻 Configuring fonts for {system}")
        
        if system == 'Darwin':  # macOS
            try:
                plt.rcParams['font.family'] = 'AppleGothic'
                print("  ✅ AppleGothic font configured (macOS)")
            except:
                try:
                    plt.rcParams['font.family'] = 'Arial Unicode MS'
                    print("  ✅ Arial Unicode MS font configured (macOS)")
                except:
                    plt.rcParams['font.family'] = 'DejaVu Sans'
                    print("  ⚠️ Using default font (macOS)")
                    
        elif system == 'Windows':
            try:
                plt.rcParams['font.family'] = 'Malgun Gothic'
                print("  ✅ Malgun Gothic font configured (Windows)")
            except:
                try:
                    plt.rcParams['font.family'] = 'Microsoft YaHei'
                    print("  ✅ Microsoft YaHei font configured (Windows)")
                except:
                    plt.rcParams['font.family'] = 'DejaVu Sans'
                    print("  ⚠️ Using default font (Windows)")
                    
        else:  # Linux
            try:
                plt.rcParams['font.family'] = 'NanumGothic'
                print("  ✅ NanumGothic font configured (Linux)")
            except:
                plt.rcParams['font.family'] = 'DejaVu Sans'
                print("  ⚠️ Using default font (Linux)")
    
    # Prevent minus sign issues
    plt.rcParams['axes.unicode_minus'] = False

# Configure fonts
configure_korean_fonts()

print("✅ Visualization settings configured!")

## 4. Data Exploration

### 4.1 Check Data Directory Structure

In [None]:
# 🌍 Environment Detection and Path Configuration
def setup_environment_paths():
    """
    Automatically detect environment (Local, Colab, Kaggle) and set appropriate paths
    """
    import os
    import sys
    
    # Detect environment
    IN_COLAB = 'google.colab' in sys.modules
    IN_KAGGLE = 'kaggle_secrets' in sys.modules or os.environ.get('KAGGLE_KERNEL_RUN_TYPE')
    
    print("🔍 Environment Detection:")
    
    if IN_COLAB:
        print("  📱 Running in Google Colab")
        
        # Mount Google Drive (optional)
        try:
            from google.colab import drive
            drive.mount('/content/drive')
            
            # Check if data exists in Drive
            drive_data_path = '/content/drive/MyDrive/seoul_heatwave_course/data'
            if os.path.exists(drive_data_path):
                COURSE_DATA_PATH = '/content/drive/MyDrive/seoul_heatwave_course/data'
                print("  ✅ Using Google Drive data path")
            else:
                # Clone from GitHub or use default
                COURSE_DATA_PATH = '/content/seoul_heatwave_course/data'
                print("  📁 Using Colab workspace path")
                
                # GitHub clone instructions
                print("  💡 To get data, run: !git clone https://github.com/KimJiHan/seoul_climate_analysis.git")
                
        except Exception as e:
            COURSE_DATA_PATH = '/content/seoul_heatwave_course/data'
            print("  📁 Using default Colab path")
            
    elif IN_KAGGLE:
        print("  🏆 Running in Kaggle")
        COURSE_DATA_PATH = '/kaggle/input/seoul-heatwave-data'
        
    else:
        print("  💻 Running in Local Environment")
        # Local environment - use relative path
        COURSE_DATA_PATH = '../data'
    
    # Create absolute paths
    SDOT_PATH = os.path.join(COURSE_DATA_PATH, 'raw', 's-dot')
    EXTERNAL_DATA_PATH = os.path.join(COURSE_DATA_PATH, 'external')
    PROCESSED_DATA_PATH = os.path.join(COURSE_DATA_PATH, 'processed')
    
    # SGIS data paths
    SGIS_BOUNDARIES_PATH = os.path.join(EXTERNAL_DATA_PATH, 'sgis_boundaries')
    SGIS_STATISTICS_PATH = os.path.join(EXTERNAL_DATA_PATH, 'sgis_statistics')
    
    return {
        'course_data': COURSE_DATA_PATH,
        'sdot': SDOT_PATH,
        'external': EXTERNAL_DATA_PATH,
        'processed': PROCESSED_DATA_PATH,
        'sgis_boundaries': SGIS_BOUNDARIES_PATH,
        'sgis_statistics': SGIS_STATISTICS_PATH,
        'environment': 'colab' if IN_COLAB else 'kaggle' if IN_KAGGLE else 'local'
    }

# Setup paths
paths = setup_environment_paths()
COURSE_DATA_PATH = paths['course_data']
SDOT_PATH = paths['sdot']
EXTERNAL_DATA_PATH = paths['external']
PROCESSED_DATA_PATH = paths['processed']
SGIS_BOUNDARIES_PATH = paths['sgis_boundaries']
SGIS_STATISTICS_PATH = paths['sgis_statistics']

print(f"\n📁 Course data path: {os.path.abspath(COURSE_DATA_PATH)}")
print(f"📊 S-DoT data path: {os.path.abspath(SDOT_PATH)}")
print(f"📋 External data path: {os.path.abspath(EXTERNAL_DATA_PATH)}")

# Check if S-DoT data directory exists
if os.path.exists(SDOT_PATH):
    print(f"\n✅ S-DoT data directory found: {SDOT_PATH}")
    
    # List CSV files
    csv_files = glob.glob(os.path.join(SDOT_PATH, '*.csv'))
    total_size = sum(os.path.getsize(f) for f in csv_files) / (1024 * 1024 * 1024)  # GB
    
    print(f"📊 Found {len(csv_files)} CSV files (Total: {total_size:.2f} GB)")
    print("📝 Sample files:")
    
    for file in sorted(csv_files)[:5]:  # Show first 5 files
        file_size = os.path.getsize(file) / (1024 * 1024)  # Convert to MB
        print(f"  • {os.path.basename(file)} ({file_size:.1f} MB)")
    
    if len(csv_files) > 5:
        print(f"  ... and {len(csv_files) - 5} more files")
        
    # Check external data
    if os.path.exists(EXTERNAL_DATA_PATH):
        print(f"\n📋 External data files found:")
        
        # Check Excel files
        excel_files = glob.glob(os.path.join(EXTERNAL_DATA_PATH, '*.xlsx'))
        for file in excel_files:
            file_size = os.path.getsize(file) / 1024  # KB
            print(f"  • {os.path.basename(file)} ({file_size:.0f} KB)")
        
        # Check SGIS boundaries
        if os.path.exists(SGIS_BOUNDARIES_PATH):
            print(f"  • SGIS Administrative Boundaries:")
            boundary_dirs = [d for d in os.listdir(SGIS_BOUNDARIES_PATH) 
                           if os.path.isdir(os.path.join(SGIS_BOUNDARIES_PATH, d)) and not d.startswith('.')]
            for boundary_dir in boundary_dirs:
                print(f"    - {boundary_dir}")
        
        # Check SGIS statistics
        if os.path.exists(SGIS_STATISTICS_PATH):
            print(f"  • SGIS Statistics Data: ✅")
    
else:
    print(f"❌ S-DoT data directory not found at {SDOT_PATH}")
    if paths['environment'] == 'colab':
        print("💡 For Google Colab:")
        print("   1. Clone repository: !git clone https://github.com/KimJiHan/seoul_climate_analysis.git")
        print("   2. Or upload data to Google Drive and mount it")
    elif paths['environment'] == 'kaggle':
        print("💡 For Kaggle: Upload data as a Kaggle Dataset")
    else:
        print("💡 For Local: Please check the data setup or run the data preparation script")

### 4.2 Load Sample Data

In [None]:
# Load a sample CSV file to understand the structure
try:
    if 'csv_files' in locals() and csv_files:
        sample_file = csv_files[0]
        print(f"🔍 Loading sample file: {os.path.basename(sample_file)}")
        
        # Try different encodings for Korean text
        encodings_to_try = ['euc-kr', 'cp949', 'utf-8', 'utf-8-sig']
        df_sample = None
        
        for encoding in encodings_to_try:
            try:
                print(f"  Trying encoding: {encoding}")
                df_sample = pd.read_csv(sample_file, encoding=encoding, nrows=1000)
                print(f"  ✅ Successfully loaded with encoding: {encoding}")
                break
            except UnicodeDecodeError as e:
                print(f"  ❌ Failed with {encoding}: {str(e)[:50]}...")
                continue
            except Exception as e:
                print(f"  ❌ Error with {encoding}: {str(e)[:50]}...")
                continue
        
        if df_sample is not None:
            print(f"\n📊 Sample Data Shape: {df_sample.shape}")
            print(f"📋 Column count: {len(df_sample.columns)}")
            print(f"📝 First few columns: {df_sample.columns.tolist()[:8]}")
            
            # Show basic info about the data
            print(f"📅 Date range: {df_sample.iloc[0, 2] if len(df_sample.columns) > 2 else 'N/A'}")
            print(f"📍 Sensor count in sample: {df_sample.iloc[:, 1].nunique() if len(df_sample.columns) > 1 else 'N/A'}")
            
        else:
            raise Exception("Could not decode file with any encoding")
        
except Exception as e:
    print(f"❌ Error loading S-DoT data: {e}")
    print("📝 Creating simulated data for demonstration purposes...")
    
    # Create simulated data for demonstration
    np.random.seed(42)
    dates = pd.date_range('2025-04-28', periods=1000, freq='10min')
    
    # Create realistic Seoul weather data
    df_sample = pd.DataFrame({
        'sensor_id': np.random.choice(['SDOT001', 'SDOT002', 'SDOT003'], 1000),
        'serial_number': np.random.choice(['OC3CL200011', 'OC3CL200027', 'OC3CL200019'], 1000),
        'measurement_time': dates,
        'location_type': np.random.choice(['parks', 'main_street', 'residential'], 1000),
        'district': np.random.choice(['Gangnam-gu', 'Gangdong-gu', 'Yongsan-gu'], 1000),
        'temperature': np.random.normal(22, 5, 1000).round(1),
        'humidity': np.random.uniform(40, 85, 1000).round(0),
        'pm25': np.random.exponential(25, 1000).round(1),
        'pm10': np.random.exponential(40, 1000).round(1),
        'noise': np.random.normal(55, 10, 1000).round(0)
    })
    
    print("✅ Simulated dataset created successfully")
    print(f"📊 Shape: {df_sample.shape}")
    print(f"📋 Columns: {df_sample.columns.tolist()}")

### 4.3 Basic Data Exploration

In [None]:
# Display first few rows
print("📋 First 5 rows of the data:")
df_sample.head()

In [None]:
# Basic statistics
print("📊 Statistical Summary:")
df_sample.describe()

In [None]:
# Data types and missing values
print("🔍 Data Information:")
df_sample.info()

## 5. Simple Visualization

In [None]:
# Create a simple visualization
if 'temperature' in df_sample.columns or 'Temperature' in df_sample.columns:
    temp_col = 'temperature' if 'temperature' in df_sample.columns else 'Temperature'
    
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # Histogram
    axes[0].hist(df_sample[temp_col].dropna(), bins=30, edgecolor='black', alpha=0.7)
    axes[0].set_xlabel('Temperature (°C)')
    axes[0].set_ylabel('Frequency')
    axes[0].set_title('Temperature Distribution')
    axes[0].grid(True, alpha=0.3)
    
    # Time series (if datetime column exists)
    if 'datetime' in df_sample.columns:
        df_sample['datetime'] = pd.to_datetime(df_sample['datetime'])
        axes[1].plot(df_sample['datetime'][:100], df_sample[temp_col][:100], alpha=0.7)
        axes[1].set_xlabel('Time')
        axes[1].set_ylabel('Temperature (°C)')
        axes[1].set_title('Temperature Time Series (First 100 observations)')
        axes[1].grid(True, alpha=0.3)
        plt.setp(axes[1].xaxis.get_majorticklabels(), rotation=45)
    
    plt.tight_layout()
    plt.show()
else:
    print("Temperature column not found in the data")

## 6. Git/GitHub Setup

### 6.1 Environment-specific Git Commands

**For Local Environment:**
Run these commands in your terminal (not in Jupyter):

```bash
# Configure Git (first time only)
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Initialize repository
git init

# Add files
git add .

# Commit changes
git commit -m "Initial commit: Week 1 notebook"

# Connect to GitHub repository
git remote add origin https://github.com/KimJiHan/seoul_climate_analysis.git

# Push to GitHub
git push -u origin main
```

**For Google Colab:**
```python
# Clone the repository
!git clone https://github.com/KimJiHan/seoul_climate_analysis.git
%cd seoul_climate_analysis/seoul_heatwave_course

# Configure Git in Colab
!git config --global user.name "Your Name"
!git config --global user.email "your.email@example.com"

# Create and switch to your branch
!git checkout -b week01-yourname

# After completing work
!git add .
!git commit -m "Complete Week 1 assignment"
!git push origin week01-yourname
```

**For Kaggle:**
Kaggle notebooks can be forked and shared directly through the Kaggle platform. You can also download the notebook and commit to GitHub separately.

## 7. Assignment

### Week 1 Tasks:

1. **Environment Setup** (20 points)
   - Install all required packages
   - Verify Python version
   - Configure Jupyter environment

2. **Data Exploration** (30 points)
   - Load at least 3 S-DoT CSV files
   - Identify all column names and data types
   - Calculate basic statistics for temperature and humidity

3. **Visualization** (30 points)
   - Create a histogram for each numerical variable
   - Plot a time series for one day of temperature data
   - Create a correlation matrix heatmap

4. **GitHub Setup** (20 points)
   - Fork the course repository
   - Create a branch with your name
   - Submit your completed notebook via Pull Request

### Submission Instructions:
1. Complete this notebook with your code
2. Save as `Week01_YourName.ipynb`
3. Push to your GitHub fork
4. Create a Pull Request with title: `Week 1 Assignment - Your Name`

## 8. Summary

In this week, we covered:
- ✅ Project background and importance
- ✅ S-DoT sensor network overview
- ✅ Python environment setup
- ✅ Basic data loading and exploration
- ✅ Simple visualizations
- ✅ Git/GitHub workflow

### Next Week Preview:
**Week 2: Data Collection & Preprocessing**
- Loading multiple S-DoT files
- Data cleaning techniques
- Handling missing values
- Data integration strategies

### Resources:
- [Pandas Documentation](https://pandas.pydata.org/docs/)
- [Matplotlib Tutorial](https://matplotlib.org/stable/tutorials/index.html)
- [Git Handbook](https://guides.github.com/introduction/git-handbook/)

---
**End of Week 1**

*Instructor: Sohn Chul*