# Regional Stress Response Analysis - Turkey

This notebook analyzes how different regions of Türkiye react to nationwide stressful events such as economic shocks and monetary policy decisions.

## Project Overview

The central hypothesis is that Türkiye's seven geographical regions respond to national stressors at different intensities and time lags.

By analyzing Google Trends data for stress-related search terms across provinces and comparing them with macroeconomic indicators (e.g., currency volatility, inflation announcements, and central bank interest-rate decisions), we seek to determine whether some regions get "stressed out" earlier or more strongly than others when faced with the same national event.


## 1. Setup and Imports

This section imports all necessary libraries for data processing, statistical analysis, and visualization. The key libraries include:
- **pandas & numpy**: Data manipulation and numerical operations
- **matplotlib & seaborn**: Data visualization
- **statsmodels**: Statistical modeling (ANOVA)
- **scipy.stats**: Statistical functions (z-score calculation)
- **pytrends**: Google Trends API wrapper for data collection


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy.stats import zscore
from pytrends.request import TrendReq
import time


## 2. Data Collection: Google Trends by Province

This section collects Google Trends data for stress-related keywords across all 81 Turkish provinces.

### 2.1 Province and Region Definitions

The following cell defines the mapping of all 81 Turkish provinces to their ISO codes and 7 geographical regions. This mapping is essential for:
- Organizing provinces into regional groups for analysis
- Querying Google Trends API with correct geographic codes
- Aggregating province-level data to regional level

**Keywords tracked**: "anksiyete" (anxiety), "uykusuzluk" (insomnia), "stres" (stress), "panik atak" (panic attack), "mide yanması" (heartburn/indigestion)

**Time period**: 2018-01-01 to 2025-11-30 (weekly data)


In [None]:
# 81 il + ISO kodu + 7 bölge (Marmara, Ege, Akdeniz,
# İç Anadolu, Karadeniz, Doğu Anadolu, Güneydoğu Anadolu)
PROVINCES = [
    # Akdeniz
    {"code": "TR-01", "province": "Adana",        "region7": "Akdeniz"},
    {"code": "TR-07", "province": "Antalya",      "region7": "Akdeniz"},
    {"code": "TR-15", "province": "Burdur",       "region7": "Akdeniz"},
    {"code": "TR-31", "province": "Hatay",        "region7": "Akdeniz"},
    {"code": "TR-32", "province": "Isparta",      "region7": "Akdeniz"},
    {"code": "TR-33", "province": "Mersin",       "region7": "Akdeniz"},
    {"code": "TR-46", "province": "Kahramanmaraş","region7": "Akdeniz"},
    {"code": "TR-80", "province": "Osmaniye",     "region7": "Akdeniz"},

    # Ege
    {"code": "TR-03", "province": "Afyonkarahisar","region7": "Ege"},
    {"code": "TR-09", "province": "Aydın",         "region7": "Ege"},
    {"code": "TR-20", "province": "Denizli",       "region7": "Ege"},
    {"code": "TR-35", "province": "İzmir",         "region7": "Ege"},
    {"code": "TR-43", "province": "Kütahya",       "region7": "Ege"},
    {"code": "TR-45", "province": "Manisa",        "region7": "Ege"},
    {"code": "TR-48", "province": "Muğla",         "region7": "Ege"},
    {"code": "TR-64", "province": "Uşak",          "region7": "Ege"},

    # Marmara
    {"code": "TR-10", "province": "Balıkesir",     "region7": "Marmara"},
    {"code": "TR-11", "province": "Bilecik",       "region7": "Marmara"},
    {"code": "TR-16", "province": "Bursa",         "region7": "Marmara"},
    {"code": "TR-17", "province": "Çanakkale",     "region7": "Marmara"},
    {"code": "TR-22", "province": "Edirne",        "region7": "Marmara"},
    {"code": "TR-34", "province": "İstanbul",      "region7": "Marmara"},
    {"code": "TR-39", "province": "Kırklareli",    "region7": "Marmara"},
    {"code": "TR-41", "province": "Kocaeli",       "region7": "Marmara"},
    {"code": "TR-54", "province": "Sakarya",       "region7": "Marmara"},
    {"code": "TR-59", "province": "Tekirdağ",      "region7": "Marmara"},
    {"code": "TR-77", "province": "Yalova",        "region7": "Marmara"},

    # İç Anadolu
    {"code": "TR-06", "province": "Ankara",        "region7": "İç Anadolu"},
    {"code": "TR-18", "province": "Çankırı",       "region7": "İç Anadolu"},
    {"code": "TR-26", "province": "Eskişehir",     "region7": "İç Anadolu"},
    {"code": "TR-38", "province": "Kayseri",       "region7": "İç Anadolu"},
    {"code": "TR-40", "province": "Kırşehir",      "region7": "İç Anadolu"},
    {"code": "TR-42", "province": "Konya",         "region7": "İç Anadolu"},
    {"code": "TR-50", "province": "Nevşehir",      "region7": "İç Anadolu"},
    {"code": "TR-51", "province": "Niğde",         "region7": "İç Anadolu"},
    {"code": "TR-58", "province": "Sivas",         "region7": "İç Anadolu"},
    {"code": "TR-66", "province": "Yozgat",        "region7": "İç Anadolu"},
    {"code": "TR-68", "province": "Aksaray",       "region7": "İç Anadolu"},
    {"code": "TR-70", "province": "Karaman",       "region7": "İç Anadolu"},
    {"code": "TR-71", "province": "Kırıkkale",     "region7": "İç Anadolu"},

    # Karadeniz
    {"code": "TR-05", "province": "Amasya",        "region7": "Karadeniz"},
    {"code": "TR-14", "province": "Bolu",          "region7": "Karadeniz"},
    {"code": "TR-19", "province": "Çorum",         "region7": "Karadeniz"},
    {"code": "TR-28", "province": "Giresun",       "region7": "Karadeniz"},
    {"code": "TR-29", "province": "Gümüşhane",     "region7": "Karadeniz"},
    {"code": "TR-37", "province": "Kastamonu",     "region7": "Karadeniz"},
    {"code": "TR-52", "province": "Ordu",          "region7": "Karadeniz"},
    {"code": "TR-53", "province": "Rize",          "region7": "Karadeniz"},
    {"code": "TR-55", "province": "Samsun",        "region7": "Karadeniz"},
    {"code": "TR-57", "province": "Sinop",         "region7": "Karadeniz"},
    {"code": "TR-60", "province": "Tokat",         "region7": "Karadeniz"},
    {"code": "TR-61", "province": "Trabzon",       "region7": "Karadeniz"},
    {"code": "TR-67", "province": "Zonguldak",     "region7": "Karadeniz"},
    {"code": "TR-74", "province": "Bartın",        "region7": "Karadeniz"},
    {"code": "TR-78", "province": "Karabük",       "region7": "Karadeniz"},
    {"code": "TR-81", "province": "Düzce",         "region7": "Karadeniz"},
    {"code": "TR-08", "province": "Artvin",        "region7": "Karadeniz"},
    {"code": "TR-10", "province": "Balıkesir",     "region7": "Marmara"},  # Zaten yukarıda

    # Doğu Anadolu
    {"code": "TR-04", "province": "Ağrı",          "region7": "Doğu Anadolu"},
    {"code": "TR-12", "province": "Bingöl",        "region7": "Doğu Anadolu"},
    {"code": "TR-13", "province": "Bitlis",        "region7": "Doğu Anadolu"},
    {"code": "TR-23", "province": "Elazığ",        "region7": "Doğu Anadolu"},
    {"code": "TR-24", "province": "Erzincan",      "region7": "Doğu Anadolu"},
    {"code": "TR-25", "province": "Erzurum",       "region7": "Doğu Anadolu"},
    {"code": "TR-30", "province": "Hakkâri",       "region7": "Doğu Anadolu"},
    {"code": "TR-36", "province": "Kars",          "region7": "Doğu Anadolu"},
    {"code": "TR-44", "province": "Malatya",       "region7": "Doğu Anadolu"},
    {"code": "TR-49", "province": "Muş",           "region7": "Doğu Anadolu"},
    {"code": "TR-62", "province": "Tunceli",       "region7": "Doğu Anadolu"},
    {"code": "TR-65", "province": "Van",           "region7": "Doğu Anadolu"},
    {"code": "TR-75", "province": "Ardahan",       "region7": "Doğu Anadolu"},
    {"code": "TR-76", "province": "Iğdır",         "region7": "Doğu Anadolu"},
    {"code": "TR-79", "province": "Kilis",         "region7": "Güneydoğu Anadolu"},  # sınır ama GDA

    # Güneydoğu Anadolu
    {"code": "TR-02", "province": "Adıyaman",      "region7": "Güneydoğu Anadolu"},
    {"code": "TR-21", "province": "Diyarbakır",    "region7": "Güneydoğu Anadolu"},
    {"code": "TR-27", "province": "Gaziantep",     "region7": "Güneydoğu Anadolu"},
    {"code": "TR-47", "province": "Mardin",        "region7": "Güneydoğu Anadolu"},
    {"code": "TR-56", "province": "Siirt",         "region7": "Güneydoğu Anadolu"},
    {"code": "TR-63", "province": "Şanlıurfa",     "region7": "Güneydoğu Anadolu"},
    {"code": "TR-72", "province": "Batman",        "region7": "Güneydoğu Anadolu"},
    {"code": "TR-73", "province": "Şırnak",        "region7": "Güneydoğu Anadolu"},
]

# Bazı iller yukarıda iki kez görünebilir (ör: Balıkesir), bunu merge ederken uniq alacağız.

KW_LIST = ["anksiyete", "uykusuzluk", "stres", "panik atak", "mide yanması"]
TIMEFRAME = "2018-01-01 2025-11-30"


### 2.2 Google Trends Data Collection Function

This function queries Google Trends API for each province to collect weekly search interest data for stress-related keywords. 

**How it works:**
1. Initializes a Google Trends session with Turkish locale
2. Iterates through all 81 provinces
3. For each province, queries Google Trends for the 5 stress-related keywords
4. Combines all province data into a single DataFrame
5. Saves the result to `google_trends_province_timeseries.csv`

**Important notes:**
- Includes a 1-second delay between requests to avoid rate limiting
- Handles errors gracefully (continues if a province fails)
- Removes duplicate provinces automatically
- The process takes approximately 2-3 minutes for all 81 provinces


In [None]:
def collect_trends_provinces(out_file="google_trends_province_timeseries.csv"):
    """
    Her il (ISO code = TR-xx) için Google Trends zaman serisi çeker
    ve tek bir CSV'de toplar.
    """
    print("[INFO] Google Trends oturumu açılıyor...")
    pytrends = TrendReq(hl="tr-TR", tz=180)

    all_dfs = []

    # Province listesi DataFrame'e dönsün (tekrarları da temizleriz)
    provinces_df = pd.DataFrame(PROVINCES).drop_duplicates(subset=["code"])

    for _, row in provinces_df.iterrows():
        code = row["code"]
        province = row["province"]
        region7 = row["region7"]

        print(f"[INFO] Çekiliyor: {province} ({code}) - Bölge: {region7}")

        try:
            pytrends.build_payload(
                kw_list=KW_LIST,
                geo=code,          # kritik nokta: TR-xx formatı
                timeframe=TIMEFRAME
            )

            iot = pytrends.interest_over_time().reset_index()

            if iot.empty:
                print(f"  [WARN] {province} için boş veri döndü, atlanıyor.")
                continue

            if "isPartial" in iot.columns:
                iot = iot.drop(columns=["isPartial"])

            iot["province_code"] = code
            iot["province"] = province
            iot["region7"] = region7

            all_dfs.append(iot)

            # Google'a çok yüklenmemek için ufak bekleme
            time.sleep(1)

        except Exception as e:
            print(f"  [ERROR] {province} ({code}) için hata: {e}")

    if not all_dfs:
        print("[ERROR] Hiç veri toplanamadı.")
        return

    result = pd.concat(all_dfs, ignore_index=True)

    # Çıktıyı kaydet
    result.to_csv(out_file, index=False)
    print(f"[OK] İl bazlı Google Trends zaman serileri '{out_file}' dosyasına kaydedildi.")
    print(result.head())


**Note:** Run the cell below to collect Google Trends data. This may take a while as it queries Google Trends for all 81 provinces.


### 3.1 Regional Stress Index Function

This function processes the raw Google Trends data to create a standardized regional stress index. 

**Key operations:**
- Reads the province-level Google Trends time series
- Melts the data from wide to long format (one row per province-keyword-date)
- Calculates z-scores for each province-keyword combination to normalize baseline differences
- Aggregates to regional level by averaging z-scores across all provinces and keywords within each region
- Outputs a weekly time series with one stress index value per region per week

**Output**: `region_stress_index_weekly.csv` with columns: `date`, `region7`, `stress_index`


In [None]:
# Uncomment to run data collection
# collect_trends_provinces()


## 3. Build Regional Stress Index

This section processes the Google Trends data to create a regional stress index. The stress index is a standardized measure that allows comparison across regions and time periods.

**Processing steps:**
1. **Convert to long format**: Transform wide format (one column per keyword) to long format (one row per province-keyword-date combination)
2. **Calculate z-scores**: Standardize search scores within each province-keyword combination to account for baseline differences
3. **Aggregate to regions**: Average z-scores across all provinces and keywords within each region to create a weekly regional stress index

**Why z-scores?** Google Trends scores are relative (0-100), so a score of 50 in one province may not mean the same as 50 in another. Z-scores normalize these differences, making regions comparable.


### 3.2 Execute Regional Stress Index Calculation

Run this cell to process the Google Trends data and create the regional stress index. The function will:
- Load `google_trends_province_timeseries.csv`
- Calculate z-scores and aggregate to regional level
- Save results to `region_stress_index_weekly.csv`
- Display a preview of the results


In [None]:
def build_region_stress_index(
    in_file="google_trends_province_timeseries.csv",
    out_file="region_stress_index_weekly.csv"
):
    # Veriyi oku
    print(f"[INFO] İl bazlı Trends verisi okunuyor: {in_file}")
    df = pd.read_csv(in_file)

    # Tarih formatı
    df["date"] = pd.to_datetime(df["date"])

    # Stresle ilgili keyword kolonları
    # Bu isimler collect_trends_provinces.py içindeki KW_LIST ile birebir aynı olmalı
    value_cols = ["anksiyete", "uykusuzluk", "stres", "panik atak", "mide yanması"]

    # long formata çevir (her satır: date, province, region7, keyword, score)
    long_df = df.melt(
        id_vars=["date", "province_code", "province", "region7"],
        value_vars=value_cols,
        var_name="keyword",
        value_name="score"
    )

    # Eksik skorları at
    long_df = long_df.dropna(subset=["score"])

    print("[INFO] Z-score hesaplanıyor (province + keyword bazında)...")

    # Her il + keyword için z-score (standartlaştırma)
    # Gruplar tek elemanlı ise zscore NaN verebilir, onları 0'a çekeceğiz.
    def zscore_safe(x):
        z = zscore(x, nan_policy="omit")
        # Tek gözlemde zscore 'nan' olabilir, 0 ile doldur
        if hasattr(z, "__len__"):
            return z
        else:
            return 0.0

    long_df["score_z"] = long_df.groupby(
        ["province", "keyword"]
    )["score"].transform(zscore_safe)

    # Kalan NaN'leri 0 yap
    long_df["score_z"] = long_df["score_z"].fillna(0)

    print("[INFO] Bölge bazında haftalık stres indeksi hesaplanıyor...")

    # Bölge + tarih düzeyinde ortalama z-score = regional stress index
    region_weekly = (
        long_df
        .groupby(["date", "region7"], as_index=False)["score_z"]
        .mean()
        .rename(columns={"score_z": "stress_index"})
        .sort_values(["region7", "date"])
    )

    # Kaydet
    region_weekly.to_csv(out_file, index=False)

    print(f"[OK] Bölgesel stres indeksi kaydedildi: {out_file}")
    print(region_weekly.head())
    
    return region_weekly


### 4.1 Event Detection Thresholds

Define the criteria for identifying stressful events. These thresholds can be adjusted based on research needs:

- **MAG_MIN (4.5)**: Minimum earthquake magnitude to be considered significant
- **DEPTH_MIN (0.0 km) & DEPTH_MAX (40.0 km)**: Depth range for shallow earthquakes (more likely to cause widespread concern)
- **FX_RET_THRESHOLD (0.02)**: Minimum absolute log return (≈2%) for USD/TRY to be considered a currency shock

**Note**: These values can be modified to test sensitivity of results to different event definitions.


In [None]:
region_stress = build_region_stress_index()


### 4.2 Event Date Extraction Function

This function processes earthquake and FX data to identify event dates:

**For Earthquakes:**
1. Loads earthquake data from `earthquake.csv`
2. Filters for earthquakes meeting magnitude and depth criteria
3. Extracts unique event dates (removes duplicates if multiple earthquakes occur on same day)
4. Labels events as "earthquake"

**For FX Shocks:**
1. Loads USD/TRY historical data
2. Calculates daily log returns (log(price_today / price_yesterday))
3. Identifies days with absolute returns > threshold (2%)
4. Extracts unique dates and labels as "fx_shock"

**Output**: `event_dates.csv` with columns: `event_date`, `event_type`


## 4. Build Event Dates

This section identifies stressful national events that may trigger regional stress responses. We identify two types of events:

1. **Earthquakes**: Major seismic events that affect the entire country
   - Criteria: Magnitude ≥ 4.5 and depth 0-40 km (shallow earthquakes that are more likely to be felt)
   - Source: AFAD earthquake catalog

2. **FX Shocks**: Sudden currency volatility events
   - Criteria: Daily USD/TRY log returns with absolute value > 2%
   - These represent significant currency movements that may cause economic stress
   - Source: Historical USD/TRY exchange rate data

**Purpose**: These event dates will be used to analyze how regional stress levels change before and after national stressful events.


### 4.3 Execute Event Date Extraction

Run this cell to identify and extract event dates from earthquake and FX data. The function will:
- Process earthquake data and identify significant earthquakes
- Process FX data and identify currency shock days
- Combine both event types into a single dataset
- Save results to `event_dates.csv`
- Display summary statistics (number of events by type)


In [None]:
# Eşik değerleri istersen buradan değiştirebilirsin
MAG_MIN = 4.5        # Deprem için minimum magnitüd
DEPTH_MIN = 0.0      # Deprem derinliği alt sınır (km)
DEPTH_MAX = 40.0     # Deprem derinliği üst sınır (km)

FX_RET_THRESHOLD = 0.02  # Kur şoku için log-getiri eşiği (~%2)


### 5.1 Data Loading Function

Simple utility function to load the regional stress index and event dates dataframes. Ensures proper date parsing for time series analysis.


### 5.2 Exploratory Data Analysis (EDA) Function

This function provides an initial overview of the regional stress data:

**Outputs:**
1. **Data preview**: First few rows of the dataset
2. **Summary statistics**: Mean, std, min, max for stress index by region
3. **Time series plot**: Line chart showing stress index trends over time for all 7 regions

**Purpose**: Helps identify patterns, outliers, and overall trends before formal statistical analysis.


### 5.3 Event-Region Panel Construction

This function creates a panel dataset where each row represents one event-region combination.

**For each event and each region, it calculates:**
- **pre_mean**: Average stress index in the 30 days before the event
- **post_mean**: Average stress index in the 30 days after the event
- **delta_stress**: Difference (post_mean - pre_mean)

**Parameters:**
- `pre_days=30`: Number of days before event to include in pre-period
- `post_days=30`: Number of days after event to include in post-period

**Output**: Panel dataset saved to `event_region_stress_panel.csv` with one row per event-region combination. This panel is used for statistical testing of regional differences.


### 5.4 ANOVA Statistical Test

This function performs Analysis of Variance (ANOVA) to test whether regional stress responses differ significantly.

**Hypothesis:**
- **H0 (Null)**: All regions have the same average delta_stress (no regional differences)
- **H1 (Alternative)**: At least one region has a different average delta_stress

**Method**: One-way ANOVA with region as the factor variable and delta_stress as the dependent variable.

**Interpretation:**
- If p-value < 0.05: Reject H0 → Regions respond differently to events
- If p-value ≥ 0.05: Fail to reject H0 → No significant regional differences detected

**Note**: ANOVA tests for any differences but doesn't tell us which specific regions differ. Post-hoc tests would be needed for pairwise comparisons.


### 5.5 Execute Complete Analysis Pipeline

Run this cell to execute the full analysis workflow:
1. Load regional stress and event data
2. Perform exploratory data analysis (displays summary stats and time series plot)
3. Build event-region panel (calculates pre/post stress for each event-region)
4. Run ANOVA test (tests for regional differences in stress response)

**Expected outputs:**
- Console output with data summaries
- Time series plot showing regional stress trends
- ANOVA table with statistical test results
- Interpretation of whether regional differences are statistically significant


In [None]:
def build_event_dates(
    eq_file="earthquake.csv",
    fx_file="USD_TRY Historical Data.csv",
    out_file="event_dates.csv"
):
    # --------------------------------------------------
    # 1) Deprem event tarihleri
    # --------------------------------------------------
    print(f"[INFO] Deprem verisi okunuyor: {eq_file}")
    eq = pd.read_csv(eq_file)

    # AFAD CSV'inde Date kolonu: '31/10/2025 07:18:50' formatında (gün/ay/yıl)
    eq["date"] = pd.to_datetime(eq["Date"], dayfirst=True, errors="coerce")
    eq = eq.dropna(subset=["date"])

    # Filtre: Magnitüd ≥ MAG_MIN ve derinlik DEPTH_MIN–DEPTH_MAX arası
    mask_mag = eq["Magnitude"] >= MAG_MIN
    mask_depth = (eq["Depth"] >= DEPTH_MIN) & (eq["Depth"] <= DEPTH_MAX)
    eq_big = eq[mask_mag & mask_depth].copy()

    # Sadece tarih (saatten bağımsız)
    eq_big["event_date"] = eq_big["date"].dt.date

    # Tekil günleri al
    eq_dates = (
        eq_big[["event_date"]]
        .drop_duplicates()
        .sort_values("event_date")
        .reset_index(drop=True)
    )
    eq_dates["event_type"] = "earthquake"

    print(f"[INFO] Deprem event günü sayısı: {len(eq_dates)}")

    # --------------------------------------------------
    # 2) Kur şoku event tarihleri
    # --------------------------------------------------
    print(f"[INFO] Kur verisi okunuyor: {fx_file}")
    fx = pd.read_csv(fx_file)

    # USD_TRY dosyasında Date: '11/23/2025' formatı (ay/gün/yıl)
    fx["date"] = pd.to_datetime(fx["Date"], dayfirst=False, errors="coerce")
    fx = fx.dropna(subset=["date"])

    # Tarihe göre sırala (çoğu finans datası tersten gelir)
    fx = fx.sort_values("date")

    # Fiyatı numeric yap
    fx["price"] = pd.to_numeric(fx["Price"], errors="coerce")
    fx = fx.dropna(subset=["price"])

    # Günlük/haftalık log-getiri
    fx["ret"] = np.log(fx["price"]).diff()

    # Şok tanımı: |ret| > FX_RET_THRESHOLD
    fx_shock = fx[fx["ret"].abs() > FX_RET_THRESHOLD].copy()

    # Kur şoku event tarihleri
    fx_shock["event_date"] = fx_shock["date"].dt.date

    fx_dates = (
        fx_shock[["event_date"]]
        .drop_duplicates()
        .sort_values("event_date")
        .reset_index(drop=True)
    )
    fx_dates["event_type"] = "fx_shock"

    print(f"[INFO] Kur şoku event günü sayısı: {len(fx_dates)}")

    # --------------------------------------------------
    # 3) Birleştir ve kaydet
    # --------------------------------------------------
    events = pd.concat([eq_dates, fx_dates], ignore_index=True)
    events = events.sort_values("event_date").reset_index(drop=True)

    events.to_csv(out_file, index=False)

    print(f"[OK] Toplam {len(events)} event günü '{out_file}' dosyasına kaydedildi.")
    print(events.head())
    
    return events


### 6.1 Panel Data Loading Function

Utility function to load the event-region panel dataset for visualization. Ensures proper date parsing.


### 6.2 Box Plot: Delta Stress by Region

Creates a box plot showing the distribution of delta stress (post-event minus pre-event stress) for each region.

**What to look for:**
- **Box position**: Regions with boxes above zero show increased stress after events
- **Box height**: Larger boxes indicate more variability in stress responses
- **Outliers**: Extreme stress responses for specific events
- **Regional differences**: Compare median (line in box) across regions to identify which regions respond most strongly


### 6.3 Bar Chart: Average Delta Stress by Region

Creates a bar chart showing the mean delta stress (averaged across all events) for each region.

**Interpretation:**
- **Positive bars**: Regions that, on average, show increased stress after events
- **Negative bars**: Regions that, on average, show decreased stress after events
- **Bar height**: Magnitude of average stress change
- **Comparison**: Easily compare which regions have the strongest average stress response

**Note**: This averages across all event types. The next plots will separate by event type.


### 6.4 Event Timeline Plot

Creates a scatter plot showing delta stress values over time, with event dates marked.

**Features:**
- **X-axis**: Event dates
- **Y-axis**: Delta stress (change in stress index)
- **Points**: Each point represents one event-region combination
- **Labels**: Event type (earthquake or fx_shock) labeled on each point
- **Zero line**: Horizontal line at y=0 for reference

**Purpose**: Identify temporal patterns and see if stress responses vary by event type or time period.


### 6.5 Comparison: Earthquakes vs FX Shocks

Creates a box plot comparing stress responses between two event types: earthquakes and FX shocks.

**What to look for:**
- **Box position**: Do earthquakes or FX shocks cause larger stress increases?
- **Box spread**: Which event type shows more variability in stress responses?
- **Median comparison**: Compare the median delta stress between event types

**Research insight**: This helps determine whether physical disasters (earthquakes) or economic shocks (FX) trigger different stress response patterns.


### 6.6 Generate All Visualizations

Run this cell to generate all visualization plots. This will:
1. Load the event-region panel data
2. Create box plot of delta stress by region
3. Create bar chart of average delta stress by region
4. Create timeline plot showing stress changes over time
5. Create comparison plot of earthquakes vs FX shocks

All plots will be displayed sequentially. Use these visualizations to explore patterns and support the statistical findings from the ANOVA test.


In [None]:
event_dates = build_event_dates()


## 5. Analyze Regional Stress Response

This section performs the core analysis to test whether regions respond differently to national stressful events.

**Analysis workflow:**
1. **Load data**: Regional stress index and event dates
2. **Exploratory Data Analysis (EDA)**: Visualize stress trends over time by region
3. **Build event panel**: For each event-region combination, calculate:
   - Pre-event mean stress (30 days before event)
   - Post-event mean stress (30 days after event)
   - Delta stress (post - pre) = change in stress level
4. **Statistical testing**: ANOVA to test if delta stress differs significantly across regions

**Research question**: Do some regions show larger stress increases (or decreases) after national events compared to others?


In [None]:
def load_data(
    region_file="region_stress_index_weekly.csv",
    events_file="event_dates.csv"
):
    print(f"[INFO] Bölgesel stres verisi okunuyor: {region_file}")
    region = pd.read_csv(region_file)
    region["date"] = pd.to_datetime(region["date"])

    print(f"[INFO] Event tarihleri okunuyor: {events_file}")
    events = pd.read_csv(events_file)
    events["event_date"] = pd.to_datetime(events["event_date"])

    return region, events


In [None]:
def run_eda(region):
    print("\n--- region_stress_index_weekly.head() ---")
    print(region.head())

    print("\n--- region_stress_index_weekly.describe() ---")
    print(region.describe())

    print("\n[INFO] 7 bölgenin stres indeksini çiziyorum...")
    plt.figure(figsize=(12, 6))
    for reg in sorted(region["region7"].unique()):
        sub = region[region["region7"] == reg]
        plt.plot(sub["date"], sub["stress_index"], label=reg, alpha=0.8)
    plt.title("7 Bölge İçin Haftalık Stres İndeksi (Google Trends)")
    plt.xlabel("Tarih")
    plt.ylabel("Stres İndeksi (z-score ortalaması)")
    plt.legend()
    plt.tight_layout()
    plt.show()


In [None]:
def build_event_panel(region, events,
                      pre_days=30, post_days=30,
                      out_file="event_region_stress_panel.csv"):
    """
    Her event + bölge için:
    - pre_mean: event_date - pre_days .. event_date-1
    - post_mean: event_date .. event_date + post_days
    - delta_stress: post_mean - pre_mean
    """
    print(f"\n[INFO] Event panel oluşturuluyor (pre={pre_days}, post={post_days})...")

    rows = []
    regions = sorted(region["region7"].unique())

    for _, ev in events.iterrows():
        ev_date = ev["event_date"]
        ev_type = ev["event_type"]

        for reg in regions:
            sub = region[region["region7"] == reg].copy()

            pre_mask = (sub["date"] >= ev_date - pd.Timedelta(days=pre_days)) & \
                       (sub["date"] < ev_date)
            post_mask = (sub["date"] >= ev_date) & \
                        (sub["date"] <= ev_date + pd.Timedelta(days=post_days))

            pre_mean = sub.loc[pre_mask, "stress_index"].mean()
            post_mean = sub.loc[post_mask, "stress_index"].mean()
            delta = post_mean - pre_mean

            rows.append({
                "event_date": ev_date,
                "event_type": ev_type,
                "region7": reg,
                "pre_mean": pre_mean,
                "post_mean": post_mean,
                "delta_stress": delta
            })

    panel = pd.DataFrame(rows)
    panel.to_csv(out_file, index=False)

    print(f"[OK] event_region_stress_panel kaydedildi: {out_file}")
    print(panel.head())

    return panel


In [None]:
def run_anova(panel):
    """
    H0: delta_stress ortalaması tüm bölgelerde aynıdır.
    H1: En az bir bölgenin delta_stress ortalaması farklıdır.
    """
    df = panel.dropna(subset=["delta_stress"]).copy()

    print("\n--- ANOVA için gözlem sayısı ---")
    print(len(df))

    model = ols("delta_stress ~ C(region7)", data=df).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)

    print("\n--- ANOVA Sonuçları (delta_stress ~ C(region7)) ---")
    print(anova_table)

    p_val = anova_table["PR(>F)"][0]
    alpha = 0.05

    if p_val < alpha:
        print(
            f"\nYorum: p={p_val:.4f} < {alpha} → H0 reddedilir."
            "\n       En az bir bölgenin event sonrası stres değişimi diğerlerinden farklı."
        )
    else:
        print(
            f"\nYorum: p={p_val:.4f} >= {alpha} → H0 reddedilemez."
            "\n       Bölgeler arasında event sonrası stres değişiminde anlamlı fark yok."
        )


In [None]:
region, events = load_data()
run_eda(region)
panel = build_event_panel(region, events)
run_anova(panel)
print("\n[DONE] Bölgesel stres analizi tamamlandı.")


## 6. Visualization

This section creates comprehensive visualizations to explore regional stress responses. The plots help identify:
- Which regions show the largest stress changes after events
- Whether earthquakes or FX shocks trigger different stress responses
- Temporal patterns in stress responses across events

**Visualization types:**
1. Box plots: Distribution of delta stress by region
2. Bar charts: Average delta stress by region
3. Timeline plots: Stress changes over time with event markers
4. Event type comparison: Earthquakes vs FX shocks


In [None]:
def load_panel(panel_file="event_region_stress_panel.csv"):
    df = pd.read_csv(panel_file)
    df["event_date"] = pd.to_datetime(df["event_date"])
    return df


In [None]:
def plot_delta_stress_by_region(df):
    plt.figure(figsize=(12, 6))
    sns.boxplot(data=df, x="region7", y="delta_stress")
    plt.axhline(0, color="black", linestyle="--", linewidth=1)
    plt.title("Delta Stress (Post – Pre) by Region")
    plt.ylabel("Δ Stress Index")
    plt.xlabel("Region")
    plt.tight_layout()
    plt.show()


In [None]:
def plot_delta_stress_bar(df):
    region_avg = df.groupby("region7")["delta_stress"].mean().reset_index()

    plt.figure(figsize=(10, 5))
    sns.barplot(data=region_avg, x="region7", y="delta_stress")
    plt.axhline(0, color="black", linestyle="--")
    plt.title("Average Δ Stress Response by Region")
    plt.ylabel("Mean Δ Stress")
    plt.xlabel("Region")
    plt.tight_layout()
    plt.show()


In [None]:
def plot_event_timeline(df):
    plt.figure(figsize=(14, 4))
    plt.scatter(df["event_date"], df["delta_stress"], c="red", alpha=0.7)
    plt.axhline(0, color="black", linestyle="--")

    for _, row in df.iterrows():
        plt.text(row["event_date"], row["delta_stress"],
                 row["event_type"], fontsize=8, alpha=0.6)

    plt.title("Event Timeline with Δ Stress")
    plt.ylabel("Δ Stress")
    plt.xlabel("Event Date")
    plt.tight_layout()
    plt.show()


In [None]:
def plot_fx_vs_quake(df):
    plt.figure(figsize=(8, 5))
    sns.boxplot(data=df, x="event_type", y="delta_stress")
    plt.title("Δ Stress: Earthquakes vs FX Shocks")
    plt.axhline(0, color="black", linestyle="--")
    plt.tight_layout()
    plt.show()


In [None]:
df_panel = load_panel()

plot_delta_stress_by_region(df_panel)
plot_delta_stress_bar(df_panel)
plot_event_timeline(df_panel)
plot_fx_vs_quake(df_panel)
