# **08 - GEM Taxonomy Assignment for Morocco HDX Census Data**

**IRDR0012 MSc Independent Research Project**

*   Candidate number: NWHL6
*   Institution: UCL IRDR
*   Supervisor: Dr. Roberto Gentile
*   Date: 01/09/2025
*   Version: v1.0

**Description:**

This notebook synthetically assigns GEM Building Taxonomy parameters to the HDX
census building dataset using regional probability distributions from the GEM
Global Exposure Model. The process creates a complete exposure dataset suitable
for seismic risk assessment and damage matrix analysis.


**Overview:**

- **Input**: 16,593 census buildings with location and administrative region
- **Process**: Statistical sampling based on regional GEM taxonomy distributions  
- **Output**: Complete exposure dataset with GEM taxonomy parameters
- **Focus**: Residential (RES) and Commercial (COM) buildings only


**INPUT FILES:**

*   NWHL6-SH-P01_OCHA HDX census.csv
*   NWHL6-SH-P01_GEM Global exposure data_occupancy.csv
*   NWHL6-SH-P01_GEM Global exposure data_taxo_draatafilalet.csv
*   NWHL6-SH-P01_GEM Global exposure data_taxo_marrakechsafi.csv
*   NWHL6-SH-P01_GEM Global exposure data_taxo_soussmassa.csv

**OUTPUT FILES:**

*   HDX_Census_Enhanced_GEM_Taxonomy.csv
*   HDX_Census_Summary_Statistics.csv

## 0. SETUP AND IMPORTS

Setting up the required libraries and connecting to Google Drive for data access.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive
import warnings
warnings.filterwarnings('ignore')

# Mount Google Drive
print("🔗 Mounting Google Drive...")
drive.mount('/content/drive')
print("✅ Google Drive mounted successfully!")

# Set random seed for reproducible results
np.random.seed(42)
print("🎲 Random seed set to 42 for reproducible sampling")

🔗 Mounting Google Drive...
Mounted at /content/drive
✅ Google Drive mounted successfully!
🎲 Random seed set to 42 for reproducible sampling


## 2. DATA LOADING

Loading all required datasets from Google Drive including census data,
occupancy distributions, and regional taxonomy distributions.

In [None]:
# Define file paths (adjust these paths to match your Google Drive structure)
base_path = "/content/drive/MyDrive/IRDR0012_Research Project/00 INPUT/"  # Adjust this path as needed

file_paths = {
    'census': f'{base_path}NWHL6-SH-P01_OCHA HDX census.csv',
    'occupancy': f'{base_path}NWHL6-SH-P01_GEM Global exposure data_occupancy.csv',
    'taxo_draa': f'{base_path}NWHL6-SH-P01_GEM Global exposure data_taxo_draatafilalet.csv',
    'taxo_marrakech': f'{base_path}NWHL6-SH-P01_GEM Global exposure data_taxo_marrakechsafi.csv',
    'taxo_souss': f'{base_path}NWHL6-SH-P01_GEM Global exposure data_taxo_soussmassa.csv'
}

print("📁 Loading datasets...")

# Load census data
census_df = pd.read_csv(file_paths['census'])
print(f"✅ Census data loaded: {len(census_df):,} buildings")

# Load occupancy distribution
occupancy_df = pd.read_csv(file_paths['occupancy'])
print(f"✅ Occupancy data loaded: {len(occupancy_df)} categories")

# Load taxonomy distributions for each region
taxo_draa = pd.read_csv(file_paths['taxo_draa'])
taxo_marrakech = pd.read_csv(file_paths['taxo_marrakech'])
taxo_souss = pd.read_csv(file_paths['taxo_souss'])

print(f"✅ Drâa-Tafilalet taxonomy: {len(taxo_draa)} building types")
print(f"✅ Marrakech-Safi taxonomy: {len(taxo_marrakech)} building types")
print(f"✅ Souss-Massa taxonomy: {len(taxo_souss)} building types")

📁 Loading datasets...
✅ Census data loaded: 16,593 buildings
✅ Occupancy data loaded: 9 categories
✅ Drâa-Tafilalet taxonomy: 92 building types
✅ Marrakech-Safi taxonomy: 92 building types
✅ Souss-Massa taxonomy: 92 building types


## 3. DATA EXPLORATION AND VALIDATION

Understanding the structure and distributions of our input data before
proceeding with synthetic assignment.

In [None]:
print("\n" + "="*60)
print("📊 DATA EXPLORATION")
print("="*60)

# Census data overview
print("\n🏢 Census Data Overview:")
print(f"Total buildings: {len(census_df):,}")
print(f"Regions: {census_df['Region'].unique()}")
print(f"Locations: {census_df['Location'].nunique()} unique locations")

# Buildings per region
print("\n🗺️ Buildings per Administrative Region:")
region_counts = census_df['Region'].value_counts()
for region, count in region_counts.items():
    percentage = (count / len(census_df)) * 100
    print(f"  {region}: {count:,} buildings ({percentage:.1f}%)")

# Occupancy distribution overview
print("\n🏠 Available Occupancy Categories:")
occupancy_summary = occupancy_df.groupby('Occupancy')['Relative'].mean()
for occ, rel in occupancy_summary.items():
    print(f"  {occ}: {rel:.1%} average across regions")

# Filter for RES and COM only
print("\n🎯 Filtering for Residential (RES) and Commercial (COM) only...")
res_com_occupancy = occupancy_df[occupancy_df['Occupancy'].isin(['RES', 'COM'])]
print(f"Selected occupancy types: {res_com_occupancy['Occupancy'].unique()}")


📊 DATA EXPLORATION

🏢 Census Data Overview:
Total buildings: 16,593
Regions: ['Marrakech-Safi' 'Drâa-Tafilalet' 'Souss-Massa']
Locations: 13 unique locations

🗺️ Buildings per Administrative Region:
  Marrakech-Safi: 13,354 buildings (80.5%)
  Souss-Massa: 2,292 buildings (13.8%)
  Drâa-Tafilalet: 947 buildings (5.7%)

🏠 Available Occupancy Categories:
  COM: 2.1% average across regions
  IND: 1.2% average across regions
  RES: 96.7% average across regions

🎯 Filtering for Residential (RES) and Commercial (COM) only...
Selected occupancy types: ['COM' 'RES']


## 4. OCCUPANCY ASSIGNMENT

Assigning occupancy types (RES/COM) to each building based on regional
probability distributions from the GEM exposure model.

In [None]:
print("\n" + "="*60)
print("🏠 OCCUPANCY ASSIGNMENT")
print("="*60)

def assign_occupancy(region, n_buildings, occupancy_dist):
    """
    Assign occupancy types based on regional probability distributions.

    Args:
        region (str): Administrative region name
        n_buildings (int): Number of buildings to assign
        occupancy_dist (DataFrame): Occupancy probability distributions

    Returns:
        list: Array of assigned occupancy types
    """
    # Get region-specific distribution (RES and COM only)
    region_occ = occupancy_dist[
        (occupancy_dist['Region'] == region) &
        (occupancy_dist['Occupancy'].isin(['RES', 'COM']))
    ].copy()

    if len(region_occ) == 0:
        print(f"⚠️ No occupancy data for {region}, using default distribution")
        # Default fallback (typical rural Morocco)
        occupancies = ['RES'] * int(n_buildings * 0.92) + ['COM'] * int(n_buildings * 0.08)
        return np.random.choice(occupancies, n_buildings, replace=True)

    # Normalize probabilities to sum to 1 (RES + COM only)
    region_occ['normalized'] = region_occ['Relative'] / region_occ['Relative'].sum()

    # Sample based on probabilities
    occupancies = np.random.choice(
        region_occ['Occupancy'].values,
        size=n_buildings,
        p=region_occ['normalized'].values
    )

    return occupancies

# Assign occupancy for each region
census_df['Occupancy'] = ''

for region in census_df['Region'].unique():
    print(f"\n🎯 Processing {region}...")

    # Get buildings for this region
    region_mask = census_df['Region'] == region
    n_buildings = region_mask.sum()

    # Assign occupancies
    occupancies = assign_occupancy(region, n_buildings, res_com_occupancy)
    census_df.loc[region_mask, 'Occupancy'] = occupancies

    # Show results
    occ_counts = pd.Series(occupancies).value_counts()
    print(f"  Assigned {n_buildings:,} buildings:")
    for occ, count in occ_counts.items():
        percentage = (count / n_buildings) * 100
        print(f"    {occ}: {count:,} buildings ({percentage:.1f}%)")

print(f"\n✅ Occupancy assignment completed for all {len(census_df):,} buildings")



🏠 OCCUPANCY ASSIGNMENT

🎯 Processing Marrakech-Safi...
  Assigned 13,354 buildings:
    RES: 13,070 buildings (97.9%)
    COM: 284 buildings (2.1%)

🎯 Processing Drâa-Tafilalet...
  Assigned 947 buildings:
    RES: 922 buildings (97.4%)
    COM: 25 buildings (2.6%)

🎯 Processing Souss-Massa...
  Assigned 2,292 buildings:
    RES: 2,248 buildings (98.1%)
    COM: 44 buildings (1.9%)

✅ Occupancy assignment completed for all 16,593 buildings


## 5. TAXONOMY ASSIGNMENT

Assigning detailed GEM Building Taxonomy parameters to each building based on
regional distributions. The same taxonomy distributions used for residential
buildings are applied to commercial buildings as well.

In [None]:
print("\n" + "="*60)
print("🏗️ GEM TAXONOMY ASSIGNMENT")
print("="*60)

def assign_taxonomy(region, n_buildings, taxonomy_dist):
    """
    Assign GEM taxonomy strings based on regional probability distributions.

    Args:
        region (str): Administrative region name
        n_buildings (int): Number of buildings to assign
        taxonomy_dist (DataFrame): Taxonomy probability distributions

    Returns:
        list: Array of assigned GEM taxonomy strings
    """
    # Use the taxonomy distribution for this region
    if len(taxonomy_dist) == 0:
        print(f"⚠️ No taxonomy data available, using default")
        return ['MUR+ADO/LWAL/H:1/RES/IRR'] * n_buildings

    # Normalize probabilities
    taxonomy_dist = taxonomy_dist.copy()
    taxonomy_dist['normalized'] = taxonomy_dist['Relative'] / taxonomy_dist['Relative'].sum()

    # Sample based on probabilities
    taxonomies = np.random.choice(
        taxonomy_dist['Taxonomy'].values,
        size=n_buildings,
        p=taxonomy_dist['normalized'].values
    )

    return taxonomies

# Create taxonomy mapping for regions
taxonomy_data = {
    'Drâa-Tafilalet': taxo_draa,
    'Marrakech-Safi': taxo_marrakech,
    'Souss-Massa': taxo_souss
}

# Assign taxonomy for each region
census_df['GEM_Taxonomy'] = ''

for region in census_df['Region'].unique():
    print(f"\n🏗️ Processing taxonomy for {region}...")

    # Get buildings for this region
    region_mask = census_df['Region'] == region
    n_buildings = region_mask.sum()

    # Get taxonomy distribution for this region
    if region in taxonomy_data:
        taxo_dist = taxonomy_data[region]
        print(f"  Using {len(taxo_dist)} taxonomy classes")
    else:
        print(f"⚠️ No specific taxonomy data for {region}, using Marrakech-Safi as default")
        taxo_dist = taxonomy_data['Marrakech-Safi']

    # Assign taxonomies
    taxonomies = assign_taxonomy(region, n_buildings, taxo_dist)
    census_df.loc[region_mask, 'GEM_Taxonomy'] = taxonomies

    # Show sample results
    taxo_sample = pd.Series(taxonomies).value_counts().head(5)
    print(f"  Top 5 assigned taxonomy classes:")
    for taxo, count in taxo_sample.items():
        percentage = (count / n_buildings) * 100
        print(f"    {taxo}: {count:,} ({percentage:.1f}%)")

print(f"\n✅ Taxonomy assignment completed for all {len(census_df):,} buildings")


🏗️ GEM TAXONOMY ASSIGNMENT

🏗️ Processing taxonomy for Marrakech-Safi...
  Using 92 taxonomy classes
  Top 5 assigned taxonomy classes:
    MUR+CL/LWAL+CDN/H:1/RES: 1,262 (9.5%)
    MUR+CB/LWAL+CDN/H:1/RES: 1,244 (9.3%)
    MUR+CL/LWAL+CDN/H:2/RES: 1,229 (9.2%)
    MUR+CB/LWAL+CDN/H:2/RES: 1,195 (8.9%)
    E+ETO/LWAL+CDN/H:1/RES: 1,029 (7.7%)

🏗️ Processing taxonomy for Drâa-Tafilalet...
  Using 92 taxonomy classes
  Top 5 assigned taxonomy classes:
    E+ETO/LWAL+CDN/H:1/RES: 171 (18.1%)
    MUR+CL/LWAL+CDN/H:1/RES: 101 (10.7%)
    MUR+CL/LWAL+CDN/H:2/RES: 85 (9.0%)
    MUR+CB/LWAL+CDN/H:1/RES: 84 (8.9%)
    MUR+CB/LWAL+CDN/H:2/RES: 84 (8.9%)

🏗️ Processing taxonomy for Souss-Massa...
  Using 92 taxonomy classes
  Top 5 assigned taxonomy classes:
    MUR+CL/LWAL+CDN/H:1/RES: 293 (12.8%)
    MUR+CB/LWAL+CDN/H:2/RES: 277 (12.1%)
    MUR+CL/LWAL+CDN/H:2/RES: 270 (11.8%)
    MUR+CB/LWAL+CDN/H:1/RES: 262 (11.4%)
    E+ETO/LWAL+CDN/H:1/RES: 132 (5.8%)

✅ Taxonomy assignment completed for a

## 6. TAXONOMY PARSING AND ATTRIBUTE EXTRACTION

Extracting individual building attributes from the GEM taxonomy strings
for easier analysis and compatibility with vulnerability functions.

In [None]:
print("\n" + "="*60)
print("🔍 TAXONOMY PARSING AND ATTRIBUTE EXTRACTION")
print("="*60)

def parse_gem_taxonomy(taxonomy_string):
    """
    Parse GEM taxonomy string into individual attributes.

    Example: 'MUR+ADO/LWAL/H:2/RES/IRR' ->
    {
        'Material': 'MUR+ADO',
        'LLRS': 'LWAL',
        'Height': 'H:2',
        'Occupancy_Taxo': 'RES',
        'Regularity': 'IRR'
    }
    """
    # Split by '/' separator
    parts = str(taxonomy_string).split('/')

    # Initialize with defaults
    attributes = {
        'Material_LLRS': 'MUR+ADO',  # Default: Adobe masonry
        'LLRS': 'LWAL',              # Default: Load-bearing wall
        'Height': 'H:1',             # Default: 1 story
        'Occupancy_Taxo': 'RES',     # Default: Residential
        'Structural_Irregularity': 'IRR'  # Default: Regular
    }

    # Parse available parts
    if len(parts) >= 1 and parts[0]:
        attributes['Material_LLRS'] = parts[0]
    if len(parts) >= 2 and parts[1]:
        attributes['LLRS'] = parts[1]
    if len(parts) >= 3 and parts[2]:
        attributes['Height'] = parts[2]
    if len(parts) >= 4 and parts[3]:
        attributes['Occupancy_Taxo'] = parts[3]
    if len(parts) >= 5 and parts[4]:
        attributes['Structural_Irregularity'] = parts[4]

    return attributes

print("🔍 Parsing GEM taxonomy strings...")

# Parse all taxonomy strings
parsed_attributes = []
for idx, taxonomy in enumerate(census_df['GEM_Taxonomy']):
    if idx % 5000 == 0:
        print(f"  Processed {idx:,} / {len(census_df):,} buildings...")

    attrs = parse_gem_taxonomy(taxonomy)
    parsed_attributes.append(attrs)

# Convert to DataFrame and join with census data
attributes_df = pd.DataFrame(parsed_attributes)
census_enhanced = pd.concat([census_df, attributes_df], axis=1)

print(f"✅ Parsed {len(census_enhanced):,} taxonomy strings")

# Show attribute distributions
print("\n📊 Attribute Distributions:")
for attr in ['Material_LLRS', 'LLRS', 'Height']:
    print(f"\n{attr}:")
    dist = census_enhanced[attr].value_counts().head(5)
    for val, count in dist.items():
        percentage = (count / len(census_enhanced)) * 100
        print(f"  {val}: {count:,} ({percentage:.1f}%)")


🔍 TAXONOMY PARSING AND ATTRIBUTE EXTRACTION
🔍 Parsing GEM taxonomy strings...
  Processed 0 / 16,593 buildings...
  Processed 5,000 / 16,593 buildings...
  Processed 10,000 / 16,593 buildings...
  Processed 15,000 / 16,593 buildings...
✅ Parsed 16,593 taxonomy strings

📊 Attribute Distributions:

Material_LLRS:
  MUR+CL: 3,268 (19.7%)
  MUR+CB: 3,173 (19.1%)
  CR: 2,295 (13.8%)
  MCF: 1,876 (11.3%)
  MUR+STRUB+MOM: 1,744 (10.5%)

LLRS:
  LWAL+CDN: 12,351 (74.4%)
  LWAL+CDL: 1,505 (9.1%)
  LFINF+CDL: 1,158 (7.0%)
  LDUAL+CDL: 597 (3.6%)
  LWAL+CDM: 432 (2.6%)

Height:
  H:1: 8,227 (49.6%)
  H:2: 7,339 (44.2%)
  H:3: 875 (5.3%)
  HBET:3-6: 108 (0.7%)
  HBET:4-7: 38 (0.2%)


## 7. FINAL DATASET PREPARATION

Creating the final enhanced dataset with all required attributes for
seismic risk assessment and damage matrix analysis.

In [None]:
print("\n" + "="*60)
print("📋 FINAL DATASET PREPARATION")
print("="*60)

# Create final column order
final_columns = [
    'ID', 'Location', 'Region', 'latitude', 'longitude', 'DG',
    'Occupancy', 'GEM_Taxonomy', 'Material_LLRS', 'LLRS', 'Height',
    'Occupancy_Taxo', 'Structural_Irregularity'
]

# Ensure all columns exist
for col in final_columns:
    if col not in census_enhanced.columns:
        print(f"⚠️ Missing column {col}, adding default values")
        census_enhanced[col] = 'Unknown'

# Select and reorder columns
final_dataset = census_enhanced[final_columns].copy()

# Add construction date (default for Morocco pre-seismic code)
final_dataset['Date_Construction'] = 'YPRE:2002'  # Pre-2002 seismic code

# Add roof type (typical for Morocco)
final_dataset['Roof'] = 'RF'  # Flat roof (common in Morocco)

print(f"✅ Final dataset prepared with {len(final_dataset):,} buildings")
print(f"📊 Dataset dimensions: {final_dataset.shape}")



📋 FINAL DATASET PREPARATION
✅ Final dataset prepared with 16,593 buildings
📊 Dataset dimensions: (16593, 15)


## 8. QUALITY CONTROL AND VALIDATION

Performing final validation checks to ensure data quality and consistency
before export.

In [None]:
print("\n" + "="*60)
print("✅ QUALITY CONTROL AND VALIDATION")
print("="*60)

# Check for missing values
print("🔍 Missing Value Check:")
missing_check = final_dataset.isnull().sum()
for col, missing in missing_check.items():
    if missing > 0:
        percentage = (missing / len(final_dataset)) * 100
        print(f"  {col}: {missing:,} missing ({percentage:.1f}%)")

if missing_check.sum() == 0:
    print("  ✅ No missing values found!")

# Verify damage grade assignment
print(f"\n🏚️ Damage Grade Distribution:")
dg_dist = final_dataset['DG'].value_counts().sort_index()
for dg, count in dg_dist.items():
    percentage = (count / len(final_dataset)) * 100
    print(f"  DG {dg}: {count:,} buildings ({percentage:.1f}%)")

# Regional summary
print(f"\n🗺️ Final Regional Distribution:")
regional_summary = final_dataset.groupby(['Region', 'Occupancy']).size().unstack(fill_value=0)
print(regional_summary)

# Occupancy validation
print(f"\n🏠 Final Occupancy Distribution:")
occ_final = final_dataset['Occupancy'].value_counts()
for occ, count in occ_final.items():
    percentage = (count / len(final_dataset)) * 100
    print(f"  {occ}: {count:,} buildings ({percentage:.1f}%)")


✅ QUALITY CONTROL AND VALIDATION
🔍 Missing Value Check:
  ✅ No missing values found!

🏚️ Damage Grade Distribution:
  DG 0: 16,593 buildings (100.0%)

🗺️ Final Regional Distribution:
Occupancy       COM    RES
Region                    
Drâa-Tafilalet   25    922
Marrakech-Safi  284  13070
Souss-Massa      44   2248

🏠 Final Occupancy Distribution:
  RES: 16,240 buildings (97.9%)
  COM: 353 buildings (2.1%)


## 9. DATA EXPORT

Exporting the final enhanced dataset to Google Drive for integration with
the EEFIT exposure data and subsequent analysis.

In [None]:
print("\n" + "="*60)
print("💾 DATA EXPORT")
print("="*60)

# Define export path
export_path = "/content/drive/MyDrive/IRDR0012_Research Project/01 OUTPUT/HDX_Census_Enhanced_GEM_Taxonomy.csv"

print(f"💾 Exporting enhanced dataset...")
print(f"📁 Export path: {export_path}")

# Export to CSV
final_dataset.to_csv(export_path, index=False)

print(f"✅ Export completed successfully!")
print(f"📊 Exported {len(final_dataset):,} buildings with full GEM taxonomy")

# Create summary statistics file
summary_stats = {
    'Total_Buildings': len(final_dataset),
    'Regions': final_dataset['Region'].nunique(),
    'Locations': final_dataset['Location'].nunique(),
    'RES_Buildings': len(final_dataset[final_dataset['Occupancy'] == 'RES']),
    'COM_Buildings': len(final_dataset[final_dataset['Occupancy'] == 'COM']),
    'Unique_Taxonomies': final_dataset['GEM_Taxonomy'].nunique()
}

summary_df = pd.DataFrame([summary_stats])
summary_path = "/content/drive/MyDrive/IRDR0012_Research Project/01 OUTPUT/HDX_Census_Summary_Statistics.csv"
summary_df.to_csv(summary_path, index=False)

print(f"📈 Summary statistics exported to: {summary_path}")



💾 DATA EXPORT
💾 Exporting enhanced dataset...
📁 Export path: /content/drive/MyDrive/IRDR0012_Research Project/01 OUTPUT/HDX_Census_Enhanced_GEM_Taxonomy.csv
✅ Export completed successfully!
📊 Exported 16,593 buildings with full GEM taxonomy
📈 Summary statistics exported to: /content/drive/MyDrive/IRDR0012_Research Project/00 INPUT/HDX_Census_Summary_Statistics.csv


## 10. FINAL SUMMARY AND NEXT STEPS

Summary of the synthetic taxonomy assignment process and recommendations
for next steps in the seismic risk assessment workflow.

In [None]:
print("\n" + "="*60)
print("🎉 PROCESS COMPLETED SUCCESSFULLY!")
print("="*60)

print(f"""
📊 FINAL DATASET SUMMARY:
├── Total Buildings: {len(final_dataset):,}
├── Administrative Regions: {final_dataset['Region'].nunique()}
├── Unique Locations: {final_dataset['Location'].nunique()}
├── Residential Buildings: {len(final_dataset[final_dataset['Occupancy'] == 'RES']):,}
├── Commercial Buildings: {len(final_dataset[final_dataset['Occupancy'] == 'COM']):,}
├── Unique GEM Taxonomies: {final_dataset['GEM_Taxonomy'].nunique()}
└── All buildings assigned DG = 0 (undamaged baseline)

🎯 NEXT STEPS:
1. Merge this dataset with your EEFIT damage assessment data (383 buildings)
2. Create combined exposure model with DG 0-4 classification
3. Perform logistic regression analysis using building attributes
4. Develop fragility curves for Morocco building typologies
5. Conduct seismic risk assessment for the study region

📁 OUTPUT FILES:
├── HDX_Census_Enhanced_GEM_Taxonomy.csv (Main dataset)
└── HDX_Census_Summary_Statistics.csv (Summary statistics)

✅ The synthetic GEM taxonomy assignment is now complete!
   Your baseline building inventory (DG0) is ready for integration
   with the damage assessment data for comprehensive risk analysis.
""")

# Display first few rows of final dataset
print("\n📋 Sample of Final Dataset:")
print(final_dataset.head(10).to_string(index=False))

print("\n🎯 Ready for integration with EEFIT exposure data!")


🎉 PROCESS COMPLETED SUCCESSFULLY!

📊 FINAL DATASET SUMMARY:
├── Total Buildings: 16,593
├── Administrative Regions: 3
├── Unique Locations: 13
├── Residential Buildings: 16,240
├── Commercial Buildings: 353
├── Unique GEM Taxonomies: 64
└── All buildings assigned DG = 0 (undamaged baseline)

🎯 NEXT STEPS:
1. Merge this dataset with your EEFIT damage assessment data (383 buildings)
2. Create combined exposure model with DG 0-4 classification
3. Perform logistic regression analysis using building attributes
4. Develop fragility curves for Morocco building typologies
5. Conduct seismic risk assessment for the study region

📁 OUTPUT FILES:
├── HDX_Census_Enhanced_GEM_Taxonomy.csv (Main dataset)
└── HDX_Census_Summary_Statistics.csv (Summary statistics)

✅ The synthetic GEM taxonomy assignment is now complete!
   Your baseline building inventory (DG0) is ready for integration
   with the damage assessment data for comprehensive risk analysis.


📋 Sample of Final Dataset:
  ID Location     