Block 1 ‚Äî Preparation
Here's the preparation section: imports, seeds, folder creation, logger setup, log file bootstrapping, and secure deletion of "/content/sample_data" if present.

In [1]:
# ============================================================
# ‚öôÔ∏è Installation des d√©pendances du projet
# Cette cellule garantit que toutes les librairies n√©cessaires sont install√©es.
# ============================================================

import subprocess
import sys

def install_requirements(file_path="requirements.txt"):
    """Installe les paquets list√©s dans requirements.txt."""
    print(f"Installation/Mise √† jour des d√©pendances via {file_path}...")
    try:
        # Ex√©cute la commande pip
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", file_path])
        print("\n‚úÖ Toutes les d√©pendances ont √©t√© install√©es ou mises √† jour avec succ√®s.")
        print("Veuillez RED√âMARRER le noyau (kernel) du notebook si c'est la premi√®re ex√©cution.")
    except subprocess.CalledProcessError as e:
        print(f"\n‚ùå ERREUR lors de l'installation des d√©pendances : {e}")

# Ex√©cuter l'installation
install_requirements()


Installation/Mise √† jour des d√©pendances via requirements.txt...

‚úÖ Toutes les d√©pendances ont √©t√© install√©es ou mises √† jour avec succ√®s.
Veuillez RED√âMARRER le noyau (kernel) du notebook si c'est la premi√®re ex√©cution.


In [2]:
# Bloc 1 ‚Äî Pr√©paration
# - Imports des librairies
# - Seed pour reproductibilit√©
# - Cr√©ation des dossiers: data/, results/, logs/
# - Setup du logger (fichier + console)
# - Bootstrap des fichiers de log: logs/logs.csv et logs/summary.md
# - Suppression de /content/sample_data si pr√©sent (environnements type Colab)
# - Messages de confirmation imprim√©s en sortie

import os
import sys
import logging
import random
import time
from datetime import datetime

import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# 1) Reproductibilit√©
random.seed(42)
np.random.seed(42)

# 2) Cr√©ation des dossiers (idempotent)
BASE_DIRS = ['data', 'results', 'logs']
for d in BASE_DIRS:
    os.makedirs(d, exist_ok=True)

# 3) Setup logger
# Format standardis√©: timestamp | level | message
log_formatter = logging.Formatter('%(asctime)s | %(levelname)s | %(message)s')

logger = logging.getLogger('T_log_V0_1')
logger.setLevel(logging.INFO)
logger.handlers = []  # √©vite doublons si r√©-ex√©cut√©

# Handler fichier (logs/logs.txt pour lecture humaine rapide)
file_handler = logging.FileHandler('logs/logs.txt', mode='a', encoding='utf-8')
file_handler.setFormatter(log_formatter)
logger.addHandler(file_handler)

# Handler console
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(log_formatter)
logger.addHandler(console_handler)

# 4) Bootstrap des fichiers de log structur√©s
# logs/logs.csv: colonnes = timestamp, level, message
logs_csv_path = 'logs/logs.csv'
if not os.path.exists(logs_csv_path):
    df_init = pd.DataFrame(columns=['timestamp', 'level', 'message'])
    df_init.to_csv(logs_csv_path, index=False)

# logs/summary.md: ent√™te + contexte
summary_md_path = 'logs/summary.md'
if not os.path.exists(summary_md_path):
    with open(summary_md_path, 'w', encoding='utf-8') as f:
        f.write('# Journal de test ‚Äî Mod√®le T_log V0.1\n\n')
        f.write(f'- Cr√©√© le: {datetime.now().isoformat()}\n')
        f.write('- Contexte: Pr√©paration de l‚Äôenvironnement de test (imports, logger, dossiers)\n\n')
        f.write('## √âv√©nements cl√©s\n')

# 5) Fonction utilitaire pour loguer dans logs.csv
def log_to_csv(level: str, message: str):
    ts = datetime.now().isoformat()
    row = pd.DataFrame([[ts, level, message]], columns=['timestamp', 'level', 'message'])
    try:
        row.to_csv(logs_csv_path, mode='a', header=False, index=False)
    except Exception as e:
        logger.error(f'Erreur lors de l‚Äô√©criture dans logs.csv: {e}')

# 6) Suppression de /content/sample_data si pr√©sent (environnements type Colab)
sample_data_path = '/content/sample_data'
try:
    if os.path.exists(sample_data_path):
        import shutil
        shutil.rmtree(sample_data_path, ignore_errors=True)
        logger.info('R√©pertoire /content/sample_data d√©tect√© et supprim√©.')
        log_to_csv('INFO', 'R√©pertoire /content/sample_data supprim√©.')
    else:
        logger.info('Aucun r√©pertoire /content/sample_data √† supprimer.')
        log_to_csv('INFO', 'Aucun /content/sample_data trouv√©.')
except Exception as e:
    logger.error(f'Erreur lors de la suppression de /content/sample_data: {e}')
    log_to_csv('ERROR', f'Suppression /content/sample_data √©chou√©e: {e}')

# 7) Messages de confirmation
logger.info('Pr√©paration termin√©e: librairies import√©es, seeds fix√©s, dossiers cr√©√©s, logger op√©rationnel.')
log_to_csv('INFO', 'Pr√©paration termin√©e: environnement pr√™t.')

print('Dossiers:', {d: os.path.abspath(d) for d in BASE_DIRS})
print('Logger pr√™t. Fichiers de log:')
print('-', os.path.abspath('logs/logs.txt'))
print('-', os.path.abspath('logs/logs.csv'))
print('-', os.path.abspath('logs/summary.md'))


2025-11-11 03:18:37,522 | INFO | Aucun r√©pertoire /content/sample_data √† supprimer.
2025-11-11 03:18:37,526 | INFO | Pr√©paration termin√©e: librairies import√©es, seeds fix√©s, dossiers cr√©√©s, logger op√©rationnel.
Dossiers: {'data': 'c:\\Users\\zackd\\OneDrive\\Desktop\\T_log_Tsunami_V_0_1En\\data', 'results': 'c:\\Users\\zackd\\OneDrive\\Desktop\\T_log_Tsunami_V_0_1En\\results', 'logs': 'c:\\Users\\zackd\\OneDrive\\Desktop\\T_log_Tsunami_V_0_1En\\logs'}
Logger pr√™t. Fichiers de log:
- c:\Users\zackd\OneDrive\Desktop\T_log_Tsunami_V_0_1En\logs\logs.txt
- c:\Users\zackd\OneDrive\Desktop\T_log_Tsunami_V_0_1En\logs\logs.csv
- c:\Users\zackd\OneDrive\Desktop\T_log_Tsunami_V_0_1En\logs\summary.md


In [3]:
import os
import json

# --- 0. INSTALLATION DE KAGGLE ---
# Cette ligne assure que la librairie Kaggle est install√©e
!pip install kaggle --quiet

# --- D√©pendance Kaggle ---
try:
    import kaggle.api as kaggle_api
except ImportError:
    print("√âchec de l'importation de 'kaggle'. V√©rifiez votre installation.")
    raise
# ------------------------

# --- 1. CONFIGURATION ---

# Identifiant du Dataset Kaggle
KAGGLE_DATASET_ID = "ahmeduzaki/global-earthquake-tsunami-risk-assessment-dataset"
DOWNLOAD_DIR = '/content/data'

# Cr√©ation du dossier de destination
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

def find_and_auth_kaggle():
    """Tente de trouver les cl√©s d'API et authentifie l'API Kaggle."""
    print("Tentative d'authentification Kaggle...")
    
    # 1. V√©rifier les variables d'environnement (m√©thode Colab/Notebook)
    if os.getenv('KAGGLE_USERNAME') and os.getenv('KAGGLE_KEY'):
        print('INFO: Authentification via variables d\'environnement (KAGGLE_USERNAME/KEY).')
    
    # 2. Chercher le fichier kaggle.json
    else:
        locations = [
            os.path.join(os.path.expanduser('~'), '.kaggle', 'kaggle.json'), # Emplacement standard
            os.path.join(os.getcwd(), 'kaggle.json')                       # R√©pertoire actuel
        ]
        
        found = False
        for loc in locations:
            if os.path.exists(loc):
                try:
                    with open(loc, 'r') as f:
                        config = json.load(f)
                        username = config.get('username')
                        key = config.get('key')
                        if username and key:
                            os.environ['KAGGLE_USERNAME'] = username
                            os.environ['KAGGLE_KEY'] = key
                            print(f'INFO: Cl√©s lues et d√©finies via {loc}.')
                            found = True
                            break
                except (json.JSONDecodeError, Exception):
                    continue
        
        if not found:
            print("ERREUR: Fichier kaggle.json introuvable. Veuillez le placer dans ~/.kaggle/ ou le r√©pertoire courant.")
            return False

    # 3. Authentifier l'API
    try:
        kaggle_api.authenticate()
        print('SUCC√àS: Authentification Kaggle r√©ussie.')
        return True
    except Exception as e:
        print(f'ERREUR: √âchec de l\'authentification de l\'API: {e}')
        return False


# --- 2. T√âL√âCHARGEMENT DU FICHIER ZIP ---
try:
    if not find_and_auth_kaggle():
        raise RuntimeError("Processus annul√©. √âchec de la configuration Kaggle.")
    
    print(f"\nD√©but du t√©l√©chargement du fichier ZIP pour : {KAGGLE_DATASET_ID}")
    
    # T√©l√©charger le dataset SANS D√âCOMPRESSION (unzip=False)
    kaggle_api.dataset_download_files(
        KAGGLE_DATASET_ID, 
        path=DOWNLOAD_DIR, 
        unzip=False, # <-- Ceci maintient le fichier au format ZIP
        quiet=True
    )
    
    # Tenter de trouver le nom du fichier ZIP t√©l√©charg√©
    zip_files = [f for f in os.listdir(DOWNLOAD_DIR) if f.endswith('.zip')]
    
    if zip_files:
        zip_filename = zip_files[0]
        original_path = os.path.join(DOWNLOAD_DIR, zip_filename)
        target_path = os.path.join(DOWNLOAD_DIR, 'Global Earthquake-Tsunami Risk Assessment Dataset.zip')
        
        # Renommer le fichier pour correspondre au nom souhait√©
        os.rename(original_path, target_path)
        
        print("\n" + "="*50)
        print("T√âL√âCHARGEMENT DU ZIP R√âUSSI üéâ")
        print(f"Dataset : {KAGGLE_DATASET_ID}")
        print(f"Fichier ZIP sauvegard√© ici : {target_path}")
        print("="*50)
    else:
        raise FileNotFoundError(f"Le t√©l√©chargement a r√©ussi mais aucun fichier .zip n'a √©t√© trouv√© dans {DOWNLOAD_DIR}.")
    
except Exception as e:
    print("\n" + "#"*50)
    print("√âCHEC DU T√âL√âCHARGEMENT CRITIQUE.")
    print(f"Erreur: {e}")
    print(f"V√©rifiez que votre cl√© d'API Kaggle est correctement configur√©e.")
    print("#"*50)
    # Ne pas lever l'exception pour √©viter de casser le notebook si le probl√®me est Kaggle


Tentative d'authentification Kaggle...
INFO: Cl√©s lues et d√©finies via C:\Users\zackd\.kaggle\kaggle.json.
SUCC√àS: Authentification Kaggle r√©ussie.

D√©but du t√©l√©chargement du fichier ZIP pour : ahmeduzaki/global-earthquake-tsunami-risk-assessment-dataset
Dataset URL: https://www.kaggle.com/datasets/ahmeduzaki/global-earthquake-tsunami-risk-assessment-dataset

T√âL√âCHARGEMENT DU ZIP R√âUSSI üéâ
Dataset : ahmeduzaki/global-earthquake-tsunami-risk-assessment-dataset
Fichier ZIP sauvegard√© ici : /content/data\Global Earthquake-Tsunami Risk Assessment Dataset.zip


Block 2 ‚Äî Data Acquisition (Unzip + Initial Inspection)
Here is the Python cell that will:

Unzip the Global Earthquake-Tsunami Risk Assessment Dataset.zip file located in /content/data/.

List the extracted files.

Load only the found CSV files.

Check for each CSV: number of rows/columns, completely empty columns, and number of NaN values.

Save an overall summary in results/data_summary.csv.

Log the events in logs/.

In [4]:
# Bloc 2 ‚Äî Acquisition de donn√©es
# D√©zipper le fichier et analyser les CSV pour colonnes vides ou NaN

import zipfile

zip_path = '/content/data/Global Earthquake-Tsunami Risk Assessment Dataset.zip'
extract_dir = 'data/extracted'

# 1) Extraction
os.makedirs(extract_dir, exist_ok=True)
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)
    extracted_files = zip_ref.namelist()

logger.info(f"Fichiers extraits: {extracted_files}")
log_to_csv('INFO', f"Fichiers extraits: {extracted_files}")

# 2) Filtrer les CSV
csv_files = [f for f in extracted_files if f.lower().endswith('.csv')]
print("CSV trouv√©s:", csv_files)

# 3) Inspection des CSV
summary_rows = []
for csv_file in csv_files:
    file_path = os.path.join(extract_dir, csv_file)
    try:
        df = pd.read_csv(file_path)
        shape = df.shape
        empty_cols = [col for col in df.columns if df[col].isna().all()]
        nan_counts = df.isna().sum().sum()

        print(f"\n--- {csv_file} ---")
        print("Shape:", shape)
        print("Colonnes vides:", empty_cols)
        print("Nombre total de NaN:", nan_counts)
        print(df.head(3))  # aper√ßu rapide

        summary_rows.append({
            'file': csv_file,
            'rows': shape[0],
            'cols': shape[1],
            'empty_cols': len(empty_cols),
            'total_NaN': nan_counts
        })

        log_to_csv('INFO', f"Inspection {csv_file}: {shape}, NaN={nan_counts}, empty_cols={len(empty_cols)}")
    except Exception as e:
        logger.error(f"Erreur lecture {csv_file}: {e}")
        log_to_csv('ERROR', f"Erreur lecture {csv_file}: {e}")

# 4) Sauvegarde du r√©sum√© global
summary_df = pd.DataFrame(summary_rows)
summary_path = 'results/data_summary.csv'
summary_df.to_csv(summary_path, index=False)

print("\nR√©sum√© global sauvegard√© dans:", summary_path)
print(summary_df)


2025-11-11 03:18:46,576 | INFO | Fichiers extraits: ['earthquake_data_tsunami.csv']
CSV trouv√©s: ['earthquake_data_tsunami.csv']

--- earthquake_data_tsunami.csv ---
Shape: (782, 13)
Colonnes vides: []
Nombre total de NaN: 0
   magnitude  cdi  mmi  sig  nst   dmin   gap  depth  latitude  longitude  \
0        7.0    8    7  768  117  0.509  17.0   14.0   -9.7963    159.596   
1        6.9    4    4  735   99  2.229  34.0   25.0   -4.9559    100.738   
2        7.0    3    3  755  147  3.125  18.0  579.0  -20.0508   -178.346   

   Year  Month  tsunami  
0  2022     11        1  
1  2022     11        0  
2  2022     11        1  

R√©sum√© global sauvegard√© dans: results/data_summary.csv
                          file  rows  cols  empty_cols  total_NaN
0  earthquake_data_tsunami.csv   782    13           0          0


Perfect üëç ‚Äî your dataset is clean: 782 rows, 13 columns, no empty columns, no NaNs.
We can now move on to the next step of the protocol.

---

### Block 3 ‚Äî Calculating \(T_{\log}\) (preparation)

To apply your model \(T_{\log}(n,d) = (d-4)\cdot \ln(n)\), we need to define:

- **\(n\)**: the size of the system. Here, we can take \(n = 782\) (total number of seismic events in the dataset).
- **\(d\)**: the effective dimension. Since this dataset is not a graph with a Laplacian spectrum, we must choose an approximation. Two possible options:
1. **Physical dimension**: take \(d=3\) (3D geographic space: latitude, longitude, depth).
2. **Enriched dimension**: Include time as an additional axis ‚Üí \(d=4\).

üëâ To stay true to V0.1 (without the PDE extension), I suggest starting with **\(d=3\)** (spatial dimension). We can then test the sensitivity by sweeping \(d\) around 3‚Äì4.

### Block 3 ‚Äî Calculating T_{\log} with d = 3

Here is cell 3. It calculates T_{\log} for your dataset (782 events), with d=3 and bias=0. It displays the numerical result and the corresponding regime, then logs the event.