# GreenExplorer MCDA Pipeline Demo
A minimal, end‑to‑end notebook that shows how to:
1. Load the tourism CSVs already in your repo.
2. (Optionally) call the ChatGPT API to enrich each POI with the six sustainability indicators *z₁–z₆*.
3. Build an ELECTRE‑III decision matrix with **pyDecision** and get a ranking.
4. Refine the non‑dominated kernel with a soft‑AND Logic‑Scoring‑of‑Preference (LSP) utility.
5. Display the top‑k recommendations.

> ✨ **Tip for class projects** – treat this notebook as a scaffold: run the cells, read the comments, then plug in your real indicator functions and UI.”

## 📦 Install & import the specialised libraries
Uncomment the first cell the **first time** you run the notebook or whenever you rebuild the Docker image.

In [1]:
import os, json, pandas as pd, numpy as np
from dataloader import readTourismData

# MCDA libraries
from pyDecision.algorithm import electre_iii
from pymcdm.helpers import rankdata


### 1️⃣ Load the demo POI corpus

In [2]:
df = readTourismData('../data')   # adjust path if you moved the notebook
print(f'Total POIs loaded: {len(df):,}')
df.head()

Total POIs loaded: 300


Unnamed: 0,name,category,lat,lon,sustainability,popularity,municipality
0,La Sagrada Família,Monument,41.4036,2.1744,0.35,0.98,barcelona
1,Park Güell,Park,41.4145,2.1527,0.8,0.95,barcelona
2,Casa Batlló,Monument,41.3916,2.1649,0.4,0.92,barcelona
3,La Pedrera (Casa Milà),Monument,41.3954,2.1619,0.45,0.88,barcelona
4,Barcelona Cathedral,Historic,41.3839,2.1763,0.5,0.85,barcelona


### 2️⃣ Generate/attach the six sustainability indicators *(z₁–z₆)*
Below is a **placeholder** that shows how you *could* call the ChatGPT API in batch‑mode.
Replace the prompt with your own indicator derivation logic (API, heuristics, sensors, etc.).

In [14]:
from tqdm import tqdm
import pandas as pd, json, os, time, random
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()

PROMPT_TEMPLATE = """
You are a sustainability analyst.  For the Point-of-Interest below,
return ONLY valid JSON with keys z1–z6 (floats 0–1).

Definitions (copy exactly):
• z1 = estimated CO2-kg per individual visit (lower = greener).
• z2 = current_visitors / carrying_capacity (lower = less crowded).
• z3 = entropy-based seasonality balance (higher = steadier flow).
• z4 = proportion of revenue retained locally (higher = better).
• z5 = crowd-adjusted heritage fragility (lower = safer for culture).
• z6 = overall physical & sensory accessibility (higher = inclusive).

POI:
Name: {name}
Category: {category}
Lat,Lon: {lat}, {lon}
Known sustainability proxy (0-1): {sustainability}
Popularity score (0-1): {popularity}
"""

SYSTEM_MSG = {
    "role": "system",
    "content": "You are a strict JSON generator. Output only valid JSON."
}

def gpt_enrich_row(row):
    prompt = PROMPT_TEMPLATE.format(**row)
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[SYSTEM_MSG, {"role": "user", "content": prompt}],
        temperature=0.1,
        response_format={"type": "json_object"}
    )
    return json.loads(resp.choices[0].message.content)

def enrich_dataframe(df):
    records = []
    for _, r in tqdm(df.iterrows(), total=len(df)):
        try:
            records.append(gpt_enrich_row(r))
        except Exception as e:
            print(f"⚠️  Skipped {r['name']}: {e}")
            records.append({f"z{i}": None for i in range(1, 7)})
    return pd.concat([df.reset_index(drop=True),
                      pd.DataFrame(records)], axis=1)

# usage
df = readTourismData("../data")
df = enrich_dataframe(df)


  0%|          | 0/300 [00:00<?, ?it/s]

100%|██████████| 300/300 [08:40<00:00,  1.74s/it]


In [15]:
# ── CELL ▒▒ Save enriched data ▒▒─────────────────────────────────────────────
# 1) master file with every POI + z-indicators
master_path = "../data/poi_all_enriched.csv"
df.to_csv(master_path, index=False)
print(f"✅  Master CSV written → {master_path}  ({len(df):,} rows)")

# 2) one CSV per municipality (keeps the original naming convention)
for muni, sub in df.groupby("municipality"):
    fname = f"../data/poi_{muni}_30_enriched.csv"
    sub.to_csv(fname, index=False)
    print(" • Saved", fname)


✅  Master CSV written → ../data/poi_all_enriched.csv  (300 rows)
 • Saved ../data/poi_barcelona_30_enriched.csv
 • Saved ../data/poi_besalu_30_enriched.csv
 • Saved ../data/poi_cadaques_30_enriched.csv
 • Saved ../data/poi_figueres_30_enriched.csv
 • Saved ../data/poi_girona_30_enriched.csv
 • Saved ../data/poi_lleida_30_enriched.csv
 • Saved ../data/poi_montblanc_30_enriched.csv
 • Saved ../data/poi_sitges_30_enriched.csv
 • Saved ../data/poi_tarragona_30_enriched.csv
 • Saved ../data/poi_vic_30_enriched.csv


### 3️⃣ Build the ELECTRE‑III decision matrix
We use **pyDecision** because it already implements the full ELECTRE family (I–TRI).
Weights, thresholds (q/p/v) and the λ‑cut come straight out of the research report.

In [17]:
# Decision matrix: rows = POIs, columns = criteria
criteria = [f'z{i}' for i in range(1,8)]  # z7 will be preference‑fit; stub as popularity for now
df['z7'] = df['popularity']               # placeholder until you plug real user/group utility
matrix = df[criteria].to_numpy()

weights = np.array([0.08, 0.12, 0.05, 0.10, 0.10, 0.05, 0.50])
types   = np.array([ -1,  -1,   1,   1,  -1,   1,   1])      # ‑1 = cost, +1 = benefit

P = np.full(7, 0.20)          # preference
Q = np.full(7, 0.05)          # indifference
V = np.full(7, 0.50)          # veto
W = weights                   # same w as before

# 2. Run ELECTRE-III
g_conc, credibility, rank_D, rank_A, rank_N, rank_P = electre_iii(
        matrix, P=P, Q=Q, V=V, W=W, graph=False)

# 3. Attach descending-rank (rank_D) to the DataFrame
df["electre_rank"] = pd.Series(rank_D, index=df.index)
df.sort_values("electre_rank").head(10)


ValueError: Length of values (144) does not match length of index (300)

### 4️⃣ Soft‑AND LSP utility inside the non‑dominated kernel

In [None]:
rho = 0.5
def lsp(row, rho=0.5):
    # soft‑AND power mean
    return (np.sum(weights * (row[criteria] ** rho))) ** (1/rho)

kernel = df[df['electre_rank'] <= 25].copy()
kernel['U_LSP'] = kernel.apply(lsp, axis=1)
kernel.sort_values('U_LSP', ascending=False).head(10)

### 5️⃣ Visualise the top‑k POIs on a map
Streamlit already does this in your current `main.py`, but here’s a one‑liner with **folium** if you want to stay in‑notebook.

In [None]:
# !pip install folium --quiet
import folium, webbrowser, tempfile, uuid

m = folium.Map(location=[41.3851, 2.1734], zoom_start=6)
top = kernel.head(15)
for _, r in top.iterrows():
    folium.CircleMarker(
        [r['lat'], r['lon']],
        radius=6,
        popup=f"{r['name']} — U={r['U_LSP']:.2f}",
        fill=True
    ).add_to(m)

temp_path = Path(tempfile.gettempdir()) / f"greenexplorer_{uuid.uuid4().hex}.html"
m.save(temp_path)
webbrowser.open(temp_path.as_uri())
print(f"Map saved to {temp_path}")

---
**You’re all set!**

*Next steps*
1. Wire this notebook into your Streamlit UI (call the functions in `src/`).
2. Replace the random indicator generator with real data pipelines or the ChatGPT enrichment.
3. Plug the user/group preference vector into `z7`.
4. Tune the ELECTRE thresholds & weights, then run the user study. :rocket: