# Real-Time ESG Scoring Using Satellite Imagery & Alternative Data

End-to-end prototype of a real-time ESG scoring pipeline using:
- Synthetic satellite imagery + alternative data
- Vision Transformer (ViT) feature extractor
- Graph Neural Network (GraphSAGE / GAT) over a company graph
- Streaming simulation with stateful ESG score updates.


## 1. Environment & Imports
Requires Python 3.10+ and packages from `requirements.txt`.

In [9]:
import sys
print(sys.version)

import numpy as np
import pandas as pd
import torch

from esg_pipeline.data_ingestion import build_merged_entity_frame
from esg_pipeline.vit_module import ViTFeatureExtractor, encode_company_images, get_dummy_attention_map
from esg_pipeline.gnn_module import build_company_graph, GraphSAGEESG, GATESG
from esg_pipeline.streaming import InMemoryTopic, ESGConsumer, start_background_producer
from esg_pipeline.scoring import compute_esg_subscores, update_scores_with_stream_event
from esg_pipeline.visualization import plot_satellite_images, plot_attention_heatmap, plot_company_graph, plot_esg_dashboard

import matplotlib.pyplot as plt
%matplotlib inline

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device


3.13.3 (main, Apr  8 2025, 13:54:08) [Clang 16.0.0 (clang-1600.0.26.6)]


ModuleNotFoundError: No module named 'torch'

pip install -r requirements.txt

## 2. Synthetic Data Ingestion
Create synthetic companies and join alternative data + satellite tiles.

In [None]:
n_companies = 6
companies, image_map = build_merged_entity_frame(n_companies=n_companies)
companies.head()


In [None]:
plot_satellite_images(image_map, max_images=4)


## 3. ViT Satellite Feature Extraction
Encode each company image using a pre-trained ViT.

In [None]:
vit_extractor = ViTFeatureExtractor(model_name='google/vit-base-patch16-224', device=device)
company_vit_embs = encode_company_images(image_map, vit_extractor)
len(company_vit_embs), next(iter(company_vit_embs.values())).shape


In [None]:
first_cid = list(image_map.keys())[0]
attn_map = get_dummy_attention_map(image_map[first_cid])
plot_attention_heatmap(image_map[first_cid], attn_map)


## 4. Graph Construction & GNN Embeddings
Build a company graph and run a GNN (GraphSAGE or GAT).

In [None]:
graph_data, id_to_idx = build_company_graph(companies, company_vit_embs)
graph_data


In [None]:
in_dim = graph_data.x.shape[1]
hidden_dim = 128
out_dim = 64
use_gat = False

if use_gat:
    gnn_model = GATESG(in_dim, hidden_dim, out_dim).to(device)
else:
    gnn_model = GraphSAGEESG(in_dim, hidden_dim, out_dim).to(device)

gnn_model.eval()
with torch.inference_mode():
    graph_data = graph_data.to(device)
    node_embs = gnn_model(graph_data).cpu()
node_embs.shape


In [None]:
idx_to_id = {idx: cid for cid, idx in id_to_idx.items()}
gnn_embeddings = {idx_to_id[i]: node_embs[i] for i in range(node_embs.shape[0])}
list(gnn_embeddings.items())[:2]


In [None]:
plot_company_graph(companies, id_to_idx)


## 5. ESG Scoring (Static)
Combine GNN embeddings + features into E/S/G subscores and overall ESG.

In [None]:
scores = compute_esg_subscores(companies, gnn_embeddings)
scores


In [None]:
plot_esg_dashboard(scores)


## 6. Streaming Simulation & Real-Time Updates
Simulate ESG events and update the score table incrementally.

In [None]:
topic = InMemoryTopic()
consumer = ESGConsumer(topic)
company_ids = companies['company_id'].tolist()

producer_thread = start_background_producer(topic, company_ids, event_rate_hz=3.0, stop_after=5.0)
producer_thread.join()

updated_scores = scores.copy()
event_log = []
for ev in consumer.iterate(max_events=50, timeout=0.5):
    d = {'company_id': ev.company_id, 'event_type': ev.event_type, 'payload': ev.payload}
    event_log.append(d)
    updated_scores = update_scores_with_stream_event(updated_scores, d)
len(event_log), updated_scores.head()


In [None]:
comparison = scores[['company_id', 'ESG_score']].merge(
    updated_scores[['company_id', 'ESG_score']], on='company_id', suffixes=('_orig', '_updated')
)
comparison


In [None]:
plot_esg_dashboard(updated_scores)


## 7. Final ESG Scores
Sorted ESG scores (post-streaming) for sample companies.

In [None]:
final_scores = updated_scores.sort_values('ESG_score', ascending=False).reset_index(drop=True)
final_scores
