# Task 6: FinTech Vendor Scorecard for Micro-Lending

In this task, we will analyze vendor activity and engagement on Telegram e-commerce channels using the entities extracted by our NER model and available metadata (e.g., views, timestamps). We will compute key business metrics for each vendor and design a simple "Lending Score" to help EthioMart identify promising vendors for micro-lending.

## Data Loading

We will load the processed message data (with metadata) and the NER predictions for each message. This will allow us to aggregate statistics per vendor/channel.

In [None]:
import os
import json
import pandas as pd
from glob import glob

# Load all messages from all channels
data_dir = "data/processed/text"
all_files = []
for channel in os.listdir(data_dir):
    channel_dir = os.path.join(data_dir, channel)
    if os.path.isdir(channel_dir):
        all_files.extend(glob(os.path.join(channel_dir, "*.json")))

messages = []
for file_path in all_files:
    with open(file_path, encoding="utf-8") as f:
        msg = json.load(f)
        msg['channel'] = os.path.basename(os.path.dirname(file_path))
        messages.append(msg)

df = pd.DataFrame(messages)
df.head()

## Run NER Model on Messages

We will use our fine-tuned NER model to extract entities (Product, Price, Location) from each message. The results will be used to compute business metrics.

In [None]:
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

model_path = "./finetuned-ner-model"  # Update if needed
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

def extract_entities(text):
    try:
        ents = ner_pipeline(text)
        products = [ent['word'] for ent in ents if ent['entity_group'] == 'Product']
        prices = [ent['word'] for ent in ents if ent['entity_group'] == 'PRICE']
        locations = [ent['word'] for ent in ents if ent['entity_group'] == 'LOC']
        return products, prices, locations
    except Exception as e:
        return [], [], []

df['products'], df['prices'], df['locations'] = zip(*df['text'].map(extract_entities))
df.head()

## Calculate Vendor Metrics

We will calculate the following metrics for each vendor/channel:
- Posting Frequency (posts per week)
- Average Views per Post
- Top Performing Post (highest views, product, price)
- Average Price Point
- Lending Score (custom weighted score)

In [None]:
import numpy as np

# Convert timestamp to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Posting frequency (posts per week)
vendor_stats = df.groupby('channel').agg(
    posts_per_week = ('timestamp', lambda x: x.count() / ((x.max() - x.min()).days / 7 + 1)),
    avg_views = ('views', 'mean'),
    top_post_views = ('views', 'max'),
    avg_price = ('prices', lambda x: np.mean([float(p) for sublist in x for p in sublist if str(p).replace('.','',1).isdigit()]) if any(x) else np.nan)
).reset_index()

# Top performing post info
top_posts = df.loc[df.groupby('channel')['views'].idxmax()][['channel', 'text', 'products', 'prices', 'views']].rename(
    columns={'text': 'top_post_text', 'products': 'top_post_products', 'prices': 'top_post_prices', 'views': 'top_post_views'}
)
vendor_stats = vendor_stats.merge(top_posts, on=['channel', 'top_post_views'])

# Lending Score (example: 0.5*avg_views + 0.5*posts_per_week)
vendor_stats['lending_score'] = 0.5 * vendor_stats['avg_views'] + 0.5 * vendor_stats['posts_per_week']

vendor_stats = vendor_stats.sort_values('lending_score', ascending=False)
vendor_stats.head()

## Vendor Scorecard

Below is the summary table comparing vendors on key metrics and the final Lending Score.

In [None]:
display_cols = [
    'channel', 'avg_views', 'posts_per_week', 'avg_price', 'lending_score',
    'top_post_text', 'top_post_products', 'top_post_prices', 'top_post_views'
]
vendor_stats[display_cols]

## Discussion

The Vendor Scorecard provides actionable insights for EthioMart to identify the most active and promising vendors for micro-lending. The Lending Score combines engagement and activity metrics, helping prioritize vendors for financial support.