# ESG Sentiment Analysis & Greenwashing Detection

This notebook demonstrates how to use Natural Language Processing (NLP) to analyze earnings call transcripts for ESG content and potential greenwashing.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import re

# Mock Data: Simulated Earnings Call Segments
transcripts = [
    {
        "company": "OilCo",
        "text": "We are committed to Net Zero by 2050. However, current market dynamics require us to expand production in the Permian basin to ensure energy security. We are exploring carbon capture technologies, but costs remain high. Our safety record is strong."
    },
    {
        "company": "TechGreen",
        "text": "We have achieved 100% renewable energy for our data centers. Our diversity initiatives have increased female leadership by 15% YoY. We are auditing our supply chain for human rights compliance and have severed ties with non-compliant vendors."
    },
    {
        "company": "FastFashionInc",
        "text": "We love the planet. Sustainability is in our DNA. We are launching a green collection. We care about people. Our board is thinking about climate change constantly. It is great."
    }
]

df = pd.DataFrame(transcripts)

## Define Lexicons

In [None]:
esg_keywords = {
    'E': ['carbon', 'net zero', 'renewable', 'energy', 'climate', 'planet', 'green'],
    'S': ['diversity', 'safety', 'human rights', 'labor', 'community', 'people'],
    'G': ['board', 'audit', 'compliance', 'shareholder', 'governance']
}

def count_keywords(text, category):
    count = 0
    for word in esg_keywords[category]:
        count += len(re.findall(r'\b' + word + r'\b', text.lower()))
    return count

# Apply counting
for cat in ['E', 'S', 'G']:
    df[f'{cat}_Score'] = df['text'].apply(lambda x: count_keywords(x, cat))

print(df[['company', 'E_Score', 'S_Score', 'G_Score']])

## Basic Sentiment Analysis (Rule-Based)
We will check for 'specific' vs 'vague' language as a proxy for commitment.

In [None]:
specific_terms = ['achieved', 'increased', 'auditing', 'severed', 'YoY', '%']
vague_terms = ['committed to', 'exploring', 'thinking about', 'DNA', 'love', 'care']

def specificity_score(text):
    specific = sum([1 for word in specific_terms if word in text])
    vague = sum([1 for word in vague_terms if word in text])
    return specific - vague

df['Specificity'] = df['text'].apply(specificity_score)

# Visualize
plt.figure(figsize=(10, 6))
plt.bar(df['company'], df['Specificity'], color=['red', 'green', 'orange'])
plt.title('Greenwashing Detection: Specificity Score')
plt.ylabel('Score (Positive = Specific, Negative = Vague)')
plt.axhline(0, color='black', linewidth=0.8)
plt.show()

## Interpretation
*   **TechGreen** scores high (Positive) due to concrete terms like "achieved", "100%", "auditing".
*   **OilCo** is mixed/negative; they cite "committed to" and "exploring" which are future promises, not current actions.
*   **FastFashionInc** scores lowest; high usage of fluff words "love", "DNA", "great" without data.