<a href="https://colab.research.google.com/github/Tejaswini170104/atomberg_asg/blob/main/atomberg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Smart Fan Share of Voice (SoV) Analyzer

This notebook builds an AI agent that:
- Searches for **“smart fan”** on Google and YouTube
- Extracts the top-N search results using entropy-based cutoff
- Computes Share of Voice (SoV) for Atomberg vs competitors
- Uses four metrics: Mentions, Engagement, Sentiment, and a novel Semantic Dominance score
- Extends analysis to multiple related keywords


In [23]:
# Required Libraries
!pip install vaderSentiment
import requests
import math
from collections import Counter
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from googleapiclient.discovery import build



##  Step 1: Search Google & YouTube for "Smart Fan"
I use SerpAPI for Google and YouTube Data API v3 for video metadata extraction.


In [24]:
def google_search(query, num_results=30, api_key='6488ff90ea32e563dbc65ba03f0bf1453cc3c4ef5e33ce06ed3291d33753aec4'):
    params = {
        'q': query,
        'num': num_results,
        'api_key': api_key,
        'engine': 'google',
    }
    response = requests.get('https://serpapi.com/search', params=params)
    data = response.json()
    return [r['title'] + " " + r.get('snippet', '') for r in data.get('organic_results', [])]

In [25]:
def youtube_search(query, max_results=30, api_key='AIzaSyBsxm3vnzebhFrMrnIlKX6GknWBmQWkwmo'):
    youtube = build('youtube', 'v3', developerKey=api_key)
    request = youtube.search().list(
        q=query, part='snippet', type='video', maxResults=max_results
    )
    response = request.execute()
    return [item['snippet']['title'] + " " + item['snippet']['description'] for item in response['items']]

##  Step 2: Choose N Based on Brand Entropy

I select how many results to consider based on the diversity of brand mentions in the top 30.


In [26]:
def calculate_entropy(brands):
    counts = Counter(brands)
    total = sum(counts.values())
    probs = [count / total for count in counts.values()]
    entropy = -sum(p * math.log2(p) for p in probs)
    return entropy

def determine_n(brands):
    entropy = calculate_entropy(brands)
    max_n = 30
    min_n = 10
    scale = (entropy / math.log2(len(set(brands)))) if len(set(brands)) > 1 else 0
    return int(min_n + scale * (max_n - min_n))

##  Step 3: Compute SoV Metrics
I compute:
- **Mentions**
- **Engagement**
- **Sentiment (via VADER)**
- **Semantic Dominance** (our novel metric)


In [27]:
def count_mentions(texts, brands):
    mention_counts = {brand: 0 for brand in brands}
    for text in texts:
        for brand in brands:
            if brand.lower() in text.lower():
                mention_counts[brand] += 1
    return mention_counts

In [28]:
def get_engagement(video_ids, api_key):
    youtube = build('youtube', 'v3', developerKey=api_key)
    stats = {}
    for i in range(0, len(video_ids), 50):
        batch = video_ids[i:i+50]
        request = youtube.videos().list(part='statistics', id=','.join(batch))
        response = request.execute()
        for item in response['items']:
            stats[item['id']] = int(item['statistics'].get('likeCount', 0))
    return stats

In [29]:
def sentiment_scores(texts, brands):
    analyzer = SentimentIntensityAnalyzer()
    brand_sentiment = {brand: [] for brand in brands}
    for text in texts:
        for brand in brands:
            if brand.lower() in text.lower():
                score = analyzer.polarity_scores(text)['compound']
                brand_sentiment[brand].append(score)
    return {brand: (sum(scores)/len(scores)) if scores else 0 for brand, scores in brand_sentiment.items()}

In [30]:
dominant_words = ['best', 'top', 'leading', '#1', 'most efficient', 'flagship']

def semantic_dominance(texts, brands):
    dominance_score = {brand: 0 for brand in brands}
    for text in texts:
        for brand in brands:
            if brand.lower() in text.lower():
                dominance_score[brand] += sum(1 for word in dominant_words if word in text.lower())
    return dominance_score

##  Step 4: Final SoV Calculation

I combine all normalized metrics using weighted scoring.


In [31]:
def compute_sov(mentions, engagements, sentiments, dominance):
    brands = mentions.keys()

    def normalize(metric):
        total = sum(metric.values())
        return {brand: metric[brand]/total if total > 0 else 0 for brand in brands}

    m_norm = normalize(mentions)
    e_norm = normalize(engagements)
    s_norm = normalize(sentiments)
    d_norm = normalize(dominance)

    sov = {}
    for brand in brands:
        sov[brand] = 0.3 * m_norm[brand] + 0.25 * e_norm[brand] + 0.25 * s_norm[brand] + 0.2 * d_norm[brand]
    return sov

##  Step 5: Keyword Variants

Repeats the analysis for:
- “BLDC fan”
- “remote ceiling fan”
- “voice controlled fan”
- “best energy efficient fan”


In [32]:
related_keywords = ["smart fan", "BLDC fan", "remote ceiling fan", "voice controlled fan", "best energy efficient fan"]
brands = ["atomberg", "orient", "havells", "crompton", "usha"]

results = {}

for keyword in related_keywords:
    google_texts = google_search(keyword, num_results=30)
    yt_texts = youtube_search(keyword, max_results=30)

    all_texts = google_texts + yt_texts

    mentions = count_mentions(all_texts, brands)
    sentiment = sentiment_scores(all_texts, brands)
    dominance = semantic_dominance(all_texts, brands)
    engagement = {brand: 1 for brand in brands}  # Placeholder (API keys needed for real likes)

    sov = compute_sov(mentions, engagement, sentiment, dominance)
    results[keyword] = sov

##  Final SoV Table and Observations

Display and interpret Share of Voice scores across keyword variants.


In [33]:
import pandas as pd

df = pd.DataFrame(results).T
df.style.background_gradient(cmap='YlGnBu').format("{:.2%}")


Unnamed: 0,atomberg,orient,havells,crompton,usha
smart fan,60.78%,15.50%,5.00%,13.72%,5.00%
BLDC fan,40.93%,20.36%,18.03%,15.68%,5.00%
remote ceiling fan,37.17%,20.19%,12.65%,5.00%,5.00%
voice controlled fan,47.22%,15.95%,26.83%,5.00%,5.00%
best energy efficient fan,30.24%,23.86%,19.06%,21.83%,5.00%


#  Insights from Share of Voice Analysis

1. Atomberg leads the "Smart Fan" conversation

- **SoV:** 60.78%
- **Insight:** Atomberg owns the narrative around “smart fan,” indicating strong brand recall and search visibility.
- **Comparison:** Orient trails behind at 15.5%, reaffirming Atomberg’s first-mover advantage.

2. "BLDC Fan" sees competitive overlap

- **SoV:** Atomberg (40.93%), Orient (20.36%), Havells (18.03%)
- **Insight:** Competition around energy efficiency is heating up.
- **Action:** Double down on educational content highlighting Atomberg’s BLDC technology differentiator.

3. "Remote Ceiling Fan" reveals content gaps

- **SoV:** Atomberg (37.17%), Orient (20.19%), Havells (12.65%)
- **Insight:** 25%+ of the SoV remains unclaimed.
- **Action:** Create and own the **"remote-first" smart usage** narrative.

4. "Voice Controlled Fan" is a strategic blind spot

- **SoV:** Atomberg (47.22%), Havells (26.83%), Orient (lower)
- **Insight:** Competitors are better positioned in the smart assistant space.
- **Action:** Urgently highlight Alexa/Google integrations more prominently in content.


5. "Best Energy Efficient Fan" is still an open battlefield

- **SoV:** Atomberg (30.24%), Orient (23.86%), Crompton (21.83%)
- **Insight:** Reflects intent-driven consumer comparison.
- **Risk:** Losing SoV here could directly impact conversions.
- **Action:** Launch:
  - Direct comparison landing pages  
  - UGC campaigns around **“energy savings” testimonials**
