##  **Insight Synthesizer - AI/ML Assessment**

 ## **Project Overview**


This project is a Python-based Insight Synthesizer designed for the Outlaw AI/ML assessment. It analyzes raw user survey feedback and extracts structured insights using NLP techniques. The goal is to automatically group responses into meaningful themes with supporting quotes and sentiment lab

## **Objective**

The objective of this project is to build an AI-powered module that transforms raw survey responses into structured, human-readable insights. Specifically, the system should:


1 Analyze short-form user feedback.

2 Identify and group common themes (e.g., privacy, usability, integrations).


3 Select 1–12 representative quotes for each theme.


4 Assign an overall sentiment label (positive, neutral, or negative) to each theme group.

In [1]:
!pip install transformers



## **Importing Libraries**

In [2]:
from transformers import pipeline
import json
from typing import List, Dict
from collections import defaultdict, Counter


## **Load Hugging Face models**

Hugging Face is a company and open-source platform specializing in natural language processing (NLP) and machine learning tools.

It provides easy-to-use libraries and models for tasks like text generation, translation, and sentiment analysis.

In [3]:
topic_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
sentiment_analyzer = pipeline("sentiment-analysis")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


##  **Predefined Themes**

In [4]:
predefined_themes = [
    "Privacy Concerns",
    "Product Expectations",
    "User Experience",
    "Integration Requests",
    "Security Concerns"
]

In [5]:
def aggregate_sentiment(sentiments):
    counter = Counter(sentiments)
    if counter['NEGATIVE'] >= counter['POSITIVE']:
        return "neutral" if counter['NEGATIVE'] == counter['POSITIVE'] else "negative"
    return "positive" if counter['POSITIVE'] > 1 else "neutral"


In [6]:
def generate_insights(feedbacks: List[str], threshold: float = 0.7) -> List[Dict]:
    theme_data = defaultdict(list)
    sentiment_scores = defaultdict(list)

    for feedback in feedbacks:
        theme_result = topic_classifier(feedback, candidate_labels=predefined_themes)
        for label, score in zip(theme_result['labels'], theme_result['scores']):
            if score >= threshold:
                theme_data[label].append(feedback)

        sentiment = sentiment_analyzer(feedback)[0]['label']
        for label, score in zip(theme_result['labels'], theme_result['scores']):
            if score >= threshold:
                sentiment_scores[label].append(sentiment)

    insights = []
    for theme, quotes in theme_data.items():
        sentiments = sentiment_scores[theme]
        aggregated_sentiment = aggregate_sentiment(sentiments).lower()

        insights.append({
            "theme": theme,
            "quotes": quotes,
            "sentiment": aggregated_sentiment
        })

    return insights


## **Sample Feedback**

In [7]:
sample_feedback = [
    "I love the idea of an AI that writes emails...",
    "Privacy is my biggest concern...",
    "If this works with Gmail and Slack...",
    "Make sure it doesn't sound robotic.",
    "Would love integrations, but only if theyre secure."
]

## **Output**

In [8]:
insights = generate_insights(sample_feedback)
print("Generated Insights:\n", json.dumps(insights, indent=2))

Generated Insights:
 [
  {
    "theme": "Privacy Concerns",
    "quotes": [
      "Privacy is my biggest concern..."
    ],
    "sentiment": "neutral"
  }
]


## **Final Recommandation**

Focus on enhancing privacy safeguards and secure integrations with platforms like Gmail and Slack.

Also, prioritize making the AI’s tone more natural to avoid robotic-sounding messages.