# Project Title : Socioguard
* **Project Description : GenAI for Social Media  Real-Time Post Moderation**
* **Developer : Shankar**
  
**GenAI Capabilities covered in this project :**
1. Structured output/JSON mode/controlled generation
2. Few-shot prompting
3. Function Calling
4. Agents & LangGraph
5. Embeddings
6. Retrieval augmented generation (RAG)
7. Vector search/vector store/vector database

# Problem & Use Case
Every second, social media platforms are flooded with new posts. While most are harmless, some can be toxic, harmful, sensitive, or even life-threatening. Moderating such content manually is slow and expensive, and traditional keyword-based filters often fail to understand the nuance behind a post.
As a developer passionate about creating safer digital environments, I built a real-time GenAI-powered text moderation system to classify and take appropriate action on harmful posts & Emergency posts - entirely autonomously, and in seconds.


Despite platform efforts, harmful content still makes it through - from bullying and harassment to urgent cries for help.
  Here's why:
1. **Manual moderation** can't keep up with the volume of posts being published every second.
2. **Rule-based systems** are too rigid, missing posts hidden behind slang, emojis, or indirect language.
3. **Delays in action** can lead to real-world consequences - from mental health impacts to brand damage.

Social platforms, in particular, require immediate support to detect and automatically respond to:
* Life-threatening or harmful messages
* Toxic language and hate speech
* Sensitive or adult-oriented content
* Spam, scams, and promotional abuse

The stakes are high - user safety, trust, and platform reputation all hang in the balance.


# Solution 
**How GenAI Solves This in Real Time :**
To solve this, I built a fully automated GenAI-based moderation pipeline that understands context, tone, risk, and organizational guidelines before taking action.

**Here's how it works Step-by-Step Breakdown:**
* User submits a post.
* The system retrieves relevant organization guidelines using RAG from ChromaDB.
* The LLM (Gemini Flash) classifies the post into one of ten categories : LIFE_EMERGENCY, THREAT, HARASSMENT, OFFENSIVE_LANGUAGE, ADULT, SENSITIVE, PRIORITY_SUPPORT, PROMOTIONAL_CONTENT, SPAM, or NORMAL.
* The model then analyzes - Post Tone, Sentiment, Applicable Rule ID ,Description , Hashtags and a Risk Score (0.0–1.0)
* Based on the risk level and category, a moderation agent decides on the appropriate action:

1. Sanitize harmful, sensitive, offensive statements
2. Normalize or rephrase hateful & threatening language

* Once a post is analyzed and categorized, it swiftly applies the exact action needed based on the severity of the risk, with zero human delay. Here's how the Action Applier takes command using tools available to it:

  🔴 Critical: Block the user, Remove post & Send the post for urgent review.

  🟠 High: Block the post, temporarily block user (≤10 mins), queue for review

  🟡 Medium: Temporarily block user (≤10 mins) show cleaned version(Sanitize or normalize the post) with/without any review

  ⚪ Others: Escalate to emergency, review, or security teams with appropriate priority without any block.

* The system then takes the output from the moderation model and generates a user-friendly explanation, detailing what went wrong and why specific actions were taken.

* The app respects user status such as active, temporarily blocked, or fully blocked, and is designed to operate in real-time (Eg : blocked user can not post anything other then Emergency & High priority support). All post data is stored in SQLite tables, and based on specific flags, the app updates the feed accordingly. Each time a user submits a post, the feed refreshes automatically to display all available posts in real-time.

# 📦 Installing required packages
These are the core libraries powering our GenAI-based moderation system.
- google-genai: For Gemini model access and tool-calling
- chromadb: To store and retrieve moderation guidelines using vector search
- langgraph: For building multi-step agent workflows
- tabulate: For neat table outputs

In [1]:
# Installing required packages
!pip install -U -q "google-genai==1.7.0"
!pip install -U -q "chromadb"
!pip install -U -q "langgraph"
!pip install -U -q "tabulate"

# 📥 Importing libraries
These modules are essential for building and running our moderation system:
- Google GenAI: For accessing Gemini models and tool-calling
- ChromaDB: To handle vector embeddings and store moderation guidelines
- LangGraph: To create structured agent workflows
- SQLite3 & datetime: For managing post data and timestamps
- Tabulate & IPython Display: For pretty outputs and inline visuals
- Typing & Enum: For data structuring and control flow

In [2]:
#  Importing libraries
from google.genai import types
import typing_extensions as typing
import json
import enum
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry
from google.genai import types
import chromadb
from langgraph.graph import StateGraph, END
from typing import TypedDict, Optional
from html import unescape
import sqlite3
import datetime
from google import genai
from IPython.display import HTML, Markdown, display
from google.api_core import retry
from datetime import datetime
from tabulate import tabulate

# 🔐 Setting up API Key & Retrying Logic
 - Defines a retry mechanism to handle temporary API errors (e.g., rate limits, service unavailability)
 - Retrieves the Google API key securely using Kaggle secrets
 - Initializes the GenAI client with the authenticated key


In [3]:
#Setting up API Key & Retrying Logic
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})
genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

from kaggle_secrets import UserSecretsClient
GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
# from google.colab import userdata
# GOOGLE_API_KEY = userdata.get('GEMINI_KEY')


client = genai.Client(api_key=GOOGLE_API_KEY)

# 🧱 Defining Core Types and Enums
* These TypedDicts and Enums define the structure and allowed values for:
   
1. Moderation flow state (`ModerationState`)
2. Post sentiment (`PostAvailableSentiment`)
3. Post tone (`PostAvailableTone`)
4. Category classification (`PostAvilableCategory`)
5. Risk scoring (`RiskScaleEnum`)
6. Final moderation decisions and user feedback formats

* These are essential for building a structured and consistent moderation pipeline.

In [4]:
#Defining Core Types and Enums
class ModerationState(TypedDict):
    user_post: str
    validated_post: Optional[str]
    categorized_post: Optional[str]
    actioned_post: Optional[str]
    final_output: Optional[str]

class PostAvailableSentiment(enum.Enum):
    POSITIVE = "POSITIVE"
    NEUTRAL = "NEUTRAL"
    NEGATIVE = "NEGATIVE"
    MIXED = "MIXED"

class PostAvailableTone(enum.Enum):
    EXCITED = "EXCITED"
    HAPPY = "HAPPY"
    CALM = "CALM"
    ANGRY = "ANGRY"
    SAD = "SAD"
    SARCASTIC = "SARCASTIC"
    FEARFUL = "FEARFUL"
    HOPEFUL = "HOPEFUL"
    INFORMATIVE = "INFORMATIVE"
    NEUTRAL = "NEUTRAL"
    ANXIOUS = "ANXIOUS"
    CONFUSED = "CONFUSED"
    FRUSTRATED = "FRUSTRATED"
    AGGRESSIVE = "AGGRESSIVE"
    SUPPORTIVE = "SUPPORTIVE"
    GRATEFUL = "GRATEFUL"
    HUMOROUS = "HUMOROUS"

class PostAvilableCategory(enum.Enum):
  PRIORITY_SUPPORT = 'PRIORITY_SUPPORT'
  LIFE_EMERGENCY = 'LIFE_EMERGENCY'
  SPAM = 'SPAM'
  HARASSMENT = 'HARASSMENT'
  ADULT = 'ADULT'
  SENSITIVE = 'SENSITIVE'
  THREAT = 'THREAT'
  NORMAL = 'NORMAL'
  OFFENSIVE_LANGUAGE = 'OFFENSIVE_LANGUAGE'
  PROMOTIONAL_CONTENT = 'PROMOTIONAL_CONTENT'

class RiskScaleEnum(enum.Enum):
  VERY_LOW = "0.0"
  LOW = "0.1"
  MEDIUM_LOW = "0.2"
  MEDIUM = "0.3"
  MEDIUM_HIGH = "0.4"
  HIGH = "0.5"
  VERY_HIGH = "0.6"
  EXTREMELY_HIGH = "0.7"
  NEAR_PERFECT = "0.8"
  PERFECT = "0.9"
  MAX = "1.0"

class PostCategoryResponse(typing.TypedDict):
  category: PostAvilableCategory
  risk_scale: RiskScaleEnum
  organization_standards_applied: str
  organization_standards_applied_desc: str
  post_tone: PostAvailableTone
  post_sentiment: PostAvailableSentiment
  auto_hash_tag: str

class NormalizedTextResponse(typing.TypedDict):
  normalized_post: str

class SanitizedTextResponse(typing.TypedDict):
  sanitized_post: str

class ReviewActionsAppliedAndNotifyUserResponse(typing.TypedDict):
  final_response_to_user_on_post_submission: str
  any_error_during_actions: bool

class DataBaseName:
  DATABASE_NAME = "post_management.db"
  @staticmethod
  def get_database_name():
    return DataBaseName.DATABASE_NAME


# ⚙️ Model & Configuration Setup

* Define the Gemini model version used (`GOOGLE_FLASH_V2`)
* Provide reusable configuration presets for different moderation tasks:

1. Categorization, normalization, sanitization
2. Action triggering with tool calling
3. RAG-based decision making with guideline retrieval
4. User feedback after moderation

* Helps maintain consistency and modularity across all model calls.

In [5]:
#Model & Configuration Setup
class Models:
  GOOGLE_FLASH_V2 = "gemini-2.0-flash"
  @staticmethod
  def get_model():
    return Models.GOOGLE_FLASH_V2


class ModelConfig:
  @staticmethod
  def get_category_model_config():
    return types.GenerateContentConfig(max_output_tokens=150,
                                           temperature=0.1,
                                           top_p=0.95,
                                           response_mime_type="application/json",
                                           response_schema=PostCategoryResponse)
  @staticmethod
  def get_normalize_model_config():
    return types.GenerateContentConfig(temperature=0.1,
                                           top_p=0.95,
                                           response_mime_type="application/json",
                                           response_schema=NormalizedTextResponse)
  @staticmethod
  def get_sanitize_model_config():
    return types.GenerateContentConfig(temperature=0.1,
                                           top_p=0.95,
                                           response_mime_type="application/json",
                                           response_schema=SanitizedTextResponse)
  @staticmethod
  def get_action_model_config(add_review_entry,block_user,insert_post_visibility_status):
    return types.GenerateContentConfig(temperature=0.1,
                                      top_p=0.95,
                                      tools=[add_review_entry, block_user,insert_post_visibility_status])
  @staticmethod
  def get_rag_based_action_model_config(add_review_entry,block_user,insert_post_visibility_status,retrive_guidelines_for_actions):
    return types.GenerateContentConfig(temperature=0.1,
                                      top_p=0.95,
                                      tools=[add_review_entry, block_user,insert_post_visibility_status, retrive_guidelines_for_actions])
  @staticmethod
  def get_review_action_applied_and_notify_user_model_config():
    return types.GenerateContentConfig(temperature=0.1,
                                           top_p=0.95,
                                           response_mime_type="application/json",
                                           response_schema=ReviewActionsAppliedAndNotifyUserResponse)



# 📄 Prompt Templates for All Moderation Tasks
 
 This class contains all the prompt templates used for different stages of moderation:
 
 1. 🚨 `CATEGORY_ZERO_SHOT_PROMPT`: Classifies user input into predefined moderation categories and generates structured metadata (tone, sentiment, risk, etc.)
 2. 🧹 `NORMALIZE_FEW_SHOT_PROMPT`: Rewrites harsh/offensive user input into a softer, respectful tone using few-shot examples.
 3. 🔒 `SANITIZE_FEW_SHOT_PROMPT`: Masks sensitive or harmful content (personal details, medical info, profanity, etc.) using few-shot examples.
 4. 🛠 `APPLY_ACTIONS_NON_RAG_ZERO_SHOT_PROMPT`: Directly triggers moderation actions based on structured input and rules.
 5. 📚 `ACTIONS_AVILABLE_ZERO_PROMPT`: Defines the complete set of moderation tools available and the conditions under which each tool should be used.
 6. 🧠 `APPLY_ACTIONS_RAG_ZERO_SHOT_PROMPT`: Uses RAG to retrieve action guidelines and applies actions strictly as per organizational policies.
 7. ✅ `REVIEW_ACTIONS_APPLIED_ZERO_SHOT_PROMPT`: Evaluates moderation actions taken and generates a human-like response back to the user including status, reasoning, and escalation (if any).

 ✅ All prompts follow strict output formatting and domain-specific constraints to maintain consistency and reliability during moderation.


In [6]:
# Prompt Templates for All Moderation Tasks
class Prompts:
  CATEGORY_ZERO_SHOT_PROMPT = """
  You are an AI moderation assistant designed to classify user-generated text into one of the following predefined categories:
  1. LIFE_EMERGENCY
  2. THREAT
  3. HARASSMENT
  4. OFFENSIVE_LANGUAGE
  5. PRIORITY_SUPPORT
  6. SENSITIVE
  7. ADULT
  8. PROMOTIONAL_CONTENT
  9. SPAM
  10. NORMAL

  Your task is to analyze the user input and return a structured response based on both the input and the provided Organizational Standards (retrieved via RAG).

  ### Your Output Must Include:
    1. **Category** : Choose the most appropriate category from the list above, based on the following priority order: LIFE_THREATENING_EMERGENCY > THREAT > HARASSMENT > OFFENSIVE_LANGUAGE > PRIORITY_SUPPORT > SENSITIVE > ADULT > PROMOTIONAL_CONTENT > SPAM > NORMAL
    2. **Risk Scale** : A numeric score between 0.0 (low risk) and 1.0 (high risk), representing the urgency or danger level of the post.
    3. **Organization Standards Applied for Category** : Provide the name of the rule or standard used to classify the category. If no matching rule is found, return 'Others'.
    4. **Organization Standards Applied for Category Description** : A short explanation (max 20 words) describing why this category was selected based on the standard.
    5. **Post Tone** : Describe the emotional tone of the post (e.g., angry, calm, frustrated, urgent, sarcastic).
    6. **Post Sentiment** : Determine the sentiment expressed (e.g., positive, negative, neutral, mixed).
    7. **Auto Hash Tag** : Generate up to 3 relevant more professional language hashtags for the post (comma-separated, no # symbol, lowercase).

  ### Instructions and Guidelines:
  - Always classify the input by following this strict priority order, as users may try to mask high-priority issues within low-priority content.Use the following descending order of importance when assigning categories: LIFE_THREATENING_EMERGENCY > THREAT > HARASSMENT > OFFENSIVE_LANGUAGE > PRIORITY_SUPPORT > SENSITIVE > ADULT > PROMOTIONAL_CONTENT > SPAM > NORMAL
  - Prioritize using the Organizational Standards. If unclear, rely on logical reasoning.
  - Make sure to classify the text accurately without raising false alerts or false positives. If the message is neutral, it should be categorized as "NORMAL."
  - Do not include any special characters in the output except single quotes (').
  - The Category must always match the Organizational Standard applied.
  - User inputs may include emojis, grammar mistakes, or informal/slang language. Always judge based on intent and context.

  ### Organizational Standards:
    {}

  ### User Input:
    {}
  """
  NORMALIZE_FEW_SHOT_PROMOT ="""
  You are a respectful communication assistant. Rephrase the given message to remove any harassing, aggressive, or offensive language.Make softer tone to give same meaning

  Example 1:
  Original: "You're such a loser, nobody wants you here."
  Rephrased: "I’m having a hard time connecting with you on this topic."

  Example 2:
  Original: "Why don't you just shut up already?"
  Rephrased: "I’d prefer if we paused this conversation for now."

  Example 3:
  Original: "People like you are the worst!"
  Rephrased: "I strongly disagree with your opinion."

  Now rephrase the following message:
  "{}"
  """
  SANITIZE_FEW_SHOT_PROMPT = """
  You are a sensitive & ofensive words content masking assistant. Your task is to identify and mask any sensitive or offensive language information such as personal details, medical advice, medications,bad words or anything that should not be blindly followed. Replace only the sensitive parts with [MASKED] and keep the rest of the sentence unchanged.

  Examples:

  Example 1:
  Original: "My phone number is 9876543210"
  Masked: "My phone number is [MASKED]"

  Example 2:
  Original: "I take Xanax 3 times a day"
  Masked: "I take [MASKED] 3 times a day"

  Example 3:
  Original: "Send your documents to john.doe@example.com"
  Masked: "Send your documents to [MASKED]"

  Example 4:
  Original: "Just drink apple cider vinegar every morning to cure your diabetes"
  Masked: "Just drink [MASKED] every morning to cure your diabetes"

  Example 5:
  Original: "I live at 44-A, Galaxy Apartments, NY"
  Masked: "I live at [MASKED]"

  Example 5:
  Original: "oh fuck, i am screwd"
  Masked: "oh [MASKED], i am screwd"

  Now, mask the sensitive information in this sentence:
  "{}"
  """
  APPLY_ACTIONS_NON_RAG_ZERO_SHOT_PROMPT = """
  You are a social media moderator responsible for analyzing user posts and taking appropriate moderation actions.
  ***Note -
    1. If you experience any issues or failures while using the tools, please report them in your response.
    2. DO NOT use any Escape sequences & special charactors in your response text except single quotes (')
  input:
  {}

  apply actions:
  {}
  """

  ACTIONS_AVILABLE_ZERO_PROMPT = """
  You have access to perform only the following actions:
    1.ACTION NAME: Critical / Block for Review
    DESC : Block the user until the review team clears them. Set is_blocked_for_review = true,is_post_blocked = true is and escalate to the review team with High priority to prevent the user from posting further until clearance is granted. This action is intended for high-confidence cases where the post clearly violates community guidelines but requires review before further decisions.
    Conditions: Apply this action only if the risk_scale is 0.9 or higher, except for posts categorized as LIFE_EMERGENCY. ***DO NOT BLOCK USER/POST FOR LIFE_EMERGENCY or PRIORITY_SUPPORT category,

     is_post_waiting_for_clearance |   is_post_visible

    2.ACTION NAME: Temporary Block with Review
    DESC : Temporarily block the user and involve the review team. The temporary block must not exceed 10 minutes. Set is_blocked = true,is_post_waiting_for_clearance = true,is_post_blocked=false,is_post_visible=false and is_blocked_for_review = false with the appropriate duration and timestamp. Escalate the issue to the review team with Medium priority for further analysis.
    Conditions: Use this action when risk_scale is 0.8 and the case requires additional review.***DO NOT BLOCK USER/POST FOR LIFE_EMERGENCY or PRIORITY_SUPPORT category

    3.ACTION NAME: Temporary Block without Review
    DESC : Temporarily block the user for up to 10 minutes without involving the review team. Set is_blocked = true,is_post_visible=true,is_post_blocked=false & is_post_waiting_for_clearance=false with the appropriate duration and timestamp.
    Conditions: Use this action when risk_scale is 0.7. This block is considered lower risk but might still require escalation later.***DO NOT BLOCK USER FOR LIFE_EMERGENCY or PRIORITY_SUPPORT or PROMOTIONAL_CONTENT categories

    4.ACTION NAME: Non-Blocking Escalation Only
    DESC :
       - When user blocking is not necessary, escalate the issue directly to the review, emergency, or security team based on context. Set the correct priority flag (HIGH, MEDIUM, or LOW) depending on the situation. Use the category, risk_scale, post_tone, and post_sentiment values from the input to determine the priority.
       - This action should be used when the issue needs attention but blocking is not appropriate.
      Example: In emergency cases where help is already on the way or someone is actively handling the situation, reduce the priority to LOW with is_post_visible=true.
    Conditions: Apply this action only when escalation is sufficient, and make sure to select the most relevant internal team with right priority undesrtanding the situvation.
      """

  APPLY_ACTIONS_RAG_ZERO_SHOT_PROMPT ="""
  Social Media Moderation: Guidelines, Tools, and Decision-Making Process:
  1.As a social media moderator, your responsibility is to review user posts and decide on the appropriate moderation actions.
  2.You have access to multiple tools, and you must always use the retrieval tool to access the organization's action guidelines through RAG. Ensure your search string includes risk_scale, category, and any other relevant details that help justify your decision.
  3.You are not allowed to take any decisions other then ORGANISATION GUIDELINES.

  Input:
  {}

  Output : what actions taken and any failures while applying actions
  """
  REVIEW_ACTIONS_APPLIED_ZERO_SHOT_PROMPT = """
  You are an interactive social media moderator. Your task is to review the output of system that applied actions for a post by user and provide feedback on the actions applied by the system. Your response should include details on what went well and what actions were taken
  Note :
    1.Except life-threatening emergencies or high-priority support cases , Feel free to add a touch of professional fun. You can tease professionally the user when it comes use of harrasment & Offensive languages
    2.***DO NOT use any special charactor except single quotes (')
    3.sometimes, you recive input like talking or questioning statment, Do not get confused with that as your input is from other model parse it properly
  Your output should include:
  1. final_response_to_user_on_post_submission: str ->
      Using the input, generate a precise, concise message that includes only:
        - refer "actions_taken" to find type of block (if any) & is it existing one or new one
        - refer "actions_taken" to find duration of block
        - refer "actions_taken" to find post status any one of this visible,blocked,waiting for clearance
        - refer "organization_standards_applied" or "organization_standards_applied_desc" to find reason for actions.
        - refer "actions_taken" escalations if any, if so which team
  2. any_error_during_actions: bool -> Set this to true if the input text suggests that the system attempted an action but encountered a problem. Otherwise, set it to false..

  Input:
  {}
  """


# 📄 Organisation Guidelines for Risk Scoring

To ensure that our content moderation system makes responsible and accurate decisions, we define a structured set of organizational rules. Each rule includes:

- A **Title** describing the context of the guideline.
- Clear **examples** of text patterns or phrases that match the rule.
- A breakdown of **Risk Boost levels** (Very Low to Very High) based on severity, urgency, or potential harm.

These rules act as domain knowledge input to our AI system, helping us compute a **risk score** for each post. This score is later used to determine moderation actions like escalation, blocking, or further review.

Below is the `OrganisationGuidelinesDocument` class that holds all these rules as raw text for downstream use in the risk computation engine. These can be queried or embedded for use in LLM prompting or similarity search.


In [7]:
#Organisation Guidelines for Risk Scoring
class OrganisationGuidelinesDocument:
  ORG_GUIDELINES_FOR_RISK =[
  """
  Rule Title : Urgent Requests or Situations
  Eaxmple: Text that includes urgent phrases like “emergency,” “urgent,” “help me,” “I need assistance immediately,” “something happened,” “accident,” “crisis,” “call 911,” etc.
  Risk Boost:
    1.High Risk: If critical emergency terms are found (e.g., “emergency,” “urgent,” “accident”), and the person is unable to receive assistance or is in immediate danger, set the Risk between 0.8 and 1.0.
    2.Moderate Risk: If the person is in an urgent situation but help is available or manageable (e.g., “call 911” or “help is on the way”), set the Risk between 0.5 and 0.7, adjusting based on the level of urgency and available support.
    3.Low Risk: If the situation is urgent but the individual is either able to manage it or if the urgency is somewhat less critical (e.g., general urgent inquiries), adjust the Risk score between 0.1 and 0.4.
  """,
  """
  Rule Title : Urgent Life-Threatening Requests
  Example: Text that includes terms like “emergency,” “help me,” “immediate danger,” “accident,” “critical,” “injured,” etc.
  Risk Boost:
    1.Very High Risk (0.9 - 1.0): The text indicates imminent danger or severe distress, requiring urgent intervention (e.g., “My friend is injured, I need help right now, they’re in critical condition!”).
    2.High Risk (0.7 - 0.8): The situation is urgent, and immediate action is needed, but some support is on the way or can be arranged (e.g., “My friend’s in an accident, I’ve called for help, but they need medical attention urgently!”).
    3.Moderate Risk (0.5 - 0.6): The urgency is clear, but the situation can likely be managed by the individual or by receiving help soon (e.g., “My friend’s in danger, we need to act fast to help them!”).
    4.Low Risk (0.3 - 0.4): The situation is time-sensitive, but not life-threatening or critical (e.g., “My friend needs help immediately, but the situation isn’t too dangerous, just urgent”).
    5.Very Low Risk (0.1 - 0.2): The urgency is mild, and while help is requested, the situation is not critical or dangerous (e.g., “Hey, my friend’s in a tough spot, can someone help ASAP?”).
  """,
  """
  Rule Title : Non-Life-threatening Requests
  Example: Text that includes terms like “urgent,” “blocked,” “immediate help,” “account,” “access issue,”,"Hacked" etc., where the urgency indicates a non-critical situation and no offensive or inappropriate language is used.
  Risk Boost:
	  1.Very High Risk (0.9 - 1.0): Text indicates a high level of urgency and distress over a non-life-threatening issue (e.g., “My friend’s account has been blocked, and they need it unlocked immediately because it’s preventing them from working!”).
	  2.High Risk (0.7 - 0.8): The request expresses urgency about a non-life-threatening issue, but the situation can be resolved soon with help (e.g., “My friend’s account is blocked, can someone help fix this ASAP?”).
	  3.Moderate Risk (0.5 - 0.6): The urgency is implied but not critical, and the person can likely resolve it soon (e.g., “My friend’s account has been blocked, but I think it’s not a big deal, just need some help”).
	  4.Low Risk (0.3 - 0.4): The urgency is mild or non-urgent (e.g., “My friend’s account is blocked, but I’m sure it will be sorted soon”).
	  5.Very Low Risk (0.1 - 0.2): The request is an inquiry or casual mention with no significant urgency (e.g., “Hey, my friend’s account was blocked. Any idea how to fix it?”).
  """,
  """
  Rule Title : Promotional Content
  Eaxmple: Text containing terms like “limited time offer,” “free,” “buy now,” “save big,” “discount,” “click here”.
  Risk Boost:
    1.High Risk: If the promotional content appears potentially misleading, manipulative, or suspicious (e.g., aggressive sales tactics or offers that promise unrealistic results), set the Risk between 0.7 and 1.0.
    2.Moderate Risk: If the promotional content seems suspicious, such as offers that feel a bit too good to be true but are not overtly misleading (e.g., some ambiguous or vague claims like “limited time offer” with no clear details), set the confidence between 0.4 and 0.6.
    3.Low Risk: If the promotional content appears legitimate, such as clear and honest offers (e.g., genuine discounts or sales), set the Risk between 0.1 and 0.3.
  """,
  """
  Rule Title : Repetitive Phrases
  Eaxmple: If the text contains repetitive phrases or call-to-action commands such as “hurry up,” “don’t miss out,” “act now”, “order today”.
  Risk Boost:
    1.High Risk: If the repetitive phrases are identified as manipulative, spammy, or overly aggressive (e.g., excessive use of phrases like “hurry up” or “act now” with the intent to pressure or manipulate), set the Risk between 0.7 and 1.0.
    2.Moderate Risk: If the repetitive phrases seem mildly persuasive or slightly manipulative but not overly aggressive (e.g., moderate use of phrases like “don’t miss out” or “order today” in promotional content), set the Risk between 0.4 and 0.6.
    3.Low Risk: If the repetitive phrases are used in a non-intrusive or legitimate way (e.g., occasional reminders or general calls to action), set the Risk between 0.1 and 0.3.
  """,
  """
  Rule Title : Links and Advertising
  Eaxmple: Text with links, advertisements, or unsolicited promotional content.
  Risk Boost:
    1.High Risk: If the text contains unsolicited, aggressive, or suspicious promotional content or external links (e.g., spam, excessive advertising, or links leading to questionable sites), set the Risk between 0.7 and 1.0.
    2.Moderate Risk: If the links or ads are somewhat promotional or unsolicited but not overly aggressive (e.g., links to promotional content from a trusted company), set the Risk between 0.4 and 0.6.
    3.Low Risk: If the text contains legitimate, non-intrusive, and relevant links or advertisements (e.g., trusted sources, non-spammy offers), set the Risk between 0.1 and 0.3.
  """,
  """
  Rule Title: Harassment Through Insults
  Example: Text that contains repeated or targeted insults, such as “stupid,” “idiot,” “useless,” “pathetic,” or “awful person,” especially when aimed at demeaning or belittling an individual over time.
  Risk Boost:
    1.High Risk: If the text contains persistent or aggressive insults, or a combination of insults that seem aimed at harassing or degrading the individual, set the Risk between 0.7 and 1.0.
    2.Moderate Risk: If the text contains moderate insults that could be perceived as part of a pattern of harassment but are not excessively harsh, set the Risk between 0.4 and 0.6.
    3.Low Risk: If the text contains isolated or subtle insults that do not seem to indicate a sustained attack or repeated behavior, set the Risk between 0.1 and 0.3.
  """,
   """
    Rule Title : Insults Directed at an Individual
    Example: Text that contains repeated or targeted insults, such as “stupid,” “idiot,” “useless,” “pathetic,” or “awful person,” especially when aimed at demeaning or belittling an individual over time.
    Risk Boost:
      1. Very High Risk (0.9 - 1.0): Continuous and aggressive insults or bullying directed at an individual, creating a harmful or toxic environment.Eg: "You should just leave. No one wants you here, ever."
      2. High Risk (0.7 - 0.8): Frequent or harsh insults aimed at an individual, though not overtly malicious or bullying in nature.Eg: "You’re such a loser, no one cares about your opinion."
      3. Moderate Risk (0.5 - 0.6): Insults are present but not severe; they may suggest tension or discomfort but are not constant or overly harsh.Eg: "I can’t believe how dumb you are. You’ll never get it."
      4. Low Risk (0.3 - 0.4): Minor insults, typically isolated or not directed at an individual in an aggressive way.Eg: "You should really start listening to people for once."
      5. Very Low Risk (0.1 - 0.2): Insults or negative comments used casually, without targeting or harming anyone intentionally.Eg: "You're always so annoying, but I’ll still help you out."
    """,
  """
  Rule Title : Aggressive Tone
  Eaxmple: If the text includes aggressive phrases like “shut up,” “leave me alone,” “shut your mouth,” “I can’t stand you”.
  Risk Boost:
    1.High Risk: If the text contains highly aggressive or confrontational language (e.g., harsh or multiple aggressive phrases), set the Risk between 0.8 and 1.0.
    2.Moderate Risk: If the text has moderate aggression (e.g., multiple phrases or slightly stronger language), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the text includes subtle or mild aggressive phrases (e.g., a single phrase with less intensity), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Threatening Words
  Eaxmple: Text that includes threats but in a more subtle form like “you’ll regret this,” “you’ll pay for this,” “you’ll wish you hadn’t done that.”
  Risk Boost:
    1.High Risk: If the text strongly implies or suggests harassment or a serious threat (but still not explicit), set the Risk to 0.9.
    2.Moderate Risk: If the threats are more direct or suggest mild harassment without being overtly harmful, set the Risk between 0.5 and 0.7.
    3.Low Risk: If the threatening words are mild or imply harm in a non-confrontational way (e.g., subtle hints with no direct threat), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Explicit Sexual Content
  Eaxmple: Text that includes sexually explicit content or adult material such as terms like “sex,” “sexual act,” “porn,” “adult film,” “adult content,” “erotic”.
  Risk Boost:
    1.High Risk: If the content contains clear, explicit sexual terms or graphic adult material, set the Risk to 1.0.
    2.Moderate Risk: If the content includes somewhat explicit references or alludes to adult material in a non-graphic manner (e.g., vague or indirect references), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the content contains mild references or euphemisms related to sexual themes without explicit details (e.g., casual mentions), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Sexual Implications or Innuendo
  Eaxmple: Text that includes sexual innuendo or implied sexual references like “hookup,” “make love,” “get intimate,” “one-night stand,” “flirt” and indirect references to sex organ are considered Adult catogery.
  Risk Boost:
    1.High Risk: If the content explicitly refers to sexual activity or intimate situations, or clearly represents a sexual act, the Risk should be set between 0.8 and 1.0.
    2.Moderate Risk: If the content includes implied or indirect sexual activity or intimacy, or indirectly suggests sexual acts or encounters, set the Risk between 0.5 and 0.7.
    3.Low Risk: If the content contains casual or light sexual implications, flirtatious comments, or subtle references that could be interpreted as suggestive but are not directly explicit (such as mild flirting or indirect allusions to sexual organs), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Adult Conversations or Topics
  Eaxmple: Text discussing adult relationships, marital issues, or extramarital affairs in a sexual context, such as “affair,” “cheating,” “unfaithful,” “love affair”.
  Risk Boost:
    1.High Risk: If the content involves explicit discussions of infidelity, sexual relationships, or sexual undertones (e.g., explicit mention of extramarital affairs with sexual connotations), set the Risk to 0.9.
    2.Moderate Risk: If the text discusses adult relationships with mild or implied sexual references (e.g., general discussion of marital issues or infidelity without overt sexual content), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the text discusses adult topics or relationships in a neutral, non-sexual context (e.g., casual mention of relationships or marriage without explicit or sexual undertones), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : PII (Personally Identifiable Information)
  Eaxmple: Text containing keywords like “social security number,” “bank account,” “address,” “phone number,” “email address,” “driver's license”.
  Risk Boost:
    1.High Risk: If explicit, clear references to sensitive PII (e.g., full social security numbers, bank account details, unmasked addresses, etc.) are found, set the Risk to 1.0.
    2.Moderate Risk: If the text includes partial or obscured references to PII, such as a phone number with dashes or a partially redacted address (e.g., “1234**** street”), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the text contains any indirect references to PII, such as vague or non-specific terms related to personal information (e.g., “contact info,” “personal details”), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Financial or Legal Information
  Eaxmple: Text that includes “account number,” “balance,” “transaction,” “payment,” “legal case”, or anything involving money or legal proceedings.
  Risk Boost:
    1.High Risk: If the text includes explicit financial or legal information that could be harmful, such as full account numbers, legal proceedings details, or confidential financial data (e.g., “my account number is 123-456-789”), set the Risk to 0.9 or higher.
    2.Moderate Risk: If the text includes references to financial or legal topics that appear more specific or potentially sensitive but not clearly malicious (e.g., “my bank account is being reviewed” or “a legal case is pending”), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the text mentions financial or legal terms in a casual or non-sensitive context, such as a discussion about a transaction or payment in a general context (e.g., “I made a payment today”), set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title : Medical Information
  Eaxmple: Text containing terms like “diagnosis,” “medication,” “treatment,” “surgery” that could be considered sensitive, especially when providing advice or instructions that should not be blindly followed.
  Risk Boost:
    1.High Risk: If the text involves sensitive or explicit medical information, such as diagnoses, instructions, or advice that could be harmful if blindly followed (e.g., “Take this medication without consulting a doctor” or “Here’s a treatment plan for your condition”), set the confidence to 0.9 or higher. Ensure that such content is flagged for caution.
    2.Moderate Risk: If the text contains more specific details related to medical conditions or treatments, but not with explicit instructions or harmful advice (e.g., “I’m on medication for my condition”), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the medical information is shared in a general or non-specific context, with no explicit guidance or advice (e.g., “I went to the doctor” or “I’m considering a treatment”), set the Risk score between 0.1 and 0.4.
  """,
  """
  Rule Title : Explicit Threats
  Eaxmple: Text containing explicit threats such as “I will kill you,” “I will harm you,” “I’m going to hurt you”,"i have planted a bomb in stadium"."conducting/supporting terror attacks","Bomb will blast in 2 min"
  Risk Boost:
    1.High Risk: If the text contains clear, direct, and explicit threats of harm, such as “I will kill you” or “I’m going to hurt you,” set the Risk to 1.0. This should be flagged as a severe threat that requires immediate attention.
    2.Moderate Risk: If the threat is less direct but still implies harm (e.g., “You’re going to regret this” or “I’ll make you pay for this”), set the Risk between 0.7 and 0.9 based on the context and level of implied danger.
    3.Low Risk: If the text contains ambiguous or less clear threats that don’t convey immediate danger (e.g., “I’ll deal with you later” or “You’ll regret this”), set the Risk between 0.3 and 0.6 depending on the language and context.
  """,
  """
  Rule Title : Implicit Threats
  Eaxmple: Text that subtly threatens harm, like “You’ll regret this,” “You’ll pay for this,” “You don’t know who you’re dealing with.”
  Risk Boost:
    1.High Risk: If the text subtly implies harm with strong language or context (e.g., “You’ll pay for this” or “You don’t know who you’re dealing with”), set the Risk to 0.7 to 0.8. This should be flagged as a serious potential threat.
    2.Moderate Risk: If the text contains less direct but still concerning phrases (e.g., “You’ll regret this”), set the Risk to 0.4 to 0.6 based on the severity of the language used and the potential for harm.
    3.Low Risk: If the text contains mild or unclear hints of a threat (e.g., “You’ll see what happens” or “This won’t be forgotten”), set the Risk to 0.1 to 0.3. These should be considered for review but may not indicate a high level of threat.
  """,
  """
  Rule Title : Neutral and Non-sensitive Content
  Eaxmple: Text without any emotionally charged language or sensitive information. This includes casual conversations, neutral statements, and general inquiries (e.g., “How are you?”, “What time is the meeting?”, “Let’s have lunch tomorrow”).
  Risk Boost:
    1.High Risk: For entirely neutral, non-sensitive content with no emotionally charged or harmful language, set the Risk to 1.0. This content can be classified as completely safe and non-urgent.
    2.Moderate Risk: In rare cases where the tone or language could be perceived as slightly emotional but still neutral (e.g., light humor or excitement), set the Risk to 0.7 to 0.9. This is still considered low-risk, but it may require closer attention.
    3.Low Risk: If the content is borderline neutral and contains minor emotional cues that could be misinterpreted, set the Risk to 0.4 to 0.6 for further review, but it should still be flagged as safe unless stronger indicators arise.
  """,
  """
  Rule Title: Offensive Language
  Example: Text containing profanity or bad words in a casual or non-threatening context, such as “What the hell,” “damn it,” or “screw this.”,"fuck it, what can i do now","oh shit, how can i do this"
  Risk Boost:
    1.High Risk: If the bad words used in statment then mark it to 0.8 , language used in a highly inappropriate or aggressive way, but still doesn’t escalate to harassment or abuse, set the Risk to 0.8 to 1.0.
    2.Moderate Risk: If bad words are used in a frustrated but non-aggressive context (e.g., slight irritation), set the Risk between 0.5 and 0.7.
    3.Low Risk: If the language is used mildly and does not escalate to any form of insult or aggression, set the Risk between 0.1 and 0.4.
  """,
  """
  Rule Title: Others
  Examples & Actions:
    i. If Category is in [EMERGENCY, THREAT] → Use your best judgment to make decisions freely based on your understanding.
    ii. If Category is in [SPAM, HARASSMENT, ADULTERY, SENSITIVE, NORMAL] → Be cautious; avoid false alerts and unnecessary flags.
  Risk Boost: Adjust the Risk score dynamically between 0.1 to 1.0 based on your level of certainty and contextual understanding.
  """
  ]
  ORG_GUIDELINES_FOR_ACTIONS =[
      """
      ACTION NAME : Critical / Block for Review
      - Block the user pending review: Block the user until the review team clears them. Set `is_blocked_for_review = true` and escalate to the review team to prevent the user from posting further until clearance is given. This action is for high-confidence cases, typically when the post violates community guidelines but requires review before deciding on further action.
        - ***Conditions:*** Block for review only if the Risk_scale is **0.9 or higher**, except in the case of the **LIFE_THREATENING_EMERGENCY** category.
      """,
      """
      ACTION NAME : Temporary Block with Review
      - Temporary block with review team involvement: Block the user temporarily while still involving the review team, temp block should not exceed 10 min. Set `is_blocked = true` and `is_blocked_for_review = false` with the duration and timestamp, and escalate to the review team for further review and actions.
        - ***Conditions:*** Used when the Risk_scale = 0.8, and the situation requires additional review.
      """,
      """
      ACTION NAME : Temporary Block  without Review
      - Temporary block without review: Block the user temporarily for a set duration without involving the review team , temp block should not exceed 10 min. Set `is_blocked = true` with the appropriate duration and timestamp.
        - ***Conditions:*** If the Risk_scale = 0.7 and the post is not in the  **LIFE_THREATENING_EMERGENCY**, or **PROMOTIONAL_CONTENT** categories, apply this action. This is a lower-risk block, but may still need to be escalated later.
      """,
      """
      **ACTION NAME : Non-Blocking only Escalation
      - Escalate without blocking: If blocking isn’t necessary, escalate the issue directly to the review team, emergency team, or security team without applying any block. This action is for situations where blocking the user is not required, but the issue needs to be addressed by another team. ensuring that the appropriate team (review, emergency, or security) is selected based on the context.
        - ***Conditions:*** Use this action when you observe just escalation is enough, ensuring you select the most suitable team for the situation.
      """
  ]

# 🔧 Gemini Embedding Function for Vectorization
* This class defines a custom embedding function using Google Gemini's text-embedding-004 model, which transforms input text into high-dimensional vectors. These embeddings are essential for performing semantic search and similarity comparisons within our moderation system.
* Supports both document and query embedding modes using retrieval_document and retrieval_query task types.
* Integrates with ChromaDB for efficient storage and retrieval of vectorized content.
* Includes retry logic to handle temporary API failures gracefully.
* This component powers the text understanding layer of our system, enabling downstream modules like RAG and content classification to work with vectorized representations of social media posts.

In [8]:
#Gemini Embedding Function for Vectorization
class GeminiEmbeddingFunction(EmbeddingFunction):
    document_mode = True
    def __init__(self):
      print("Initializing Gemini emdeding function")

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

# 🗂️ Local Database Service for Post Management and Moderation Logs
This cell defines a SQLite-based storage layer that acts as the persistent memory of our AI moderation system. It plays a vital role in capturing and maintaining user post information, moderation decisions, and visibility status for further auditing and review.

✅ Key Highlights:
DatabaseService class handles:

* Post ingestion with metadata like category, sentiment, tone, risk score, etc.
* Moderation actions: blocking users, escalating posts, assigning visibility.
* Structured tables: user_posts, blocked_user, review_posts, post_visibility_status.
* Hashtag standardization and sanitization tracking.
* PrintDataFromDataBase class:
* Provides a beautifully formatted feed view with emojis, labels, and escalation highlights.
* Clearly shows blocked status, review priorities, and original/sanitized post details.
* This module ensures the system retains a traceable history of every user interaction and moderation action, making it reliable for future analysis, transparency, and compliance.

In [9]:
#Local Database Service for Post Management and Moderation Logs
class DatabaseService:
  def __init__(self):
    print("Initializing the database service")
    self.db_file = DataBaseName.get_database_name()
    self.create_all_required_tables()

  def create_all_required_tables(self):
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()
    cursor.execute("DROP TABLE IF EXISTS user_posts")
    cursor.execute("DROP TABLE IF EXISTS blocked_user")
    cursor.execute("DROP TABLE IF EXISTS review_posts")
    cursor.execute("DROP TABLE IF EXISTS blocked_post")
    cursor.execute("DROP TABLE IF EXISTS post_visibility_status")
    cursor.execute("""
    CREATE TABLE user_posts (
        post_id INTEGER PRIMARY KEY AUTOINCREMENT,
        user_name VARCHAR(100),
        post_txt TEXT,
        category VARCHAR(50),
        risk_scale DECIMAL(3,2),
        organization_standards_applied VARCHAR(255),
        organization_standards_applied_desc TEXT,
        post_tone VARCHAR(50),
        post_sentiment VARCHAR(50),
        auto_hash_tag VARCHAR(255),
        sanitized_post TEXT,
        normalized_post TEXT,
        actions_taken TEXT,
        final_response_on_post_submission TEXT,
        any_errors_while_executing_actions BOOLEAN
    );
    """)
    cursor.execute("""
    CREATE TABLE blocked_user (
      user_name VARCHAR(100) PRIMARY KEY,
      is_blocked_for_review BOOLEAN DEFAULT 0,
      is_blocked BOOLEAN DEFAULT 0,
      block_duration INTEGER,
      block_timestamp DATETIME
    ); """)
    cursor.execute("""
    CREATE TABLE review_posts (
      post_id INTEGER PRIMARY KEY,
      escalate_to_security_team BOOLEAN,
      escalate_to_emergency_services BOOLEAN,
      escalate_to_review_team BOOLEAN,
      priority VARCHAR(50)
    );""")
    cursor.execute("""
    CREATE TABLE post_visibility_status (
      post_id INTEGER PRIMARY KEY,
      is_post_blocked BOOLEAN,
      is_post_waiting_for_clearance BOOLEAN,
      is_post_visible BOOLEAN
    );""")
    print("created required tables - done")
    conn.commit()
    conn.close()

  def insert_user_post_and_get_post_id(self,user_post_data)->int:
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()
    post_data=json.loads(user_post_data)
    insert_query = """
    INSERT INTO user_posts (
           user_name, post_txt, category, risk_scale,
          organization_standards_applied, organization_standards_applied_desc,
          post_tone, post_sentiment, auto_hash_tag,
          sanitized_post, normalized_post
      ) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """

    cursor.execute(insert_query, (
            post_data["user_name"],post_data["post_txt"],
            post_data["category"],post_data["risk_scale"],post_data["organization_standards_applied"],
            post_data["organization_standards_applied_desc"],post_data["post_tone"],post_data["post_sentiment"],
            self.auto_hashtag(post_data["auto_hash_tag"]),post_data["sanitized_post"],post_data["normalized_post"]
        ))

    post_id = cursor.lastrowid
    conn.commit()
    conn.close()
    return post_id

  def block_user(
      self,
      user_name: str,
      block_duration: int,
      is_blocked_for_review:bool):
    """
    Used to to block the user.
    Args:
        user_name:string -> user name of the user
        block_duration:integer -> Number of minutes to block the user
        is_blocked_for_review:boolean -> True if escalated to review team, this will make the post blocked untill review team review it.

    Returns:
        None
    """

    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()

    cursor.execute("""
    INSERT INTO blocked_user (user_name, is_blocked_for_review, is_blocked, block_duration, block_timestamp)
    VALUES (?, ?, ?, ?, ?)
    ON CONFLICT(user_name) DO UPDATE SET
        is_blocked_for_review = excluded.is_blocked_for_review,
        is_blocked = excluded.is_blocked,
        block_duration = excluded.block_duration,
        block_timestamp = excluded.block_timestamp
    """, (
        user_name,
        is_blocked_for_review,
        True,
        block_duration,
        datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    ))
    conn.commit()
    conn.close()

  def add_review_entry(
      self,
      post_id:int,
      escalate_to_security_team:bool,
      escalate_to_emergency_services:bool,
      escalate_to_review_team:bool,
      priority:str):
    """
    Used to escalate a post for individual team.
    Args:
        post_id:integer -> post id of the user post
        escalate_to_security_team:boolean -> True if escalate to security team for app access issues and other security concerns in application, else False
        escalate_to_emergency_services:boolean -> True if escalate to emergency services for quick help on emergency situvations else False
        escalate_to_review_team:boolean -> True if escalate to review team for human review to user post and other concerns from user regarding application else False
        priority:string -> priority of the the review any one in [LOW,MEDIUM,HIGH]

    Returns:
        None
    """
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()

    cursor.execute("""
    INSERT INTO review_posts (
        post_id,
        escalate_to_security_team,
        escalate_to_emergency_services,
        escalate_to_review_team,
        priority
    ) VALUES (?, ?, ?, ?, ?)
    """, (post_id, escalate_to_security_team, escalate_to_emergency_services, escalate_to_review_team,priority))

    conn.commit()
    conn.close()
  import sqlite3

  def insert_post_visibility_status(self,
                                    post_id:int,
                                    is_post_blocked:bool,
                                    is_post_waiting_for_clearance:bool,
                                    is_post_visible:bool):
    """
    Used to determine post visibility status according to actions aplied.
    Args:
        post_id:integer -> post id of the user post
        is_post_blocked:boolean -> True if the post has to be blocked else False
        is_post_waiting_for_clearance:boolean -> True post is not blocked but waiting for clearance else False.
        is_post_visible:boolean -> True if post is allowed to visible else False

    Returns:
        None
    """
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()

    cursor.execute("""
            INSERT INTO post_visibility_status (
                post_id,
                is_post_blocked,
                is_post_waiting_for_clearance,
                is_post_visible
            )
            VALUES (?, ?, ?, ?)
        """, (post_id, is_post_blocked, is_post_waiting_for_clearance, is_post_visible))

    conn.commit()
    conn.close()


  def is_user_allowed_to_post(self,user_name):
      conn = sqlite3.connect(self.db_file)
      cursor = conn.cursor()

      cursor.execute("""
          SELECT is_blocked_for_review, block_duration, block_timestamp
          FROM blocked_user
          WHERE user_name = ?
      """, (user_name,))

      row = cursor.fetchone()

      if not row:
          # No block entry exists, allow post
          conn.close()
          return True

      is_blocked_for_review, block_duration, block_timestamp = row

      if not is_blocked_for_review:
          # Check if block duration has expired
          if block_timestamp is None or block_duration is None:
              conn.close()
              return False  # Can't validate expiration
          block_time = datetime.fromisoformat(block_timestamp)
          now = datetime.now()
          if (now - block_time).total_seconds() > block_duration * 60:
              # Unblock the user by removing the record
              cursor.execute("DELETE FROM blocked_user WHERE user_name = ?", (user_name,))
              conn.commit()
              conn.close()
              return True

      conn.close()
      return False
  def update_actions_taken(self,post_id, actions):
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()
    cursor.execute("""
            UPDATE user_posts
            SET actions_taken = ?
            WHERE post_id = ?
        """, (actions, post_id))
    conn.commit()
    conn.close()


  def update_final_response_to_user_on_post_submission(self,post_id, user_response, any_errors):
    conn = sqlite3.connect(self.db_file)
    cursor = conn.cursor()
    cursor.execute("""
            UPDATE user_posts
            SET final_response_on_post_submission = ?, any_errors_while_executing_actions=?
            WHERE post_id = ?
        """, (user_response,any_errors, post_id))
    conn.commit()
    conn.close()
  def auto_hashtag(self,s):
    if not s:
        return s
    return ','.join(
        (word.strip() if word.strip().startswith('#') else f"#{word.strip()}")
        .replace(' ', '_')
        for word in s.split(',')
    )

class PrintDataFromDataBase:
  db_file = DataBaseName.get_database_name()
  @staticmethod
  def print_post_data():
    conn = sqlite3.connect(PrintDataFromDataBase.db_file)
    cursor = conn.cursor()
    query = """
    with posts as (
    SELECT
        p.post_id as post_id,p.user_name as user_name,
        ('\U0001F4CC CATEGORY:' || p.category || ' & RISK:' || p.risk_scale) as designer_title,
        ('#'|| p.post_id ||' \U0001F60E' || '[' || user_name || ']') as designed_header,
        (CASE
          WHEN pv.is_post_waiting_for_clearance  = 1 THEN
          '\U0001F6AB Post has been waiting for review due to violation of organization policy: ' ||CHAR(10)||CHAR(9) ||
           '\U0001F4E3 ' || p.organization_standards_applied || ' - ' || p.organization_standards_applied_desc
          ELSE COALESCE(NULLIF(sanitized_post, ''),NULLIF(normalized_post, ''),p.post_txt)
        END) AS designed_post_txt,
        (CASE 
        WHEN sanitized_post IS NOT NULL AND sanitized_post != '' THEN '\U0001F6E1\ufe0f Post has been sanitized'
        WHEN normalized_post IS NOT NULL AND normalized_post != '' THEN '\U0001F9EF Post has been normalized'
        ELSE '\U0001F4DC Original post (no sanitization or normalization)'
    END) AS post_status,
        ('\u0023\uFE0F\u20E3' || p.auto_hash_tag) as designed_auto_hash_tag,
        (CASE
          WHEN category = 'LIFE_EMERGENCY' then
             '\u2757 Life Emergency content: Not sanitized to preserve original message for urgent review. Will be taken down if it violates any guidelines.'
          ELSE ''
        END) as non_filter_warnings,
        p.category as category,p.risk_scale as risk_scale,
        pv.is_post_blocked as is_post_blocked,pv.is_post_waiting_for_clearance as is_post_waiting_for_clearance,
        pv.is_post_visible as is_post_visible
    FROM
        user_posts p
    LEFT JOIN
        post_visibility_status pv ON p.post_id = pv.post_id
        where (pv.is_post_blocked is null or pv.is_post_blocked = 0)
    ),
    review_details as(
      SELECT
      post_id,
      '\U0001F6A8' || 'Post escalated to ' ||
      TRIM(
        CASE WHEN escalate_to_review_team = 1 THEN 'Review Team, ' ELSE '' END ||
        CASE WHEN escalate_to_emergency_services = 1 THEN 'Emergency Services, ' ELSE '' END ||
        CASE WHEN escalate_to_security_team = 1 THEN 'Security Team, ' ELSE '' END,
        ', '
      ) ||
      ' with ' || priority || ' priority.' AS escalation_message
    FROM review_posts
    ),
    post_display_data as (
    select
    posts.post_id,designer_title,designed_header,designed_post_txt,post_status,designed_auto_hash_tag,escalation_message,non_filter_warnings
    from posts left join review_details on posts.post_id=review_details.post_id
    )
    select
    (designed_header || CHAR(10) || designer_title || ' ' || post_status || CHAR(10) || CHAR(10) ||
    '\U0001F4A1 Thought Shared:' ||CHAR(10) ||
    CHAR(9) || designed_post_txt || CHAR(10) ||CHAR(10) ||
    designed_auto_hash_tag || CHAR(10) ||
    (CASE
      WHEN COALESCE(escalation_message, '') = '' THEN ''
      ELSE escalation_message || CHAR(10)
    END) ||
    (CASE
      WHEN COALESCE(non_filter_warnings, '') = '' THEN ''
      ELSE non_filter_warnings || CHAR(10)
    END) ||
    '------------------------------------------------------------------------------------------------------------------------------------------------------') as `FEED POSTS`
    from post_display_data
    order by post_id desc
    """
    cursor.execute(query)
    rows = cursor.fetchall()
    column_names = [description[0] for description in cursor.description]
    print(tabulate(rows, headers=column_names))
    conn.close()




# 🚦 MonitorService: Core Moderation Logic and Workflow Engine
The MonitorService class forms the heart of the moderation pipeline in our "Social Media Posts Monitor and Controller" system. It coordinates all core actions including:

* 🔍 Validation – Checks for required fields and formats user posts.
* 🧠 Category Classification – Uses a Gemini-powered model to classify posts by category, tone, sentiment, and risk.
* 🧽 Sanitization and Normalization – Depending on post category, applies content correction for safety and clarity.
* 📊 RAG-based Risk & Action Logic – Integrates organisational guidelines from ChromaDB for context-aware action decisions.
* 🛑 Moderation Action Execution – Calls predefined tools (via function calling) to take structured actions like blocking or escalating posts.
* 📣 Final Review & Feedback – Notifies users about moderation outcomes in a clear and structured response.

The system follows a modular flow driven by LangGraph agents and stateful processing. This class leverages ChromaDB, Gemini’s function calling, and dynamic prompt templates to ensure high-quality, context-sensitive moderation actions.

In [10]:
#MonitorService: Core Moderation Logic and Workflow Engine
class MonitorService:

  def __init__(self):
    print("Initializing the application")
    self.chroma_client = chromadb.Client()
    self.embed_fn = GeminiEmbeddingFunction()
    self.databaseService = DatabaseService()
    self.vector_db_guidelines_for_risk = self.chroma_client.get_or_create_collection(name="organisation_guidelines_for_risk", embedding_function=self.embed_fn)
    self.vector_db_guidelines_for_actions = self.chroma_client.get_or_create_collection(name="organisation_guidelines_for_actions", embedding_function=self.embed_fn)
    self.refresh_chroma_db_for_rag(OrganisationGuidelinesDocument.ORG_GUIDELINES_FOR_RISK,OrganisationGuidelinesDocument.ORG_GUIDELINES_FOR_ACTIONS)

  def validate_user_post(self,state: ModerationState) -> ModerationState:
    # print("validate_user_post")
    user_post = self.fix_unicode(state["user_post"])
    try:
      user_post_dict = json.loads(user_post)
      required_keys = ["user_name", "post_txt"]
      for key in required_keys:
        if key not in user_post_dict:
          raise ValueError(f"Missing key: {key}")
        if user_post_dict[key] is None or user_post_dict[key].strip() == "":
          raise ValueError(f"Value for {key} cannot be null or empty")
      user_post_dict["user_name"] = self.fix_unicode(user_post_dict["user_name"].upper().strip())
      user_post_dict["post_txt"] = self.fix_unicode(user_post_dict["post_txt"].strip())
      state["validated_post"] =  json.dumps(user_post_dict, indent=4, ensure_ascii=False).strip()
      return state
    except json.JSONDecodeError:
      raise ValueError("Invalid JSON format")

  def get_post_category(self,state: ModerationState) -> ModerationState:
    # print("get_post_category")
    user_post = self.fix_unicode(state["validated_post"])
    user_post_dict = json.loads(user_post)
    input = json.dumps({"post_txt": user_post_dict.get("post_txt")}, indent=4, ensure_ascii=False)
    response = client.models.generate_content(
        model=Models.get_model(),
        config=ModelConfig.get_category_model_config(),
        contents=[Prompts.CATEGORY_ZERO_SHOT_PROMPT.format(self.retrive_guidelines_for_risk(user_post_dict.get("post_txt")),input)]
    )
    response_txt = self.fix_unicode(response.text)
    user_post_dict["category"] = json.loads(response_txt)["category"]
    user_post_dict["risk_scale"] = json.loads(response_txt)["risk_scale"]
    user_post_dict["organization_standards_applied"] = json.loads(response_txt)["organization_standards_applied"]
    user_post_dict["organization_standards_applied_desc"] = json.loads(response_txt)["organization_standards_applied_desc"]
    user_post_dict["post_tone"] = json.loads(response_txt)["post_tone"]
    user_post_dict["post_sentiment"] = json.loads(response_txt)["post_sentiment"]
    user_post_dict["auto_hash_tag"] = json.loads(response_txt)["auto_hash_tag"]
    user_post_dict["normalized_post"] = ""
    user_post_dict["sanitized_post"] =""

    result_json_str = json.dumps(user_post_dict, indent=4, ensure_ascii=False)
    state["categorized_post"] = result_json_str
    return state

  def normalize_user_post(self,state: ModerationState) -> ModerationState:
    # print("normalize_user_post")
    user_post = self.fix_unicode(state["categorized_post"])
    user_post_dict = json.loads(user_post)
    response = client.models.generate_content(
        model=Models.get_model(),
        config=ModelConfig.get_normalize_model_config(),
        contents=[Prompts.NORMALIZE_FEW_SHOT_PROMOT.format(user_post_dict.get("post_txt"))]
    )
    response_txt = self.fix_unicode(response.text)
    user_post_dict["normalized_post"] = json.loads(response_txt)["normalized_post"]
    result_json_str = json.dumps(user_post_dict, indent=4, ensure_ascii=False)
    state["categorized_post"]=result_json_str
    return state

  def sanitize_user_post(self,state: ModerationState) -> ModerationState:
    # print("sanitize_user_post")
    user_post = self.fix_unicode(state["categorized_post"])
    user_post_dict = json.loads(user_post)
    response = client.models.generate_content(
        model=Models.get_model(),
        config=ModelConfig.get_sanitize_model_config(),
        contents=[Prompts.SANITIZE_FEW_SHOT_PROMPT.format(user_post_dict.get("post_txt"))]
    )
    response_txt = self.fix_unicode(response.text)
    user_post_dict["sanitized_post"] = json.loads(response_txt)["sanitized_post"]
    result_json_str = json.dumps(user_post_dict, indent=4, ensure_ascii=False)
    state["categorized_post"] = result_json_str
    return state

  def take_actions_based_on_analysis(self,state: ModerationState)-> ModerationState:
    # print("take_actions_based_on_analysis")
    user_post = self.fix_unicode(state["categorized_post"])
    user_post_dict = json.loads(user_post)
    post_status=True

    if (self.databaseService.is_user_allowed_to_post(user_post_dict.get("user_name"))) or (user_post_dict.get("category") in(PostAvilableCategory.LIFE_EMERGENCY.value,PostAvilableCategory.PRIORITY_SUPPORT.value)):
      post_id = self.databaseService.insert_user_post_and_get_post_id(user_post)
      user_post_dict["post_id"] = post_id
      if (user_post_dict.get("category") != PostAvilableCategory.NORMAL.value) and (float(user_post_dict.get("risk_scale"))>=0.5):
        action_model_input = json.dumps(user_post_dict, indent=4, ensure_ascii=False)

        #Non Rag based action model
        response = client.models.generate_content(
          model=Models.get_model(),
          config=ModelConfig.get_action_model_config(self.databaseService.add_review_entry, self.databaseService.block_user, self.databaseService.insert_post_visibility_status),
          contents=Prompts.APPLY_ACTIONS_NON_RAG_ZERO_SHOT_PROMPT.format(action_model_input,Prompts.ACTIONS_AVILABLE_ZERO_PROMPT)
        )

        #RAG based action Model
        # response = client.models.generate_content(
        #     model=Models.get_model(),
        #     config=ModelConfig.get_rag_based_action_model_config(self.databaseService.add_review_entry,, self.databaseService.insert_post_visibility_status, self.databaseService.block_user,self.retrive_guidelines_for_actions),
        #     contents = Prompts.APPLY_ACTIONS_RAG_ZERO_SHOT_PROMPT.format(action_model_input)
        # )
        response_txt = self.fix_unicode(response.text)
        user_post_dict["actions_taken"] = response_txt
      else:
        user_post_dict["actions_taken"] = "NO ACTIONS APPLIED, ITS POSTED"
      self.databaseService.update_actions_taken(user_post_dict["post_id"],user_post_dict["actions_taken"])
    else:
      user_post_dict["actions_taken"] = "User has been already blocked either for review or temp cool off block ! you are not allowed to post now ,sorry try again"


    result_json_str = json.dumps(user_post_dict, indent=4, ensure_ascii=False)
    state["actioned_post"] = result_json_str
    return state

  def review_action_applied_and_notify_user(self,state: ModerationState)->ModerationState:
    user_post = self.fix_unicode(state["actioned_post"])
    user_post_dict = json.loads(user_post)
    review_model_input = json.dumps(
        {
            "user_name": user_post_dict.get("user_name"),
            "organization_standards_applied": user_post_dict.get("organization_standards_applied"),
            "organization_standards_applied_desc": user_post_dict.get("organization_standards_applied_desc"),
            "actions_taken": user_post_dict.get("actions_taken")
         }, indent=4, ensure_ascii=False)
    response = client.models.generate_content(
        model=Models.get_model(),
        config=ModelConfig.get_review_action_applied_and_notify_user_model_config(),
        contents=[Prompts.REVIEW_ACTIONS_APPLIED_ZERO_SHOT_PROMPT.format(review_model_input)]
    )
    response_txt = self.fix_unicode(response.text)
    user_post_dict["final_response_to_user_on_post_submission"] = json.loads(response_txt)["final_response_to_user_on_post_submission"]
    user_post_dict["any_error_during_actions"] = json.loads(response_txt)["any_error_during_actions"]
    if user_post_dict.get("post_id"):
      self.databaseService.update_final_response_to_user_on_post_submission(user_post_dict["post_id"],user_post_dict["final_response_to_user_on_post_submission"],user_post_dict["any_error_during_actions"])
    result_json_str = json.dumps(user_post_dict, indent=4, ensure_ascii=False)
    state["final_output"] = result_json_str
    return state


  def retrive_guidelines_for_risk(self,topic:str)->str:
    query_text="""Get me the rules related to text : {}""".format(topic)
    result = self.vector_db_guidelines_for_risk.query(query_texts=query_text, n_results=3)
    return "\n".join(result["documents"][0])

  def retrive_guidelines_for_actions(self,topic:str)->str:
    """
    Used to retrive document from the RAG.
    Args:
        topic:string -> search string from the vector db , always include catogory & risk_scale by default, others you can decide
    Returns:
        relavent guidlines as a string
    """
    result = self.vector_db_guidelines_for_actions.query(query_texts=topic, n_results=2)
    return "\n".join(result["documents"][0])

  def refresh_chroma_db_for_rag(self,org_guidelines_for_risk,org_guidelines_for_actions):
    self.embed_fn.document_mode = True
    self.vector_db_guidelines_for_risk.add(documents=org_guidelines_for_risk, ids=[str(i) for i in range(len(org_guidelines_for_risk))])
    self.vector_db_guidelines_for_actions.add(documents=org_guidelines_for_actions, ids=[str(i) for i in range(len(org_guidelines_for_actions))])
    print("Updated chroma db with latest organisation policy")
    self.embed_fn.document_mode = False

  def fix_unicode(self,data):
    return data.encode('utf-8').decode('utf-8', errors='ignore')

  def route_after_categorization(self,state: ModerationState) -> str:
    post = json.loads(state["categorized_post"])
    if (post["category"] in (
        PostAvilableCategory.SENSITIVE.value,
        PostAvilableCategory.OFFENSIVE_LANGUAGE.value,
        PostAvilableCategory.PRIORITY_SUPPORT.value,
        PostAvilableCategory.SPAM.value,
        PostAvilableCategory.ADULT.value)) :
        return "Sanitize"
    elif post["category"] in (
        PostAvilableCategory.HARASSMENT.value,
        PostAvilableCategory.THREAT.value
        ) :
        return "Normalize"
    else:
        return "JusticeNode"

# 🛡️ PostModerationEngine: Full Pipeline for Social Media Post Moderation
The PostModerationEngine class is the central engine that drives the entire moderation workflow in the Social Media Posts Monitor and Controller system. It ties together the different moderation services, orchestrating the entire process from post validation to user notification. Here’s a breakdown of its components and flow:

Service Initialization:
The MonitorService is instantiated to handle core moderation tasks like validating posts, categorizing them, and performing content normalization and sanitization.

Graph-Oriented Workflow:
The moderation flow is structured as a State Graph, where each node represents a critical step in the moderation pipeline:

* GateKeeper: Validates the user post, checking for necessary fields and formatting.
* InsightEngine: Uses AI models to classify the post based on category, tone, sentiment, and risk.
* Sanitize: Sanitizes the post for sensitive content when necessary.
* Normalize: Normalizes content to ensure clarity and prevent misunderstanding.
* JusticeNode: Determines whether any moderation actions are needed, like blocking or escalating the post.
* VoiceBox: Sends feedback to the user, notifying them of moderation outcomes.

Conditional Routing:
The flow uses conditional edges to dynamically determine the next step based on the post’s category (e.g., sensitive content may trigger sanitization).

Final Output:
The main method is the entry point to invoke the moderation flow. It takes a post_data string as input, runs it through the entire pipeline, and outputs the final feedback response for the user.

Graph Visualization:
The get_graph() method generates a visual representation of the moderation workflow using Mermaid diagrams, providing an easy-to-understand map of the entire process.

This structured approach ensures a dynamic and flexible moderation system that can handle a wide range of social media posts, making it suitable for real-time content moderation at scale.

In [11]:
#PostModerationEngine: Full Pipeline for Social Media Post Moderation
class PostModerationEngine:
    service = MonitorService()
    graph = StateGraph(ModerationState)
    
    graph.add_node("GateKeeper", service.validate_user_post)
    graph.add_node("InsightEngine", service.get_post_category)
    graph.add_node("Sanitize", service.sanitize_user_post)
    graph.add_node("Normalize", service.normalize_user_post)
    graph.add_node("JusticeNode", service.take_actions_based_on_analysis)
    graph.add_node("VoiceBox", service.review_action_applied_and_notify_user)
    
    graph.set_entry_point("GateKeeper")
    graph.add_edge("GateKeeper", "InsightEngine")
    graph.add_conditional_edges("InsightEngine", service.route_after_categorization)
    graph.add_edge("Sanitize", "JusticeNode")
    graph.add_edge("Normalize", "JusticeNode")
    graph.add_edge("JusticeNode", "VoiceBox")
    graph.add_edge("VoiceBox", END)
    graph
    social_media_monitor_app = graph.compile()

    @staticmethod
    def main(post_data:str):
        try:
          result = PostModerationEngine.social_media_monitor_app.invoke({"user_post": user_post})
          response = json.loads(result["final_output"])
          print("Post Response :\n"+response.get("final_response_to_user_on_post_submission")+"\n\n")
          PrintDataFromDataBase.print_post_data()
        except ValueError as e:
          print(e)
    @staticmethod
    def get_graph():
        from IPython.display import Image, display
        graph = PostModerationEngine.social_media_monitor_app.get_graph()
        display(Image(graph.draw_mermaid_png()))

Initializing the application
Initializing Gemini emdeding function
Initializing the database service
created required tables - done
Updated chroma db with latest organisation policy


📝 Testing the PostModerationEngine with a Sample Post
Let's run the PostModerationEngine on a real-world example to see how it handles moderation tasks such as post validation, categorization, and taking appropriate actions.

sample input = 

> {
> "user_name": "King",
> "post_txt": "Hello everyone, I am in india, what a hevenly land"
> }

sample output =

* post response :
* All Avilable Feed Post:

In [12]:
#Testing the PostModerationEngine with a Sample Post
user_post = """
{
  "user_name": "King",
  "post_txt": "Hello everyone, I am in india, what a hevenly land"
}
"""
PostModerationEngine.main(user_post)
# PostModerationEngine.get_graph()

Post Response :
Hey KING, your post is visible. Just a friendly reminder to keep content genuine and avoid anything that might trigger false flags. Let's keep the community vibes positive!


FEED POSTS
------------------------------------------------------------------------------------------------------------------------------------------------------
#1 😎[KING]
📌 CATEGORY:NORMAL & RISK:0.1 📜 Original post (no sanitization or normalization)

💡 Thought Shared:
	Hello everyone, I am in india, what a hevenly land

#️⃣#travel_india,#explore_india,#visit_india
------------------------------------------------------------------------------------------------------------------------------------------------------
