### Linkedin Content Generator


Title: LinkedIn Content Co-Pilot — research-grounded, on-brand post generator with human-in-the-loop approvals, analytics feedback, and safe auto-scheduling.

One-liner: Give it a topic → it researches, drafts multiple post variants (hook/body/CTA/hashtags), cites sources, enforces LinkedIn limits, checks tone & safety, and schedules via Buffer (or exports for manual post). Learns from your past high-performing posts to mimic your voice.

#### High-level architecture (local → cloud)

##### Local dev (fast to demo):

1. Streamlit UI for topic input + preview/approve + export.

2. LangGraph (orchestrates multi-step agent flow).

3. LLMs (Gemini / OpenAI) for writing & editing.

4. Retrieval tools:

   - Web search + page scraping (for facts) → embedded + stored to a small local vector DB (Chroma).

   - Personal “voice profile” built from your past top posts (CSV/Markdown drop-in).

5. Guardrails: moderation (LLM + rules), claim/URL check, plagiarism/dup-content check (embedding similarity to sources and past posts).

6. Outputs: Markdown/JSON/CVS export, optional Buffer API schedule, or Google Sheet handoff.

##### Production (optional):

1. AWS: Step Functions (orchestrate), Lambda (workers), S3 (artifacts), DynamoDB (posts + metrics), Secrets Manager (keys), EventBridge (cron), CloudWatch (logs).

2. Alternative: run daily on GitHub Actions + Render/Fly.io for the Streamlit UI.

#### The LangGraph workflow (agents & tools)

##### 1. Brief Planner

- Input: topic + desired audience + length + tone.

- Output: content brief (angle, 3 key points, working title, target outcomes).

##### 2. Researcher

- Tools: web search + HTML fetcher → chunk → embeddings → top-k evidence.

- Output: structured evidence with URLs + key facts + quotes.

##### 3. Factuality Checker

- Ensures each claim in the brief maps to evidence (or is clearly labeled as opinion).

- Flags weak/unsupported claims to revise.

##### 4. Writer

Drafts 2–3 variants:

  - Variant A: educational mini-essay (≤ 1,300 chars).

  - Variant B: hook-heavy listicle.

  - Variant C: story/anecdote format (optional).

All include: hook, body, CTA, 5–8 hashtags, 2–3 source links.

##### 5. Voice Styler

- Uses your “voice profile” (extracted from past posts: cadence, formality, emoji usage, average sentence length, buzzword tolerance).

- Rewrites to match.

##### 6. Compliance & Safety Gate

- Checks: length limits, profanity/toxicity, sensitive claims without sources, company NDAs, hallucination risk (no source → softened phrasing).

- Suggests fixes automatically.

##### 7. Editor

- Tightens verbs, removes hedging, ensures scannability (short paragraphs, bullets), adds 1 CTA, no more than 1 question, removes filler.

##### 8. Scorer

- Heuristics + small LLM rubric to score: hook strength, specificity, actionability, credibility signals, and skim-ability.

##### 9. A/B Selector + Scheduler

- Picks top 2 variants, staggers publish times (e.g., Tue/Thu 9:15 AM IST) via Buffer API (or exports to CSV/Markdown).

- UTM params for link CTAs.

##### 10. Analytics Feedback (post-hoc loop)

- Ingest likes/comments/impressions (manual CSV upload or Buffer/Shield export).

- Learns what features correlate with higher engagement → updates prompt knobs (e.g., “hooks with numbers outperform”).



#### Data & storage

/data/voice/: your previous posts (CSV/MD).

/data/cache/: scraped pages + chunked embeddings (Chroma).

/artifacts/: generated posts + evidence packs (JSON + MD).

#### Model strategy (practical + impressive)

1. Generation: Gemini 1.5 Flash (fast) or OpenAI GPT-4o-mini; configurable.

2. Editing/Refinement: same model with strict, short system prompts.

3. Retrieval: embeddings (bge/multilingual or OpenAI text-embedding-3-small) in Chroma.

4. Evaluation: small LLM judging + regex/rules for length & links.

5. Prompt Engineering:

   - ReAct style for research → claim-evidence table.

   - “Voice tokens” distilled from your history (e.g., avg sentence length, emoji rate, preferred sign-offs).

   - Style constraints for LinkedIn (line breaks, no hashtags mid-body, CTA at end).

#### Guardrails & compliance

1. LinkedIn API: Prefer official/compliant posting via Buffer/Hootsuite APIs. Avoid browser automation that violates ToS.

2. Attribution: Include 2–3 credible sources with explicit URLs.

3. No unverified stats: If model can’t find a source, soften claim (“In my experience…”).

4. Moderation: run a toxicity/offensive check + a “claims without sources” check.

5. Plagiarism/dup: cosine similarity against sources and your own corpus; flag if too high.

#### Analytics & A/B testing

1. Generate 2 variants per topic; schedule on different days/times.

2. Track metrics (ingest CSV exports).

3. Simple uplift analysis: per-feature regression or SHAP on handcrafted features (hook length, #numbers, #bullets, questions yes/no).

4. Close the loop: persist the best feature ranges into a small config.yaml used by prompts.

#### Deliverables

1. Streamlit app (demo video + live): topic → research preview → choose variants → schedule/export → see past performance.

2. CLI (copilot generate --topic "X" --schedule tomorrow 9:15).

3. LangGraph diagram auto-rendered as PNG.

4. MLflow runs for prompt variants (store prompts, scores, token cost).

5. Dockerfile + docker compose up.

6. Optional AWS deploy: Terraform/IaC snippets for core resources.

7. Unit tests for tools, and prompt regression tests with golden outputs.

8. README with security/keys, ToS notes, and a “what I’d improve next” section.

In [1]:
# graph_rag.py
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from pypdf import PdfReader
import re

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from langchain_core.messages import BaseMessage, HumanMessage
from dotenv import load_dotenv
#import google.generativeai as genai
from langchain_core.tools import tool
import os

#### Config Loader
app/config.py

In [2]:
# Load environment variables from the .env file
load_dotenv()

True

#### Tools Layer
Search Tool

app/tools/search_tool.py

In [58]:
from langchain_community.tools import DuckDuckGoSearchRun

def search_web_ddg(query: str) -> list[dict]:
    """Search the web using DuckDuckGo and return a list of results."""
    search_tool = DuckDuckGoSearchRun()
    results = search_tool.invoke(query)
    return results

In [59]:
search_web_ddg("AI engineer vs data scientist")

  with DDGS() as ddgs:


'No good DuckDuckGo Search Result was found'

#### Scraping Tool
app/tools/scrape_tool.py

In [32]:
import trafilatura

def scrape_url(url):
    """Download and extract main text from a web page."""
    downloaded = trafilatura.fetch_url(url)
    if downloaded:
        return trafilatura.extract(downloaded)
    return ""


In [33]:
scrape_url("https://towardsdatascience.com/langgraph-101-lets-build-a-deep-research-agent/")

'You need to consider how to orchestrate the multi-step workflow, keep track of the agents’ states, implement necessary guardrails, and monitor decision processes as they happen.\nFortunately, LangGraph addresses exactly those pain points for you.\nRecently, Google just demonstrated this perfectly by open-sourcing a full-stack implementation of a Deep Research Agent built with LangGraph and Gemini (with Apache-2.0 license).\nThis isn’t a toy implementation: the agent can not only search, but also dynamically evaluate the results to decide if more information is needed by doing further searches. This iterative workflow is exactly the kind of thing where LangGraph really shines.\nSo, if you want to learn how LangGraph works in practice, what better place to start than a real, working agent like this?\nHere’s our game plan for this tutorial post: We’ll adopt a “problem-driven” learning approach. Instead of starting with lengthy, abstract concepts, we’ll jump right into the code and examin

#### Vector Store
app/tools/vectorstore.py

In [5]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import chromadb
from project_files.configs import GEMINI_API_KEY, VECTOR_DB_DIR, MODEL_NAME

def get_vectorstore():
    embeddings = GoogleGenerativeAIEmbeddings(model=MODEL_NAME, google_api_key=GEMINI_API_KEY)
    client = chromadb.PersistentClient(path=VECTOR_DB_DIR)
    return client, embeddings


#### Moderation Tool
app/tools/moderation.py

In [34]:
def check_toxicity(text):
    """Basic keyword filter (extend with LLM for production)."""
    bad_words = ["hate", "violence", "racist"]
    return any(bad in text.lower() for bad in bad_words)


In [35]:
check_toxicity("I hate this product, it's terrible!")

True

#### Voice Profile Loader
app/tools/voice_profile.py

In [7]:
import pandas as pd
from project_files.configs import VOICE_PROFILE_PATH

def load_voice_profile():
    df = pd.read_csv(VOICE_PROFILE_PATH)
    all_text = " ".join(df["post_text"].dropna().tolist())
    return all_text[:3000]  # Keep a manageable token count


In [36]:
load_voice_profile()

"Just published my latest article on AI in healthcare! Excited to hear your thoughts. #AI #Healthcare Data storytelling is key. Always visualize your insights clearly. #DataScience #Visualization Had a fantastic discussion on MLOps best practices today. Key takeaway: automate testing! #MLOps #MachineLearning Exploring RAG (Retrieval-Augmented Generation) techniques for enterprise knowledge management. #RAG #AI Remember: metrics are only valuable if they guide action. Focus on impact, not vanity. #Analytics #Business Thrilled to announce our team's latest project in NLP! More updates soon. #NLP #Innovation Networking is not about collecting contacts, it's about building relationships. #CareerGrowth #LinkedInTips Python tips: list comprehensions can simplify your code and improve readability. #Python #Tips Deep learning models are powerful, but understanding your data is still crucial. #MachineLearning #Data Agile methodology works best when the whole team is aligned on goals and priorit

#### Agents Layer
Planner

app/agents/planner.py

In [8]:
from langchain_google_genai import ChatGoogleGenerativeAI
from project_files.configs import GEMINI_API_KEY, MODEL_NAME

def create_brief(topic, audience="Data professionals"):
    llm = ChatGoogleGenerativeAI(model=MODEL_NAME, google_api_key=GEMINI_API_KEY)
    prompt = f"""
    Create a LinkedIn content brief for the topic: {topic}.
    Audience: {audience}
    Include: angle, 3 key points, suggested title, desired outcome.
    """
    return llm.invoke(prompt).content


In [37]:
create_brief("AI engineer vs data scientist")

'Here\'s a LinkedIn content brief for the topic "AI Engineer vs. Data Scientist," tailored for data professionals:\n\n---\n\n## LinkedIn Content Brief: AI Engineer vs. Data Scientist\n\n**Audience:** Data professionals (Data Scientists, ML Engineers, Data Analysts, Data Engineers, Technical Leads, Hiring Managers in data-driven organizations)\n\n---\n\n**Angle:**\n**"Beyond the Buzzwords: Demystifying the distinct yet highly complementary roles of AI Engineers and Data Scientists to help data professionals navigate career paths, optimize team structures, and understand the modern AI/ML landscape."**\n\nThis angle aims to clarify confusion, highlight the value of both roles, and provide actionable insights for career development and team building.\n\n---\n\n**3 Key Points:**\n\n1.  **Core Focus & Deliverables:**\n    *   **Data Scientist:** Primarily focused on *discovery, insights, and model prototyping*. They identify business problems, explore data, develop hypotheses, build experime

#### Researcher
app/agents/researcher.py

In [60]:
#from app.tools.search_tool import search_web
#from app.tools.scrape_tool import scrape_url

def gather_evidence(topic):
    results = search_web_ddg(topic)
    # evidence = []
    # for r in results:
    #     text = scrape_url(r["url"])
    #     if text:
    #         evidence.append({"url": r["url"], "snippet": text[:500]})
    return results


In [61]:
gather_evidence("AI engineer vs data scientist")

  with DDGS() as ddgs:


"Aug 6, 2025 · Data Scientist vs. AI Engineer This article aims to delineate the distinctions and overlaps between Data Scientists … Jun 17, 2025 · Confused between becoming an AI Engineer or a Data Scientist? Discover career growth, required skills, salary … Mar 4, 2025 · AI engineer vs. data scientist: What's the difference? AI engineers and data scientists both shape AI projects, … May 20, 2025 · Explore the distinct roles of AI engineers and data scientists, their key differences, and the unique career paths … Mar 12, 2025 · AI engineers focus on building AI-powered applications, while data scientists focus on data analysis and …"

#### Writer
app/agents/writer.py

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI
from project_files.configs import GEMINI_API_KEY, MODEL_NAME

def draft_posts(topic, evidence, voice_profile):
    llm = ChatGoogleGenerativeAI(model=MODEL_NAME, google_api_key=GEMINI_API_KEY)
    prompt = f"""
    Write 2 LinkedIn post drafts on "{topic}" for tech audience.
    Use evidence below and match voice style:
    Voice profile: {voice_profile}
    Evidence: {evidence}

    Output format:
    - Variant A: Hook, Body, CTA, Hashtags
    - Variant B: Hook, Body, CTA, Hashtags
    """
    return llm.invoke(prompt).content


In [43]:
output = draft_posts("AI engineer vs data scientist",
              gather_evidence("AI engineer vs data scientist"), 
                load_voice_profile())

  with DDGS() as ddgs:


In [45]:
print(output)

Here are two LinkedIn post drafts comparing "AI engineer vs data scientist":

---

**- Variant A:**

*   **Hook:** Confused about the difference between an #AIEngineer and a #DataScientist? You're not alone! While both roles are crucial in shaping AI projects, their core focus areas are quite distinct.
*   **Body:** From my discussions and recent reads, it's clear: AI Engineers primarily focus on *building and deploying* AI-powered applications, often integrating closely with software development. Data Scientists, on the other hand, dive deep into *data analysis and predictive modeling*, extracting insights and informing decisions. Think of it as the builder versus the analyst – both essential for a successful AI ecosystem. Understanding your data is crucial, but so is bringing those models to life!
*   **CTA:** Which path resonates more with your skills and career aspirations? Share your thoughts or experiences below!
*   **Hashtags:** #AI #DataScience #MachineLearning #CareerGrowth #

#### Compliance
app/agents/compliance.py

In [11]:
#from project_files.tools.moderation import check_toxicity

def compliance_check(post_text):
    issues = []
    if len(post_text) > 1300:
        issues.append("Exceeds LinkedIn char limit")
    if check_toxicity(post_text):
        issues.append("Potentially toxic language detected")
    return issues


In [46]:
output.split("Variant")

['Here are two LinkedIn post drafts comparing "AI engineer vs data scientist":\n\n---\n\n**- ',
 " A:**\n\n*   **Hook:** Confused about the difference between an #AIEngineer and a #DataScientist? You're not alone! While both roles are crucial in shaping AI projects, their core focus areas are quite distinct.\n*   **Body:** From my discussions and recent reads, it's clear: AI Engineers primarily focus on *building and deploying* AI-powered applications, often integrating closely with software development. Data Scientists, on the other hand, dive deep into *data analysis and predictive modeling*, extracting insights and informing decisions. Think of it as the builder versus the analyst – both essential for a successful AI ecosystem. Understanding your data is crucial, but so is bringing those models to life!\n*   **CTA:** Which path resonates more with your skills and career aspirations? Share your thoughts or experiences below!\n*   **Hashtags:** #AI #DataScience #MachineLearning #Caree

In [48]:
output.split("Variant")[1].strip()

"A:**\n\n*   **Hook:** Confused about the difference between an #AIEngineer and a #DataScientist? You're not alone! While both roles are crucial in shaping AI projects, their core focus areas are quite distinct.\n*   **Body:** From my discussions and recent reads, it's clear: AI Engineers primarily focus on *building and deploying* AI-powered applications, often integrating closely with software development. Data Scientists, on the other hand, dive deep into *data analysis and predictive modeling*, extracting insights and informing decisions. Think of it as the builder versus the analyst – both essential for a successful AI ecosystem. Understanding your data is crucial, but so is bringing those models to life!\n*   **CTA:** Which path resonates more with your skills and career aspirations? Share your thoughts or experiences below!\n*   **Hashtags:** #AI #DataScience #MachineLearning #CareerGrowth #TechJobs #AIvsDS\n\n---\n\n**-"

In [49]:
compliance_check(output.split("Variant")[1].strip())

[]

#### Editor
app/agents/editor.py

In [50]:
from langchain_google_genai import ChatGoogleGenerativeAI
from project_files.configs import GEMINI_API_KEY, MODEL_NAME

def refine_post(post_text):
    llm = ChatGoogleGenerativeAI(model=MODEL_NAME, google_api_key=GEMINI_API_KEY)
    prompt = f"Edit the following LinkedIn post for clarity, conciseness, and impact:\n{post_text}"
    return llm.invoke(prompt).content


#### LangGraph Orchestration
app/graph.py

In [62]:
# from project_files.agents.planner import create_brief
# from project_files.agents.researcher import gather_evidence
# from project_files.agents.writer import draft_posts
# from project_files.agents.editor import refine_post
# from project_files.agents.compliance import compliance_check
# from project_files.tools.voice_profile import load_voice_profile

def generate_linkedin_posts(topic):
    voice_profile = load_voice_profile()
    brief = create_brief(topic)
    evidence = gather_evidence(topic)
    drafts = draft_posts(topic, evidence, voice_profile)
    
    final_variants = []
    for variant in drafts.split("Variant"):
        if not variant.strip():
            continue
        issues = compliance_check(variant)
        if issues:
            variant += f"\n\n⚠ Issues: {issues}"
        refined = refine_post(variant)
        final_variants.append(refined)
    
    return {
        "brief": brief,
        "evidence": evidence,
        "variants": final_variants
    }


#### Model - Testing

In [63]:
result = generate_linkedin_posts("AI engineer vs data scientist")

  with DDGS() as ddgs:


In [64]:
print("Content Brief:")
print(result["brief"])

Content Brief:
Here's a LinkedIn content brief for the topic "AI Engineer vs. Data Scientist":

---

## LinkedIn Content Brief: AI Engineer vs. Data Scientist

**Topic:** AI Engineer vs. Data Scientist: Understanding the Nuances for Career & Team Strategy

**Audience:** Data professionals (Data Scientists, ML Engineers, Analysts, Data Leaders, Recruiters in the data space). They are familiar with the concepts but seek clarity on specialization, career progression, and team structure.

---

**Angle:**
It's not about which role is "better," but about understanding their distinct focuses, skillsets, and where they fit within the AI/ML lifecycle. This post will demystify the common confusion, helping individuals plot their career paths and organizations build more effective data teams. We'll highlight how these roles are complementary, not competitive, and essential for taking AI from concept to production.

---

**3 Key Points:**

1.  **Beyond the Hype: Core Responsibilities & Impact:**
 

In [65]:
print("\nEvidence:")


Evidence:


In [66]:
print("\nDraft Variants:")
for i, variant in enumerate(result["variants"], 1):
    print(f"\nVariant {i}:\n{variant}\n")


Draft Variants:

Variant 1:
Okay, since the original drafts were not provided, I will create two *new* drafts based on the common distinctions between AI Engineer and Data Scientist roles, aiming for clarity, conciseness, and impact for a tech audience.

Here are two options, choose the one that best fits your immediate goal or combine elements:

---

### **Option 1: Focus on "Which Path is Right For You?"**

Navigating the tech landscape, it's easy to confuse the **AI Engineer** and **Data Scientist** roles. While both work with data and AI, their primary focuses are distinct yet complementary. Let's break it down:

**📊 Data Scientist:**
The master of insights. Data Scientists focus on **exploratory data analysis, statistical modeling, and predictive analytics** to uncover patterns and answer complex business questions. They're about the "why" and "what if," using data to drive strategic decisions.
*   **Key skills:** Statistics, machine learning algorithms, SQL, data visualization, 