🚀 Fake News Detection Prompt Optimization

The goal of Fake News Detection Prompt Optimization is to improve the effectiveness of the LLM’s response by: \
✅ Providing structured context from RAG \
✅ Clearly defining the linguistic analysis metrics used \
✅ Guiding the LLM to make well-reasoned credibility assessments

📌 Script: Formatting a RAG-to-LLM Prompt for Fake News Detection

This script structures the prompt sent to the LLM, ensuring it understands: \
🔹 The original news article \
🔹 The linguistic analysis results (TF-IDF outliers, sentiment, readability, etc.) \
🔹 Fact-checking guidance (what the LLM should evaluate)

In [1]:
import json

def format_fake_news_prompt(article):
    """
    Formats a structured prompt for LLM fake news detection.
    The prompt includes RAG-retrieved context, linguistic analysis, and evaluation instructions.
    """

    # ✅ Extract Key Article Information
    title = article["title"]
    url = article["url"]
    published_date = article.get("published_date", "Unknown date")
    source_name = article.get("source_name", "Unknown source")
    author = article.get("author", "Unknown author")
    category = article.get("category", "Unknown category")
    content = article["content"]

    # ✅ Extract Linguistic Analysis
    linguistic_analysis = article["linguistic_analysis"]
    summary = linguistic_analysis.get("summary", "No summary available.")
    tfidf_outliers = ", ".join(linguistic_analysis["tfidf_outliers"])
    grammar_errors = linguistic_analysis["grammar_errors"]
    sentence_count = linguistic_analysis["sentence_count"]
    sentiment_polarity = linguistic_analysis["sentiment_polarity"]
    sentiment_subjectivity = linguistic_analysis["sentiment_subjectivity"]

    # ✅ Construct Prompt
    prompt = f"""
    You are a fact-checking AI analyzing the credibility of a news article. Below is the structured information:
    
    📰 **Article Information:**
    - **Title:** {title}
    - **URL:** {url}
    - **Published Date:** {published_date}
    - **Source:** {source_name}
    - **Author:** {author}
    - **Category:** {category}

    🔹 **Article Summary (Generated via BERT):**
    "{summary}"

    📊 **Linguistic Analysis:**
    - **TF-IDF Outlier Keywords (Unique/Unusual Words):** {tfidf_outliers}
    - **Grammar Issues:** {grammar_errors} errors
    - **Sentence Count:** {sentence_count}
    - **Sentiment Analysis:** 
        - **Polarity (Scale -1 to 1):** {sentiment_polarity}
        - **Subjectivity (Scale 0 to 1, higher = opinionated):** {sentiment_subjectivity}

    🎯 **Your Task:**
    1️⃣ **Assess the credibility of this article** based on the provided content and linguistic analysis.  
    2️⃣ **Use the TF-IDF outlier words** to determine if the article contains **unusual phrasing or misleading language.**  
    3️⃣ **Analyze sentiment:** Does the emotional tone suggest bias, fear-mongering, or objectivity?  
    4️⃣ **Evaluate readability & grammar:** Is the article professionally written, or does it contain errors typical of misinformation?  
    5️⃣ **Compare against reliable sources** if possible, to determine factual accuracy.  

    🏆 **Final Response Format:**
    - **Credibility Score:** (Scale 0-100, where 100 = totally credible, 0 = completely false)
    - **Verdict:** (Choose one: "True", "False", or "Misleading")
    - **Explanation:** (2-3 sentences summarizing why you assigned this rating)
    """

    return prompt


📝 How This Improves Fake News Detection Prompts

✅ Structured Information → The LLM gets article metadata + analysis results clearly.
✅ Contextual Guidance → The LLM knows what to check (keywords, sentiment, grammar).
✅ Standardized Response Format → Ensures consistent evaluation across articles.
✅ Fact-Checking Instructions → LLM is instructed to verify against known sources.

📰 **Article Information:**
- **Title:** "NASA to Launch Mission to Mars in 2025"
- **URL:** https://example.com/nasa-mars-mission
- **Published Date:** February 12, 2024
- **Source:** SpaceNews
- **Author:** John Doe
- **Category:** Science

🔹 **Article Summary (Generated via BERT):**
"NASA has announced a mission to Mars in 2025, with plans to deploy a new rover for data collection."

📊 **Linguistic Analysis:**
- **TF-IDF Outlier Keywords:** "Mars2025, deep-space, propulsion, roverX"
- **Grammar Issues:** 3 errors
- **Sentence Count:** 24
- **Sentiment Analysis:**
    - **Polarity:** 0.2 (Neutral)
    - **Subjectivity:** 0.4 (Moderately Objective)

🎯 **Your Task:**
1️⃣ **Assess credibility** based on linguistic and factual analysis.  
2️⃣ **Check TF-IDF words** for unusual terminology.  
3️⃣ **Analyze sentiment & grammar.**  
4️⃣ **Compare against known sources.**  

🏆 **Final Response Format:**
- **Credibility Score:** (Scale 0-100)
- **Verdict:** "True" / "False" / "Misleading"
- **Explanation:** 2-3 sentences



## Next Steps for Refinement:
Task	 ------------  Improvement \
✅ Test Prompt Variations -- Try different instruction formats to see which generates the best LLM responses. \
✅ Expand Fact-Checking Instructions	-- Include reliable sources for LLM verification (e.g., Snopes, FactCheck.org). \
✅ Track LLM Response Quality -- Store LLM-generated credibility scores in ChromaDB for further analysis. \
✅ Incorporate External Knowledge -- Augment with retrieved facts from Google Fact Check API before LLM evaluation. \


🚀 Final Thoughts

This optimized prompt ensures the LLM receives high-quality, structured input, improving the accuracy and consistency of fake news detection. By integrating linguistic analysis + RAG context, we enhance factual verification while ensuring scalable, structured AI fact-checking.

🔥 Now ready to test and iterate! 🚀🔥