### FACT-CHECKING PROMT (**PROMT ENGINEERING**)
#### 01. PROMPT_01

import json

def construct_fact_checking_prompt(enriched_entry):
    """
    Constructs a well-formatted prompt for fact-checking using DeepSeek LLM.
    
    Args:
        enriched_entry (dict): A dictionary containing article metadata, enriched content, and linguistic analysis.
    
    Returns:
        str: A formatted string prompt for the LLM.
    """
    
    # Extract relevant fields from enriched_entry
    title = enriched_entry.get("title", "Unknown Title")
    url = enriched_entry.get("url", "Unknown URL")
    published_date = enriched_entry.get("published_date", "Unknown Date")
    source_name = enriched_entry.get("source", "Unknown Source")
    author = enriched_entry.get("author", "Unknown Author")
    category = enriched_entry.get("category", "Unknown Category")
    summary = enriched_entry.get("enriched_content", "No summary available")[:500]  # Truncate if too long
    
    tfidf_outliers = json.loads(enriched_entry.get("TF-IDF Outliers", "[]"))  # Convert back to list
    tfidf_outliers_str = ", ".join(tfidf_outliers) if tfidf_outliers else "None"
    
    grammar_errors = enriched_entry.get("Grammar Errors", 0)
    sentence_count = enriched_entry.get("Sentence Count", 0)
    sentiment_polarity = enriched_entry.get("Sentiment Polarity", "Unknown")
    sentiment_subjectivity = enriched_entry.get("Sentiment Subjectivity", "Unknown")
    fact_checking_summary = enriched_entry.get("fact_checking_summary", "No fact-checking data available.")


    # Construct the prompt_01
    prompt = f"""
    You are a fact-checking AI analyzing the credibility of a news article. Below is the structured information:
    
    📰 **Article Information:**
    - **Title:** {title}
    - **URL:** {url}
    - **Published Date:** {published_date}
    - **Source:** {source_name}
    - **Author:** {author}
    - **Category:** {category}

    🔹 **Article Summary (Extracted via AI):**
    "{summary}"
    
    📊 **Linguistic Analysis:**
    - **TF-IDF Outlier Keywords (Unique/Unusual Words):** {tfidf_outliers_str}
    - **Grammar Issues:** {grammar_errors} errors
    - **Sentence Count:** {sentence_count}
    - **Sentiment Analysis:** 
        - **Polarity (Scale -1 to 1):** {sentiment_polarity}
        - **Subjectivity (Scale 0 to 1, higher = opinionated):** {sentiment_subjectivity}

    🎯 **Your Task:**
    1️⃣ **Assess the credibility of this article** based on the provided content and linguistic analysis.  
    2️⃣ **Use the TF-IDF outlier words** to determine if the article contains **unusual phrasing or misleading language.**  
    3️⃣ **Analyze sentiment:** Does the emotional tone suggest bias, fear-mongering, or objectivity?  
    4️⃣ **Evaluate readability & grammar:** Is the article professionally written, or does it contain errors typical of misinformation?  
    5️⃣ **Compare against reliable sources** if possible, to determine factual accuracy.  
    
    🏆 **Final Response Format:**
    - **Credibility Score:** (Scale 0-100, where 100 = totally credible, 0 = completely false)
    - **Verdict:** (Choose one: "True", "False", or "Misleading")
    - **Explanation:** (2-3 sentences summarizing why you assigned this rating)
    """
    
    return prompt


# Example Usage
example_entry = {
    "title": "Breaking News: AI Solves World Hunger",
    "url": "https://news.example.com/ai-hunger",
    "published_date": "2025-02-18",
    "source": "Example News",
    "author": "John Doe",
    "category": "Technology",
    "enriched_content": "AI has made significant advancements... (summary content here)...\n\nFact-Checking Data:\n- Verified by multiple sources",
    "TF-IDF Outliers": json.dumps(["AI", "breakthrough", "hunger crisis"]),
    "Grammar Errors": 2,
    "Sentence Count": 25,
    "Sentiment Polarity": 0.7,
    "Sentiment Subjectivity": 0.4,
    "fact_checking_summary": "Verified by multiple sources."
}

# Generate the prompt
prompt_text = construct_fact_checking_prompt(example_entry)
print(prompt_text)


#### 02. PROMPT_02

import json

def construct_fact_checking_prompt(enriched_entry):
    """
    Constructs a well-formatted prompt for fact-checking using DeepSeek LLM.
    
    Args:
        enriched_entry (dict): A dictionary containing article metadata, enriched content, and linguistic analysis.
    
    Returns:
        str: A formatted string prompt for the LLM.
    """
    
    # Extract relevant fields from enriched_entry
    title = enriched_entry.get("title", "Unknown Title")
    url = enriched_entry.get("url", "Unknown URL")
    published_date = enriched_entry.get("published_date", "Unknown Date")
    source_name = enriched_entry.get("source", "Unknown Source")
    author = enriched_entry.get("author", "Unknown Author")
    category = enriched_entry.get("category", "Unknown Category")
    summary = enriched_entry.get("enriched_content", "No summary available")[:500]  # Truncate if too long
    
    tfidf_outliers = json.loads(enriched_entry.get("TF-IDF Outliers", "[]"))  # Convert back to list
    tfidf_outliers_str = ", ".join(tfidf_outliers) if tfidf_outliers else "None"
    
    grammar_errors = enriched_entry.get("Grammar Errors", 0)
    sentence_count = enriched_entry.get("Sentence Count", 0)
    sentiment_polarity = enriched_entry.get("Sentiment Polarity", "Unknown")
    sentiment_subjectivity = enriched_entry.get("Sentiment Subjectivity", "Unknown")
    fact_checking_summary = enriched_entry.get("fact_checking_summary", "No fact-checking data available.")


    
  # Construct the prompt_02
    prompt = f"""
    You are a fact-checking AI analyzing the credibility of a news article. Below is the structured information:
    
    📰 **Article Information:**
    - **Title:** {title}
    - **URL:** {url}
    - **Published Date:** {published_date}
    - **Source:** {source_name}
    - **Author:** {author}
    - **Category:** {category}

    🔹 **Article Summary (Extracted via AI):**
    "{summary}"
    
    📊 **Linguistic Analysis:**
    - **TF-IDF Outlier Keywords (Unique/Unusual Words):** {tfidf_outliers_str}
    - **Grammar Issues:** {grammar_errors} errors
    - **Sentence Count:** {sentence_count}
    - **Sentiment Analysis:** 
        - **Polarity (Scale -1 to 1):** {sentiment_polarity}
        - **Subjectivity (Scale 0 to 1, higher = opinionated):** {sentiment_subjectivity}

 ## 🎯 Task: Evaluate Credibility and Truthfulness  
    Based on the provided information, conduct a critical analysis following these points:  

    1️⃣ **Credibility Assessment:**  
       - Is the source reliable?  
       - Does the author have legitimate or recognized credentials in the field?  
       - Does the article follow a logical and professional structure, or does it appear poorly written?  

    2️⃣ **Detection of Misleading or Sensationalist Language:**  
       - Analyze the unusual words detected by TF-IDF. Are these terms uncommon in serious journalism?  
       - Does the article use exaggerated language to manipulate the reader’s emotions?  

    3️⃣ **Bias and Subjectivity:**  
       - Does the content appear neutral, or does it attempt to influence the reader’s opinion?  
       - Are there phrases that exaggerate, alarm, or contain subjective judgments?  

    4️⃣ **Verification with Other Sources:**  
       - If a key fact is mentioned, is there verifiable evidence from reliable sources?  
       - Are there missing expert citations or solid references?  

    5️⃣ **Linguistic Quality:**  
       - Does the text contain unusual grammatical errors for legitimate news articles?  
       - Does it appear to be an automatically generated or poorly translated text?  

    ---  
    
    ## 📌 **Expected Response Format**  
    - **Credibility Score (0-100):** (100 = Fully credible, 0 = Completely false)  
    - **Final Verdict:** ("True", "False", or "Misleading")  
    - **Detailed Explanation (3-5 sentences):** Justify the evaluation based on findings.  

    ⚠️ **If the information is insufficient, indicate that more context or additional sources are needed.**  
    """
    
    return prompt

# Example Usage
example_entry = {
    "title": "Breaking News: AI Solves World Hunger",
    "url": "https://news.example.com/ai-hunger",
    "published_date": "2025-02-18",
    "source": "Example News",
    "author": "John Doe",
    "category": "Technology",
    "enriched_content": "AI has made significant advancements... (summary content here)...\n\nFact-Checking Data:\n- Verified by multiple sources",
    "TF-IDF Outliers": json.dumps(["AI", "breakthrough", "hunger crisis"]),
    "Grammar Errors": 2,
    "Sentence Count": 25,
    "Sentiment Polarity": 0.7,
    "Sentiment Subjectivity": 0.4,
    "fact_checking_summary": "Verified by multiple sources."
}

# Generate the prompt
prompt_text = construct_fact_checking_prompt(example_entry)
print(prompt_text)


### 03. PROMPT_03

In [1]:
import json

def construct_fact_checking_prompt(enriched_entry):
    """
    Constructs a well-formatted prompt for fact-checking using DeepSeek LLM.
    
    Args:
        enriched_entry (dict): A dictionary containing article metadata, enriched content, and linguistic analysis.
    
    Returns:
        str: A formatted string prompt for the LLM.
    """
    
    # Extract relevant fields from enriched_entry
    title = enriched_entry.get("title", "Unknown Title")
    url = enriched_entry.get("url", "Unknown URL")
    published_date = enriched_entry.get("published_date", "Unknown Date")
    source_name = enriched_entry.get("source", "Unknown Source")
    author = enriched_entry.get("author", "Unknown Author")
    category = enriched_entry.get("category", "Unknown Category")
    summary = enriched_entry.get("enriched_content", "No summary available")[:500]  # Truncate if too long
    
    tfidf_outliers = json.loads(enriched_entry.get("TF-IDF Outliers", "[]"))  # Convert back to list
    tfidf_outliers_str = ", ".join(tfidf_outliers) if tfidf_outliers else "None"
    
    grammar_errors = enriched_entry.get("Grammar Errors", 0)
    sentence_count = enriched_entry.get("Sentence Count", 0)
    sentiment_polarity = enriched_entry.get("Sentiment Polarity", "Unknown")
    sentiment_subjectivity = enriched_entry.get("Sentiment Subjectivity", "Unknown")
    fact_checking_summary = enriched_entry.get("fact_checking_summary", "No fact-checking data available.")


    
  # Construct the prompt_02
    prompt = f"""
    You are a fact-checking AI analyzing the credibility of a news article. Below is the structured information:
    
    📰 **Article Information:**
    - **Title:** {title}
    - **URL:** {url}
    - **Published Date:** {published_date}
    - **Source:** {source_name}
    - **Author:** {author}
    - **Category:** {category}

    🔹 **Article Summary (Extracted via AI):**
    "{summary}"
    
    📊 **Linguistic Analysis:**
    - **TF-IDF Outlier Keywords (Unique/Unusual Words):** {tfidf_outliers_str}
    - **Grammar Issues:** {grammar_errors} errors
    - **Sentence Count:** {sentence_count}
    - **Sentiment Analysis:** 
        - **Polarity (Scale -1 to 1):** {sentiment_polarity}
        - **Subjectivity (Scale 0 to 1, higher = opinionated):** {sentiment_subjectivity}

 ## 🎯  Task: Evaluate Credibility and Truthfulness
    Based on the provided information, conduct a critical analysis following these detailed tasks:

    1. Source Credibility Analysis:
       - Assess the reliability of the source and its reputation in journalism.
       - Identify any potential conflicts of interest or biases within the source.
       - Determine if the author has known expertise or credentials in the subject matter.

    2. Content Verification:
       - Cross-check key claims with verifiable and authoritative sources.
       - Identify any exaggerations, misleading statements, or unverified claims.
       - Evaluate if the article presents evidence and factual backing for its assertions.

    3. Detection of Misleading or Sensationalist Language:
       - Analyze the tone and wording of the article to identify emotional manipulation.
       - Assess the presence of exaggerated or alarmist phrases that may indicate bias.
       - Determine whether the article includes balanced viewpoints or only presents one-sided perspectives.

    4. Bias and Subjectivity Evaluation:
       - Determine whether the article contains subjective language or ideological framing.
       - Identify any patterns of bias based on the article's structure, language, and omitted information.
       - Consider whether the article serves an agenda beyond objective reporting.

    5. External Source Comparison:
       - Check if other reputable news sources report on the same topic and whether their coverage aligns.
       - Identify inconsistencies in reporting across different sources.
       - Evaluate whether primary sources (official statements, research papers, etc.) support the claims made.

    6. Quality of Writing and Presentation:
       - Analyze the grammatical accuracy and coherence of the text.
       - Identify any unusual phrasing that may indicate automatic content generation or poor translation.
       - Determine if the article follows standard journalistic practices in terms of citations and formatting.

    ---
    
    Expected Response Format:
    - Credibility Score (0-100): (100 = Fully credible, 0 = Completely false)
    - Final Verdict: ("True", "False", or "Misleading")
    - Detailed Explanation (4-6 sentences): Justify the evaluation based on findings, referencing key factors from the analysis.

    If the information is insufficient to determine credibility, specify what additional context or sources would be necessary to reach a conclusive assessment.
    """
    
    return prompt

# Example Usage
example_entry = {
    "title": "Breaking News: AI Solves World Hunger",
    "url": "https://news.example.com/ai-hunger",
    "published_date": "2025-02-18",
    "source": "Example News",
    "author": "John Doe",
    "category": "Technology",
    "enriched_content": "AI has made significant advancements... (summary content here)...\n\nFact-Checking Data:\n- Verified by multiple sources",
    "TF-IDF Outliers": json.dumps(["AI", "breakthrough", "hunger crisis"]),
    "Grammar Errors": 2,
    "Sentence Count": 25,
    "Sentiment Polarity": 0.7,
    "Sentiment Subjectivity": 0.4,
    "fact_checking_summary": "Verified by multiple sources."
}

# Generate the prompt
prompt_text = construct_fact_checking_prompt(example_entry)
print(prompt_text)


    You are a fact-checking AI analyzing the credibility of a news article. Below is the structured information:
    
    📰 **Article Information:**
    - **Title:** Breaking News: AI Solves World Hunger
    - **URL:** https://news.example.com/ai-hunger
    - **Published Date:** 2025-02-18
    - **Source:** Example News
    - **Author:** John Doe
    - **Category:** Technology

    🔹 **Article Summary (Extracted via AI):**
    "AI has made significant advancements... (summary content here)...

Fact-Checking Data:
- Verified by multiple sources"
    
    📊 **Linguistic Analysis:**
    - **TF-IDF Outlier Keywords (Unique/Unusual Words):** AI, breakthrough, hunger crisis
    - **Grammar Issues:** 2 errors
    - **Sentence Count:** 25
    - **Sentiment Analysis:** 
        - **Polarity (Scale -1 to 1):** 0.7
        - **Subjectivity (Scale 0 to 1, higher = opinionated):** 0.4

 ## 🎯  Task: Evaluate Credibility and Truthfulness
    Based on the provided information, conduct a critical anal

#### SEND PROMP TO THE MODEL

In [2]:
import ollama

# Generate the fact-checking prompt
prompt_text = construct_fact_checking_prompt(example_entry)

# Send the prompt to DeepSeek LLM using Ollama
response = ollama.chat(
    model="deepseek-r1",
    messages=[{"role": "user", "content": prompt_text}]
)

# Print the response
print(response['message']['content'])


<think>
Okay, so I need to evaluate the credibility and truthfulness of this news article. Let me go through each task step by step based on the provided data.

First, Source Credibility Analysis: The source is "Example News" from 2025-02-18. It's an example source, which probably means it's not a real news outlet but used for testing. Since I don't know its reputation or if it has any bias, this could be a red flag. The author is John Doe, who isn't identified as an expert in AI or technology, so that might mean he doesn't have the credentials to speak on such a complex topic.

Next, Content Verification: The article claims AI has solved world hunger, which seems too dramatic and likely exaggerated. AI making such a significant breakthrough is probably overhyped unless it's from a peer-reviewed source. I can't verify this because there are no external sources provided or citations given in the summary.

Detection of Misleading or Sensationalist Language: The title uses "Breaking News"

#### AUTOMATES PROCESS AND SAVE RESULTS

In [3]:
import json
import ollama
import chromadb  # Ensure chromadb is installed and properly set up
import os
from datetime import datetime

def fetch_entries_from_chromadb(collection_name="news_articles", max_entries=5):
    """
    Fetches the latest enriched entries from ChromaDB.

    Args:
        collection_name (str): The ChromaDB collection to query.
        max_entries (int): The maximum number of entries to fetch.

    Returns:
        list: A list of retrieved entries.
    """
    client = chromadb.PersistentClient(path="./chroma_db")  # Update path as needed
    collection = client.get_collection(name=collection_name)
    
    # Retrieve latest X entries
    results = collection.get(include=["metadatas", "documents"], limit=max_entries)

    # Process entries into required format
    entries = []
    for i in range(len(results["documents"])):
        entry = {
            "title": results["metadatas"][i].get("title", "Unknown Title"),
            "url": results["metadatas"][i].get("url", "Unknown URL"),
            "published_date": results["metadatas"][i].get("published_date", "Unknown Date"),
            "source": results["metadatas"][i].get("source", "Unknown Source"),
            "author": results["metadatas"][i].get("author", "Unknown Author"),
            "category": results["metadatas"][i].get("category", "Unknown Category"),
            "enriched_content": results["documents"][i],
            "TF-IDF Outliers": json.dumps(results["metadatas"][i].get("tfidf_outliers", [])),
            "Grammar Errors": results["metadatas"][i].get("grammar_errors", 0),
            "Sentence Count": results["metadatas"][i].get("sentence_count", 0),
            "Sentiment Polarity": results["metadatas"][i].get("sentiment_polarity", "Unknown"),
            "Sentiment Subjectivity": results["metadatas"][i].get("sentiment_subjectivity", "Unknown"),
            "fact_checking_summary": results["metadatas"][i].get("fact_checking_summary", "No fact-checking data available."),
        }
        entries.append(entry)
    
    return entries

def construct_fact_checking_prompt(entry):
    """
    Constructs a fact-checking prompt for the LLM.

    Args:
        entry (dict): A dictionary containing article details.

    Returns:
        str: The formatted prompt.
    """
    return f"Analyze the following news article:\n\nTitle: {entry['title']}\nSource: {entry['source']}\nPublished: {entry['published_date']}\nContent: {entry['enriched_content']}\n\nIs this news article reliable? Provide a fact-checking analysis."

def get_unique_filename(base_path="PROMT_RESULTS"):
    """
    Generates a unique filename with the format: Prompt_Return_YYYY-MM-DD_N.json.

    Args:
        base_path (str): The directory where files will be stored.

    Returns:
        str: The full path of the unique filename.
    """
    # Ensure the directory exists
    os.makedirs(base_path, exist_ok=True)

    # Get current date
    date_str = datetime.now().strftime("%Y-%m-%d")
    file_number = 1

    while True:
        filename = f"Prompt_Return_{date_str}_{file_number}.json"
        full_path = os.path.join(base_path, filename)
        if not os.path.exists(full_path):
            return full_path
        file_number += 1

def analyze_multiple_articles(chroma_entries, model="deepseek-r1"):
    """
    Sends multiple articles from ChromaDB to the DeepSeek LLM for analysis and saves the results.

    Args:
        chroma_entries (list): List of enriched article entries from ChromaDB.
        model (str): The LLM model to use.

    Returns:
        list: A list of responses from the LLM.
    """
    responses = []
    
    for entry in chroma_entries:
        prompt_text = construct_fact_checking_prompt(entry)
        response = ollama.chat(
            model=model,
            messages=[{"role": "user", "content": prompt_text}]
        )
        response_content = response['message']['content']
        responses.append({
            "title": entry.get("title", "Unknown Title"),
            "url": entry.get("url", "Unknown URL"),
            "published_date": entry.get("published_date", "Unknown Date"),
            "source": entry.get("source", "Unknown Source"),
            "author": entry.get("author", "Unknown Author"),
            "fact_check_result": response_content
        })
    
    # Generate unique filename
    output_file = get_unique_filename()

    # Save responses to a JSON file
    with open(output_file, "w", encoding="utf-8") as file:
        json.dump(responses, file, indent=4, ensure_ascii=False)

    return responses

# ** Fetch limited entries from ChromaDB and send them for analysis **
chroma_entries = fetch_entries_from_chromadb(max_entries=5)
results = analyze_multiple_articles(chroma_entries)

# Output results
print(json.dumps(results, indent=4, ensure_ascii=False))




[
    {
        "title": "Presidents Are Often Judged By History Through The Lens Of Morality",
        "url": "https://newsone.com/5939034/presidents-are-judged-by-history-through-the-lens-of-morality/",
        "published_date": "2025-02-17T14:33:46+00:00",
        "source": "Unknown source",
        "author": "George R. Goethals, University of Richmond",
        "fact_check_result": "<think>\nOkay, so I need to figure out if this news article is reliable based on some basic checks. Let me start by reading through the content carefully and then see what potential issues or inconsistencies I can spot.\n\nThe article talks about how presidents are judged over time through historical surveys, focusing on their moral authority and other aspects. It mentions a few key points: historical rankings have changed as values shift, examples of specific presidents like Lincoln and Grant moving up and down the rankings, and the role of moral authority in these evaluations. \n\nFirst off, I notice 

# Fundamental Differences Between the Three Prompts  

## Prompt 5  
- Focuses on **historical context** and **content analysis** but lacks a deep evaluation of **source credibility**.  
- Provides **basic fact-checking** without structured methodologies for bias detection.  
- Does not fully consider **technical issues** like **404 errors, paywalls, or misinformation tactics**.  

## Prompt 6  
- Introduces **source credibility analysis**, evaluating **bias and methodological clarity**.  
- Improves **transparency evaluation** and **fact-checking reliability**.  
- Begins considering **digital accessibility issues**, such as **content removal or URL changes**.  

## Prompt 7  
- Expands **source credibility verification**, including **peer-reviewed sources and institutional transparency**.  
- Evaluates **misleading language**, **tone manipulation**, and **sensationalism**.  
- Includes **fact-checking cross-references** with **official statements and authoritative sources**.  
- Considers **technical barriers** like **paywalls, 404 errors, and online content manipulation**.  
- Analyzes **media influence and marketing tactics** in news presentation.  

---

# Comparative Summary of Results  

## Presidential Rankings Article  
- **Prompt 5**: Evaluates **historical context** and **survey methodology**.  
- **Prompt 6**: Examines **bias in presidential rankings** and **missing survey details**.  
- **Prompt 7**: Enhances scrutiny of **source transparency** and **lack of peer-reviewed studies**.  
- **Best Performance**: **Prompt 7**, due to stronger verification of **sources and methodology**.  

## Prince and Princess of Wales' Mustique Villa  
- **Prompt 5**: Identifies **lack of reputable sources**.  
- **Prompt 6**: Adds **404 error analysis** and considers **content removal**.  
- **Prompt 7**: Investigates **potential online content manipulation** and **technical barriers**.  
- **Best Performance**: **Prompt 7**, for deeper **digital forensic evaluation**.  

## Scottish Museum Government Funding  
- **Prompt 5**: Discusses **general credibility of government funding news**.  
- **Prompt 6**: Notes that **government sources are typically reliable** but lacks detailed verification.  
- **Prompt 7**: Evaluates **paywall restrictions and source accessibility issues**.  
- **Best Performance**: **Prompt 7**, due to analysis of **information accessibility**.  

## The Lathums Album Release  
- **Prompt 5**: Checks **content consistency** but **does not verify sources**.  
- **Prompt 6**: Fact-checks **band announcements** but lacks external verification.  
- **Prompt 7**: Analyzes **fan engagement tactics, media influence, and marketing strategies**.  
- **Best Performance**: **Prompt 7**, due to **evaluation of media influence on reporting**.  

## Trump and FAA Air Traffic Control Firings  
- **Prompt 5**: Identifies **timeline inconsistencies**.  
- **Prompt 6**: Improves **historical fact-checking** but lacks **government source verification**.  
- **Prompt 7**: Expands on **government accountability, misinformation sources, and aviation industry fact-checking**.  
- **Best Performance**: **Prompt 7**, due to **comprehensive verification and cross-referencing**.  

---

# Final Verdict: Best Prompt Performance  
The **updated prompt used in File 7** is the most effective because it:  
- **Enhances source credibility analysis** with **peer-reviewed references and institutional transparency**.  
- **Evaluates bias, misleading language, and tone manipulation**.  
- **Incorporates fact-checking methodologies with cross-references to authoritative sources**.  
- **Considers digital accessibility issues** such as **paywalls, content removals, and misinformation tactics**.  
- **Analyzes the influence of media narratives and marketing strategies** on reporting.  

Would you like to refine the prompt further based on these findings?  



