# Why Checking DOIs via API is the ‚ÄúSilver Bullet‚Äù for AI Detection
Checking References, specifically through their Digital Object Identifiers (DOIs), is arguably the most definitive method to catch AI hallucinations. Large Language Models (LLMs) like ChatGPT often generate plausible-sounding citations that do not actually exist.

Here is why the Python + doi.org Content Negotiation method is superior:

-  Deterministic Accuracy (Binary Result)
Unlike analyzing writing style or ‚Äúperplexity‚Äù scores‚Äîwhich are probabilistic and prone to false positives‚Äîa DOI check is binary. A DOI either exists in the global registry, or it doesn‚Äôt.

Result: 404 Not Found = 100% Fake Reference.

 - Detecting ‚ÄúStolen‚Äù DOIs
AI sometimes hallucinates by taking a real DOI from an unrelated paper and attaching it to a fake citation.

- The Fix: By retrieving the metadata (JSON) directly from the source, you can compare the actual title in the database against the title listed in the suspicious paper. If the paper claims to be about ‚ÄúEconomics‚Äù but the DOI resolves to ‚ÄúMarine Biology,‚Äù it is undeniable proof of AI generation.
-  Global Coverage (Not Just One Publisher)
By querying the central doi.org resolver rather than specific publisher APIs (like Elsevier or Wiley), this method covers all academic content.

Efficiency: It handles redirects automatically, finding the metadata whether the paper is hosted on Crossref, DataCite, or mEDRA.
- . Scalability and Automation
Manually clicking 50 links is tedious. This Python script allows for batch processing. You can feed it a list of 100 references and receive a full audit report in seconds, making it perfect for editors, professors, or automated quality control systems.

In this section, we proved that this is an efficient way to find if a paper is valid or not. 

In [None]:
import re
import pandas as pd
import time

In [None]:
import requests

def verify_doi_validity(doi_input):
    """
    Checks if a DOI exists by querying the doi.org resolver directly.
    Returns detailed metadata if valid, or an error status if invalid.
    """
    # Clean the input to ensure we only have the DOI string
    clean_doi = doi_input.replace("https://doi.org/", "").replace("http://doi.org/", "")
    
    url = f"https://doi.org/{clean_doi}"
    
    headers = {
        "Accept": "application/vnd.citationstyles.csl+json"
    }

    try:
        response = requests.get(url, headers=headers, allow_redirects=True, timeout=10)
        
        if response.status_code == 200:
            try:
                data = response.json()
            except ValueError:
                return {"status": "Error", "details": "Response was not valid JSON."}
            
            # 1. Extracting Title
            title = data.get('title', 'N/A')
            if isinstance(title, list) and len(title) > 0:
                title = title[0]
            
            # 2. Extracting Journal Name (Container Title)
            journal = data.get('container-title', 'N/A')
            if isinstance(journal, list) and len(journal) > 0:
                journal = journal[0]

            # 3. Extracting First Author's Last Name
            author_lastname = "N/A"
            if 'author' in data and len(data['author']) > 0:
                # We take the first author in the list
                author_lastname = data['author'][0].get('family', 'N/A')

            return {
                "status": "Valid",
                "real_title": title,
                "journal": journal,
                "first_author": author_lastname
            }
            
        elif response.status_code == 404:
            return {"status": "Invalid", "details": "DOI not found"}
        else:
            return {"status": "Error", "details": f"HTTP Code: {response.status_code}"}

    except Exception as e:
        return {"status": "Connection Error", "details": str(e)}

# --- Usage Example ---

doi_list_to_check = [
    "10.1038/nature123",            # Fake
    "10.1007/s10701-005-9016-x",    # Valid (Physics paper)
    "10.1016/j.jbi.2008.04.002",    # Valid (Bioinformatics paper)
    "10.1126/science.fake.999"      # Fake
]

# Header format for the table
print(f"{'DOI':<27} | {'Status':<8} | {'Author':<15} | {'Journal':<20} | {'Real Title'}")
print("-" * 110)

for doi in doi_list_to_check:
    result = verify_doi_validity(doi)
    
    if result['status'] == "Valid":
        # Clean and shorten strings for table display
        author = str(result['first_author'])[:15]
        journal = str(result['journal'])[:20]
        title = str(result['real_title'])[:35] + "..."
        
        print(f"{doi:<27} | {result['status']:<8} | {author:<15} | {journal:<20} | {title}")
    else:
        # For errors, we just print the details in the last column
        print(f"{doi:<27} | {result['status']:<8} | {'-':<15} | {'-':<20} | {result.get('details', '-')}")


# for csv files

In [None]:
#  Define the extraction function
def extract_dois_from_text(text):
    """
    Scans a text string for DOIs using regex.
    Returns a list of unique DOIs found, or an empty list.
    """
    # The standard DOI regex
    doi_pattern = r'\b(10\.\d{4,9}/[-._;()/:a-zA-Z0-9]+)\b'
    # we can extend \d{4,9} mybe capture more 
    
    if not isinstance(text, str):
        return []
        
    matches = re.findall(doi_pattern, text)
    
    # Clean up trailing punctuation (like a period at the end of a sentence)
    unique_dois = set()
    for doi in matches:
        clean = doi.rstrip(".,)")
        unique_dois.add(clean)
        
    return list(unique_dois)

#  Apply it to the dataframe
print("Extracting DOIs from 'paper_text' column... this might take a moment.")
df['extracted_dois'] = df['paper_text'].apply(extract_dois_from_text)

#  Create a count column just to see how many we found per paper
df['doi_count'] = df['extracted_dois'].apply(len)

# 4. Filter to show only papers where we actually found DOIs
papers_with_dois = df[df['doi_count'] > 0].copy()

print(f"\nProcessing Complete.")
print(f"Total Papers Scanned: {len(df)}")
print(f"Papers containing DOIs: {len(papers_with_dois)}")

# Show a preview of the results
if len(papers_with_dois) > 0:
    print("\n--- Preview of Papers with Extracted DOIs ---")
    # We select just the ID, Year, Title, and the list of DOIs found
    display_cols = ['id', 'year', 'title', 'extracted_dois']
    try:
        display(papers_with_dois[display_cols].head())
    except NameError:
        print(papers_with_dois[display_cols].head())
else:
    print("No DOIs found. Note: Older papers (1987-1990s) often didn't print DOIs in their bibliographies.")

In [None]:
# Your provided verification function
def verify_doi_validity(doi_input):
    clean_doi = doi_input.replace("https://doi.org/", "").replace("http://doi.org/", "")
    url = f"https://doi.org/{clean_doi}"
    headers = {"Accept": "application/vnd.citationstyles.csl+json"}

    try:
        response = requests.get(url, headers=headers, allow_redirects=True, timeout=10)
        
        if response.status_code == 200:
            try:
                data = response.json()
            except ValueError:
                return {"status": "Error", "details": "Response was not valid JSON."}
            
            title = data.get('title', 'N/A')
            if isinstance(title, list) and len(title) > 0: title = title[0]
            
            journal = data.get('container-title', 'N/A')
            if isinstance(journal, list) and len(journal) > 0: journal = journal[0]

            author_lastname = "N/A"
            if 'author' in data and len(data['author']) > 0:
                author_lastname = data['author'][0].get('family', 'N/A')

            return {
                "validity": "Valid",
                "meta_title": title,
                "meta_journal": journal,
                "meta_author": author_lastname,
                "details": "OK"
            }
        elif response.status_code == 404:
            return {"validity": "Invalid", "meta_title": "-", "meta_journal": "-", "meta_author": "-", "details": "DOI Not Found"}
        else:
            return {"validity": "Error", "meta_title": "-", "meta_journal": "-", "meta_author": "-", "details": f"HTTP {response.status_code}"}

    except Exception as e:
        return {"validity": "Conn Error", "meta_title": "-", "meta_journal": "-", "meta_author": "-", "details": str(e)}


# Iterate through the papers and check their DOIs

results_list = []

# LIMITER: We only check the first 5 papers for this demo to save time.
# Remove .head(5) to run on all papers.
papers_to_check = papers_with_dois.head(5)

print(f"Starting verification on {len(papers_to_check)} papers...")

for index, row in papers_to_check.iterrows():
    paper_id = row['id']
    paper_year = row['year']
    extracted_dois = row['extracted_dois']
    
    print(f"Processing Paper ID {paper_id} ({len(extracted_dois)} DOIs found)...")
    
    for doi in extracted_dois:
        # Run the verification API
        res = verify_doi_validity(doi)
        
        # Save the result in a structured way
        results_list.append({
            "Paper_ID": paper_id,
            "Paper_Year": paper_year,
            "Checked_DOI": doi,
            "Status": res['validity'],
            "Real_Author": res['meta_author'],
            "Real_Journal": res['meta_journal'],
            "Real_Title": res['meta_title'],
            "Notes": res['details']
        })
        
        # Be polite to the API server, sleep a tiny bit
        time.sleep(0.2)

# Convert results to a DataFrame for nice display
verification_df = pd.DataFrame(results_list)

print("\n--- Verification Complete ---")

# Display valid vs invalid counts
print(verification_df['Status'].value_counts())

print("\n--- Detailed Results Table ---")
# Displaying in a nice clean format
display_cols = ['Paper_ID', 'Checked_DOI', 'Status', 'Real_Author', 'Real_Journal']
try:
    display(verification_df[display_cols])
except NameError:
    print(verification_df[display_cols])

# Gradio Mini App

In [1]:
import gradio as gr
import re
import requests
import pandas as pd

# --- API Helper Function ---
def get_api_data(doi):
    """
    Fetches official metadata (Title, Year) from Crossref API.
    """
    if not doi:
        return "-", "-"
        
    try:
        url = f"https://api.crossref.org/works/{doi}"
        # Polite User-Agent to avoid being blocked
        headers = {"User-Agent": "ResearchParser/2.0 (mailto:test@example.com)"}
        
        response = requests.get(url, headers=headers, timeout=5)
        
        if response.status_code == 200:
            data = response.json()['message']
            
            # Extract Official Title
            api_title = data.get('title', ['-'])[0]
            
            # Extract Official Year
            date_parts = data.get('issued', {}).get('date-parts', [[None]])
            api_year = str(date_parts[0][0]) if date_parts[0][0] else "-"
            
            return api_title, api_year
        else:
            return "Error Fetching", "Error"
    except:
        return "Connection Failed", "Error"

# --- Local Text Extraction Function ---
def extract_local_info(text):
    """
    Attempts to parse Title and Year directly from the raw input text strings.
    """
    # 1. Find DOI
    doi_match = re.search(r'\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)', text, re.IGNORECASE)
    doi = doi_match.group(1).rstrip('.') if doi_match else None
    
    # 2. Find Title (Heuristic: Look for text inside quotes)
    # Matches both standard quotes "" and smart quotes ‚Äú‚Äù
    title_match = re.search(r'[‚Äú"](.*?)[‚Äù"]', text)
    local_title = title_match.group(1) if title_match else "Not found in quotes"
    
    # 3. Find Year (Heuristic: Last 4-digit number starting with 19 or 20)
    years = re.findall(r'\b(19|20)\d{2}\b', text)
    local_year = years[-1] if years else "-"
    
    # Cleanup: If the found year is actually part of the DOI (e.g. 10.1016), ignore it
    if local_year and doi and local_year in doi:
         local_year = "-"

    return doi, local_title, local_year

# --- Main Processing Logic ---
def process_references(text):
    if not text:
        return pd.DataFrame()

    # Split text based on reference numbers like [1], [2]
    raw_refs = re.split(r'(\[\d+\])', text)
    
    parsed_data = []
    current_id = ""
    
    for chunk in raw_refs:
        chunk = chunk.strip()
        if not chunk: continue
        
        # Identify Reference ID (e.g., [1])
        if re.match(r'\[\d+\]', chunk):
            current_id = chunk
        else:
            # It is the reference text content
            full_text = chunk.replace('\n', ' ')
            
            # 1. Extract from Text (Local)
            doi, local_title, local_year = extract_local_info(full_text)
            
            # 2. Extract from API (Web)
            if doi:
                api_title, api_year = get_api_data(doi)
            else:
                api_title, api_year = "-", "-"

            # 3. Append to list for the DataFrame
            parsed_data.append([
                current_id, 
                doi if doi else "-", 
                local_title,  # Extracted from text
                api_title,    # Extracted from API
                local_year,   # Extracted from text
                api_year      # Extracted from API
            ])

    # Create DataFrame
    df = pd.DataFrame(parsed_data, columns=[
        "Ref ID", 
        "DOI", 
        "Title (From Text)", 
        "Title (From API)", 
        "Year (Text)", 
        "Year (API)"
    ])
    return df

# --- Gradio UI ---

with gr.Blocks(title="Smart Citation Comparator") as demo:
    gr.Markdown("#  Smart Reference Parser & Comparator")
    gr.Markdown("Paste your bibliography below. The system will extract data from your **text** (Local) and compare it with the **Crossref Database** (API).")
    
    with gr.Row():
        # Left: Input
        with gr.Column(scale=1):
            input_text = gr.Textbox(
                lines=10, 
                label="Input Bibliography", 
                placeholder="[1] Author Name, ‚ÄúPaper Title Here‚Äù, Journal, 2023, doi:10.1000/xyz..."
            )
            btn = gr.Button("Analyze & Compare", variant="primary")
        
        # Right: Output
        with gr.Column(scale=3):
            output_table = gr.Dataframe(
                label="Comparison Table",
                headers=["Ref ID", "DOI", "Title (From Text)", "Title (From API)", "Year (Text)", "Year (API)"],
                interactive=False,
                wrap=True
            )

    btn.click(fn=process_references, inputs=input_text, outputs=output_table)

if __name__ == "__main__":
    demo.launch()


* Running on local URL:  http://127.0.0.1:7860
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://93e71b6e569ba3f5cf.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


# second app

In [None]:
import gradio as gr
import re
import requests
import pandas as pd

# --- CONFIGURATION: Suspicious Patterns ---
# Known "Tortured Phrases" often used by AI paraphrasing tools
TORTURED_PHRASES = [
    "counterfeit consciousness", # instead of Artificial Intelligence
    "colossal information",      # instead of Big Data
    "many-sided",                # instead of multifaceted
    "random access memory",      # usually fine, but sometimes used weirdly in context
    "sedentary phone",           # instead of mobile phone
    "creepie-crawlie",           # instead of bug/insect in bio papers
    "solar module",              # sometimes used for 'sun' generally
    "flagitious",                # archaic word often used by spinners
]

# List of highly trusted publishers (simplistic whitelist)
TRUSTED_PUBLISHERS = [
    "IEEE", "Elsevier", "Springer", "Wiley", "ACM", "Nature", "Science", 
    "Taylor & Francis", "Sage", "Oxford University Press", "Cambridge University Press"
]

# --- API Helper Function ---
def get_api_data(doi):
    if not doi:
        return "-", "-", "-"
        
    try:
        url = f"https://api.crossref.org/works/{doi}"
        headers = {"User-Agent": "ResearchParser/3.0 (mailto:test@example.com)"}
        
        response = requests.get(url, headers=headers, timeout=5)
        
        if response.status_code == 200:
            data = response.json()['message']
            
            api_title = data.get('title', ['-'])[0]
            publisher = data.get('publisher', '-')
            
            date_parts = data.get('issued', {}).get('date-parts', [[None]])
            api_year = str(date_parts[0][0]) if date_parts[0][0] else "-"
            
            return api_title, api_year, publisher
        else:
            return "Error Fetching", "Error", "Unknown"
    except:
        return "Connection Failed", "Error", "Unknown"

# --- Risk Analysis Function ---
def analyze_risk(title, publisher):
    flags = []
    score = 0
    
    # Check 1: Tortured Phrases in Title
    if title and title != "-":
        lower_title = title.lower()
        for phrase in TORTURED_PHRASES:
            if phrase in lower_title:
                flags.append(f"Suspicious Phrase: '{phrase}'")
                score += 3
                # artirary 

    # Check 2: Title Length (Too short titles are suspicious)
    if title and len(title) < 15 and title != "-":
         flags.append("Title too short")
         score += 1

    # Check 3: Publisher Analysis
    if publisher == "-" or publisher == "Unknown":
        flags.append("Unknown Publisher")
        score += 1
    else:
        # Check if publisher contains trusted keywords
        is_trusted = any(tp.lower() in publisher.lower() for tp in TRUSTED_PUBLISHERS)
        if not is_trusted:
            flags.append("Niche/Unknown Publisher") # Not necessarily bad, but worth noting
            score += 0.5

    # Determine Level
    if score >= 3:
        return "üî¥ HIGH RISK", ", ".join(flags)
    elif score >= 1:
        return "üü° MODERATE", ", ".join(flags)
    else:
        return "üü¢ LOW", "Looks Standard"

# --- Local Extraction ---
def extract_local_info(text):
    doi_match = re.search(r'\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)', text, re.IGNORECASE)
    doi = doi_match.group(1).rstrip('.') if doi_match else None
    
    title_match = re.search(r'[‚Äú"](.*?)[‚Äù"]', text)
    local_title = title_match.group(1) if title_match else "Not found in quotes"
    
    years = re.findall(r'\b(19|20)\d{2}\b', text)
    local_year = years[-1] if years else "-"
    
    if local_year and doi and local_year in doi:
         local_year = "-"

    return doi, local_title, local_year

# --- Main Logic ---
def process_references(text):
    if not text:
        return pd.DataFrame()

    raw_refs = re.split(r'(\[\d+\])', text)
    parsed_data = []
    current_id = ""
    
    for chunk in raw_refs:
        chunk = chunk.strip()
        if not chunk: continue
        
        if re.match(r'\[\d+\]', chunk):
            current_id = chunk
        else:
            full_text = chunk.replace('\n', ' ')
            
            # 1. Local Extract
            doi, local_title, local_year = extract_local_info(full_text)
            
            # 2. API Extract
            if doi:
                api_title, api_year, publisher = get_api_data(doi)
            else:
                api_title, api_year, publisher = "-", "-", "-"

            # 3. Analyze Risk (Using API Title preferably, otherwise Local)
            target_title = api_title if api_title != "-" else local_title
            risk_level, risk_details = analyze_risk(target_title, publisher)

            parsed_data.append([
                current_id, 
                doi if doi else "-", 
                api_title, 
                publisher,
                risk_level,
                risk_details
            ])

    df = pd.DataFrame(parsed_data, columns=[
        "ID", "DOI", "Official Title", "Publisher", "Risk Level", "Risk Details"
    ])
    return df

# --- Gradio UI ---
with gr.Blocks(title="Fake Paper Detector") as demo:
    gr.Markdown("# Advanced Citation & Risk Analyzer")
    gr.Markdown("Checks for: **Tortured Phrases** (AI generation indicators) and **Publisher Credibility**.")
    
    with gr.Row():
        with gr.Column(scale=1):
            input_text = gr.Textbox(lines=8, label="Input References", placeholder="Paste references here...")
            btn = gr.Button("Analyze Risks", variant="primary")
        
        with gr.Column(scale=3):
            output_table = gr.Dataframe(
                label="Risk Analysis Report",
                headers=["ID", "DOI", "Official Title", "Publisher", "Risk Level", "Risk Details"],
                interactive=False,
                wrap=True
            )

    btn.click(fn=process_references, inputs=input_text, outputs=output_table)

if __name__ == "__main__":
    demo.launch()


# fake refrence using names 

In [3]:
import gradio as gr
import re
import requests
import pandas as pd
from difflib import SequenceMatcher

# --- Helper: Calculate Text Similarity ---
def similarity_score(a, b):
    if not a or not b: return 0.0
    return SequenceMatcher(None, a.lower(), b.lower()).ratio()

# --- API Helper Function ---
def search_crossref(doi=None, title_query=None):
    """
    Returns: (Title, Year, List of Family Names, Source Note)
    """
    try:
        headers = {"User-Agent": "AuthCheck/5.0 (mailto:test@example.com)"}
        data = None
        note = ""

        if doi:
            url = f"https://api.crossref.org/works/{doi}"
            response = requests.get(url, headers=headers, timeout=5)
            if response.status_code == 200:
                data = response.json()['message']
                note = "Found via DOI"
        
        elif title_query:
            url = "https://api.crossref.org/works"
            # Request top 1 result
            params = {'query.bibliographic': title_query, 'rows': 1}
            response = requests.get(url, headers=headers, params=params, timeout=5)
            if response.status_code == 200:
                items = response.json()['message']['items']
                if items:
                    data = items[0]
                    note = f"Found via Title Search (DOI: {data.get('DOI', '-')})"

        if data:
            api_title = data.get('title', [''])[0]
            
            # Extract Authors (List of family names)
            authors_raw = data.get('author', [])
            # We specifically get 'family' name because 'given' names vary (J. vs John)
            api_authors = [a.get('family', '') for a in authors_raw if 'family' in a]
            
            return api_title, api_authors, note
            
        return None, [], "Not Found"

    except Exception as e:
        return None, [], f"Error: {str(e)}"

# --- Local Extraction ---
def extract_local_info(text):
    # 1. Find DOI
    doi_match = re.search(r'\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)', text, re.IGNORECASE)
    doi = doi_match.group(1).rstrip('.') if doi_match else None
    
    # 2. Find Title (Text inside quotes)
    # Regex Explanation: Capture everything inside the first pair of quotes
    title_match = re.search(r'[‚Äú"](.*?)[‚Äù"]', text)
    local_title = title_match.group(1) if title_match else None
    
    # 3. Find Author (Heuristic: Everything BEFORE the opening quote)
    # We assume format is: Author Names "Title" ...
    local_author_str = "Unknown"
    if title_match:
        # Get the start index of the quote
        quote_start = title_match.start()
        # Slice text from 0 to quote_start
        pre_title_text = text[:quote_start].strip()
        # Remove trailing commas or periods
        local_author_str = pre_title_text.rstrip('.,: ')
    
    return doi, local_title, local_author_str

# --- Validation Logic ---
def check_author_match(local_str, api_authors_list):
    """
    Checks if any of the official API authors appear in the local string.
    """
    if not api_authors_list:
        return "‚ùì No Data", False
        
    # Check if ANY official family name is present in the user's text
    # We verify length > 2 to avoid matching short strings like "Li" or "Ng" incorrectly easily (optional safety)
    found_authors = []
    for auth in api_authors_list:
        if auth.lower() in local_str.lower():
            found_authors.append(auth)
            
    if found_authors:
        return f"‚úÖ Match ({', '.join(found_authors)})", True
    else:
        # Create a string of expected authors for the error message
        expected = ", ".join(api_authors_list[:3]) # Show max 3
        return f"‚ùå Mismatch (Expected: {expected}...)", False

# --- Main Processor ---
def verify_full_citation(text):
    if not text:
        return pd.DataFrame()

    raw_refs = re.split(r'(\[\d+\])', text)
    results = []
    current_id = "?"
    
    for chunk in raw_refs:
        chunk = chunk.strip()
        if not chunk: continue
        
        if re.match(r'\[\d+\]', chunk):
            current_id = chunk
        else:
            full_text = chunk.replace('\n', ' ')
            
            # 1. Extract Local Data
            doi, local_title, local_author_str = extract_local_info(full_text)
            
            api_title = "-"
            api_authors = []
            status_title = "Unknown"
            status_author = "Unknown"
            
            # 2. Fetch Data (Strategy: DOI first, then Title)
            if doi:
                found_title, found_authors, note = search_crossref(doi=doi)
            elif local_title:
                found_title, found_authors, note = search_crossref(title_query=local_title)
            else:
                found_title, found_authors, note = None, [], "Format Error"

            # 3. Verify Title
            if found_title:
                api_title = found_title
                api_authors = found_authors
                
                # Calculate Similarity
                if local_title:
                    score = similarity_score(local_title, api_title) * 100
                    if score > 80:
                        status_title = "‚úÖ Title Verified"
                    elif score > 50:
                        status_title = f"‚ö†Ô∏è Low Similarity ({int(score)}%)"
                    else:
                        status_title = "‚õî Title Mismatch"
                else:
                    status_title = "‚ö†Ô∏è No Local Title"
                    
                # 4. Verify Author (Only if title was found)
                status_author, is_author_ok = check_author_match(local_author_str, api_authors)
                
            else:
                status_title = "‚ùå Paper Not Found"
                status_author = "-"

            results.append([
                current_id,
                local_title[:30] + "..." if local_title else "-", # Shorten for display
                local_author_str[:20] + "..." if local_author_str else "-",
                status_title,
                status_author,
                note
            ])

    df = pd.DataFrame(results, columns=[
        "ID", "Local Title", "Local Author", "Title Check", "Author Check", "Source"
    ])
    return df

# --- Gradio UI ---
with gr.Blocks(title="Full Citation Auditor") as demo:
    gr.Markdown("# üëÆ‚Äç‚ôÇÔ∏è Full Citation Auditor")
    gr.Markdown("Checks both **Title** AND **Author** validity.")
    gr.Markdown("1. Extracts Author & Title from your text.\n2. Finds the official paper.\n3. Compares if the Author matches the real paper.")
    
    with gr.Row():
        with gr.Column(scale=1):
            input_text = gr.Textbox(
                lines=10, 
                label="Paste References", 
                placeholder="[1] Vaswani, A. \"Attention Is All You Need\". NIPS, 2017.\n[2] Einstein, A. \"Deep Learning for Cats\". Fake Journal, 2024."
            )
            btn = gr.Button("Audit Citations", variant="primary")
        
        with gr.Column(scale=3):
            output_table = gr.Dataframe(
                label="Audit Report",
                headers=["ID", "Local Title", "Local Author", "Title Check", "Author Check", "Source"],
                interactive=False,
                wrap=True
            )

    btn.click(fn=verify_full_citation, inputs=input_text, outputs=output_table)

if __name__ == "__main__":
    demo.launch()


* Running on local URL:  http://127.0.0.1:7861
It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://85f0b15da706552cec.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
