<a href="https://colab.research.google.com/github/boscokante/test/blob/master/credit_report_analyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [44]:
# Cell 1: Install and Upgrade Libraries

!pip install --upgrade openai pdfplumber




In [45]:
# Cell 2: Imports and API Key

import openai
import pdfplumber

# Replace with your actual OpenAI key
openai.api_key = "OPEN AI KEY"


In [46]:
#Cell 3: Function to Extract Text from the PDF

def extract_text_from_pdf(pdf_path: str) -> str:
    """Extract text from each page of a text-based PDF and return a single string."""
    text_content = []
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            page_text = page.extract_text()
            if page_text:
                text_content.append(page_text)
    return "\n".join(text_content)



In [47]:
# Cell 4: Step A – Identify Detailed Errors

def step_a_identify_detailed_errors(pdf_text: str) -> str:
    """
    Step A: Force GPT to identify an exhaustive list of errors, without summarizing or writing a final dispute letter.
    """
    system_prompt = "You are a thorough credit report analyst."

    user_prompt = f"""
    I have the following PDF text from a credit report with a single negative tradeline.
    I want you to identify every possible error, omission, or inconsistency in thorough detail.
    DO NOT summarize and DO NOT write a dispute letter yet.
    Instead, list each category of errors carefully, referencing specific data in the PDF.

    Use the following checklist, and add more categories if you see relevant issues:
    1. Charge-Off Status vs. Available Credit
    2. Balance Consistency Over Time
    3. Scheduled/Actual Payment Data
    4. Amount Past Due Over Time
    5. Payment History Inconsistencies
    6. Credit Limit vs. Negative Available Credit
    7. Charge-Off Amount vs. Balance
    8. Dates of First Delinquency and Closure
    9. Missing Account Details (e.g., Date of Last Activity, High Credit, Term Duration)
    10. Months Reviewed
    11. Creditor Classification
    12. Additional Comments/Discrepancies

    FORMAT:
    - Numbered or bullet-pointed list
    - Each point should have “Error:” or “Discrepancy:” lines
      referencing exact or approximate text from the PDF.
    - Provide as many details as possible.
    - This is not a dispute letter, just the error analysis.

    PDF TEXT:
    {pdf_text}
    """

    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.0,    # Minimal "creativity" -> more literal
        max_tokens=3000     # Large enough to hold long output
    )
    return response.choices[0].message.content





In [48]:
# Cell 5: Step B – Write Final Dispute Letter in Your Desired Style



def step_b_format_final(detailed_errors: str) -> str:
    """
    Step B: Takes the thorough error list from step A and transforms it into
    the final format with:
      1. main headings
         a.  Error:
         i.  question...
    etc.
    """

    system_prompt = "You are a thorough credit report analyst."

    # Paste your final format style below as the 'desired_output_example'.
    # We'll show a simplified snippet, but you'd paste your entire final structure.
    desired_output_example = r"""
Below is the style we want:

1. Charge-Off Status and Negative Available Credit
    a.  Error: The account is marked as "Charge-Off," but the available credit is listed as -$449.
    i.  How does a charge-off account have an available credit and what does negative available credit mean?
    ii. If I pay $949 will there be $500 in available credit?

2. Balance Consistency
    a.  Error: The reported balance remains static at $949 from 2023 through 2024.
    i.  Were there any interest charges, fees, or adjustments?
    ii. How can the balance remain constant?

(And so on for each item.)
Requested Action:
Please correct all errors and resolve all discrepancies or remove this account from my report.

Below Are Excerpts from Credit Report (Pages 4-6):
...
"""

    user_prompt = f"""
    We have a detailed error list from a credit report. Please convert it into
    the exact final format shown in the 'desired_output_example' below.

    DETAILED ERRORS:
    {detailed_errors}

    DESIRED OUTPUT EXAMPLE:
    {desired_output_example}

    Instructions:
    1. Use the numbering style:
       (1), (2), (3)...
         a. Error: ...
         i. question...
         ii. question...
    2. Indent exactly as in the example (with spaces).
    3. Do NOT use "Questions" headings—just use "i., ii." lines after "Error:".
    4. End with a "Requested Action" section.
    5. Add "Below Are Excerpts from Credit Report..." at the end if appropriate.
    6. Keep the content of 'detailed_errors', but rephrase as needed
       to match the final style. You can rearrange or condense if duplicates exist,
       but do not omit any major issues.
    """

    # Using the new endpoint for openai>=1.0.0:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.0,
        max_tokens=3000
    )

    # Return the generated text
    return response.choices[0].message.content


In [49]:
# Cell 6: Putting It All Together

# 1. Extract PDF text
pdf_text = extract_text_from_pdf("/content/2025.01.18 Barclays Equifax.pdf")

# 2. Step A - Identify Detailed Errors
step_a_output = step_a_identify_detailed_errors(pdf_text)
print("====== DETAILED ERRORS (STEP A) ======")
print(step_a_output)

# 3. Step B - Format into final style
final_output = step_b_format_final(step_a_output)
print("\n\n====== FINAL FORMAT ======")
print(final_output)


1. Error: Charge-Off Status vs. Available Credit: The account status is listed as "CHARGE_OFF" but the available credit is listed as "-$449". This is inconsistent as a charged-off account should not have any available credit.

2. Discrepancy: Balance Consistency Over Time: The balance is consistently listed as $949 for every month of 2023 and 2024. This is unusual as the balance on a revolving account typically fluctuates with usage and payments.

3. Error: Scheduled/Actual Payment Data: There is no data provided for scheduled or actual payments for the years 2023, 2024, and 2025. This is an omission as payment history is a crucial part of a credit report.

4. Discrepancy: Amount Past Due Over Time: The amount past due is consistently listed as $949 for every month of 2023 and 2024. This is inconsistent with the account status of "CHARGE_OFF" as the amount past due should be zero after an account is charged off.

5. Error: Payment History Inconsistencies: The payment history shows the 