# AI-Powered Investor Guideline Parser

Extract pricing and eligibility rules from investor guideline PDFs using NLP.

In [8]:
!pip install pdfplumber spacy
import pdfplumber
import spacy



## Step 1: Load and Extract Text from PDF

In [9]:
# Step 1: Upload PDF File from Your Local Downloads Folder
from google.colab import files
uploaded = files.upload()  # Opens file picker dialog

# Step 2: Install Required Libraries (if not already installed)
#!pip install pdfplumber

# Step 3: Extract Text Using pdfplumber
#import pdfplumber

# Replace with the uploaded filename
pdf_filename = list(uploaded.keys())[0]

with pdfplumber.open(pdf_filename) as pdf:
    text = ''
    for page in pdf.pages:
        text += page.extract_text() + '\n'

# Preview the first 1000 characters of extracted text
print(text[:1000])




Saving sample_investor_guideline.pdf to sample_investor_guideline (1).pdf
Investor Guideline Summary
Product: 30-Year Fixed Rate
Minimum Credit Score: 620
Maximum Loan-to-Value (LTV): 80%
Minimum Loan Amount: $50,000
Maximum Loan Amount: $647,200
Eligible Property Types:
- Single Family Residence
- Condo
- Townhome
Ineligible Property Types:
- Mobile Homes
- Agricultural Properties
Debt-to-Income (DTI) Ratio:
- Max DTI: 43%
Documentation Type:
- Full Documentation Required
Interest Rate Adjustments:
- +0.25% if credit score < 660
- +0.125% if LTV > 75%
Borrower Eligibility:
- US Citizens and Permanent Residents only
- No bankruptcies or foreclosures within past 4 years



In [3]:
"""
with pdfplumber.open("sample_guideline.pdf") as pdf:
    text = ''
    for page in pdf.pages:
        text += page.extract_text() + '\n'

print(text[:1000])  # Preview first 1000 characters
"""


'\nwith pdfplumber.open("sample_guideline.pdf") as pdf:\n    text = \'\'\n    for page in pdf.pages:\n        text += page.extract_text() + \'\n\'\n\nprint(text[:1000])  # Preview first 1000 characters\n'

## Step 2: Apply Named Entity Recognition (NER)

In [10]:

# Load a pre-trained spaCy model
nlp = spacy.load("en_core_web_sm")

doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)


Guideline Summary PERSON
30-Year DATE
620 CARDINAL
LTV ORG
80% PERCENT
50,000 MONEY
647,200 MONEY
DTI ORG
43% PERCENT
660 CARDINAL
LTV ORG
75% PERCENT
past 4 years DATE


## Step 3: Structure Rules

In [11]:

# Example: Extract rules related to credit score or LTV manually for now
structured_rules = {
    "min_credit_score": 620,
    "max_ltv": 80,
    "loan_term": "30_year_fixed"
}

import json
print(json.dumps(structured_rules, indent=2))


{
  "min_credit_score": 620,
  "max_ltv": 80,
  "loan_term": "30_year_fixed"
}
