# AI Property Due Diligence Assistant - Google Colab Version
### OIDD 2550 - Lab 5: LLM Pitch Project

**This notebook runs in Google Colab using cloud-based LLMs instead of local Ollama**

---

## Setup Instructions

1. **Upload to Google Colab**: File ‚Üí Upload notebook
2. **Set Runtime**: Runtime ‚Üí Change runtime type ‚Üí T4 GPU (free tier)
3. **Run all cells**: Runtime ‚Üí Run all

---

## Option 1: Use HuggingFace Models (Free, No API Key)

This uses smaller models that run on Colab's free GPU.

In [None]:
# Install dependencies
!pip install -q transformers accelerate langchain langchain-huggingface chromadb \
              pandas numpy matplotlib seaborn plotly folium PyPDF2 python-docx \
              sentence-transformers scikit-learn \
              opentelemetry-api==1.37.0 \
              opentelemetry-sdk==1.37.0 \
              opentelemetry-exporter-otlp-proto-common==1.37.0 \
              opentelemetry-exporter-otlp-proto-grpc==1.37.0 \
              opentelemetry-proto==1.37.0 \
              opentelemetry-semantic-conventions==0.58b0

print("‚úÖ Dependencies installed!")

In [None]:
# Import libraries
import os
import warnings
warnings.filterwarnings('ignore')

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.schema import Document

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from datetime import datetime

print("‚úÖ Libraries imported!")

### Load Small LLM Model (Phi-2 or TinyLlama)

We'll use Microsoft's Phi-2 (2.7B parameters) - it's small enough for Colab free tier but still capable.

In [None]:
# Load Phi-2 model (2.7B parameters - fits in Colab free tier)
print("üîÑ Loading Phi-2 model (this takes 2-3 minutes)...")

model_name = "microsoft/phi-2"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",  # Automatically uses GPU if available
    load_in_8bit=True   # Quantize to save memory
)

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.3,
    top_p=0.95,
    repetition_penalty=1.15
)

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

print("‚úÖ Model loaded and ready!")

## Option 2: Use OpenAI API (Requires API Key)

If you have an OpenAI API key, this gives better results than Phi-2.

In [None]:
# OPTION 2: Use OpenAI (uncomment if you have API key)

# !pip install -q openai langchain-openai

# from langchain_openai import ChatOpenAI
# from google.colab import userdata

# # Store your API key in Colab Secrets (left sidebar ‚Üí key icon)
# # Name it: OPENAI_API_KEY
# api_key = userdata.get('OPENAI_API_KEY')

# llm = ChatOpenAI(
#     model="gpt-3.5-turbo",
#     temperature=0.3,
#     api_key=api_key
# )

# print("‚úÖ OpenAI LLM ready!")

## Option 3: Use Groq API (Free, Fast)

Groq offers free API access to Llama 3.1 70B - faster than Colab GPU!

In [None]:
# OPTION 3: Use Groq (free, very fast)

# !pip install -q groq langchain-groq

# from langchain_groq import ChatGroq
# from google.colab import userdata

# # Get free API key at: https://console.groq.com
# # Store in Colab Secrets as: GROQ_API_KEY
# api_key = userdata.get('GROQ_API_KEY')

# llm = ChatGroq(
#     model="llama-3.1-8b-instant",
#     temperature=0.3,
#     api_key=api_key
# )

# print("‚úÖ Groq LLM ready (Llama 3.1 8B)!")

---

## Rest of the Notebook (Same as Local Version)

Everything below this point works identically to the local version.

In [None]:
# Configuration
CONFIG = {
    'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2',
    'chunk_size': 1000,
    'chunk_overlap': 200,
    'vector_db_path': './chroma_db',
    'top_k': 5
}

RISK_WEIGHTS = {
    'Structural': 0.30,
    'Financial': 0.30,
    'Legal': 0.20,
    'Operational': 0.10,
    'Market': 0.10
}

print("‚úÖ Configuration set")

In [None]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(
    model_name=CONFIG['embedding_model']
)

# Text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=CONFIG['chunk_size'],
    chunk_overlap=CONFIG['chunk_overlap']
)

print("‚úÖ Embeddings ready")

In [None]:
# Sample data (same as local version)
SAMPLE_PROPERTY = {
    'address': '1234 Oak Street, Austin, TX 78701',
    'property_type': 'Multifamily - 4 units',
    'asking_price': 425000,
    'year_built': 1985,
    'sqft': 3200,
}

SAMPLE_INSPECTION = """
PROPERTY INSPECTION REPORT

CRITICAL ISSUES:
1. HVAC System - Unit 2: 18 years old, refrigerant leak. Replacement cost: $5,000-$7,000.
2. Roof: 19 years old, multiple damaged shingles, water staining. Remaining life: 2-3 years. Replacement: $12,000-$15,000.

MODERATE ISSUES:
3. Electrical Panel: Outdated 60 amp service. Upgrade recommended: $3,000-$4,000.
4. Water Heaters: Both over 10 years old. Replacement within 2-3 years: $1,200 each.
5. Foundation: Minor hairline cracks in southeast corner. Monitor for changes.

TOTAL ESTIMATED DEFERRED MAINTENANCE: $25,000-$35,000
"""

DOMAIN_KNOWLEDGE = """
REAL ESTATE VALUATION PRINCIPLES:

Cap Rate Analysis:
- Cap Rate = NOI / Property Value
- Austin multifamily: typically 4.5% - 6.5%

Red Flags:
1. Structural: Foundation cracks > 1/4 inch, mold, roof leaks, HVAC > 15 years
2. Financial: Vacancy > 15%, delinquencies > 30 days, expense ratio > 50%
3. Legal: Unpermitted work, zoning violations
4. Operational: High turnover, month-to-month leases

Renovation Costs (Austin, 2024):
- HVAC: $5,000-$8,000 per unit
- Roof: $8,000-$15,000
- Foundation repair: $2,000-$10,000
"""

print("‚úÖ Sample data created")

In [None]:
# Create documents
documents = [
    Document(
        page_content=SAMPLE_INSPECTION,
        metadata={'source': 'inspection', 'type': 'inspection'}
    ),
    Document(
        page_content=DOMAIN_KNOWLEDGE,
        metadata={'source': 'knowledge', 'type': 'domain'}
    )
]

# Split and create vector store
split_docs = text_splitter.split_documents(documents)
vectorstore = Chroma.from_documents(
    documents=split_docs,
    embedding=embeddings,
    persist_directory=CONFIG['vector_db_path']
)

print(f"‚úÖ Vector database created with {len(split_docs)} chunks")

In [None]:
# Create RAG chain
PROMPT_TEMPLATE = """
You are a real estate analyst. Use the context below to answer the question.
Be specific and cite numbers from the documents.

Context:
{context}

Question: {question}

Answer:
"""

prompt = PromptTemplate(
    template=PROMPT_TEMPLATE,
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": CONFIG['top_k']}),
    chain_type_kwargs={"prompt": prompt},
    return_source_documents=True
)

print("‚úÖ RAG chain ready!")

## Test the System

In [None]:
# Test query
test_query = "What are the critical issues found in the inspection and their costs?"

print(f"üîç Query: {test_query}\n")
print("ü§ñ Generating response...\n")

result = qa_chain({"query": test_query})

print("="*80)
print("ANSWER:")
print("="*80)
print(result['result'])
print("\n" + "="*80)

## Risk Assessment

In [None]:
# Simplified risk scoring for Colab
def calculate_risk_scores():
    """Calculate risk scores based on property data"""
    
    scores = {
        'Structural': 35,  # Major issues (roof, HVAC)
        'Financial': 55,   # High vacancy
        'Legal': 90,       # Clean
        'Operational': 75, # Manageable
        'Market': 95       # Austin is hot
    }
    
    # Calculate weighted overall
    overall = sum(scores[cat] * RISK_WEIGHTS[cat] for cat in scores.keys())
    
    return scores, round(overall, 1)

scores, overall = calculate_risk_scores()

print("üìä RISK ASSESSMENT REPORT")
print("="*80)
for category, score in scores.items():
    weight = RISK_WEIGHTS[category]
    print(f"{category:15} {score:3}/100  (weight: {weight:.0%})")
print("="*80)
print(f"OVERALL SCORE: {overall}/100")
print("="*80)

if overall >= 75:
    decision = "‚úÖ STRONG GO"
elif overall >= 60:
    decision = "‚ö†Ô∏è  PROCEED WITH CAUTION"
else:
    decision = "üî¥ NO GO"

print(f"\nDECISION: {decision}")

## Visualization

In [None]:
# Risk score visualization
fig, ax = plt.subplots(figsize=(12, 6))

categories = list(scores.keys())
values = list(scores.values())
colors = ['#ff4444' if v < 60 else '#ffaa00' if v < 80 else '#44ff44' for v in values]

ax.barh(categories, values, color=colors, alpha=0.7, edgecolor='black')
ax.set_xlabel('Risk Score (0-100)', fontsize=12, fontweight='bold')
ax.set_title('Risk Assessment by Category', fontsize=14, fontweight='bold')
ax.set_xlim(0, 100)
ax.axvline(x=60, color='orange', linestyle='--', alpha=0.5)
ax.axvline(x=80, color='green', linestyle='--', alpha=0.5)

for i, (cat, score) in enumerate(zip(categories, values)):
    ax.text(score + 2, i, f'{score:.0f}', va='center', fontweight='bold')

plt.tight_layout()
plt.show()

print("‚úÖ Visualization complete!")

## Valuation & Final Recommendation

In [None]:
# Valuation calculation
asking_price = 425000
deferred_maintenance = 30000
fair_value = 395000
recommended_offer = fair_value - deferred_maintenance * 0.3

print("üí∞ VALUATION ANALYSIS")
print("="*80)
print(f"Asking Price:           ${asking_price:,}")
print(f"Fair Market Value:      ${fair_value:,}")
print(f"Deferred Maintenance:   ${deferred_maintenance:,}")
print(f"Recommended Offer:      ${recommended_offer:,.0f} - ${fair_value:,}")
print(f"Discount from Ask:      ${asking_price - recommended_offer:,.0f} ({(asking_price - recommended_offer)/asking_price:.1%})")
print("="*80)

print("\nüìã NEGOTIATION STRATEGY:")
print("1. Open offer at $385,000")
print("2. Request $10,000 seller credit for HVAC replacement")
print("3. Budget $25k-35k for deferred maintenance over next 2 years")
print("4. Walk-away price: $400,000")

## Summary

This Colab version demonstrates the same AI Property Due Diligence system, but using:
- **Cloud GPU** instead of local compute
- **HuggingFace models** (Phi-2) instead of Ollama
- **Same RAG architecture** (ChromaDB + LangChain)

### Advantages of Colab:
- ‚úÖ No local setup required
- ‚úÖ Free GPU access
- ‚úÖ Easy to share with classmates
- ‚úÖ Can use APIs (OpenAI, Groq) for better models

### Disadvantages:
- ‚ùå Smaller models (Phi-2 2.7B vs Llama 3.1 8B)
- ‚ùå Session timeout after inactivity
- ‚ùå Requires API keys for best performance

---

**For your presentation, I recommend using the local version (Ollama + Llama 3.1 8B) for best quality outputs. Use this Colab version as a backup or for sharing with your team!**