# Korean Real Estate AI Analysis - LOCAL LLM

Analyze Korean apartment transaction data using **100% LOCAL LLMs** (Ollama).

**Benefits:**
- ✅ No API costs
- ✅ Complete privacy
- ✅ Real market insights
- ✅ Works offline

**Data Source:** 국토교통부 실거래가 공개 시스템 (Korean Ministry of Land)

## 1. Setup & Load Data

In [None]:
import pandas as pd
from langchain_ollama import ChatOllama
from dotenv import load_dotenv
import matplotlib.pyplot as plt
import os

load_dotenv()

# Initialize LOCAL LLM (Ollama)
llm = ChatOllama(
    model=os.getenv("LLM_MODEL", "qwen2.5:7b"),
    base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
)

print("✓ Local LLM initialized")
print(f"  Model: {llm.model}")
print(f"  Endpoint: {llm.base_url}")

In [None]:
# Load sample data (replace with your actual data file)
# For demo, we'll create sample data structure

# If you have the CSV file:
# df = pd.read_csv("../data/processed_transactions.csv")

# Sample data for demonstration
sample_data = {
    'aptNm': ['새뜸마을2단지', '가재마을11단지', '수루배마을6단지'],
    'umdNm': ['새롬동', '종촌동', '반곡동'],
    'dealAmount': [47000, 37500, 38100],  # in 만원 (10,000 KRW)
    'dealYear': [2024, 2024, 2024],
    'dealMonth': [1, 1, 1],
    'excluUseAr': [59.96, 59.93, 59.9],  # m²
    'floor': [6, 13, 6],
    'buildYear': [2017, 2013, 2019],
    'dealingGbn': ['중개거래', '중개거래', '중개거래'],
    'buyerGbn': ['개인', '개인', '개인']
}

df = pd.DataFrame(sample_data)
print(f"\n✓ Loaded {len(df)} transactions")
df.head()

## 2. Market Trend Analysis with LLM

In [None]:
# Prepare summary statistics
stats = df.groupby('umdNm')['dealAmount'].agg(['mean', 'count']).round(2)
print("Location Statistics:")
print(stats)

# Ask LLM to analyze
prompt = f"""Analyze this Korean real estate market data:

{stats.to_string()}

Prices are in 만원 (10,000 KRW). Provide:
1. Top 3 areas with highest average prices
2. Market trend interpretation
3. Investment recommendations

Answer in clear, structured format."""

print("\n=== LLM Analysis ===")
response = llm.invoke(prompt)
print(response.content)

## 3. Price Prediction Insights

In [None]:
# Recent price trends
avg_price = df['dealAmount'].mean()
min_price = df['dealAmount'].min()
max_price = df['dealAmount'].max()

prompt = f"""Given these Korean apartment price statistics (in 만원):
- Average: {avg_price:,.0f}
- Minimum: {min_price:,.0f}
- Maximum: {max_price:,.0f}
- Sample size: {len(df)} transactions

Predict:
1. Next month's likely price direction (up/down/stable)
2. Confidence level and reasoning
3. Key factors to watch

Use data patterns and market reasoning."""

print("=== Price Prediction ===")
prediction = llm.invoke(prompt)
print(prediction.content)

## 4. Location Recommendation

In [None]:
# Compare neighborhoods
location_stats = df.groupby('umdNm').agg({
    'dealAmount': ['mean', 'min', 'max'],
    'excluUseAr': 'mean'
}).round(2)

print("Neighborhood Comparison:")
print(location_stats)

prompt = f"""As a real estate advisor, analyze these Korean neighborhoods:

{location_stats.to_string()}

For a buyer with 40,000만원 (400 million KRW) budget:
1. Recommend best neighborhoods
2. Explain value proposition for each
3. Mention any risks or considerations

Be specific and practical."""

print("\n=== Location Recommendation ===")
recommendation = llm.invoke(prompt)
print(recommendation.content)

## 5. Deal Quality Assessment

In [None]:
# Analyze specific transaction
sample_deal = df.iloc[0]

prompt = f"""Evaluate this Korean apartment deal:

- Apartment: {sample_deal['aptNm']}
- Location: {sample_deal['umdNm']}
- Price: {sample_deal['dealAmount']:,}만원
- Size: {sample_deal['excluUseAr']} m²
- Floor: {sample_deal['floor']}
- Building Year: {sample_deal['buildYear']}
- Transaction Type: {sample_deal['dealingGbn']}

Assessment:
1. Is this price fair? (compare to similar properties)
2. Pros and cons of this deal
3. Negotiation potential
4. Overall rating (1-10) with reasoning"""

print("=== Deal Assessment ===")
assessment = llm.invoke(prompt)
print(assessment.content)

## 6. Visualization + LLM Interpretation

In [None]:
# Create price distribution chart
plt.figure(figsize=(10, 6))
df['dealAmount'].hist(bins=20, edgecolor='black')
plt.title("Apartment Price Distribution", fontsize=14)
plt.xlabel("Price (만원)")
plt.ylabel("Frequency")
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Statistical summary
stats_summary = {
    'mean': df['dealAmount'].mean(),
    'median': df['dealAmount'].median(),
    'std': df['dealAmount'].std(),
    'min': df['dealAmount'].min(),
    'max': df['dealAmount'].max()
}

print("\nPrice Statistics:")
for key, value in stats_summary.items():
    print(f"  {key}: {value:,.2f}만원")

In [None]:
# Ask LLM to interpret the distribution
prompt = f"""Interpret this Korean apartment price distribution:

Statistics (in 만원):
- Mean: {stats_summary['mean']:,.0f}
- Median: {stats_summary['median']:,.0f}
- Std Dev: {stats_summary['std']:,.0f}
- Range: {stats_summary['min']:,.0f} to {stats_summary['max']:,.0f}

What does this tell us about:
1. Market segmentation
2. Affordability levels
3. Typical buyer profiles
4. Overall market health"""

print("=== Distribution Interpretation ===")
interpretation = llm.invoke(prompt)
print(interpretation.content)

## 7. Buyer Type Analysis

In [None]:
# Compare buyer types
buyer_analysis = df.groupby('buyerGbn')['dealAmount'].agg(['mean', 'count']).round(2)
print("Buyer Type Analysis:")
print(buyer_analysis)

prompt = f"""Analyze buyer types in Korean real estate:

{buyer_analysis.to_string()}

In Korea, '개인' means individual buyers. Analyze:
1. What the buyer composition reveals about market dynamics
2. Differences in purchasing power or behavior
3. Strategic implications for sellers
4. Market sentiment indicators"""

print("\n=== Buyer Analysis ===")
comparison = llm.invoke(prompt)
print(comparison.content)

## 8. Investment Strategy Generation

In [None]:
# Full dataset summary for strategy
summary = {
    'total_transactions': len(df),
    'avg_price': df['dealAmount'].mean(),
    'avg_size': df['excluUseAr'].mean(),
    'top_locations': df.groupby('umdNm')['dealAmount'].mean().sort_values(ascending=False).head(3).to_dict()
}

print("Market Summary:")
print(f"  Total Transactions: {summary['total_transactions']}")
print(f"  Average Price: {summary['avg_price']:,.0f}만원")
print(f"  Average Size: {summary['avg_size']:.2f} m²")
print(f"  Top Locations: {list(summary['top_locations'].keys())}")

prompt = f"""Given this Korean real estate market overview:

- Total transactions: {summary['total_transactions']}
- Average price: {summary['avg_price']:,.0f}만원
- Average size: {summary['avg_size']:.1f} m²
- Top locations: {', '.join(summary['top_locations'].keys())}

Create a 6-month investment strategy for Korean real estate:

1. Buy/Hold/Sell recommendation
2. Specific neighborhoods to target
3. Budget allocation (if 100,000만원 available)
4. Risk management approach
5. Expected ROI timeline

Be practical and data-driven. Consider Korean market specifics."""

print("\n=== Investment Strategy ===")
strategy = llm.invoke(prompt)
print(strategy.content)

## 9. Custom Query (Interactive)

In [None]:
# Ask your own question about the data
your_question = "What are the key factors affecting apartment prices in this dataset?"

# Provide data context
data_context = f"""
Dataset overview:
- Columns: {', '.join(df.columns)}
- Sample data:
{df.head(3).to_string()}
"""

prompt = f"""{data_context}

Question: {your_question}

Provide a detailed, data-driven answer."""

print("=== Custom Analysis ===")
custom_response = llm.invoke(prompt)
print(custom_response.content)

## Summary

This notebook demonstrates **LOCAL LLM** analysis of Korean real estate data:

✅ **Zero API costs** - All inference runs locally
✅ **Complete privacy** - Data never leaves your machine
✅ **Real insights** - Practical analysis for market decisions
✅ **Flexible** - Easy to customize for your needs

### Next Steps:
1. Load your actual transaction data CSV
2. Customize prompts for specific analysis needs
3. Integrate with other notebooks for end-to-end pipeline
4. Try different local models (llama3.1, mistral, etc.)

**Models to try:**
- `qwen2.5:7b` - Balanced, good with Korean context
- `llama3.1:8b` - Strong reasoning
- `gemma2:9b` - Fast and capable
