# First GPT Model

This GPT model (prompt engineered) takes County Assessor data and gives `geojson` and home inspection notes as an output.

In [1]:
!pip install openai



In [7]:
from openai import OpenAI
import base64
import json
from dotenv import load_dotenv
from os import path

In [8]:
# Initialize OpenAI

load_dotenv()
client = OpenAI()

In [9]:
def encode_image(filepath):
    with open(filepath, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8')

In [10]:
def generate_geojson_and_note(house_data, image_path, model="gpt-4o"):
    image_base64 = encode_image(image_path)
    
    # ----- Prompt Setup -----
    prompt = f"""
You are a certified **home energy inspection expert** and data specialist working on a project to generate synthetic inspection reports for single-family homes. You are helping build a training set for a home efficiency AI model.

You are given:
- Structured residential property data in JSON format
- A photo of the exterior of the home

Use these to generate two outputs:
1. A **GeoJSON Feature** with a fictional but plausible (longitude, latitude) location in Pennsylvania. Populate `"properties"` using the provided JSON fields:
   - "Year Built"
   - "Total Square Feet Living Area"
   - "Building Style"
   - "Exterior Wall Material"
   - "Heating Fuel Type"
   - "Heating System Type"
   - "Heat/Air Cond"
   - "Bedrooms"
   - "Full Baths"
   - "Half Baths"
   - "Basement"
   - "Number of Stories"
   - "Grade"

2. A short **inspection note** written as if you had just walked around the home. Focus on energy-related characteristics: insulation, HVAC age/type, visible window quality, age, materials, and any notable upgrades or issues you can infer from the attributes or image.

Here is the structured property data:

{json.dumps(house_data)}

Return a single raw JSON object, like this:

{{
  "geojson": {{ ... }},
  "inspection_note": "..."
}}

Output **only valid JSON**, no backticks or explanation.
"""
    # ----- API Call -----
    
    response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}
            ]
        }
    ],
    temperature=0.7
)

    # ----- Parse -----
    
    reply = response.choices[0].message.content
    return json.loads(reply)

In [11]:
# Test with one home

home = 'dataset/RAMBEAU_RD_15'

with open(path.join(home, 'data.json'), 'r') as f:
  house_data = json.load(f)

result = generate_geojson_and_note(house_data, path.join(home, 'photo_1.jpg'))

In [12]:
result

{'geojson': {'type': 'Feature',
  'geometry': {'type': 'Point', 'coordinates': [-75.1652, 40.0052]},
  'properties': {'Year Built': '1989',
   'Total Square Feet Living Area': '2,576',
   'Building Style': 'COLONIAL',
   'Exterior Wall Material': 'ALUMINUM/VINYL SIDING',
   'Heating Fuel Type': 'ELECTRIC',
   'Heating System Type': 'WARM AIR',
   'Heat/Air Cond': 'AIR COND',
   'Bedrooms': '4',
   'Full Baths': '2',
   'Half Baths': '1',
   'Basement': 'FULL',
   'Number of Stories': '2',
   'Grade': 'B - GOOD'}},
 'inspection_note': "The home appears to be in good condition with aluminum/vinyl siding that is well-maintained. The windows look relatively modern, suggesting they may have been updated to improve energy efficiency. The heating system is electric warm air with air conditioning, likely providing efficient climate control. The full basement could be a source of heat loss if not properly insulated. Overall, the home's exterior and materials suggest it is well-suited for energy

In [13]:
with open(path.join(home, 'preprocessed.json'), 'w') as f:
  json.dump(result, f)

In [None]:
# TODO: Loop Through Dataset