---
# LLM based Insight & Personalsed Content Generator

This notebook describes the architecture for a model that generates property content tailored to different target audiences, e.g. (families, expats, students, investors) for use by product and marketing teams

### Quick navigation

- [Proposed System Architecture](#proposed-system-architecture])
- [Data Sources](#data-sources)
- [Hallucination Approach](#hallucination-approach)
- [Evaluation](#evaluation)
- [Example Outputs](#example-outputs)
- [Appendix](#appendix)

---
# Proposed System Architecture

Given strong differences between information that would be relevant for sales vs rentals I would use a core central platform with two pipelines for separation between sales and rentals
* shared RAG + orchestration
* seperate prompt templates, metrics, and guardrails for rentals vs sales

<img src="images/LLM_high_level_architecture.png" width="1000"><br>

<img src="images/LLM_shared_platform.png" width="1000">


## High level flow

1. Input
    * property metadata (structured) + target audience + language/tone
2. Retrieval (RAG)
    * Fetch neighborhood facts (schools, safety, public transport, amenities)
    * Market context (rent index, vacancy, mortgage rates, price trends)
    * POI distances (e.g supermarket) / commute times
    * Internal policies (allowed claims, disclaimers, phrasing preferences, evidence requirements)
3. Content planning (LLM or small rule based guide)
    * Extract what matters for each audience (e.g., schools for families, yield for investors)
    * Define structured content plan
4. Content generation (LLM)
    * Generate description strictly from a provided “grounding bundle”
    * Output includes fact references (citations to retrieved snippets/fields)
5. Validation & guardrails (Rule based, schema validator, LLM)
    * Claim checker: ensure every numeric/location/superlative claim is supported
    * Schema validator: ensure required sections and constraints
    * Policy filter: block disallowed claims (“safest area”, “guaranteed ROI”) unless supported + allowed
6. Revision (LLM)
    * If validation fails, return to LLM Writer
7. Output
    * Multiple descriptions
    * Structured “fact table” used
    * Confidence flags

## Prompting Strategy

Pipeline:
1. Assemble facts block (property metadata + retrieved snippets + derived metrics)
2. Planner prompt consumes that facts block + target audience + listing type
3. Planner returns a JSON plan:
    * chosen facts (with IDs)
    * outline/sections
    * key messages per audience
    * banned claims + required disclaimers
    * style constraints
4. Writer prompt receives:
    * the same facts block
    * the planner JSON
    * formatting rules + JSON output schema
5. Writer generates final description + claim/source mapping
6. Validator checks claims against “facts_used” and retrieved snippet IDs.

---
# Data Sources

## Internal listing data

Source: internal databases

Example fields:
* location: city, neighborhood, coordinates
* property type, year built, condition/renovations
* size (m²), rooms, floor, lift, balcony
* rent/price, oncosts, parking
* energy rating / heating type (if available)
* pet policy
* furnished/unfurnished
* availability date
* photos

## Neighbourhood & Points of interest facts

Potentital Sources:
* public transport: https://data.opentransportdata.swiss/en/dataset/traffic-points-full
* public transport: https://registry.opendata.aws/schweizer-haltestellen-oev/?utm_source=chatgpt.com
* poi: open street map api: https://www.openstreetmap.org/export

Example fields
* nearby schools/daycare count + distances
* grocery, parks, gyms, medical, nightlife
* public transport stops + frequency proxy
* safety proxy (e.g., “crime incidents per 1,000 residents” as a synthetic stat)
* “quietness index” / “green space index” (synthetic)

## Market data

Potential sources:
* Federal statistic office
* Internal databases

Example reports
* rental price index YoY and QoQ
* vacancy rate
* mortgage rate
* average time-on-market
* comparable listing ranges (by m² / n rooms)

## Macroeconomic indicators

Potential sources:
* Federal statistic office

Example indicators
* inflation rate
* unemployment

---
# Hallucination Approach 

## Grounding rules

The LLM only gets given:
* property metadata
* retrieved neighbourhood/market snippets
* allowed/dissallowed claims policy (e.g. dissallow "guaranteed returns")

Citations are required for
* numbers (rent index %, distances, commute times)
* comparative phrases ("above average", "high demand")

## Post generation validation

Automated checks will be implemented
* Claim-to-source linking
    * each claim has source_id referencing a field/snippet
* Numeric consistency checks
    * rent = metadata rent
    * computed CHF/m² = rent/m² (within tolerance)
* Entity checks
    * neighbourhood name matches metadata
    * POIs referenced exist in retrieved POI list
* Style constraints
    * avoid forbidden phrases
    * length bounds per audience (e.g., students shorter)

## Human-in-the-loop

Anything flagged "low confidence" goes to review

Review UI shows
* generated text
* extracted claims
* their sources
* quick approve/edit

---
# Evaluation

## Automated evaluation

Factuality / grounding
* % of claims with valid source links
* contradiction rate vs metadata

Audience fit
* A lightweight classifier or LLM-judge will evaluate the text and classify which audience is best fit, e.g it would likely look at wording like this
    * families: schools/parks/safety/space
    * expats: commute/international vibe/furnished/services
    * students: transit to campus/budget/roommates
    * investors: yield/demand/liquidity risks

Readability
* length
* sentence complexity
* banned terms

Diversity of descriptions (if a concern)
* n-gram overlap across the different audience outputs

## Human evaluation

Marketing/product team would be used to review the outputs before final acceptance. A weekly sample might also be made

Rate 1–5
* audience alignment
* clarity
* compliance risk
* would you publish this?

## Test set
* 30–50 synthetic properties across neighborhoods and price tiers
* Include edge cases / conflicting data:
    * missing data
    * studio near nightlife
    * family home but no nearby schools
    * investor property with high rent but high vacancy area

---
# Example Outputs


## Rental property

Assumed Rental Metadata
* Location: Zurich – Wiedikon
* Property type: 3.5-room apartment
* Size: 82 m²
* Balcony: 7 m²
* Year built: 2016
* Floor: 2nd floor
* Building features: Elevator, underfloor heating
* Condition: Modern, well-maintained
* Furnishing: Unfurnished
* Pets: Cats allowed; dogs case-by-case
* Monthly rent: CHF 3,150
* Utilities (Nebenkosten): CHF 250
* Total monthly cost: CHF 3,400
* Optional parking: CHF 180/month
* Availability: 01.04.2026

3 audiences from the same facts

**A) Families (space + schools + parks + practicalities)**  

Title: Family-friendly 3.5-room with balcony near parks & schools (Wiedikon)

Enjoy modern family living in this bright 3.5-room, 82 m² apartment with a 7 m² balcony in Zurich-Wiedikon. The building (2016) offers an elevator and comfortable underfloor heating, making day-to-day routines easy and cozy.

For family life, the area is well covered: there are multiple primary schools within ~1.2 km and nearby childcare options, plus several parks within a 10-minute walk for after-school playtime. With a nearby S-Bahn connection, getting around Zurich is straightforward—even on busy mornings.

Key facts: CHF 3,150/month incl. utilities (CHF 250), optional parking CHF 180, available 2026-04-01, pets (cats; dogs case-by-case).
Note: distances and neighborhood indicators are based on aggregated local stats and routing estimates.


**B) Expats (connectivity + comfort + “easy landing”)**

Title: Modern Wiedikon base with fast transit access and a sunny balcony

Settle into Zurich smoothly with this modern 82 m² apartment in Wiedikon, a well-connected residential area with a strong day-to-day infrastructure. The 2016 building includes an elevator, and the apartment features underfloor heating, a contemporary kitchen, and a 7 m² balcony—ideal for a quiet coffee or a work-from-home break.

Commuting is convenient: a nearby S-Bahn station is within walking distance and typically runs frequently during peak times. For everyday needs, the neighborhood offers parks and services close by, making it easy to build a routine quickly.

Rent is CHF 3,150/month (incl. utilities), optional parking CHF 180, available 2026-04-01. Unfurnished; pets possible (cats; dogs case-by-case).


**C) Students (budget framing + transit + lifestyle, without overpromising)**

Title: Spacious 3.5-room option in Wiedikon with great transit access

If you want more space than a typical studio, this 3.5-room, 82 m² apartment in Wiedikon offers flexibility for a study setup or shared living. You get an elevator, underfloor heating, and a 7 m² balcony for fresh air between sessions.

Transit is a strong point: the S-Bahn is within walking distance with frequent peak service, helping you move around the city efficiently. The area also has parks and everyday amenities nearby.

Monthly rent is CHF 3,150 (incl. utilities CHF 250), optional parking CHF 180, available 2026-04-01. Unfurnished; pets possible depending on arrangement.


**D) Investors (demand signals + rent/m² + vacancy + risk wording)**

Title: Wiedikon rental demand profile with competitive rent-per-m²

This modern 82 m² apartment in Zurich-Wiedikon (2016 building, elevator, balcony) sits in a market context characterized by low district-level vacancy (~0.7%) and a +2.1% YoY rental index trend (Zurich, latest monthly report). At CHF 3,150/month incl. utilities, the implied advertised rent is ~CHF 38.4/m², broadly in line with a local median around CHF 37/m² (Wiedikon advertised listings benchmark).

Investment notes (illustrative):

Demand indicator: low vacancy suggests continued tenant competition.

Positioning: modern stock + balcony + elevator generally supports liquidity among renters.

Risk/disclosure: benchmarks are aggregated; actual achievable rent depends on micro-location, condition, and tenant mix.

Optional parking add-on CHF 180 can improve attractiveness for certain tenant segments. Available 2026-04-01.


## Sales property

Assumed Sales Metadata
* Location: Zurich – Wiedikon
* 3.5 rooms, 82 m²
* 7 m² balcony
* Built 2016
* Elevator, underfloor heating
* Asking price: CHF 1,500,000
* District vacancy: ~0.7%
* Rental index YoY: +2.1%
* Local rental median: ~CHF 37/m²

3 audiences from the same facts

**A) Families (Ownership Stability + Long-Term Fit)**

Title: Modern family-ready 3.5-room home with balcony in Wiedikon

This bright 3.5-room apartment (82 m²) in Zurich-Wiedikon offers modern comfort and long-term stability for families looking to put down roots. Built in 2016, the building features an elevator and underfloor heating, while the 7 m² balcony extends the living space outdoors.

The neighborhood supports everyday family life, with primary schools within approximately 1.2 km and several parks reachable within a 10-minute walk. Public transport connections are close by, helping simplify commutes and school routines.

Asking price: CHF 1,500,000.

This property combines modern construction standards with a well-established residential area—an attractive option for families seeking both livability and long-term ownership in Zurich.

Note: Neighborhood distances and indicators are based on aggregated local data.

**B) Expats (Relocation + Quality + Liquidity)**

Title: Move-in ready 82 m² apartment in well-connected Wiedikon

This modern 3.5-room apartment in Zurich-Wiedikon offers 82 m² of well-designed living space in a 2016 building with elevator access and underfloor heating. A 7 m² balcony provides outdoor space in a quiet residential setting.

Wiedikon is known for its strong public transport connections and balanced mix of residential comfort and urban infrastructure. Schools, parks, and daily amenities are within walking distance, making it practical for professionals relocating to Zurich.

With an asking price of CHF 1,500,000, the property offers exposure to one of Zurich’s established residential districts. The relatively low district-level vacancy (~0.7%) and stable rental index trends (+2.1% YoY) provide additional context for long-term market positioning, though future performance depends on broader market conditions.

Suitable for owner-occupiers seeking quality housing in a central yet residential location.

**C) Investors (Acquisition Framing + Yield Sensitivity + Risk)**

Title: Wiedikon residential asset with rental yield potential

This 82 m² apartment in Zurich-Wiedikon (built 2016, elevator, 7 m² balcony) is positioned in a district characterized by low vacancy (~0.7%) and a +2.1% year-over-year rental index trend.

Based on local advertised rental medians around CHF 37–38/m², the property’s market-aligned rent potential would depend on tenant positioning and unit condition. Gross yield is sensitive to acquisition price, operating costs, financing structure, and holding period.

Investment considerations:

Modern building stock (2016 construction)

Competitive unit size for urban demand

Low district-level vacancy environment

Established residential micro-location

Asking price: CHF 1,500,000.

All rental and market figures reflect aggregated district-level data and are illustrative. Actual returns are not guaranteed and depend on market conditions and execution.


---
# Appendix

### Example prompt - planner

```text
SYSTEM:
You are a content planner for real-estate listings. Your job is NOT to write marketing copy.
You must produce a structured content plan that selects which facts may be used and how they should be presented.
You must follow the rules:
- Use ONLY information explicitly present in the FACTS block.
- If a desired fact is missing, do not invent it; instead add it to "missing_info".
- Avoid prohibited claim types (see POLICY).
- Output MUST be valid JSON matching the PLANNER_SCHEMA. No extra keys.

USER:
TASK
Create a content plan for a real-estate listing description tailored to:
- Audience: {AUDIENCE}
- Listing type: {LISTING_TYPE}  (one of: rental, sale)
- Language: English
- Desired tone: {TONE}  (e.g., warm, professional, minimalist)

GOALS
1) Select the most relevant facts for this audience and listing type.
2) Produce an outline (sections) with intent for each section.
3) Specify hard constraints for the writer: what must be cited, what must be avoided, and required disclaimers.
4) Set length and formatting targets.

POLICY (Prohibited / High-risk claims)
- No guarantees: “guaranteed ROI”, “will increase in value”, “risk-free”
- No safety superlatives: “safest”, “no crime”, “perfectly safe”
- No school quality claims unless explicitly supported by a retrieved snippet (e.g., official rating)
- No invented POIs or commute times not present in FACTS
- Avoid subjective absolutes: “best”, “most desirable” unless supported by retrieved ranking data

FACTS BLOCK
{FACTS_BLOCK}

PLANNER_SCHEMA (must match exactly)
{PLANNER_SCHEMA_JSON}

Return only JSON.
```

### Example prompt - writer

```text
SYSTEM:
You are a real-estate copywriter. You must write marketing copy that is strictly grounded in provided facts.
Rules:
- Use ONLY facts listed in plan.facts_to_use (and their values/snippets).
- Do NOT introduce any new numbers, distances, times, rankings, or POIs.
- Every claim containing a number, distance, time, %, or money MUST have a source_id pointing to either a meta field or retrieved snippet id.
- Output MUST be valid JSON matching the WRITER_SCHEMA. No extra keys.

USER:
TASK
Write a tailored listing description for:
- Audience: {AUDIENCE}
- Listing type: {LISTING_TYPE}
- Language: English

INPUTS
1) FACTS BLOCK (for reference, but only allowed facts may be used):
{FACTS_BLOCK}

2) PLAN (the allowed facts + outline + constraints):
{PLAN_JSON}

WRITER_SCHEMA (must match exactly)
{WRITER_SCHEMA_JSON}

STYLE REQUIREMENTS
- Tone: follow plan.style.tone
- Length: keep description within plan.style.length_words range
- Structure: follow plan.outline section order
- Formatting: short paragraphs + highlights bullet list

Return only JSON.
```text

### Example JSON schema - planner

In [None]:
{
  "type": "object",
  "required": [
    "audience",
    "listing_type",
    "positioning",
    "outline",
    "facts_to_use",
    "must_cite",
    "must_avoid",
    "required_disclaimers",
    "style",
    "missing_info"
  ],
  "properties": {
    "audience": { "type": "string", "enum": ["families", "expats", "students", "investors"] },
    "listing_type": { "type": "string", "enum": ["rental", "sale"] },

    "positioning": {
      "type": "object",
      "required": ["primary_angle", "secondary_angle"],
      "properties": {
        "primary_angle": { "type": "string" },
        "secondary_angle": { "type": "string" }
      }
    },

    "outline": {
      "type": "array",
      "minItems": 4,
      "items": {
        "type": "object",
        "required": ["section", "goal"],
        "properties": {
          "section": { "type": "string" },
          "goal": { "type": "string" }
        }
      }
    },

    "facts_to_use": {
      "type": "array",
      "minItems": 6,
      "items": {
        "type": "object",
        "required": ["source_id", "label", "value", "use_for"],
        "properties": {
          "source_id": { "type": "string" },
          "label": { "type": "string" },
          "value": {},
          "use_for": { "type": "array", "items": { "type": "string" } }
        }
      }
    },

    "must_cite": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["claim_type", "rule"],
        "properties": {
          "claim_type": { "type": "string" },
          "rule": { "type": "string" }
        }
      }
    },

    "must_avoid": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["type", "rule"],
        "properties": {
          "type": { "type": "string" },
          "rule": { "type": "string" }
        }
      }
    },

    "required_disclaimers": {
      "type": "array",
      "items": { "type": "string" }
    },

    "style": {
      "type": "object",
      "required": ["tone", "length_words", "format"],
      "properties": {
        "tone": { "type": "string" },
        "length_words": {
          "type": "array",
          "minItems": 2,
          "maxItems": 2,
          "items": { "type": "integer" }
        },
        "format": { "type": "string" }
      }
    },

    "missing_info": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["field", "why_needed"],
        "properties": {
          "field": { "type": "string" },
          "why_needed": { "type": "string" }
        }
      }
    }
  }
}

### Example JSON schema - writer

In [None]:
{
  "type": "object",
  "required": [
    "audience",
    "listing_type",
    "title",
    "description",
    "highlights",
    "claims",
    "facts_used",
    "warnings"
  ],
  "properties": {
    "audience": { "type": "string", "enum": ["families", "expats", "students", "investors"] },
    "listing_type": { "type": "string", "enum": ["rental", "sale"] },

    "title": { "type": "string" },
    "description": { "type": "string" },

    "highlights": {
      "type": "array",
      "minItems": 3,
      "maxItems": 6,
      "items": { "type": "string" }
    },

    "claims": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["claim_text", "source_id", "span"],
        "properties": {
          "claim_text": { "type": "string" },
          "source_id": { "type": "string" },
          "span": {
            "type": "object",
            "required": ["start", "end"],
            "properties": {
              "start": { "type": "integer" },
              "end": { "type": "integer" }
            }
          }
        }
      }
    },

    "facts_used": {
      "type": "array",
      "items": { "type": "string" }
    },

    "warnings": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}