Skip to content

feat(domain-skills): add 贝壳找房 (ke.com) scraping skill#392

Open
sontianye wants to merge 2 commits into
browser-use:mainfrom
sontianye:feat/ke-com-domain-skill
Open

feat(domain-skills): add 贝壳找房 (ke.com) scraping skill#392
sontianye wants to merge 2 commits into
browser-use:mainfrom
sontianye:feat/ke-com-domain-skill

Conversation

@sontianye
Copy link
Copy Markdown
Contributor

@sontianye sontianye commented May 27, 2026

Summary

Adds agent-workspace/domain-skills/ke-com/scraping.md — a field-tested guide for extracting housing listings from 贝壳找房 (ke.com), China's largest property platform, using http_get (no browser, no authentication required for listing search).

All approaches verified live across Beijing, Shanghai, and Guangzhou on 2026-04-28.

  • Approach 1 — 二手房 (resale): http_get on {city}.ke.com/ershoufang/ returns 30 listings with title, URL, total price (万), unit price (元/平), floor/year/layout/area/orientation, community name, tags (满五年/地铁 etc.), and follower count
  • Approach 2 — 租房 (rental): http_get on {city}.ke.com/zufang/ returns 30 listings with price (元/月), district, neighborhood, community name, and full description
  • Approach 3 — Property detail: browser required (goto + js) — detail pages return CAPTCHA via http_get

Access boundaries (all verified empirically)

URL pattern http_get Browser
/{city}.ke.com/ershoufang/ (city-wide p1) ✅ Full HTML
/{city}.ke.com/zufang/ (city-wide p1) ✅ Full HTML
/ershoufang/{district}/ (district filter) ❌ Redirects to login
/ershoufang/pg{n}/ (page 2+) ❌ CAPTCHA
/ershoufang/{id}.html (detail) ❌ CAPTCHA

Confirmed city subdomains

bj 北京 / sh 上海 / gz 广州 / sz 深圳 / cd 成都 / wh 武汉 / hz 杭州 / nj 南京


Summary by cubic

Adds a field-tested domain skill for scraping 贝壳找房 (ke.com). Supports first-page listing extraction via http_get, with browser fallback for pages behind CAPTCHA. Also fixes the rental parser to keep prices and descriptions aligned per listing.

  • New Features

    • Added agent-workspace/domain-skills/ke-com/scraping.md with code snippets and selectors.
    • Resale /ershoufang/ and rental /zufang/: scrape first 30 listings per city (title, URL, prices, layout/area, community, tags).
    • District filters, pagination (pg2+), and detail pages require browser (goto + js) due to redirects/CAPTCHA.
    • Includes confirmed city subdomains: bj, sh, gz, sz, cd, wh, hz, nj.
  • Bug Fixes

    • Rewrote /zufang/ parser to extract fields per data-house_code chunk with a targeted price selector, preventing price/desc misalignment across listings.

Written for commit 8db19bc. Summary will update on new commits. Review in cubic

Adds agent-workspace/domain-skills/ke-com/scraping.md with three
field-tested approaches for extracting housing data from ke.com.
All approaches verified live across Beijing, Shanghai, and Guangzhou
on 2026-04-28.

Approach 1 — 二手房 listing search (/ershoufang/): scrapes 30 resale
listings per city via http_get. Extracts title, URL, total price (万),
unit price (元/平), floor/year/layout/area/orientation, community name,
tags (满五年/地铁 etc.), and follower count.

Approach 2 — 租房 listing search (/zufang/): scrapes 30 rental listings
per city via http_get. Extracts title, URL, price (元/月), district,
neighborhood, community name, and full description string.

Approach 3 — property detail page: requires browser (goto + js). Returns
full property details via CSS selectors: area, layout, floor, orientation,
decoration, community, district, description, and tags.

Covers 8 confirmed city subdomains (bj/sh/gz/sz/cd/wh/hz/nj), URL
patterns for all page types, and all known gotchas: district filters
redirect to login page, page 2+ returns CAPTCHA, detail pages always
require browser, rental pages follow a 302 before serving HTML.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread agent-workspace/domain-skills/ke-com/scraping.md Outdated
Independent global regex scans for prices and descriptions returned
different counts than listing blocks (e.g. Shanghai: 20 blocks vs
30 prices), causing price/desc to silently belong to the wrong listing.

Rewrite ke_search_zufang to split on data-house_code boundaries and
extract all fields per-chunk, with a targeted price selector that
matches only the listing price element.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant