feat(domain-skills): add 贝壳找房 (ke.com) scraping skill#392
Open
sontianye wants to merge 2 commits into
Open
Conversation
Adds agent-workspace/domain-skills/ke-com/scraping.md with three field-tested approaches for extracting housing data from ke.com. All approaches verified live across Beijing, Shanghai, and Guangzhou on 2026-04-28. Approach 1 — 二手房 listing search (/ershoufang/): scrapes 30 resale listings per city via http_get. Extracts title, URL, total price (万), unit price (元/平), floor/year/layout/area/orientation, community name, tags (满五年/地铁 etc.), and follower count. Approach 2 — 租房 listing search (/zufang/): scrapes 30 rental listings per city via http_get. Extracts title, URL, price (元/月), district, neighborhood, community name, and full description string. Approach 3 — property detail page: requires browser (goto + js). Returns full property details via CSS selectors: area, layout, floor, orientation, decoration, community, district, description, and tags. Covers 8 confirmed city subdomains (bj/sh/gz/sz/cd/wh/hz/nj), URL patterns for all page types, and all known gotchas: district filters redirect to login page, page 2+ returns CAPTCHA, detail pages always require browser, rental pages follow a 302 before serving HTML.
Contributor
There was a problem hiding this comment.
1 issue found across 1 file
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Independent global regex scans for prices and descriptions returned different counts than listing blocks (e.g. Shanghai: 20 blocks vs 30 prices), causing price/desc to silently belong to the wrong listing. Rewrite ke_search_zufang to split on data-house_code boundaries and extract all fields per-chunk, with a targeted price selector that matches only the listing price element.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
agent-workspace/domain-skills/ke-com/scraping.md— a field-tested guide for extracting housing listings from 贝壳找房 (ke.com), China's largest property platform, usinghttp_get(no browser, no authentication required for listing search).All approaches verified live across Beijing, Shanghai, and Guangzhou on 2026-04-28.
http_geton{city}.ke.com/ershoufang/returns 30 listings with title, URL, total price (万), unit price (元/平), floor/year/layout/area/orientation, community name, tags (满五年/地铁 etc.), and follower counthttp_geton{city}.ke.com/zufang/returns 30 listings with price (元/月), district, neighborhood, community name, and full descriptiongoto+js) — detail pages return CAPTCHA viahttp_getAccess boundaries (all verified empirically)
http_get/{city}.ke.com/ershoufang/(city-wide p1)/{city}.ke.com/zufang/(city-wide p1)/ershoufang/{district}/(district filter)/ershoufang/pg{n}/(page 2+)/ershoufang/{id}.html(detail)Confirmed city subdomains
bj北京 /sh上海 /gz广州 /sz深圳 /cd成都 /wh武汉 /hz杭州 /nj南京Summary by cubic
Adds a field-tested domain skill for scraping 贝壳找房 (ke.com). Supports first-page listing extraction via
http_get, with browser fallback for pages behind CAPTCHA. Also fixes the rental parser to keep prices and descriptions aligned per listing.New Features
agent-workspace/domain-skills/ke-com/scraping.mdwith code snippets and selectors./ershoufang/and rental/zufang/: scrape first 30 listings per city (title, URL, prices, layout/area, community, tags).goto+js) due to redirects/CAPTCHA.bj,sh,gz,sz,cd,wh,hz,nj.Bug Fixes
/zufang/parser to extract fields perdata-house_codechunk with a targeted price selector, preventing price/desc misalignment across listings.Written for commit 8db19bc. Summary will update on new commits. Review in cubic