Skip to content

MDS Excavator(微数据平台数据采集工具)

Notifications You must be signed in to change notification settings

MDataSet/excavator

Repository files navigation

MDS Excavator(微数据平台数据采集工具)

excavator

特点

  1. 使用简单:非常的简介的编程接口

  2. 类库化:方便集成到第三方系统

  3. 功能全面:支持主流的HTTP抓取操作

  4. 反机器识别:支持自定义User-Agent、动态代理、速度控制等

使用

基础:如果使用4个函数玩转汽车之家数据抓取

Excavator.start()
      .contentType("text/html; charset=gb2312").charset(Charset.GB2312)
      .go("root", ('A' to 'Z').map("http://www.autohome.com.cn/grade/carhtml/" + _ + ".html"),
        _.go("brand", "dl",
          _.text("brandName", "dt")
            .go("company", "dd",
              _.text("companyName", ".h3-tit")
                .go("series", "li:has(h4)",
                  _.text("seriesName", "h4")
                    .text("seriesId", _.attr("id").replace("s", ""))
                    .go("model", "http://car.autohome.com.cn/config/series/{seriesId}.html",
                      _.json("config", "config", _.get("result").get("paramtypeitems"))
                        .json("option", "option", _.get("result").get("configtypeitems"))
                        .process(println)
                    )
                )
            )
        )
      )

License

Under version 2.0 of the Apache License.

About

MDS Excavator(微数据平台数据采集工具)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages