New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请教版主关于正则表达式匹配的问题 求解答 #399
Comments
你指的是哪个正则?代码太多了,如果是正则问题,请提供: |
pve_1092_1/\\? |
@code4craft 好的 我现在试试 马上告知结果 |
@code4craft 非常感谢 正则表达式通过了 只是这些我应该如何查阅呢?在没有你的帮助下 我似乎在文档里面没有发现这个关于正则表达式的点 这个是点是正则表达式的通用情况吧 |
正则表达式是通用技术,文档没有提供,搜一下吧。这个issue不再回复了。 |
@code4craft 搜到了 谢谢 |
import us.codecraft.webmagic.Page;
import us.codecraft.webmagic.Site;
import us.codecraft.webmagic.Spider;
import us.codecraft.webmagic.pipeline.JsonFilePipeline;
import us.codecraft.webmagic.processor.PageProcessor;
/**
@author code4crafter@gmail.com
*/
public class Client58Processor implements PageProcessor {
public static final String URL_LIST = "http://cs\\.58\\.com/zhaozu/pn\\d+\\/pve_1092_1/?PGTID=0d30000d-0019-\\w+\\-\\w+\\-\\w+\\&ClickID=1";
public static final String URL_POST = "http://cs\\.58\\.com/zhaozu/\\w+\\.shtml?psid=124606730193983807135678330&entinfo=\\w+\\&iuType=p_0&PGTID=0d30000d-0019-e991-490c-51f69d09bf8f&ClickID=\\w+\\";
private Site site = Site
.me()
.setDomain("cs.58.com")
.setSleepTime(6000)
.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31");
@OverRide
public void process(Page page) {
// 列表页
if (page.getUrl().regex(URL_LIST).match()) {
page.addTargetRequests(page.getHtml().xpath("//div[@Class="tbimg"]").links().regex(URL_POST).all());
page.addTargetRequests(page.getHtml().links().regex(URL_LIST).all());
// 文章页
} else {
/page.putField("title", page.getHtml().xpath("//div[@Class='headline']"));
page.putField("content", page.getHtml().xpath("//div[@Class='infocs']"));/
page.putField("title", page.getHtml().xpath("//div[@Class='filterbar']"));
page.putField("content", page.getHtml().xpath("//div[@Class='filterbar']"));
/page.putField("date",
page.getHtml().xpath("//div[@id='articlebody']//span[@Class='time SG_txtc']").regex("\((.)\)"));*/
}
}
@OverRide
public Site getSite() {
return site;
}
public static void main(String[] args) {
Spider.create(new Client58Processor())
.addUrl("http://cs.58.com/zhaozu/pn1/pve_1092_1/?PGTID=0d30000d-0019-e4b5-3eff-f37b576ebd78&ClickID=1")
.addPipeline(new JsonFilePipeline("D:\webmagic58\"))
.run();
}
}
如上代码 如何匹配列表数据哦? 正则表达式匹配不上
The text was updated successfully, but these errors were encountered: