Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20210401] Jsoup(Java 크롤링) #84

Open
JuHyun419 opened this issue Apr 1, 2021 · 0 comments
Open

[20210401] Jsoup(Java 크롤링) #84

JuHyun419 opened this issue Apr 1, 2021 · 0 comments
Labels

Comments

@JuHyun419
Copy link
Owner

Jsoup(자바 크롤링)

  • URL 절대 경로(abs)
Elements linkElements = document.select("a.course_card_front");

for (Element e : linkElements) {
    String url = e.attr("abs:href"); // 절대 경로
}

  • 크롤링 데이터중 html 태그 제거
private static String stripHtml(final String html) {
    return Jsoup.clean(html, Whitelist.none());
}

  • 문자열 중 가장 앞, 가장 뒤 괄호 제거
private static String removeBracket(final String str) {
    return str.replaceAll("^[(]|[)]$", "");
}

  • 사이트 페이지 반복 크롤링
for (int i = 1; i < 10; i++) {
    String url = "https://www.inflearn.com/courses/it-programming?order=seq&page=" + i;
    Document doc = Jsoup.connect(url).get();
}
@JuHyun419 JuHyun419 added the Java label Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant