Feat/156/book#157
Conversation
📝 WalkthroughWalkthroughThis PR introduces automated book description enrichment by adding jsoup dependency for HTML parsing, creating BookDescriptionEnhancer and BookDataMigrationService for batch description fixes, exposing admin endpoints via BookAdminController, integrating enhancement into book import/query workflows, and adding Swagger documentation across multiple controllers. Changes
Sequence Diagram(s)sequenceDiagram
actor Admin
participant BookAdminController
participant BookDataMigrationService
participant BooksRepository
participant BookDescriptionEnhancer
participant Kakao
Admin->>BookAdminController: POST /admin/books/fix-descriptions
BookAdminController->>BookDataMigrationService: fixAllIncompleteDescriptions()
loop For each page of books
BookDataMigrationService->>BooksRepository: findAll(paginated)
loop For each book
BookDataMigrationService->>BookDescriptionEnhancer: needsEnhancement(desc)
alt Enhancement needed
BookDescriptionEnhancer->>Kakao: fetch book detail page
Kakao-->>BookDescriptionEnhancer: HTML content
BookDescriptionEnhancer->>BookDescriptionEnhancer: extract & complete description
BookDescriptionEnhancer-->>BookDataMigrationService: enhanced description
BookDataMigrationService->>BooksRepository: save book with updated description
else No enhancement needed
BookDataMigrationService-->>BookDataMigrationService: skip book
end
end
end
BookDataMigrationService-->>BookAdminController: total fixed count
BookAdminController-->>Admin: {success, fixedCount, durationMs, message}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In `@booklog/build.gradle`:
- Around line 57-58: Replace the outdated jsoup dependency version in
build.gradle by updating the implementation declaration that currently reads
'org.jsoup:jsoup:1.17.2' to use 'org.jsoup:jsoup:1.22.1'; after changing the
version, refresh/resolve Gradle dependencies (e.g., Gradle sync or ./gradlew
build) to ensure the new jsoup 1.22.1 is picked up and tests/compile succeed.
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/controller/BookAdminController.java`:
- Around line 72-91: The current fixAllDescriptions controller method blocks
until migrationService.fixAllIncompleteDescriptions completes, risking
HTTP/proxy timeouts; change it to enqueue the work and return immediately with
202 Accepted and a job id: modify fixAllDescriptions to call an asynchronous
runner (e.g., an `@Async` method or submit to a task queue/executor) that invokes
migrationService.fixAllIncompleteDescriptions in the background, generate and
return a unique jobId in the ResponseEntity, and implement a new status endpoint
(e.g., GET /admin/books/fix-descriptions/{jobId}) that reads job progress/state
from a concurrent map or persisted JobStatus store to report progress and final
result.
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`:
- Around line 128-150: The method fixDescriptionsByKeyword currently uses a
hardcoded PageRequest.of(0, 1000) which silently drops results beyond the first
page; update fixDescriptionsByKeyword (and its use of
booksRepository.searchByTitle) to iterate pages (use Pageable/Page loop) similar
to fixAllIncompleteDescriptions so all matching Books are processed, or at
minimum detect when booksPage.hasNext() and emit a warning log; ensure you
reference and update the PageRequest creation, the for loop over
booksPage.getContent(), and preserve existing exception handling in
fixBookDescription.
- Around line 30-68: Remove the class-level or method-level `@Transactional` on
fixAllIncompleteDescriptions so the long-running batch and HTTP crawl
(enhanceFromKakaoPage) do not hold a single transaction; instead make each book
update run in its own transaction by moving `@Transactional` (or
`@Transactional`(propagation = REQUIRES_NEW)) to fixBookDescription or to a new
helper service method (e.g.,
BookDataMigrationHelper.updateDescriptionTransactional) and ensure the call goes
through the Spring proxy (inject the helper or self via the bean proxy or use
TransactionTemplate) so each book/page is committed independently.
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`:
- Around line 102-108: BookDescriptionEnhancer currently performs synchronous
blocking HTTP crawls via Jsoup.connect(...).get() during user-facing flows
(e.g., BookQueryServiceImpl.getBookDetail and
BookImportService.searchAndUpsert), which can add large latency; change this by
moving the crawl/description enhancement off the request thread: make
BookDescriptionEnhancer expose an asynchronous enrichment API (use Spring `@Async`
or dispatch a background job/message to a worker queue) and have getBookDetail
and searchAndUpsert trigger the async enrichment (return cached/partial data
immediately), while keeping synchronous crawling only in admin batch endpoints
such as fixAllIncompleteDescriptions; ensure methods like
BookDescriptionEnhancer.enhanceDescription(...) and caller sites handle async
completion/errors and do not block on Jsoup.connect(...).get().
- Around line 191-208: In findLastKoreanSentenceEnding the code mixes start
index (pos) and end index (maxPosition) causing wrong comparisons; change the
loop to compute an end index for each match (e.g., endPos = pos +
ending.length() - 1) and compare endPos against a consistently-used maxEnd
(replace/maxPosition) and update maxEnd = endPos when larger so you always
compare end positions consistently within findLastKoreanSentenceEnding.
- Around line 96-125: The code in enhanceFromKakaoPage uses
currentDescription.length() which can NPE when currentDescription is null;
change enhanceFromKakaoPage to null-safe handling: compute int currentLen =
(currentDescription == null ? 0 : currentDescription.length()) and use
currentLen in the length comparison and logs, and ensure calls to
completeIncompleteSentence(currentDescription) either accept null or pass an
empty string (e.g., completeIncompleteSentence(currentDescription == null ? "" :
currentDescription)); reference methods: enhanceFromKakaoPage,
extractDescriptionFromKakaoPage, completeIncompleteSentence (and consider
needsEnhancement callers) so the function never dereferences currentDescription.
- Around line 105-108: The code calls Jsoup.connect(kakaoDetailUrl) in
BookDescriptionEnhancer using an external kakaoDetailUrl; validate and sanitize
this URL before connecting to mitigate SSRF by parsing the URL (e.g., new
URL/URI) and enforcing a strict allowlist of permitted hosts and schemes (only
http/https), rejecting or skipping any URL whose host is not in the allowlist
(e.g., Kakao and known book vendor domains) or that contains suspicious
characters/redirects; perform this check inside the method that builds/uses
kakaoDetailUrl (referencing kakaoDetailUrl and the Jsoup.connect call) and
throw/log a safe error or skip processing when validation fails.
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookImportService.java`:
- Around line 66-70: The synchronous HTTP crawl in
descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description) is being
executed while the REQUIRES_NEW transaction in BookImportService is open,
holding DB connections; move the enhancement outside the transactional boundary
by checking descriptionEnhancer.needsEnhancement(...) and performing
enhanceFromKakaoPage(...) after the transactional method completes (or make the
enhancer asynchronous/queued so the HTTP call does not run inside the
REQUIRES_NEW transaction), updating callers in BookImportService to call the
enhancer post-transaction (or submit an async task) to avoid blocking DB
connections.
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java`:
- Around line 47-72: The current user-facing path in BookQueryServiceImpl
synchronously calls descriptionEnhancer.enhanceFromKakaoPage when
descriptionEnhancer.needsEnhancement(...) is true, causing a blocking crawl;
change this to avoid the blocking HTTP call: in the getBookDetail/query flow, if
needsEnhancement(...) is true, do not call enhanceFromKakaoPage; instead apply
the local fast fallback (e.g., completeIncompleteSentence or trim/append "...")
to return instantly, and enqueue an asynchronous enrichment job (or flag the
book for background processing) that will later invoke
descriptionEnhancer.enhanceFromKakaoPage and persist the updated book via
booksRepository.save(book); keep synchronous crawling only in admin/batch
endpoints or the background worker.
In
`@booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java`:
- Around line 68-78: The catch block in AuthorSearchService currently logs only
e.getMessage(), losing the stack trace; update the catch to pass the exception
object to the logger so the full stack trace is recorded (keep the existing
message and parameters for query and then supply the Exception e as the final
argument to the log.warn call). This change should be made where
bookImportService.searchAndUpsert(...) is invoked and
authorsRepository.searchByName(...) is retried so the log contains both the
context ("작가 '{}' 임포트 실패 - error: {}" or similar) and the full Exception e as
the final parameter for proper stacktrace logging.
🧹 Nitpick comments (2)
booklog/src/main/java/com/example/booklog/domain/search/controller/SearchController.java (1)
63-72: Consider adding Swagger annotations tosearchBooksfor consistency.The
searchAuthorsendpoint received detailed@Operation/@ApiResponses/@Parameterannotations, butsearchBooks(and the other endpoints in this controller) remain undocumented. Since the class-level@Tagis now applied, the inconsistency will be visible in the Swagger UI. This can be addressed in a follow-up.booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java (1)
57-68: DuplicatedupdateBasicInfoboilerplate across three call sites.The pattern of calling
book.updateBasicInfo(book.getTitle(), enhanced, book.getThumbnailUrl(), ...)with only the description changed is repeated here, inBookDataMigrationService.fixBookDescription, and inBookImportService. Consider adding a focused method likebook.updateDescription(String description)on theBooksentity to eliminate this duplication and reduce the risk of future drift.
| // HTML 파싱 및 크롤링 (description 보완용) | ||
| implementation 'org.jsoup:jsoup:1.17.2' |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
jsoup latest version 2026
💡 Result:
The latest jsoup release (as of February 10, 2026) is jsoup 1.22.1, released January 1, 2026. [1], [2]
Maven:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.22.1</version>
</dependency>
``` [1]
Sources: [1] [2]
---
</details>
**Update jsoup to the latest version 1.22.1.**
jsoup 1.17.2 is outdated; version 1.22.1 was released January 1, 2026, and includes updates relevant to HTML parsing security. Since this library parses untrusted external HTML from Kakao pages, upgrading is important for security.
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @booklog/build.gradle around lines 57 - 58, Replace the outdated jsoup
dependency version in build.gradle by updating the implementation declaration
that currently reads 'org.jsoup:jsoup:1.17.2' to use 'org.jsoup:jsoup:1.22.1';
after changing the version, refresh/resolve Gradle dependencies (e.g., Gradle
sync or ./gradlew build) to ensure the new jsoup 1.22.1 is picked up and
tests/compile succeed.
</details>
<!-- fingerprinting:phantom:medusa:phoenix -->
<!-- This is an auto-generated comment by CodeRabbit -->
| @PostMapping("/fix-descriptions") | ||
| public ResponseEntity<Map<String, Object>> fixAllDescriptions() { | ||
| log.info("POST /admin/books/fix-descriptions - 전체 description 보완 요청"); | ||
|
|
||
| long startTime = System.currentTimeMillis(); | ||
| int fixedCount = migrationService.fixAllIncompleteDescriptions(); | ||
| long duration = System.currentTimeMillis() - startTime; | ||
|
|
||
| Map<String, Object> response = Map.of( | ||
| "success", true, | ||
| "fixedCount", fixedCount, | ||
| "durationMs", duration, | ||
| "message", String.format("%d개 책의 description을 보완했습니다. (소요시간: %d초)", | ||
| fixedCount, duration / 1000) | ||
| ); | ||
|
|
||
| log.info("전체 description 보완 완료 - {}개 처리, {}초 소요", fixedCount, duration / 1000); | ||
|
|
||
| return ResponseEntity.ok(response); | ||
| } |
There was a problem hiding this comment.
Long-running synchronous endpoint risks client/proxy timeouts.
fixAllDescriptions blocks until every book is crawled and updated. For a large dataset this could take hours, easily exceeding typical HTTP/proxy timeouts (e.g., 60 s for Nginx, 30 s for ALB). The client will receive a timeout error while the transaction continues running server-side.
Consider returning 202 Accepted immediately with a job ID, and processing the migration asynchronously (e.g., via @Async, CompletableFuture, or a task queue). Provide a status-check endpoint for monitoring progress.
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/controller/BookAdminController.java`
around lines 72 - 91, The current fixAllDescriptions controller method blocks
until migrationService.fixAllIncompleteDescriptions completes, risking
HTTP/proxy timeouts; change it to enqueue the work and return immediately with
202 Accepted and a job id: modify fixAllDescriptions to call an asynchronous
runner (e.g., an `@Async` method or submit to a task queue/executor) that invokes
migrationService.fixAllIncompleteDescriptions in the background, generate and
return a unique jobId in the ResponseEntity, and implement a new status endpoint
(e.g., GET /admin/books/fix-descriptions/{jobId}) that reads job progress/state
from a concurrent map or persisted JobStatus store to report progress and final
result.
| @Transactional | ||
| public int fixAllIncompleteDescriptions() { | ||
| log.info("전체 책 description 일괄 보완 시작"); | ||
|
|
||
| int pageSize = 100; | ||
| int totalFixed = 0; | ||
| int pageNumber = 0; | ||
|
|
||
| while (true) { | ||
| Pageable pageable = PageRequest.of(pageNumber, pageSize); | ||
| Page<Books> booksPage = booksRepository.findAll(pageable); | ||
|
|
||
| if (booksPage.isEmpty()) { | ||
| break; | ||
| } | ||
|
|
||
| for (Books book : booksPage.getContent()) { | ||
| try { | ||
| if (fixBookDescription(book)) { | ||
| totalFixed++; | ||
| } | ||
| } catch (Exception e) { | ||
| log.warn("책 description 보완 실패 - bookId: {}, title: {}, error: {}", | ||
| book.getId(), book.getTitle(), e.getMessage()); | ||
| } | ||
| } | ||
|
|
||
| log.info("페이지 {} 처리 완료 - 현재까지 {}개 보완", pageNumber, totalFixed); | ||
|
|
||
| if (booksPage.isLast()) { | ||
| break; | ||
| } | ||
|
|
||
| pageNumber++; | ||
| } | ||
|
|
||
| log.info("전체 책 description 일괄 보완 완료 - 총 {}개 보완", totalFixed); | ||
| return totalFixed; | ||
| } |
There was a problem hiding this comment.
Long-running transaction wraps the entire batch including synchronous HTTP crawls.
@Transactional on fixAllIncompleteDescriptions opens a single transaction for the full table scan. Each book may trigger a 5-second HTTP crawl via enhanceFromKakaoPage. For thousands of books, this transaction could run for hours, leading to:
- DB connection held for the entire duration (connection pool starvation)
- Massive undo/redo log growth
- Complete rollback on any late failure
Consider processing in smaller transactional chunks (e.g., per-page) by moving the @Transactional to fixBookDescription or to a helper annotated with @Transactional(propagation = REQUIRES_NEW).
♻️ Sketch: per-book transaction with a self-invoked proxy
Remove @Transactional from fixAllIncompleteDescriptions and apply it per book via a separate bean or TransactionTemplate:
- `@Transactional`
public int fixAllIncompleteDescriptions() {
// ... same paging loop ...
for (Books book : booksPage.getContent()) {
try {
- if (fixBookDescription(book)) {
+ if (fixBookDescriptionInNewTx(book)) {
totalFixed++;
}
} catch (Exception e) { ... }
}
}
+
+ `@Transactional`(propagation = Propagation.REQUIRES_NEW)
+ public boolean fixBookDescriptionInNewTx(Books book) {
+ return fixBookDescription(book);
+ }Note: self-invocation won't trigger the proxy — either inject self, use TransactionTemplate, or extract to a separate service.
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`
around lines 30 - 68, Remove the class-level or method-level `@Transactional` on
fixAllIncompleteDescriptions so the long-running batch and HTTP crawl
(enhanceFromKakaoPage) do not hold a single transaction; instead make each book
update run in its own transaction by moving `@Transactional` (or
`@Transactional`(propagation = REQUIRES_NEW)) to fixBookDescription or to a new
helper service method (e.g.,
BookDataMigrationHelper.updateDescriptionTransactional) and ensure the call goes
through the Spring proxy (inject the helper or self via the bean proxy or use
TransactionTemplate) so each book/page is committed independently.
| @Transactional | ||
| public int fixDescriptionsByKeyword(String keyword) { | ||
| log.info("키워드 '{}' 검색 결과 description 보완 시작", keyword); | ||
|
|
||
| Pageable pageable = PageRequest.of(0, 1000); | ||
| Page<Books> booksPage = booksRepository.searchByTitle(keyword, pageable); | ||
|
|
||
| int totalFixed = 0; | ||
|
|
||
| for (Books book : booksPage.getContent()) { | ||
| try { | ||
| if (fixBookDescription(book)) { | ||
| totalFixed++; | ||
| } | ||
| } catch (Exception e) { | ||
| log.warn("책 description 보완 실패 - bookId: {}, error: {}", | ||
| book.getId(), e.getMessage()); | ||
| } | ||
| } | ||
|
|
||
| log.info("키워드 '{}' description 보완 완료 - {}개 보완", keyword, totalFixed); | ||
| return totalFixed; | ||
| } |
There was a problem hiding this comment.
Hardcoded limit of 1000 silently drops results beyond the first page.
PageRequest.of(0, 1000) only fetches the first 1000 matching books. If there are more matches, they'll be silently skipped with no warning to the caller. Consider either paginating (like fixAllIncompleteDescriptions) or at least logging a warning when the result is not the last page.
🐛 Proposed addition
Page<Books> booksPage = booksRepository.searchByTitle(keyword, pageable);
+ if (!booksPage.isLast()) {
+ log.warn("키워드 '{}' 검색 결과가 1000개를 초과합니다. 일부만 처리됩니다. (전체: {})",
+ keyword, booksPage.getTotalElements());
+ }🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`
around lines 128 - 150, The method fixDescriptionsByKeyword currently uses a
hardcoded PageRequest.of(0, 1000) which silently drops results beyond the first
page; update fixDescriptionsByKeyword (and its use of
booksRepository.searchByTitle) to iterate pages (use Pageable/Page loop) similar
to fixAllIncompleteDescriptions so all matching Books are processed, or at
minimum detect when booksPage.hasNext() and emit a warning log; ensure you
reference and update the PageRequest creation, the for loop over
booksPage.getContent(), and preserve existing exception handling in
fixBookDescription.
| public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) { | ||
| if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) { | ||
| log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도"); | ||
| return completeIncompleteSentence(currentDescription); | ||
| } | ||
|
|
||
| try { | ||
| log.info("카카오 페이지 크롤링 시작: {}", kakaoDetailUrl); | ||
|
|
||
| Document doc = Jsoup.connect(kakaoDetailUrl) | ||
| .timeout(CRAWL_TIMEOUT_MS) | ||
| .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36") | ||
| .get(); | ||
|
|
||
| // 카카오 책 상세 페이지의 책 소개 영역 추출 | ||
| String enhanced = extractDescriptionFromKakaoPage(doc); | ||
|
|
||
| if (enhanced != null && !enhanced.isBlank() && enhanced.length() > currentDescription.length()) { | ||
| log.info("description 보완 성공 (크롤링): {} -> {} 자", currentDescription.length(), enhanced.length()); | ||
| return enhanced; | ||
| } else { | ||
| log.warn("크롤링 결과가 기존 description보다 길지 않음 - 문장 마무리 시도"); | ||
| return completeIncompleteSentence(currentDescription); | ||
| } | ||
|
|
||
| } catch (Exception e) { | ||
| log.warn("카카오 페이지 크롤링 실패: {}, 문장 마무리 시도", e.getMessage()); | ||
| return completeIncompleteSentence(currentDescription); | ||
| } | ||
| } |
There was a problem hiding this comment.
Potential NullPointerException when currentDescription is null.
At line 113, currentDescription.length() will throw NPE when currentDescription is null. This path is reachable because needsEnhancement(null) returns true, and callers (e.g., BookImportService line 68) pass doc.getContents() directly, which could be null.
🐛 Proposed fix
- if (enhanced != null && !enhanced.isBlank() && enhanced.length() > currentDescription.length()) {
+ if (enhanced != null && !enhanced.isBlank()
+ && (currentDescription == null || enhanced.length() > currentDescription.length())) {🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 96 - 125, The code in enhanceFromKakaoPage uses
currentDescription.length() which can NPE when currentDescription is null;
change enhanceFromKakaoPage to null-safe handling: compute int currentLen =
(currentDescription == null ? 0 : currentDescription.length()) and use
currentLen in the length comparison and logs, and ensure calls to
completeIncompleteSentence(currentDescription) either accept null or pass an
empty string (e.g., completeIncompleteSentence(currentDescription == null ? "" :
currentDescription)); reference methods: enhanceFromKakaoPage,
extractDescriptionFromKakaoPage, completeIncompleteSentence (and consider
needsEnhancement callers) so the function never dereferences currentDescription.
| Document doc = Jsoup.connect(kakaoDetailUrl) | ||
| .timeout(CRAWL_TIMEOUT_MS) | ||
| .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36") | ||
| .get(); |
There was a problem hiding this comment.
Validate the URL before connecting to mitigate SSRF risk.
kakaoDetailUrl originates from external data (Kakao API response stored in DB). If a malicious or internal URL is injected, Jsoup.connect() could be used to probe internal services. Consider validating the URL against an allowlist of expected domains (e.g., product.kyobobook.co.kr or Kakao domains) before making the request.
🛡️ Proposed validation sketch
+ private static final Set<String> ALLOWED_HOSTS = Set.of(
+ "product.kyobobook.co.kr",
+ "search.daum.net",
+ "book.daum.net"
+ );
+
+ private boolean isAllowedUrl(String url) {
+ try {
+ java.net.URL parsed = new java.net.URL(url);
+ return ALLOWED_HOSTS.contains(parsed.getHost());
+ } catch (Exception e) {
+ return false;
+ }
+ }
+
public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) {
if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) {
log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도");
return completeIncompleteSentence(currentDescription);
}
+ if (!isAllowedUrl(kakaoDetailUrl)) {
+ log.warn("허용되지 않은 URL: {}", kakaoDetailUrl);
+ return completeIncompleteSentence(currentDescription);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Document doc = Jsoup.connect(kakaoDetailUrl) | |
| .timeout(CRAWL_TIMEOUT_MS) | |
| .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36") | |
| .get(); | |
| private static final Set<String> ALLOWED_HOSTS = Set.of( | |
| "product.kyobobook.co.kr", | |
| "search.daum.net", | |
| "book.daum.net" | |
| ); | |
| private boolean isAllowedUrl(String url) { | |
| try { | |
| java.net.URL parsed = new java.net.URL(url); | |
| return ALLOWED_HOSTS.contains(parsed.getHost()); | |
| } catch (Exception e) { | |
| return false; | |
| } | |
| } | |
| public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) { | |
| if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) { | |
| log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도"); | |
| return completeIncompleteSentence(currentDescription); | |
| } | |
| if (!isAllowedUrl(kakaoDetailUrl)) { | |
| log.warn("허용되지 않은 URL: {}", kakaoDetailUrl); | |
| return completeIncompleteSentence(currentDescription); | |
| } | |
| Document doc = Jsoup.connect(kakaoDetailUrl) | |
| .timeout(CRAWL_TIMEOUT_MS) | |
| .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36") | |
| .get(); |
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 105 - 108, The code calls Jsoup.connect(kakaoDetailUrl) in
BookDescriptionEnhancer using an external kakaoDetailUrl; validate and sanitize
this URL before connecting to mitigate SSRF by parsing the URL (e.g., new
URL/URI) and enforcing a strict allowlist of permitted hosts and schemes (only
http/https), rejecting or skipping any URL whose host is not in the allowlist
(e.g., Kakao and known book vendor domains) or that contains suspicious
characters/redirects; perform this check inside the method that builds/uses
kakaoDetailUrl (referencing kakaoDetailUrl and the Jsoup.connect call) and
throw/log a safe error or skip processing when validation fails.
| private int findLastKoreanSentenceEnding(String text) { | ||
| String[] endings = { | ||
| "습니다", "입니다", "였습니다", "있습니다", "했습니다", | ||
| "한다", "된다", "있다", "였다", "없다", | ||
| "요", "네요", "어요", "죠" | ||
| }; | ||
|
|
||
| int maxPosition = -1; | ||
|
|
||
| for (String ending : endings) { | ||
| int pos = text.lastIndexOf(ending); | ||
| if (pos > maxPosition) { | ||
| maxPosition = pos + ending.length() - 1; | ||
| } | ||
| } | ||
|
|
||
| return maxPosition; | ||
| } |
There was a problem hiding this comment.
Bug in findLastKoreanSentenceEnding: comparison mixes start and end positions.
maxPosition stores the end index of a previously found ending (pos + ending.length() - 1), but the next iteration compares the start index (pos) of a new match against this end index. This causes incorrect results when a longer ending is a substring of another (e.g., "입니다" overlaps with "습니다" in "있습니다"), potentially skipping endings that actually appear later in the text.
🐛 Proposed fix: consistently compare end positions
private int findLastKoreanSentenceEnding(String text) {
String[] endings = {
"습니다", "입니다", "였습니다", "있습니다", "했습니다",
"한다", "된다", "있다", "였다", "없다",
"요", "네요", "어요", "죠"
};
int maxPosition = -1;
for (String ending : endings) {
int pos = text.lastIndexOf(ending);
- if (pos > maxPosition) {
- maxPosition = pos + ending.length() - 1;
+ if (pos != -1) {
+ int endPos = pos + ending.length() - 1;
+ if (endPos > maxPosition) {
+ maxPosition = endPos;
+ }
}
}
return maxPosition;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private int findLastKoreanSentenceEnding(String text) { | |
| String[] endings = { | |
| "습니다", "입니다", "였습니다", "있습니다", "했습니다", | |
| "한다", "된다", "있다", "였다", "없다", | |
| "요", "네요", "어요", "죠" | |
| }; | |
| int maxPosition = -1; | |
| for (String ending : endings) { | |
| int pos = text.lastIndexOf(ending); | |
| if (pos > maxPosition) { | |
| maxPosition = pos + ending.length() - 1; | |
| } | |
| } | |
| return maxPosition; | |
| } | |
| private int findLastKoreanSentenceEnding(String text) { | |
| String[] endings = { | |
| "습니다", "입니다", "였습니다", "있습니다", "했습니다", | |
| "한다", "된다", "있다", "였다", "없다", | |
| "요", "네요", "어요", "죠" | |
| }; | |
| int maxPosition = -1; | |
| for (String ending : endings) { | |
| int pos = text.lastIndexOf(ending); | |
| if (pos != -1) { | |
| int endPos = pos + ending.length() - 1; | |
| if (endPos > maxPosition) { | |
| maxPosition = endPos; | |
| } | |
| } | |
| } | |
| return maxPosition; | |
| } |
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 191 - 208, In findLastKoreanSentenceEnding the code mixes start
index (pos) and end index (maxPosition) causing wrong comparisons; change the
loop to compute an end index for each match (e.g., endPos = pos +
ending.length() - 1) and compare endPos against a consistently-used maxEnd
(replace/maxPosition) and update maxEnd = endPos when larger so you always
compare end positions consistently within findLastKoreanSentenceEnding.
| // 2) description 보완 (카카오 API contents가 짤려있으면 크롤링) | ||
| String description = doc.getContents(); | ||
| if (descriptionEnhancer.needsEnhancement(description)) { | ||
| description = descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description); | ||
| } |
There was a problem hiding this comment.
Blocking HTTP crawl inside REQUIRES_NEW transaction.
The enhanceFromKakaoPage call performs a synchronous 5-second-timeout HTTP request while the REQUIRES_NEW transaction (line 38) is open. This holds a DB connection from the pool during the crawl for every book in the search results. Under load, this can exhaust the connection pool and stall unrelated queries.
If the performance concern flagged in BookDescriptionEnhancer is addressed (e.g., async enrichment), this goes away. Otherwise, at minimum consider moving description enhancement outside the transactional boundary.
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookImportService.java`
around lines 66 - 70, The synchronous HTTP crawl in
descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description) is being
executed while the REQUIRES_NEW transaction in BookImportService is open,
holding DB connections; move the enhancement outside the transactional boundary
by checking descriptionEnhancer.needsEnhancement(...) and performing
enhanceFromKakaoPage(...) after the transactional method completes (or make the
enhancer asynchronous/queued so the HTTP call does not run inside the
REQUIRES_NEW transaction), updating callers in BookImportService to call the
enhancer post-transaction (or submit an async task) to avoid blocking DB
connections.
| // ✅ 1. description이 불완전하면 보완 (기존 책 대응) | ||
| if (descriptionEnhancer.needsEnhancement(book.getDescription())) { | ||
| log.info("기존 책의 description 보완 시작 - bookId: {}, title: {}", bookId, book.getTitle()); | ||
| String enhanced = descriptionEnhancer.enhanceFromKakaoPage( | ||
| book.getDetailUrl(), | ||
| book.getDescription() | ||
| ); | ||
|
|
||
| // description 업데이트 | ||
| if (enhanced != null && !enhanced.equals(book.getDescription())) { | ||
| book.updateBasicInfo( | ||
| book.getTitle(), | ||
| enhanced, // 보완된 description | ||
| book.getThumbnailUrl(), | ||
| book.getDetailUrl(), | ||
| book.getPublisherName(), | ||
| book.getPublishedDate(), | ||
| book.getIsbn(), | ||
| book.getIsbn10(), | ||
| book.getIsbn13(), | ||
| book.getRawData() | ||
| ); | ||
| booksRepository.save(book); | ||
| log.info("기존 책의 description 보완 완료 - bookId: {}", bookId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Synchronous crawl on user-facing read path will spike p99 latency.
getBookDetail is a user-facing endpoint. Inserting a blocking 5-second HTTP crawl here means any book with a short description (< 200 chars) or one ending mid-sentence will see a ~5 s delay on every detail request until the description is patched. This is a significant UX regression.
As noted in the BookDescriptionEnhancer review, consider async enrichment or limiting crawl-based enrichment to the admin batch endpoint. In the query path, you could skip the crawl and only apply completeIncompleteSentence (the local trim/append-"..." fallback) for instant response.
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java`
around lines 47 - 72, The current user-facing path in BookQueryServiceImpl
synchronously calls descriptionEnhancer.enhanceFromKakaoPage when
descriptionEnhancer.needsEnhancement(...) is true, causing a blocking crawl;
change this to avoid the blocking HTTP call: in the getBookDetail/query flow, if
needsEnhancement(...) is true, do not call enhanceFromKakaoPage; instead apply
the local fast fallback (e.g., completeIncompleteSentence or trim/append "...")
to return instantly, and enqueue an asynchronous enrichment job (or flag the
book for background processing) that will later invoke
descriptionEnhancer.enhanceFromKakaoPage and persist the updated book via
booksRepository.save(book); keep synchronous crawling only in admin/batch
endpoints or the background worker.
| try { | ||
| log.info("작가 '{}' 도서 데이터 임포트 시작", query); | ||
| bookImportService.searchAndUpsert(query, 1, 10); | ||
|
|
||
| // 임포트 후 재조회 | ||
| authorsPage = authorsRepository.searchByName(query, pageable); | ||
| } catch (Exception e) { | ||
| log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage()); | ||
| // 임포트 실패 시 빈 결과 반환 (500 에러 방지) | ||
| return AuthorSearchResponse.of(List.of(), page, size, 0L); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find BookImportService class
fd -t f "BookImportService.java" --exec cat -n {} \;Repository: Project-BookLog/BookLog-BackEnd
Length of output: 9738
🏁 Script executed:
# Search for searchAndUpsert method definition
rg -A 20 "public.*searchAndUpsert" --type javaRepository: Project-BookLog/BookLog-BackEnd
Length of output: 2840
🏁 Script executed:
# Look for custom exception classes related to book import or search
fd -t f ".*Exception.java" | head -20Repository: Project-BookLog/BookLog-BackEnd
Length of output: 463
🏁 Script executed:
# Check the context around the try-catch in AuthorSearchService
cat -n booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java | sed -n '60,90p'Repository: Project-BookLog/BookLog-BackEnd
Length of output: 1567
Include full exception in logging instead of only message.
Logging only e.getMessage() on line 75 loses the stack trace, making it harder to diagnose the root cause. Include the exception object as the last logger argument:
- log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage());
+ log.warn("작가 '{}' 임포트 실패", query, e);The broad Exception catch is reasonable here since searchAndUpsert can throw various unrelated exceptions (Kakao API errors, database failures, JSON processing errors, etc.) and the endpoint intentionally provides graceful degradation by returning empty results on import failure rather than propagating a 500 error.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| log.info("작가 '{}' 도서 데이터 임포트 시작", query); | |
| bookImportService.searchAndUpsert(query, 1, 10); | |
| // 임포트 후 재조회 | |
| authorsPage = authorsRepository.searchByName(query, pageable); | |
| } catch (Exception e) { | |
| log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage()); | |
| // 임포트 실패 시 빈 결과 반환 (500 에러 방지) | |
| return AuthorSearchResponse.of(List.of(), page, size, 0L); | |
| } | |
| try { | |
| log.info("작가 '{}' 도서 데이터 임포트 시작", query); | |
| bookImportService.searchAndUpsert(query, 1, 10); | |
| // 임포트 후 재조회 | |
| authorsPage = authorsRepository.searchByName(query, pageable); | |
| } catch (Exception e) { | |
| log.warn("작가 '{}' 임포트 실패", query, e); | |
| // 임포트 실패 시 빈 결과 반환 (500 에러 방지) | |
| return AuthorSearchResponse.of(List.of(), page, size, 0L); | |
| } |
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java`
around lines 68 - 78, The catch block in AuthorSearchService currently logs only
e.getMessage(), losing the stack trace; update the catch to pass the exception
object to the logger so the full stack trace is recorded (keep the existing
message and parameters for query and then supply the Exception e as the final
argument to the log.warn call). This change should be made where
bookImportService.searchAndUpsert(...) is invoked and
authorsRepository.searchByName(...) is retried so the log contains both the
context ("작가 '{}' 임포트 실패 - error: {}" or similar) and the full Exception e as
the final parameter for proper stacktrace logging.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation