Skip to content

Feat/156/book#157

Merged
millkk04 merged 2 commits into
devfrom
feat/156/book
Feb 10, 2026
Merged

Feat/156/book#157
millkk04 merged 2 commits into
devfrom
feat/156/book

Conversation

@millkk04
Copy link
Copy Markdown
Collaborator

@millkk04 millkk04 commented Feb 10, 2026

  1. 책 description 끊기는 현상 해결
  2. 작가 검색 500 에러 처리 보완
  3. 온보딩 스웨거 오류 해결

Summary by CodeRabbit

Release Notes

  • New Features

    • Added admin controls to enhance and repair book descriptions in bulk or by keyword search.
    • Implemented automatic book description enrichment to display more complete content.
  • Bug Fixes

    • Improved error handling in author search to prevent failures when data is unavailable.
  • Documentation

    • Enhanced API documentation for endpoints with improved parameter descriptions and response examples.

@millkk04 millkk04 self-assigned this Feb 10, 2026
@millkk04 millkk04 added the enhancement New feature or request label Feb 10, 2026
@millkk04 millkk04 linked an issue Feb 10, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

This PR introduces automated book description enrichment by adding jsoup dependency for HTML parsing, creating BookDescriptionEnhancer and BookDataMigrationService for batch description fixes, exposing admin endpoints via BookAdminController, integrating enhancement into book import/query workflows, and adding Swagger documentation across multiple controllers.

Changes

Cohort / File(s) Summary
Dependency Management
booklog/build.gradle
Added jsoup 1.17.2 for HTML parsing and description enrichment.
Book Description Enhancement
booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java, booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java
New services: BookDescriptionEnhancer crawls Kakao book pages to enrich truncated descriptions with sentence completion logic; BookDataMigrationService batch-processes all or filtered books with paginated enhancement and transactional safety.
Admin Endpoints
booklog/src/main/java/com/example/booklog/domain/library/books/controller/BookAdminController.java
New REST controller at /admin/books with POST endpoints to trigger bulk or keyword-targeted description fixes, returning fix counts and durations.
Service Integration
booklog/src/main/java/com/example/booklog/domain/library/books/service/BookImportService.java, booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java
BookImportService now enhances descriptions during import; BookQueryServiceImpl adds pre-processing enhancement step before tag mapping, with improved error handling that mentions enhancement operations.
Error Handling & Migration
booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java
Wrapped author import in try-catch to gracefully return empty results on failure instead of propagating exceptions.
API Documentation
booklog/src/main/java/com/example/booklog/domain/home/controller/HomeController.java, booklog/src/main/java/com/example/booklog/domain/search/controller/SearchController.java, booklog/src/main/java/com/example/booklog/domain/onboarding/controller/OnboardingController.java, booklog/src/main/java/com/example/booklog/domain/library/books/dto/BookDetailResponse.java
Added @Tag and @Operation Swagger annotations to HomeController and SearchController; updated OnboardingController's @SecurityRequirement value; updated BookDetailResponse description schema to clarify full content with no length limits.

Sequence Diagram(s)

sequenceDiagram
    actor Admin
    participant BookAdminController
    participant BookDataMigrationService
    participant BooksRepository
    participant BookDescriptionEnhancer
    participant Kakao

    Admin->>BookAdminController: POST /admin/books/fix-descriptions
    BookAdminController->>BookDataMigrationService: fixAllIncompleteDescriptions()
    loop For each page of books
        BookDataMigrationService->>BooksRepository: findAll(paginated)
        loop For each book
            BookDataMigrationService->>BookDescriptionEnhancer: needsEnhancement(desc)
            alt Enhancement needed
                BookDescriptionEnhancer->>Kakao: fetch book detail page
                Kakao-->>BookDescriptionEnhancer: HTML content
                BookDescriptionEnhancer->>BookDescriptionEnhancer: extract & complete description
                BookDescriptionEnhancer-->>BookDataMigrationService: enhanced description
                BookDataMigrationService->>BooksRepository: save book with updated description
            else No enhancement needed
                BookDataMigrationService-->>BookDataMigrationService: skip book
            end
        end
    end
    BookDataMigrationService-->>BookAdminController: total fixed count
    BookAdminController-->>Admin: {success, fixedCount, durationMs, message}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • FEAT: 책 정보 조회 API 구현 (목차 제외) #141: Modifies BookDetailResponse schema documentation and BookQueryServiceImpl service logic (same classes updated in this PR for description enhancement integration).
  • Fix/94/search #95: Modifies HomeController class and build.gradle dependencies (overlapping files with Swagger annotation additions and jsoup dependency).

Suggested reviewers

  • jaehyeon4406
  • icarus0616

Poem

🐰 Hop through Kakao's pages, descriptions grow wide,
HTML strings untangle with jsoup's guide,
Admin commands batch-fix with grace,
Books bloom fuller in their destined place! 📚✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feat/156/book' is vague and does not clearly convey the main changes; it lacks specific information about the feature being implemented. Use a more descriptive title that summarizes the main change, such as 'Fix truncated book descriptions and improve error handling' or reference the actual feature from the PR description.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/156/book

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Fix all issues with AI agents
In `@booklog/build.gradle`:
- Around line 57-58: Replace the outdated jsoup dependency version in
build.gradle by updating the implementation declaration that currently reads
'org.jsoup:jsoup:1.17.2' to use 'org.jsoup:jsoup:1.22.1'; after changing the
version, refresh/resolve Gradle dependencies (e.g., Gradle sync or ./gradlew
build) to ensure the new jsoup 1.22.1 is picked up and tests/compile succeed.

In
`@booklog/src/main/java/com/example/booklog/domain/library/books/controller/BookAdminController.java`:
- Around line 72-91: The current fixAllDescriptions controller method blocks
until migrationService.fixAllIncompleteDescriptions completes, risking
HTTP/proxy timeouts; change it to enqueue the work and return immediately with
202 Accepted and a job id: modify fixAllDescriptions to call an asynchronous
runner (e.g., an `@Async` method or submit to a task queue/executor) that invokes
migrationService.fixAllIncompleteDescriptions in the background, generate and
return a unique jobId in the ResponseEntity, and implement a new status endpoint
(e.g., GET /admin/books/fix-descriptions/{jobId}) that reads job progress/state
from a concurrent map or persisted JobStatus store to report progress and final
result.

In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`:
- Around line 128-150: The method fixDescriptionsByKeyword currently uses a
hardcoded PageRequest.of(0, 1000) which silently drops results beyond the first
page; update fixDescriptionsByKeyword (and its use of
booksRepository.searchByTitle) to iterate pages (use Pageable/Page loop) similar
to fixAllIncompleteDescriptions so all matching Books are processed, or at
minimum detect when booksPage.hasNext() and emit a warning log; ensure you
reference and update the PageRequest creation, the for loop over
booksPage.getContent(), and preserve existing exception handling in
fixBookDescription.
- Around line 30-68: Remove the class-level or method-level `@Transactional` on
fixAllIncompleteDescriptions so the long-running batch and HTTP crawl
(enhanceFromKakaoPage) do not hold a single transaction; instead make each book
update run in its own transaction by moving `@Transactional` (or
`@Transactional`(propagation = REQUIRES_NEW)) to fixBookDescription or to a new
helper service method (e.g.,
BookDataMigrationHelper.updateDescriptionTransactional) and ensure the call goes
through the Spring proxy (inject the helper or self via the bean proxy or use
TransactionTemplate) so each book/page is committed independently.

In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`:
- Around line 102-108: BookDescriptionEnhancer currently performs synchronous
blocking HTTP crawls via Jsoup.connect(...).get() during user-facing flows
(e.g., BookQueryServiceImpl.getBookDetail and
BookImportService.searchAndUpsert), which can add large latency; change this by
moving the crawl/description enhancement off the request thread: make
BookDescriptionEnhancer expose an asynchronous enrichment API (use Spring `@Async`
or dispatch a background job/message to a worker queue) and have getBookDetail
and searchAndUpsert trigger the async enrichment (return cached/partial data
immediately), while keeping synchronous crawling only in admin batch endpoints
such as fixAllIncompleteDescriptions; ensure methods like
BookDescriptionEnhancer.enhanceDescription(...) and caller sites handle async
completion/errors and do not block on Jsoup.connect(...).get().
- Around line 191-208: In findLastKoreanSentenceEnding the code mixes start
index (pos) and end index (maxPosition) causing wrong comparisons; change the
loop to compute an end index for each match (e.g., endPos = pos +
ending.length() - 1) and compare endPos against a consistently-used maxEnd
(replace/maxPosition) and update maxEnd = endPos when larger so you always
compare end positions consistently within findLastKoreanSentenceEnding.
- Around line 96-125: The code in enhanceFromKakaoPage uses
currentDescription.length() which can NPE when currentDescription is null;
change enhanceFromKakaoPage to null-safe handling: compute int currentLen =
(currentDescription == null ? 0 : currentDescription.length()) and use
currentLen in the length comparison and logs, and ensure calls to
completeIncompleteSentence(currentDescription) either accept null or pass an
empty string (e.g., completeIncompleteSentence(currentDescription == null ? "" :
currentDescription)); reference methods: enhanceFromKakaoPage,
extractDescriptionFromKakaoPage, completeIncompleteSentence (and consider
needsEnhancement callers) so the function never dereferences currentDescription.
- Around line 105-108: The code calls Jsoup.connect(kakaoDetailUrl) in
BookDescriptionEnhancer using an external kakaoDetailUrl; validate and sanitize
this URL before connecting to mitigate SSRF by parsing the URL (e.g., new
URL/URI) and enforcing a strict allowlist of permitted hosts and schemes (only
http/https), rejecting or skipping any URL whose host is not in the allowlist
(e.g., Kakao and known book vendor domains) or that contains suspicious
characters/redirects; perform this check inside the method that builds/uses
kakaoDetailUrl (referencing kakaoDetailUrl and the Jsoup.connect call) and
throw/log a safe error or skip processing when validation fails.

In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookImportService.java`:
- Around line 66-70: The synchronous HTTP crawl in
descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description) is being
executed while the REQUIRES_NEW transaction in BookImportService is open,
holding DB connections; move the enhancement outside the transactional boundary
by checking descriptionEnhancer.needsEnhancement(...) and performing
enhanceFromKakaoPage(...) after the transactional method completes (or make the
enhancer asynchronous/queued so the HTTP call does not run inside the
REQUIRES_NEW transaction), updating callers in BookImportService to call the
enhancer post-transaction (or submit an async task) to avoid blocking DB
connections.

In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java`:
- Around line 47-72: The current user-facing path in BookQueryServiceImpl
synchronously calls descriptionEnhancer.enhanceFromKakaoPage when
descriptionEnhancer.needsEnhancement(...) is true, causing a blocking crawl;
change this to avoid the blocking HTTP call: in the getBookDetail/query flow, if
needsEnhancement(...) is true, do not call enhanceFromKakaoPage; instead apply
the local fast fallback (e.g., completeIncompleteSentence or trim/append "...")
to return instantly, and enqueue an asynchronous enrichment job (or flag the
book for background processing) that will later invoke
descriptionEnhancer.enhanceFromKakaoPage and persist the updated book via
booksRepository.save(book); keep synchronous crawling only in admin/batch
endpoints or the background worker.

In
`@booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java`:
- Around line 68-78: The catch block in AuthorSearchService currently logs only
e.getMessage(), losing the stack trace; update the catch to pass the exception
object to the logger so the full stack trace is recorded (keep the existing
message and parameters for query and then supply the Exception e as the final
argument to the log.warn call). This change should be made where
bookImportService.searchAndUpsert(...) is invoked and
authorsRepository.searchByName(...) is retried so the log contains both the
context ("작가 '{}' 임포트 실패 - error: {}" or similar) and the full Exception e as
the final parameter for proper stacktrace logging.
🧹 Nitpick comments (2)
booklog/src/main/java/com/example/booklog/domain/search/controller/SearchController.java (1)

63-72: Consider adding Swagger annotations to searchBooks for consistency.

The searchAuthors endpoint received detailed @Operation/@ApiResponses/@Parameter annotations, but searchBooks (and the other endpoints in this controller) remain undocumented. Since the class-level @Tag is now applied, the inconsistency will be visible in the Swagger UI. This can be addressed in a follow-up.

booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java (1)

57-68: Duplicated updateBasicInfo boilerplate across three call sites.

The pattern of calling book.updateBasicInfo(book.getTitle(), enhanced, book.getThumbnailUrl(), ...) with only the description changed is repeated here, in BookDataMigrationService.fixBookDescription, and in BookImportService. Consider adding a focused method like book.updateDescription(String description) on the Books entity to eliminate this duplication and reduce the risk of future drift.

Comment thread booklog/build.gradle
Comment on lines +57 to +58
// HTML 파싱 및 크롤링 (description 보완용)
implementation 'org.jsoup:jsoup:1.17.2'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

jsoup latest version 2026

💡 Result:

The latest jsoup release (as of February 10, 2026) is jsoup 1.22.1, released January 1, 2026. [1], [2]

Maven:

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.22.1</version>
</dependency>
``` [1]

Sources: [1] [2]

---



</details>

**Update jsoup to the latest version 1.22.1.**

jsoup 1.17.2 is outdated; version 1.22.1 was released January 1, 2026, and includes updates relevant to HTML parsing security. Since this library parses untrusted external HTML from Kakao pages, upgrading is important for security.

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @booklog/build.gradle around lines 57 - 58, Replace the outdated jsoup
dependency version in build.gradle by updating the implementation declaration
that currently reads 'org.jsoup:jsoup:1.17.2' to use 'org.jsoup:jsoup:1.22.1';
after changing the version, refresh/resolve Gradle dependencies (e.g., Gradle
sync or ./gradlew build) to ensure the new jsoup 1.22.1 is picked up and
tests/compile succeed.


</details>

<!-- fingerprinting:phantom:medusa:phoenix -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +72 to +91
@PostMapping("/fix-descriptions")
public ResponseEntity<Map<String, Object>> fixAllDescriptions() {
log.info("POST /admin/books/fix-descriptions - 전체 description 보완 요청");

long startTime = System.currentTimeMillis();
int fixedCount = migrationService.fixAllIncompleteDescriptions();
long duration = System.currentTimeMillis() - startTime;

Map<String, Object> response = Map.of(
"success", true,
"fixedCount", fixedCount,
"durationMs", duration,
"message", String.format("%d개 책의 description을 보완했습니다. (소요시간: %d초)",
fixedCount, duration / 1000)
);

log.info("전체 description 보완 완료 - {}개 처리, {}초 소요", fixedCount, duration / 1000);

return ResponseEntity.ok(response);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Long-running synchronous endpoint risks client/proxy timeouts.

fixAllDescriptions blocks until every book is crawled and updated. For a large dataset this could take hours, easily exceeding typical HTTP/proxy timeouts (e.g., 60 s for Nginx, 30 s for ALB). The client will receive a timeout error while the transaction continues running server-side.

Consider returning 202 Accepted immediately with a job ID, and processing the migration asynchronously (e.g., via @Async, CompletableFuture, or a task queue). Provide a status-check endpoint for monitoring progress.

🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/controller/BookAdminController.java`
around lines 72 - 91, The current fixAllDescriptions controller method blocks
until migrationService.fixAllIncompleteDescriptions completes, risking
HTTP/proxy timeouts; change it to enqueue the work and return immediately with
202 Accepted and a job id: modify fixAllDescriptions to call an asynchronous
runner (e.g., an `@Async` method or submit to a task queue/executor) that invokes
migrationService.fixAllIncompleteDescriptions in the background, generate and
return a unique jobId in the ResponseEntity, and implement a new status endpoint
(e.g., GET /admin/books/fix-descriptions/{jobId}) that reads job progress/state
from a concurrent map or persisted JobStatus store to report progress and final
result.

Comment on lines +30 to +68
@Transactional
public int fixAllIncompleteDescriptions() {
log.info("전체 책 description 일괄 보완 시작");

int pageSize = 100;
int totalFixed = 0;
int pageNumber = 0;

while (true) {
Pageable pageable = PageRequest.of(pageNumber, pageSize);
Page<Books> booksPage = booksRepository.findAll(pageable);

if (booksPage.isEmpty()) {
break;
}

for (Books book : booksPage.getContent()) {
try {
if (fixBookDescription(book)) {
totalFixed++;
}
} catch (Exception e) {
log.warn("책 description 보완 실패 - bookId: {}, title: {}, error: {}",
book.getId(), book.getTitle(), e.getMessage());
}
}

log.info("페이지 {} 처리 완료 - 현재까지 {}개 보완", pageNumber, totalFixed);

if (booksPage.isLast()) {
break;
}

pageNumber++;
}

log.info("전체 책 description 일괄 보완 완료 - 총 {}개 보완", totalFixed);
return totalFixed;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Long-running transaction wraps the entire batch including synchronous HTTP crawls.

@Transactional on fixAllIncompleteDescriptions opens a single transaction for the full table scan. Each book may trigger a 5-second HTTP crawl via enhanceFromKakaoPage. For thousands of books, this transaction could run for hours, leading to:

  • DB connection held for the entire duration (connection pool starvation)
  • Massive undo/redo log growth
  • Complete rollback on any late failure

Consider processing in smaller transactional chunks (e.g., per-page) by moving the @Transactional to fixBookDescription or to a helper annotated with @Transactional(propagation = REQUIRES_NEW).

♻️ Sketch: per-book transaction with a self-invoked proxy

Remove @Transactional from fixAllIncompleteDescriptions and apply it per book via a separate bean or TransactionTemplate:

-    `@Transactional`
     public int fixAllIncompleteDescriptions() {
         // ... same paging loop ...
         for (Books book : booksPage.getContent()) {
             try {
-                if (fixBookDescription(book)) {
+                if (fixBookDescriptionInNewTx(book)) {
                     totalFixed++;
                 }
             } catch (Exception e) { ... }
         }
     }
+
+    `@Transactional`(propagation = Propagation.REQUIRES_NEW)
+    public boolean fixBookDescriptionInNewTx(Books book) {
+        return fixBookDescription(book);
+    }

Note: self-invocation won't trigger the proxy — either inject self, use TransactionTemplate, or extract to a separate service.

🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`
around lines 30 - 68, Remove the class-level or method-level `@Transactional` on
fixAllIncompleteDescriptions so the long-running batch and HTTP crawl
(enhanceFromKakaoPage) do not hold a single transaction; instead make each book
update run in its own transaction by moving `@Transactional` (or
`@Transactional`(propagation = REQUIRES_NEW)) to fixBookDescription or to a new
helper service method (e.g.,
BookDataMigrationHelper.updateDescriptionTransactional) and ensure the call goes
through the Spring proxy (inject the helper or self via the bean proxy or use
TransactionTemplate) so each book/page is committed independently.

Comment on lines +128 to +150
@Transactional
public int fixDescriptionsByKeyword(String keyword) {
log.info("키워드 '{}' 검색 결과 description 보완 시작", keyword);

Pageable pageable = PageRequest.of(0, 1000);
Page<Books> booksPage = booksRepository.searchByTitle(keyword, pageable);

int totalFixed = 0;

for (Books book : booksPage.getContent()) {
try {
if (fixBookDescription(book)) {
totalFixed++;
}
} catch (Exception e) {
log.warn("책 description 보완 실패 - bookId: {}, error: {}",
book.getId(), e.getMessage());
}
}

log.info("키워드 '{}' description 보완 완료 - {}개 보완", keyword, totalFixed);
return totalFixed;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded limit of 1000 silently drops results beyond the first page.

PageRequest.of(0, 1000) only fetches the first 1000 matching books. If there are more matches, they'll be silently skipped with no warning to the caller. Consider either paginating (like fixAllIncompleteDescriptions) or at least logging a warning when the result is not the last page.

🐛 Proposed addition
         Page<Books> booksPage = booksRepository.searchByTitle(keyword, pageable);
+        if (!booksPage.isLast()) {
+            log.warn("키워드 '{}' 검색 결과가 1000개를 초과합니다. 일부만 처리됩니다. (전체: {})",
+                keyword, booksPage.getTotalElements());
+        }
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDataMigrationService.java`
around lines 128 - 150, The method fixDescriptionsByKeyword currently uses a
hardcoded PageRequest.of(0, 1000) which silently drops results beyond the first
page; update fixDescriptionsByKeyword (and its use of
booksRepository.searchByTitle) to iterate pages (use Pageable/Page loop) similar
to fixAllIncompleteDescriptions so all matching Books are processed, or at
minimum detect when booksPage.hasNext() and emit a warning log; ensure you
reference and update the PageRequest creation, the for loop over
booksPage.getContent(), and preserve existing exception handling in
fixBookDescription.

Comment on lines +96 to +125
public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) {
if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) {
log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도");
return completeIncompleteSentence(currentDescription);
}

try {
log.info("카카오 페이지 크롤링 시작: {}", kakaoDetailUrl);

Document doc = Jsoup.connect(kakaoDetailUrl)
.timeout(CRAWL_TIMEOUT_MS)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
.get();

// 카카오 책 상세 페이지의 책 소개 영역 추출
String enhanced = extractDescriptionFromKakaoPage(doc);

if (enhanced != null && !enhanced.isBlank() && enhanced.length() > currentDescription.length()) {
log.info("description 보완 성공 (크롤링): {} -> {} 자", currentDescription.length(), enhanced.length());
return enhanced;
} else {
log.warn("크롤링 결과가 기존 description보다 길지 않음 - 문장 마무리 시도");
return completeIncompleteSentence(currentDescription);
}

} catch (Exception e) {
log.warn("카카오 페이지 크롤링 실패: {}, 문장 마무리 시도", e.getMessage());
return completeIncompleteSentence(currentDescription);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential NullPointerException when currentDescription is null.

At line 113, currentDescription.length() will throw NPE when currentDescription is null. This path is reachable because needsEnhancement(null) returns true, and callers (e.g., BookImportService line 68) pass doc.getContents() directly, which could be null.

🐛 Proposed fix
-            if (enhanced != null && !enhanced.isBlank() && enhanced.length() > currentDescription.length()) {
+            if (enhanced != null && !enhanced.isBlank()
+                && (currentDescription == null || enhanced.length() > currentDescription.length())) {
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 96 - 125, The code in enhanceFromKakaoPage uses
currentDescription.length() which can NPE when currentDescription is null;
change enhanceFromKakaoPage to null-safe handling: compute int currentLen =
(currentDescription == null ? 0 : currentDescription.length()) and use
currentLen in the length comparison and logs, and ensure calls to
completeIncompleteSentence(currentDescription) either accept null or pass an
empty string (e.g., completeIncompleteSentence(currentDescription == null ? "" :
currentDescription)); reference methods: enhanceFromKakaoPage,
extractDescriptionFromKakaoPage, completeIncompleteSentence (and consider
needsEnhancement callers) so the function never dereferences currentDescription.

Comment on lines +105 to +108
Document doc = Jsoup.connect(kakaoDetailUrl)
.timeout(CRAWL_TIMEOUT_MS)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
.get();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the URL before connecting to mitigate SSRF risk.

kakaoDetailUrl originates from external data (Kakao API response stored in DB). If a malicious or internal URL is injected, Jsoup.connect() could be used to probe internal services. Consider validating the URL against an allowlist of expected domains (e.g., product.kyobobook.co.kr or Kakao domains) before making the request.

🛡️ Proposed validation sketch
+    private static final Set<String> ALLOWED_HOSTS = Set.of(
+        "product.kyobobook.co.kr",
+        "search.daum.net",
+        "book.daum.net"
+    );
+
+    private boolean isAllowedUrl(String url) {
+        try {
+            java.net.URL parsed = new java.net.URL(url);
+            return ALLOWED_HOSTS.contains(parsed.getHost());
+        } catch (Exception e) {
+            return false;
+        }
+    }
+
     public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) {
         if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) {
             log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도");
             return completeIncompleteSentence(currentDescription);
         }
+        if (!isAllowedUrl(kakaoDetailUrl)) {
+            log.warn("허용되지 않은 URL: {}", kakaoDetailUrl);
+            return completeIncompleteSentence(currentDescription);
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Document doc = Jsoup.connect(kakaoDetailUrl)
.timeout(CRAWL_TIMEOUT_MS)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
.get();
private static final Set<String> ALLOWED_HOSTS = Set.of(
"product.kyobobook.co.kr",
"search.daum.net",
"book.daum.net"
);
private boolean isAllowedUrl(String url) {
try {
java.net.URL parsed = new java.net.URL(url);
return ALLOWED_HOSTS.contains(parsed.getHost());
} catch (Exception e) {
return false;
}
}
public String enhanceFromKakaoPage(String kakaoDetailUrl, String currentDescription) {
if (kakaoDetailUrl == null || kakaoDetailUrl.isBlank()) {
log.warn("카카오 URL이 없어서 description 보완 불가 - 문장 마무리 시도");
return completeIncompleteSentence(currentDescription);
}
if (!isAllowedUrl(kakaoDetailUrl)) {
log.warn("허용되지 않은 URL: {}", kakaoDetailUrl);
return completeIncompleteSentence(currentDescription);
}
Document doc = Jsoup.connect(kakaoDetailUrl)
.timeout(CRAWL_TIMEOUT_MS)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
.get();
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 105 - 108, The code calls Jsoup.connect(kakaoDetailUrl) in
BookDescriptionEnhancer using an external kakaoDetailUrl; validate and sanitize
this URL before connecting to mitigate SSRF by parsing the URL (e.g., new
URL/URI) and enforcing a strict allowlist of permitted hosts and schemes (only
http/https), rejecting or skipping any URL whose host is not in the allowlist
(e.g., Kakao and known book vendor domains) or that contains suspicious
characters/redirects; perform this check inside the method that builds/uses
kakaoDetailUrl (referencing kakaoDetailUrl and the Jsoup.connect call) and
throw/log a safe error or skip processing when validation fails.

Comment on lines +191 to +208
private int findLastKoreanSentenceEnding(String text) {
String[] endings = {
"습니다", "입니다", "였습니다", "있습니다", "했습니다",
"한다", "된다", "있다", "였다", "없다",
"요", "네요", "어요", "죠"
};

int maxPosition = -1;

for (String ending : endings) {
int pos = text.lastIndexOf(ending);
if (pos > maxPosition) {
maxPosition = pos + ending.length() - 1;
}
}

return maxPosition;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug in findLastKoreanSentenceEnding: comparison mixes start and end positions.

maxPosition stores the end index of a previously found ending (pos + ending.length() - 1), but the next iteration compares the start index (pos) of a new match against this end index. This causes incorrect results when a longer ending is a substring of another (e.g., "입니다" overlaps with "습니다" in "있습니다"), potentially skipping endings that actually appear later in the text.

🐛 Proposed fix: consistently compare end positions
     private int findLastKoreanSentenceEnding(String text) {
         String[] endings = {
             "습니다", "입니다", "였습니다", "있습니다", "했습니다",
             "한다", "된다", "있다", "였다", "없다",
             "요", "네요", "어요", "죠"
         };
 
         int maxPosition = -1;
 
         for (String ending : endings) {
             int pos = text.lastIndexOf(ending);
-            if (pos > maxPosition) {
-                maxPosition = pos + ending.length() - 1;
+            if (pos != -1) {
+                int endPos = pos + ending.length() - 1;
+                if (endPos > maxPosition) {
+                    maxPosition = endPos;
+                }
             }
         }
 
         return maxPosition;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private int findLastKoreanSentenceEnding(String text) {
String[] endings = {
"습니다", "입니다", "였습니다", "있습니다", "했습니다",
"한다", "된다", "있다", "였다", "없다",
"요", "네요", "어요", "죠"
};
int maxPosition = -1;
for (String ending : endings) {
int pos = text.lastIndexOf(ending);
if (pos > maxPosition) {
maxPosition = pos + ending.length() - 1;
}
}
return maxPosition;
}
private int findLastKoreanSentenceEnding(String text) {
String[] endings = {
"습니다", "입니다", "였습니다", "있습니다", "했습니다",
"한다", "된다", "있다", "였다", "없다",
"요", "네요", "어요", "죠"
};
int maxPosition = -1;
for (String ending : endings) {
int pos = text.lastIndexOf(ending);
if (pos != -1) {
int endPos = pos + ending.length() - 1;
if (endPos > maxPosition) {
maxPosition = endPos;
}
}
}
return maxPosition;
}
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookDescriptionEnhancer.java`
around lines 191 - 208, In findLastKoreanSentenceEnding the code mixes start
index (pos) and end index (maxPosition) causing wrong comparisons; change the
loop to compute an end index for each match (e.g., endPos = pos +
ending.length() - 1) and compare endPos against a consistently-used maxEnd
(replace/maxPosition) and update maxEnd = endPos when larger so you always
compare end positions consistently within findLastKoreanSentenceEnding.

Comment on lines +66 to +70
// 2) description 보완 (카카오 API contents가 짤려있으면 크롤링)
String description = doc.getContents();
if (descriptionEnhancer.needsEnhancement(description)) {
description = descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocking HTTP crawl inside REQUIRES_NEW transaction.

The enhanceFromKakaoPage call performs a synchronous 5-second-timeout HTTP request while the REQUIRES_NEW transaction (line 38) is open. This holds a DB connection from the pool during the crawl for every book in the search results. Under load, this can exhaust the connection pool and stall unrelated queries.

If the performance concern flagged in BookDescriptionEnhancer is addressed (e.g., async enrichment), this goes away. Otherwise, at minimum consider moving description enhancement outside the transactional boundary.

🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookImportService.java`
around lines 66 - 70, The synchronous HTTP crawl in
descriptionEnhancer.enhanceFromKakaoPage(doc.getUrl(), description) is being
executed while the REQUIRES_NEW transaction in BookImportService is open,
holding DB connections; move the enhancement outside the transactional boundary
by checking descriptionEnhancer.needsEnhancement(...) and performing
enhanceFromKakaoPage(...) after the transactional method completes (or make the
enhancer asynchronous/queued so the HTTP call does not run inside the
REQUIRES_NEW transaction), updating callers in BookImportService to call the
enhancer post-transaction (or submit an async task) to avoid blocking DB
connections.

Comment on lines +47 to +72
// ✅ 1. description이 불완전하면 보완 (기존 책 대응)
if (descriptionEnhancer.needsEnhancement(book.getDescription())) {
log.info("기존 책의 description 보완 시작 - bookId: {}, title: {}", bookId, book.getTitle());
String enhanced = descriptionEnhancer.enhanceFromKakaoPage(
book.getDetailUrl(),
book.getDescription()
);

// description 업데이트
if (enhanced != null && !enhanced.equals(book.getDescription())) {
book.updateBasicInfo(
book.getTitle(),
enhanced, // 보완된 description
book.getThumbnailUrl(),
book.getDetailUrl(),
book.getPublisherName(),
book.getPublishedDate(),
book.getIsbn(),
book.getIsbn10(),
book.getIsbn13(),
book.getRawData()
);
booksRepository.save(book);
log.info("기존 책의 description 보완 완료 - bookId: {}", bookId);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Synchronous crawl on user-facing read path will spike p99 latency.

getBookDetail is a user-facing endpoint. Inserting a blocking 5-second HTTP crawl here means any book with a short description (< 200 chars) or one ending mid-sentence will see a ~5 s delay on every detail request until the description is patched. This is a significant UX regression.

As noted in the BookDescriptionEnhancer review, consider async enrichment or limiting crawl-based enrichment to the admin batch endpoint. In the query path, you could skip the crawl and only apply completeIncompleteSentence (the local trim/append-"..." fallback) for instant response.

🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/library/books/service/BookQueryServiceImpl.java`
around lines 47 - 72, The current user-facing path in BookQueryServiceImpl
synchronously calls descriptionEnhancer.enhanceFromKakaoPage when
descriptionEnhancer.needsEnhancement(...) is true, causing a blocking crawl;
change this to avoid the blocking HTTP call: in the getBookDetail/query flow, if
needsEnhancement(...) is true, do not call enhanceFromKakaoPage; instead apply
the local fast fallback (e.g., completeIncompleteSentence or trim/append "...")
to return instantly, and enqueue an asynchronous enrichment job (or flag the
book for background processing) that will later invoke
descriptionEnhancer.enhanceFromKakaoPage and persist the updated book via
booksRepository.save(book); keep synchronous crawling only in admin/batch
endpoints or the background worker.

Comment on lines +68 to +78
try {
log.info("작가 '{}' 도서 데이터 임포트 시작", query);
bookImportService.searchAndUpsert(query, 1, 10);

// 임포트 후 재조회
authorsPage = authorsRepository.searchByName(query, pageable);
} catch (Exception e) {
log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage());
// 임포트 실패 시 빈 결과 반환 (500 에러 방지)
return AuthorSearchResponse.of(List.of(), page, size, 0L);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find BookImportService class
fd -t f "BookImportService.java" --exec cat -n {} \;

Repository: Project-BookLog/BookLog-BackEnd

Length of output: 9738


🏁 Script executed:

# Search for searchAndUpsert method definition
rg -A 20 "public.*searchAndUpsert" --type java

Repository: Project-BookLog/BookLog-BackEnd

Length of output: 2840


🏁 Script executed:

# Look for custom exception classes related to book import or search
fd -t f ".*Exception.java" | head -20

Repository: Project-BookLog/BookLog-BackEnd

Length of output: 463


🏁 Script executed:

# Check the context around the try-catch in AuthorSearchService
cat -n booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java | sed -n '60,90p'

Repository: Project-BookLog/BookLog-BackEnd

Length of output: 1567


Include full exception in logging instead of only message.

Logging only e.getMessage() on line 75 loses the stack trace, making it harder to diagnose the root cause. Include the exception object as the last logger argument:

-                log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage());
+                log.warn("작가 '{}' 임포트 실패", query, e);

The broad Exception catch is reasonable here since searchAndUpsert can throw various unrelated exceptions (Kakao API errors, database failures, JSON processing errors, etc.) and the endpoint intentionally provides graceful degradation by returning empty results on import failure rather than propagating a 500 error.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
log.info("작가 '{}' 도서 데이터 임포트 시작", query);
bookImportService.searchAndUpsert(query, 1, 10);
// 임포트 후 재조회
authorsPage = authorsRepository.searchByName(query, pageable);
} catch (Exception e) {
log.warn("작가 '{}' 임포트 실패 - error: {}", query, e.getMessage());
// 임포트 실패 시 빈 결과 반환 (500 에러 방지)
return AuthorSearchResponse.of(List.of(), page, size, 0L);
}
try {
log.info("작가 '{}' 도서 데이터 임포트 시작", query);
bookImportService.searchAndUpsert(query, 1, 10);
// 임포트 후 재조회
authorsPage = authorsRepository.searchByName(query, pageable);
} catch (Exception e) {
log.warn("작가 '{}' 임포트 실패", query, e);
// 임포트 실패 시 빈 결과 반환 (500 에러 방지)
return AuthorSearchResponse.of(List.of(), page, size, 0L);
}
🤖 Prompt for AI Agents
In
`@booklog/src/main/java/com/example/booklog/domain/search/service/AuthorSearchService.java`
around lines 68 - 78, The catch block in AuthorSearchService currently logs only
e.getMessage(), losing the stack trace; update the catch to pass the exception
object to the logger so the full stack trace is recorded (keep the existing
message and parameters for query and then supply the Exception e as the final
argument to the log.warn call). This change should be made where
bookImportService.searchAndUpsert(...) is invoked and
authorsRepository.searchByName(...) is retried so the log contains both the
context ("작가 '{}' 임포트 실패 - error: {}" or similar) and the full Exception e as
the final parameter for proper stacktrace logging.

@millkk04 millkk04 merged commit 2158264 into dev Feb 10, 2026
1 check passed
This was referenced Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] 책 Description 자동 보완 기능 구현

1 participant