Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the personalized "For You" posts query by introducing a dual-path strategy: a high-performance path for fresh users with minimal interaction history, and an enhanced normal path with improved query optimization through pre-filtering, reduced dataset sizes, and conditional logic based on user engagement patterns.
- Adds a fast-track query path for fresh users (< 5 follows, < 10 likes) that skips complex personalization
- Refactors the normal path with pre-calculation optimizations, earlier LIMIT clauses, and conditional query generation
- Reduces intermediate result sets through strategic filtering and hard limits on candidate posts
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| WHERE p."is_deleted" = false | ||
| AND p."type" IN ('POST', 'QUOTE') | ||
| AND p."created_at" > NOW() - INTERVAL '7 days' -- Shorter window for fresh users | ||
| AND p."interest_id" IN (${interestIdsString}) |
There was a problem hiding this comment.
SQL injection vulnerability: The interestIdsString variable is directly interpolated into the SQL query using string interpolation. This creates a security risk even if the values are expected to be integers, as malicious input could potentially manipulate the query structure.
| FROM "Like" l | ||
| JOIN "posts" p ON l."post_id" = p."id" | ||
| WHERE l."user_id" = ${userId} | ||
| AND l."created_at" > '${dateStr}' -- Optimization: Only recent likes matter for weighting |
There was a problem hiding this comment.
SQL injection vulnerability: The dateStr variable is directly interpolated into the SQL query. Although it's derived from a Date object, this approach bypasses Prisma's parameterization. Use proper parameterized queries to prevent potential SQL injection.
| ROW_NUMBER() OVER (PARTITION BY p."interest_id" ORDER BY p."created_at" DESC) as rn | ||
| FROM "posts" p | ||
| WHERE p."is_deleted" = false | ||
| WHERE p."created_at" > '${dateStr}' |
There was a problem hiding this comment.
SQL injection vulnerability: The dateStr variable is directly interpolated into the SQL query without parameterization. This creates a security risk as it bypasses Prisma's built-in SQL injection protection mechanisms.
| CASE WHEN ap."user_id" = ${userId} THEN 3 | ||
| WHEN uf.following_id IS NOT NULL THEN 2 | ||
| WHEN la.author_id IS NOT NULL THEN 1 | ||
| ELSE 0 | ||
| END as pre_score |
There was a problem hiding this comment.
The pre_score values (3, 2, 1, 0) are magic numbers without clear documentation. These scoring weights should be defined as named constants alongside the other personalization weights to maintain consistency and improve code maintainability.
| FROM "posts" p | ||
| WHERE p."is_deleted" = false | ||
| AND p."type" IN ('POST', 'QUOTE') | ||
| AND p."created_at" > NOW() - INTERVAL '7 days' -- Shorter window for fresh users |
There was a problem hiding this comment.
The time window filter uses 7 days for fresh users, but the comment at line 1671-1674 and the dateStr calculation sets up a 14-day window for normal users. However, the fresh user query comment on line 1607 says "7 days" but this inconsistency between fresh and normal user time windows should be intentional. The problem is that the fresh user path will miss posts from days 8-14 that the normal path would include, potentially creating an inconsistent experience. Consider documenting why this difference exists or using a consistent time window.
| AND p."created_at" > NOW() - INTERVAL '7 days' -- Shorter window for fresh users | |
| -- NOTE: We intentionally use a 7-day window for fresh users (vs. 14 days for normal users) | |
| -- to ensure new users see only the most recent and relevant content, and to avoid overwhelming | |
| -- them with too much information. This difference is by design. | |
| AND p."created_at" > NOW() - INTERVAL '7 days' |
| AND p."created_at" > NOW() - INTERVAL '7 days' -- Shorter window for fresh users | ||
| AND p."interest_id" IN (${interestIdsString}) | ||
| ORDER BY p."created_at" DESC | ||
| LIMIT ${limit} OFFSET ${offset} |
There was a problem hiding this comment.
SQL injection vulnerability: The limit and offset parameters are directly interpolated into the SQL query without validation. These should be properly validated as positive integers and parameterized to prevent SQL injection attacks.
| CASE WHEN ap."user_id" = ${userId} THEN ${personalizationWeights.ownPost} ELSE 0 END + | ||
| CASE WHEN uf.following_id IS NOT NULL THEN ${personalizationWeights.following} ELSE 0 END + | ||
| CASE WHEN la.author_id IS NOT NULL THEN ${personalizationWeights.directLike} ELSE 0 END + | ||
| tc.pre_score * 5.0 + -- Reuse pre-calculated score |
There was a problem hiding this comment.
The multiplication factor of 5.0 for the pre_score is a magic number without documentation. This should be defined as a named constant (e.g., PRE_SCORE_MULTIPLIER) alongside the other personalization weights to maintain consistency and make the scoring logic more transparent.
| // --------------------------------------------------------- | ||
| if (isFreshUser && hasInterests) { | ||
| // Just stringify IDs for the IN clause (safe for integers) | ||
| const interestIdsString = interestIds.join(','); |
There was a problem hiding this comment.
SQL injection vulnerability: The interestIdsString is constructed by directly joining user interest IDs without proper parameterization. While the IDs are expected to be integers, this approach is unsafe. Use Prisma's parameterized queries or properly validate and sanitize the input to prevent potential SQL injection attacks.
| raw_reposts AS ( | ||
| SELECT r."post_id", r."user_id", r."created_at" | ||
| FROM "Repost" r | ||
| WHERE r."created_at" > '${dateStr}' |
There was a problem hiding this comment.
SQL injection vulnerability: The dateStr variable is directly interpolated into the SQL query without parameterization. This approach is unsafe and should use Prisma's parameterized query functionality.
| ), | ||
| -- Get posts from user's interests with time window (no per-interest limit) | ||
| -- Optimization: Limit original posts EARLIER. | ||
| -- The partition by interest is heavy, but necessary for diversity. |
There was a problem hiding this comment.
The comment mentions "The partition by interest is heavy, but necessary for diversity" but doesn't explain what diversity means in this context or why partitioning by interest achieves it. Consider expanding this comment to explain that it ensures a balanced representation of posts across different interest categories rather than overwhelming the feed with posts from the most active interest.
| -- The partition by interest is heavy, but necessary for diversity. | |
| -- The partition by interest is computationally heavy, but necessary to ensure diversity in the feed. | |
| -- Specifically, partitioning by interest_id ensures a balanced representation of posts from different interest categories, | |
| -- preventing the feed from being dominated by posts from the most active interests and promoting a varied user experience. |
No description provided.