In [None]:
I came across this post on StackOverflow titled as:

Optimize GROUP BY query to retrieve latest row per user




In [None]:
Let us assume we have a table (users log table) like as follows:

CREATE TABLE log (
    log_date DATE,
    user_id  INTEGER,
    payload  INTEGER
);



In [None]:
Grouping by to figure out the latest record per user within certain date is extremely slow:
    
SELECT user_id, max(log_date), max(payload) 
FROM log 
WHERE log_date <= :mydate 
GROUP BY user_id



In [None]:
Is there a way to speed up the row retrieval? thats where LATERAL JOIN comes in handy!




The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items
that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot 
                                         cross-reference any other FROM item.)

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item 
providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the 
LATERAL item is evaluated using that row or row set’s values of the columns. The resulting row(s) are joined as 
usual with the rows they were computed from. This is repeated for each row or set of rows from the column source 
table(s).

This is a bit dense. Loosely, it means that a LATERAL join is like a SQL foreach loop, in which PostgreSQL will 
iterate over each row in a result set and evaluate a subquery using that row as a parameter.


If we assume that we a table with unique userid as primary key then we can Lateral Join that with the log table.

Still not sure how LATERAL JOIN (by this we mean adding LATERAL to a JOIN statement. it does not have to be a CROSS 
                                 JOIN per se)
is speeding up the process. It is clear that it runs efficiently though.

In [None]:
can specify the mydate with specific date, say, mydate="2020-09-20"

SELECT u.user_id, l.log_date, l.payload
FROM   users u
CROSS JOIN LATERAL (
   SELECT l.log_date, l.payload
   FROM   log l
   WHERE  l.user_id = u.user_id
   AND    l.log_date <= :mydate
   ORDER  BY l.log_date DESC NULLS LAST
   LIMIT  1
   ) l;