#### ID 2080

```Count the number of unique users per day who logged in from both a mobile device and web. Output the date and the corresponding number of users.```

In [None]:
%%sql
SELECT ml.date, COUNT(DISTINCT ml.user_id) AS cnt
FROM mobile_logs AS ml
         JOIN web_logs AS wl ON ml.user_id = wl.user_id AND ml.date = wl.date
GROUP BY ml.date

In [None]:
df = pd.merge(mobile_logs, web_logs, how='inner', left_on=['user_id', 'date'], right_on=['user_id', 'date'])

df.groupby('date', as_index=False).agg(cnt=('user_id', 'nunique'))

#### ID 2083

```Count how many claims submitted in December 2021 are still pending. A claim is pending when it has neither an acceptance nor rejection date.```

In [None]:
%%sql
SELECT COUNT(claim_id) AS n_claims
FROM cvs_claims
WHERE (EXTRACT(YEAR FROM date_submitted) = 2021 AND
       EXTRACT(MONTH FROM date_submitted) = 12)
  AND (date_accepted IS NULL AND date_rejected IS NULL)

In [None]:
df = cvs_claims
df['year'] = df['date_submitted'].dt.year
df['month'] = df['date_submitted'].dt.month

df.query('year == 2021 & month == 12 & date_accepted.isnull() & date_rejected.isnull()')['claim_id'].count()

#### ID 2084

```You are given a table of users who have been blocked from Facebook, together with the date, duration, and the reason for the blocking. The duration is expressed as the number of days after blocking date and if this field is empty, this means that a user is blocked permanently. For each blocking reason, count how many users were blocked in December 2021. Include both the users who were blocked in December 2021 and those who were blocked before but remained blocked for at least a part of December 2021.```

In [None]:
%%sql
SELECT block_reason, COUNT(DISTINCT user_id) AS n_users
FROM fb_blocked_users
WHERE DATE_TRUNC('month', block_date) = '2021-12-01'
   OR (block_date < '2021-12-01' AND
       ((block_date + INTERVAL '1' DAY * block_duration) >= '2021-12-01' OR
        block_duration IS NULL))
GROUP BY block_reason;

In [None]:
df = fb_blocked_users

df['end_period'] = df['block_date'] + pd.to_timedelta(df['block_duration'], unit='d')

filtered_df = df[(df['block_date'].dt.to_period('M') == '2021-12') | (
            (df['block_date'] < '2021-12-01') & ((df['end_period'].isnull()) | (df['end_period'] >= '2021-12-01')))]

result = filtered_df.groupby('block_reason', as_index=False).agg(n_users=('user_id', 'nunique'))

#### ID 2086

```Count the total number of distinct conversations on WhatsApp. Two users share a conversation if there is at least 1 message between them. Multiple messages between the same pair of users are considered a single conversation.```

In [None]:
%%sql
SELECT DISTINCT COUNT(*)
FROM (SELECT message_sender_id, message_receiver_id
      FROM whatsapp_messages
      UNION
      SELECT message_receiver_id AS message_sender_id,
             message_sender_id   AS message_receiver_id
      FROM whatsapp_messages) t1
WHERE message_sender_id < message_receiver_id;

In [None]:
df1 = whatsapp_messages.rename(columns={'message_sender_id': 'user_1', 'message_receiver_id': 'user_2'})[
    ['user_1', 'user_2']]
df2 = whatsapp_messages.rename(columns={'message_sender_id': 'user_2', 'message_receiver_id': 'user_1'})[
    ['user_1', 'user_2']]

df = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)

df.query('user_1 > user_2').count().values[0]

#### ID 2091

```For each video game player, find the latest date when they logged in.```

In [None]:
%%sql
SELECT player_id, MAX(login_date) AS last_date
FROM players_logins
GROUP BY player_id

In [None]:
df = players_logins
df.groupby('player_id', as_index=False).agg(max_date=('login_date', 'max'))

#### ID 2092

```You have been asked to find the top 3 merchants for each day with the highest number of orders on that day. In the event of a tie, multiple merchants may share the same spot, but each day at least one merchant must be in first, second, and third place. Your output should include the date in the format YYYY-MM-DD, the name of the merchant, and their place in the daily ranking.```

In [None]:
%%sql
WITH cnt_orders_by_date_merchant AS (SELECT order_timestamp::DATE AS order_day,
                                            merchant_id,
                                            COUNT(id)             AS cnt_orders
                                     FROM order_details
                                     GROUP BY order_day, merchant_id),
     ranked_cnt_orders_by_date_merchant AS (SELECT order_day,
                                                   name,
                                                   DENSE_RANK()
                                                   OVER (PARTITION BY order_day ORDER BY cnt_orders DESC) AS ranking
                                            FROM cnt_orders_by_date_merchant
                                                     JOIN merchant_details AS md
                                                          ON cnt_orders_by_date_merchant.merchant_id = md.id)
SELECT order_day, name, ranking
FROM ranked_cnt_orders_by_date_merchant
WHERE ranking <= 3

In [None]:
df['date'] = df['order_timestamp'].dt.strftime('%Y-%m-%d')

grouped = df.groupby(['date', 'merchant_id'], as_index=False).agg(n_orders=('id', 'count'))

grouped['ranking'] = grouped.groupby('date')['n_orders'].rank(method='dense', ascending=False)

grouped.query('ranking <= 3').merge(merchant_details, how='left', left_on='merchant_id', right_on='id')[
    ['date', 'name', 'ranking']]

#### ID 2096

```Find the number of actions that ClassPass workers did for tasks completed in January 2022. The completed tasks are these rows in the asana_actions table with 'action_name' equal to CompleteTask. Note that each row in the dataset indicates how many actions of a certain type one user has performed in one day and the number of actions is stored in the 'num_actions' column. Output the ID of the user and a total number of actions they performed for tasks they completed. If a user from this company did not complete any tasks in the given period of time, you should still output their ID and the number 0 in the second column.```

In [None]:
%%sql
SELECT aa.user_id,
       SUM(CASE
               WHEN action_name = 'CompleteTask' THEN num_actions
               ELSE 0 END) AS n_completed_tasks
FROM asana_actions aa
         JOIN asana_users au ON aa.user_id = au.user_id
WHERE DATE_PART('year', date) = 2022
  AND DATE_PART('month', date) = 1
  AND company = 'ClassPass'
GROUP BY aa.user_id

In [None]:
df = pd.merge(asana_actions, asana_users, how='inner', on='user_id')
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

df.query('company == "ClassPass" & month == 1 & year == 2022').groupby('user_id', as_index=False).agg(
    num_actions=('num_actions', lambda x: x[df['action_name'] == 'CompleteTask'].sum()))

#### ID 2098

```A group of travelers embark on world tours starting with their home cities. Each traveler has an undecided itinerary that evolves over the course of the tour. Some travelers decide to abruptly end their journey mid-travel and live in their last destination. Given the dataset of dates on which they travelled between different pairs of cities, can you find out how many travellers ended back in their home city? For simplicity, you can assume that each traveler made at most one trip between two cities in a day.```

In [None]:
%%sql
WITH cte AS (SELECT traveler,
                    FIRST_VALUE(start_city)
                    OVER (PARTITION BY traveler ORDER BY date)      AS home_city_start,
                    FIRST_VALUE(end_city)
                    OVER (PARTITION BY traveler ORDER BY date DESC) AS home_city_end
             FROM travel_history
             ORDER BY traveler, date)
SELECT COUNT(DISTINCT traveler) AS n_travelers_returned
FROM cte
WHERE home_city_start = home_city_end

In [None]:
df = travel_history

df['home_city_start'] = df.sort_values('date').groupby('traveler')['start_city'].transform('first')

df['home_city_end'] = df.sort_values('date', ascending=False).groupby('traveler')['end_city'].transform('first')

df.query('home_city_start == home_city_end')['traveler'].nunique()

#### ID 2099

```The election is conducted in a city and everyone can vote for one or more candidates, or choose not to vote at all. Each person has 1 vote so if they vote for multiple candidates, their vote gets equally split across these candidates. For example, if a person votes for 2 candidates, these candidates receive an equivalent of 0.5 vote each. Find out who got the most votes and won the election. Output the name of the candidate or multiple names in case of a tie. To avoid issues with a floating-point error you can round the number of votes received by a candidate to 3 decimal places.```

In [None]:
%%sql
WITH cnt_votes_by_voter AS (SELECT voter, COUNT(candidate) cnt_votes
                            FROM voting_results
                            GROUP BY voter),
     calc_weight_votes AS (SELECT cnt_votes_by_voter.voter,
                                  1.0 /
                                  (CASE WHEN cnt_votes != 0 THEN cnt_votes ELSE NULL END) AS vote_weight,
                                  candidate
                           FROM cnt_votes_by_voter
                                    JOIN voting_results AS vr
                                         ON cnt_votes_by_voter.voter = vr.voter),
     total_votes_by_candidate AS (SELECT candidate,
                                         SUM(vote_weight),
                                         DENSE_RANK() OVER (ORDER BY SUM(vote_weight) DESC) AS rnk
                                  FROM calc_weight_votes
                                  GROUP BY candidate)
SELECT candidate
FROM total_votes_by_candidate
WHERE rnk = 1

In [None]:
df = voting_results

cnt_votes_by_voter = df.groupby('voter', as_index=False).agg(cnt_votes=('candidate', 'count'))

cnt_votes_by_voter['vote_weight'] = 1 / cnt_votes_by_voter['cnt_votes']

all_date = pd.merge(cnt_votes_by_voter, voting_results, how='inner', on='voter')

all_date.groupby('candidate', as_index=False).agg(total_votes=('vote_weight', 'sum')).nlargest(1, 'total_votes',
                                                                                               keep='all')['candidate']