#### ID 2021

```Redfin helps clients to find agents. Each client will have a unique request_id and each request_id has several calls. For each request_id, the first call is an “initial call” and all the following calls are “update calls”.  What's the average call duration for all initial calls?```

In [None]:
%%sql
WITH cte AS (SELECT request_id,
                    call_duration,
                    RANK() OVER (PARTITION BY request_id ORDER BY created_on) AS rnk
             FROM redfin_call_tracking)
SELECT AVG(call_duration)
FROM cte
WHERE rnk = 1

In [None]:
df = redfin_call_tracking
df['rnk'] = df.groupby('request_id')['created_on'].rank(method='first', ascending=True)
df.query('rnk == 1')['call_duration'].mean()

#### ID 2022

```Redfin helps clients to find agents. Each client will have a unique request_id and each request_id has several calls. For each request_id, the first call is an “initial call” and all the following calls are “update calls”.  What's the average call duration for all update calls?```

In [None]:
%%sql
SELECT AVG(call_duration)
FROM (SELECT call_duration,
             DENSE_RANK() OVER (PARTITION BY request_id ORDER BY created_on) AS rnk
      FROM redfin_call_tracking) t1
WHERE rnk > 1

In [None]:
df = redfin_call_tracking
df['rnk'] = df.sort_values('created_on').groupby('request_id')['created_on'].rank(method='dense')
df.query('rnk > 1')['call_duration'].mean()

#### 2023

```Redfin helps clients to find agents. Each client will have a unique request_id and each request_id has several calls. For each request_id, the first call is an “initial call” and all the following calls are “update calls”.  How many customers have called 3 or more times between 3 PM and 6 PM (initial and update calls combined)?```

In [None]:
%%sql
WITH total_calls AS (SELECT request_id, COUNT(call_duration) AS cnt
             FROM redfin_call_tracking
             WHERE EXTRACT(HOUR FROM created_on) BETWEEN 15 AND 18
             GROUP BY request_id
             HAVING COUNT(call_duration) >= 3)
SELECT COUNT(request_id)
FROM total_calls

In [None]:
df = redfin_call_tracking
df[(df['created_on'].dt.hour >= 15) & (df['created_on'].dt.hour <= 18)].groupby('request_id', as_index=False).agg(total_cnt=('call_duration', 'count')).query('total_cnt >= 3')['request_id'].count()

#### ID 2025

```Write a query that returns a number of users who are exclusive to only one client. Output the client_id and number of exclusive users.```

In [None]:
%%sql
WITH distinct_users AS (SELECT user_id, COUNT(DISTINCT client_id)
                        FROM fact_events
                        GROUP BY user_id
                        HAVING COUNT(DISTINCT client_id) = 1)

SELECT client_id, COUNT(DISTINCT fe.user_id)
FROM fact_events fe
         JOIN distinct_users du ON fe.user_id = du.user_id
GROUP BY client_id

In [None]:
df = fact_events
grouped_users = df.groupby('user_id', as_index=False).agg(cnt=('client_id', 'nunique')).query('cnt == 1')['user_id'].to_list()
df.query('user_id.isin(@grouped_users)').groupby('client_id', as_index=False).agg(cnt_users=('user_id', 'nunique'))

#### ID 2027

```Write a query that returns the company (customer id column) with highest number of users that use desktop only.```

In [None]:
%%sql
SELECT customer_id
FROM (SELECT customer_id,
             RANK() OVER (
                 ORDER BY COUNT(DISTINCT user_id) DESC) AS rnk
      FROM fact_events
      WHERE user_id IN (SELECT user_id
                        FROM fact_events
                        GROUP BY user_id
                        HAVING COUNT(DISTINCT client_id) = 1)
        AND client_id = 'desktop'
      GROUP BY customer_id) t1
WHERE rnk = 1

In [None]:
df = fact_events
result = df.groupby('user_id', as_index=False).agg(cnt=('client_id', 'nunique')).query('cnt == 1')
result['rnk'] = result.sort_values('cnt', ascending=False)['cnt'].rank(method='dense')
df[df['user_id'].isin(result.query('rnk == 1')['user_id'])]['customer_id'].unique()

#### ID 2039 

```Find the number of unique transactions and total sales for each of the product categories in 2017. Output the product categories, number of transactions, and total sales in descending order. The sales column represents the total cost the customer paid for the product so no additional calculations need to be done on the column. Only include product categories that have products sold.```

In [None]:
%%sql
SELECT product_category, COUNT(DISTINCT transaction_id), SUM(sales) AS sales
FROM wfm_transactions AS wt
         JOIN wfm_products AS wp ON wt.product_id = wp.product_id
WHERE EXTRACT(YEAR FROM transaction_date) = 2017
GROUP BY product_category
ORDER BY sales DESC

In [None]:
df = pd.merge(wfm_transactions, wfm_products, on='product_id', how='inner')
df['year'] = df['transaction_date'].dt.year
df.query('year == 2017').groupby('product_category', as_index=False).agg(cnt=('transaction_id', 'nunique'), sum=('sales', 'sum'))