#### ID 2140

```American Express is reviewing their customers' transactions, and you have been tasked with locating the customer who has the third highest total transaction amount. The output should include the customer's id, as well as their first name and last name. For ranking the customers, use type of ranking with no gaps between subsequent ranks.```

In [None]:
%%sql
WITH cte AS (SELECT cust_id,
                    SUM(total_order_cost)                              AS total_cost,
                    DENSE_RANK() OVER (ORDER BY SUM(total_order_cost)) AS rnk
             FROM card_orders
             GROUP BY cust_id)
SELECT cust_id, first_name, last_name
FROM cte
         JOIN customers AS c ON cte.cust_id = c.id
WHERE rnk = 3

In [None]:
df = card_orders

groped_df = df.groupby('cust_id', as_index=False).agg(total_cost=('total_order_cost', 'sum'))

groped_df['rnk'] = groped_df['total_cost'].rank(method='dense')

groped_df.query('rnk == 3').merge(customers, how='inner', left_on='cust_id', right_on='id')[
    ['id', 'first_name', 'last_name']]

#### ID 2141

```Amazon's information technology department is looking for information on employees' most recent logins. The output should include all information related to each employee's most recent login.```

In [None]:
%%sql
WITH cte
         AS (SELECT DENSE_RANK()
                    OVER (PARTITION BY worker_id ORDER BY login_timestamp DESC),
                    id,
                    worker_id,
                    login_timestamp,
                    ip_address,
                    country,
                    region,
                    city,
                    device_type
             FROM worker_logins)
SELECT id,
       worker_id,
       login_timestamp,
       ip_address,
       country,
       region,
       city,
       device_type
FROM cte
WHERE dense_rank = 1
ORDER BY id

In [None]:
df = worker_logins

df['rnk'] = df.groupby('worker_id')['login_timestamp'].rank(method='first', ascending=False)

df.query('rnk == 1').drop(columns=['rnk'])

#### ID 2142

```You've been asked by Amazon to find the shipment_id and weight of the third heaviest shipment. Output the shipment_id, and total_weight for that shipment_id. In the event of a tie, do not skip ranks.```

In [None]:
%%sql
WITH cte AS (SELECT shipment_id,
                    SUM(weight) AS total_weight,
                    DENSE_RANK() OVER (ORDER BY SUM(weight) DESC)
             FROM amazon_shipment
             GROUP BY shipment_id)
SELECT shipment_id, total_weight
FROM cte
WHERE dense_rank = 3

In [None]:
df = amazon_shipment

grouped_df = df.groupby('shipment_id', as_index=False).agg(total_weight=('weight', 'sum'))

grouped_df['rnk'] = grouped_df['total_weight'].rank(method='dense', ascending=False)

grouped_df.query('rnk == 3').drop(columns=['rnk'])

#### ID 2143

```Bank of Ireland has requested that you detect invalid transactions in December 2022. An invalid transaction is one that occurs outside of the bank's normal business hours. The following are the hours of operation for all branches: Monday - Friday 09:00 - 16:00 Saturday & Sunday Closed Irish Public Holidays 25th and 26th December Determine the transaction ids of all invalid transactions.```

In [None]:
%%sql
SELECT transaction_id
FROM boi_transactions
WHERE transaction_id IN (SELECT transaction_id
                             FROM boi_transactions
                             WHERE EXTRACT(MONTH FROM time_stamp) = 12
                               AND EXTRACT(DAY FROM time_stamp) IN (25, 26)
                             UNION
                             SELECT transaction_id
                             FROM boi_transactions
                             WHERE EXTRACT(DOW FROM time_stamp) IN (6, 0)
                             UNION
                             SELECT transaction_id
                             FROM boi_transactions
                             WHERE EXTRACT(DOW FROM time_stamp) BETWEEN 1 AND 5
                               AND EXTRACT(HOUR FROM time_stamp) NOT BETWEEN 9 AND 15)

In [None]:
df = boi_transactions

holidays = df.query('time_stamp.dt.month == 12 & time_stamp.dt.day in ([25, 26])')['transaction_id'].to_list()

weekends = df.query('time_stamp.dt.dayofweek in ([5, 6])')['transaction_id'].to_list()

non_work_hours = df.query('time_stamp.dt.dayofweek.between(0, 4) & ~time_stamp.dt.hour.between(9, 15)')[
    'transaction_id'].to_list()

combined_list = holidays + weekends + non_work_hours

df.query('transaction_id.isin(@combined_list)')['transaction_id']

#### ID 2144

```A major airline has enlisted Tata Consultancy's help to improve customer satisfaction on its flights. Their goal is to increase customer satisfaction among people between the ages of 30 and 40. You've been tasked with calculating the customer satisfaction average for this age group across all three flight classes for 2022. Return the class with the average of satisfaction rounded to the nearest whole number. Note: Only survey results from flights in 2022 are included in the dataset.```

In [None]:
%%sql
SELECT class, ROUND(AVG(satisfaction)) AS pc_score
FROM survey_results AS sr
         JOIN loyalty_customers AS lc ON sr.cust_id = lc.cust_id
WHERE age BETWEEN 30 AND 39
GROUP BY class

In [None]:
df = pd.merge(survey_results, loyalty_customers, how='inner', on='cust_id')

df.query('age.between(30, 39)').groupby('class', as_index=False).agg(avg_score=('satisfaction', 'mean')).round()

#### ID 2145

```Tiktok want to find out what were the top two most active user days during an advertising campaign they ran in the first week of August 2022 (between the 1st to the 7th). Identify the two days with the highest user activity during the advertising campaign. They've also specified that user activity must be measured in terms of unique users. Output the day, date, and number of users. Be careful that some function can add a padding (whitespaces) around the string, for a solution to be correct you should trim the extra padding.```

In [None]:
%%sql
WITH cte AS (SELECT TO_CHAR(date_visited, 'Day')                          AS day_of_week,
                    DATE_TRUNC('day', date_visited)::DATE                 AS date_visited,
                    COUNT(DISTINCT user_id)                               AS n_users,
                    DENSE_RANK() OVER (ORDER BY COUNT
                                                (DISTINCT user_id) DESC ) AS rnk
             FROM user_streaks
             WHERE DATE_TRUNC('day', date_visited) BETWEEN '2022-08-01' AND '2022-08-07'
             GROUP BY day_of_week, date_visited)
SELECT day_of_week, date_visited, n_users
FROM cte
WHERE rnk <= 2

In [None]:
df = user_streaks.query('date_visited >= "2022-08-01" and date_visited <= "2022-08-07"')

df['date'] = df['date_visited'].dt.date
df['day_of_week'] = df['date_visited'].dt.strftime('%A')

df.groupby(['day_of_week', 'date'], as_index=False).agg(n_users=('user_id', 'nunique')).nlargest(2, 'n_users',
                                                                                                 keep='all')

#### ID 2148

```You have been asked to calculate the rolling average for book sales in 2022. A rolling average continuously updates a data set's average to include all data in the set up to that point. For example, the rolling average for February would be calculated by adding the book sales from January and February and dividing by two; the rolling average for March would be calculated by adding the book sales from January, February, and March and dividing by three; and so on. Output the month, the sales for that month, and an extra column containing the rolling average rounded to the nearest whole number.```

In [None]:
%%sql
SELECT EXTRACT(MONTH FROM order_date)                                                                         AS order_month,
       SUM(quantity * unit_price)                                                                             AS sales,
       ROUND(AVG(SUM(quantity * unit_price))
             OVER (ORDER BY EXTRACT(MONTH FROM order_date) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS rolling_average
FROM book_orders AS bo
         JOIN amazon_books AS ab ON bo.book_id = ab.book_id
WHERE DATE_TRUNC('year', order_date) = '2022-01-01'
GROUP BY order_month

In [None]:
df = pd.merge(book_orders, amazon_books, how='inner', on='book_id')

df['order_month'] = df['order_date'].dt.month

df['sales'] = df['quantity'] * df['unit_price']

result_df = df.query('order_date.dt.year == 2022').groupby('order_month', as_index=False).agg(
    monthly_sales=('sales', 'sum'))

result_df['rolling_average'] = result_df['monthly_sales'].expanding().mean().round()

result_df

#### ID 2149

```Following a recent advertising campaign, you have been asked to compare the sales of consumable products across all brands. Compare the brands by finding the percentage of unique customers (among all customers in the dataset) who purchased consumable products from each brand. Your output should contain the brand_name and percentage_of_customers rounded to the nearest whole number and ordered in descending order.```

In [None]:
%%sql
SELECT brand_name,
       ROUND(COUNT(DISTINCT customer_id) FILTER (WHERE product_family = 'CONSUMABLE') *
             100.0 / (SELECT COUNT(DISTINCT customer_id) FROM online_orders)) AS pc_cust
FROM online_orders oo
         JOIN online_products op USING (product_id)
GROUP BY brand_name
HAVING COUNT(DISTINCT customer_id) FILTER (WHERE product_family = 'CONSUMABLE') > 0

In [None]:
n_customers = online_orders['customer_id'].nunique()

df = pd.merge(online_orders, online_products, how='inner', on='product_id')

df.query('product_family == "CONSUMABLE"').groupby('brand_name', as_index=False).agg(
    n_customers=('customer_id', lambda x: x.nunique() * 100 / n_customers))

#### ID 2151

```You have been asked to find the number of employees hired between the months of January and July in the year 2022 inclusive. Your output should contain the number of employees hired in this given time frame.```

In [None]:
%%sql
SELECT COUNT(id) AS hired_emp
FROM employees
WHERE DATE_TRUNC('month', joining_date) BETWEEN '2022-01-01' AND '2022-07-01'

In [None]:
df = employees

df[df['joining_date'].dt.to_period('M').between('2022-01-01', '2022-07-01')]['id'].count()

#### ID 2152

```It's time to find out who is the top employee. You've been tasked with finding the employee (or employees, in the case of a tie) who have received the most votes. A vote is recorded when a customer leaves their 10-digit phone number in the free text customer_response column of their sign up response (occurrence of any number sequence with exactly 10 digits is considered as a phone number). Output the top employee and the number of customer responses that left a number.```

In [None]:
%%sql
WITH cte AS (SELECT employee_id,
                    COUNT(employee_id)                                   AS cust_numbers,
                    DENSE_RANK() OVER (ORDER BY COUNT(employee_id) DESC) AS rnk
             FROM customer_responses
             WHERE customer_response ~ '\d{10}'
             GROUP BY employee_id)
SELECT employee_id, cust_numbers
FROM cte
WHERE rnk = 1

In [None]:
df = customer_responses

df.query('customer_response.str.contains("\d{10}", regex=True)').groupby('employee_id', as_index=False).agg(
    cust_numbers=('employee_id', 'count')).nlargest(1, 'cust_numbers', keep='all')

#### ID 2154

```The company you are working for wants to anticipate their staffing needs by identifying their top two busiest times of the week. To find this, each day should be segmented into differents parts using following criteria: Morning: Before 12 p.m. (not inclusive) Early afternoon: 12 -15 p.m. Late afternoon: after 15 p.m. (not inclusive) Your output should include the day and time of day combination for the two busiest times, i.e. the combinations with the most orders, along with the number of orders (e.g. top two results could be Friday Late afternoon with 12 orders and Sunday Morning with 10 orders). The company has also requested that the day be displayed in text format (i.e. Monday). Note: In the event of a tie in ranking, all results should be displayed.```

In [None]:
%%sql
WITH cte AS (SELECT TO_CHAR(timestamp, 'Day')                         AS day_of_week
                  , CASE
                        WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 0 AND 11 THEN 'Morning'
                        WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 12 AND 15
                            THEN 'Early afternoon'
                        ELSE 'Late afternoon' END                     AS time_of_day
                  , COUNT(order_id)                                   AS total_orders
                  , DENSE_RANK() OVER (ORDER BY COUNT(order_id) DESC) AS rnk
             FROM sales_log
             GROUP BY day_of_week, time_of_day)
SELECT day_of_week
     , time_of_day
     , total_orders
FROM cte
WHERE rnk <= 2

In [None]:
df = sales_log

df['date'] = df['timestamp'].dt.strftime('%A')

df['time_of_day'] = np.select([df['timestamp'].dt.hour.between(0, 11), df['timestamp'].dt.hour.between(12, 15)],
                              ['Morning', 'Early afternoon'], 'Late afternoon')

df.groupby(['date', 'time_of_day'], as_index=False).agg(total_orders=('order_id', 'count')).nlargest(2, 'total_orders',
                                                                                                     keep='all')

#### ID 2156

```You have been tasked with finding the worker IDs of individuals who logged in between the 13th to the 19th inclusive of December 2021. In your output, provide the unique worker IDs for the dates requested.'```

In [None]:
%%sql
SELECT DISTINCT worker_id
FROM worker_logins
WHERE login_timestamp BETWEEN '2021-12-13' AND '2021-12-20'

In [None]:
df = worker_logins

df.query('login_timestamp.between("2021-12-13", "2021-12-19")')['worker_id'].drop_duplicates()

#### ID 2157

```You have been asked to compare sales of the current month, May, to those of the previous month, April. The company requested that you only display products whose sales (UNITS SOLD * PRICE) have increased by more than 10% from the previous month to the current month. Your output should include the product id and the percentage growth in sales.```

In [None]:
%%sql
WITH cte AS (SELECT product_id,
                    DATE_TRUNC('month', date)                                         AS period,
                    SUM(units_sold * cost_in_dollars)                                 AS sales,
                    LAG(SUM(units_sold * cost_in_dollars))
                    OVER (PARTITION BY product_id ORDER BY DATE_TRUNC('month', date)) AS prev_sales
             FROM online_orders
             WHERE DATE_TRUNC('month', date) IN ('2022-04-01', '2022-05-01')
             GROUP BY product_id, period)
SELECT product_id, (sales - prev_sales) * 100.0 / prev_sales AS pc_growth
FROM cte
WHERE prev_sales IS NOT NULL
  AND (sales - prev_sales) * 100.0 / prev_sales >= 10

In [None]:
df = online_orders

df['month'] = df['date'].dt.to_period('M')

df['sales'] = df['units_sold'] * df['cost_in_dollars']

groped_df = df[(df['date'].dt.to_period('M') == '2022-04') | (df['date'].dt.to_period('M') == '2022-05')].groupby(
    ['product_id', 'month'], as_index=False).agg(total_sales=('sales', 'sum'))

groped_df['prev_sales'] = groped_df.groupby('product_id')['total_sales'].shift(1)

groped_df['pc_growth'] = (groped_df['total_sales'] - groped_df['prev_sales']) * 100 / groped_df['prev_sales']

groped_df[(~groped_df['prev_sales'].isnull()) & (groped_df['pc_growth'] > 10)][['product_id', 'pc_growth']]

#### ID 2159

```You have been asked to get a list of all the sign up IDs with transaction start dates in either April or May. Since a sign up ID can be used for multiple transactions only output the unique ID. Your output should contain a list of non duplicated sign-up IDs.```

In [None]:
%%sql
SELECT DISTINCT signup_id
FROM transactions
WHERE DATE_TRUNC('month', transaction_start_date)::DATE IN ('2020-04-01', '2020-05-01')

In [None]:
df = transactions

df.query('transaction_start_date >= "2020-04-01" & transaction_start_date < "2020-06-01"')['signup_id'].unique()

#### ID 2160

```The sales division is investigating their sales for the past month in Oregon. Calculate the total revenue generated from Oregon-based customers for April.```

In [None]:
%%sql
SELECT SUM(cost_in_dollars * units_sold) AS total_sales
FROM online_orders AS oo
         JOIN online_customers AS oc ON oo.customer_id = oc.id
WHERE state ILIKE 'oregon'
  AND DATE_TRUNC('month', date) = '2022-04-01'

In [None]:
df = pd.merge(online_orders, online_customers, how='inner', left_on='customer_id', right_on='id')

df['sales'] = df['cost_in_dollars'] * df['units_sold']

df.query('state == "Oregon" & date.between("2022-04-01", "2022-04-30")')['sales'].sum()