#### ID 2060

```Given a list of a company's employees, find the name of the manager from the largest department. Manager is each employee that contains word "manager" under their position.  Output their first and last name.```

In [None]:
%%sql
WITH department_size AS (SELECT first_name,
                                last_name,
                                position,
                                COUNT(id) OVER (PARTITION BY department_id) AS department_size
                         FROM az_employees),
     ranked AS (SELECT first_name,
                       last_name,
                       position,
                       RANK() OVER (ORDER BY department_size DESC) AS rnk
                FROM department_size)
SELECT first_name, last_name
FROM ranked
WHERE rnk = 1
  AND position ILIKE '%manager%'

In [None]:
df = az_employees
df['department_size'] = df.groupby('department_id')['id'].transform('count')
df['rnk'] = df['department_size'].rank(method='dense', ascending=False)

df.query('rnk == 1 & position.str.contains("manager", case=False)')[['first_name', 'last_name']]

#### ID 2061

```Count the number of users who made more than 5 searches in August 2021.```

In [None]:
%%sql
SELECT COUNT(user_id)
FROM (SELECT user_id, COUNT(search_id) AS cnt
      FROM fb_searches
      WHERE EXTRACT(MONTH FROM date) = 8
        AND EXTRACT(YEAR FROM date) = 2021
      GROUP BY user_id
      HAVING COUNT(search_id) > 5) AS sq

In [None]:
df = fb_searches
df.query('date.dt.year == 2021 & date.dt.month == 8').groupby('user_id', as_index=False).agg(
    cnt=('search_id', 'count')).query('cnt > 5')['user_id'].count()

#### ID 2062

```How many searches were there in the second quarter of 2021?```

In [None]:
%%sql
SELECT COUNT(*) AS result
FROM fb_searches
WHERE EXTRACT(QUARTER FROM date) = 2
  AND EXTRACT(YEAR FROM date) = 2021

In [None]:
df = fb_searches
df.query('date.dt.year == 2021 & date.dt.quarter == 2')['search_id'].nunique()

#### ID 2063

```You are given a list of exchange rates from various currencies to US Dollars (USD) in different months. Show how the exchange rate of all the currencies changed in the first half of 2020. Output the currency code and the difference between values of the exchange rate between July 1, 2020 and January 1, 2020.```

In [None]:
%%sql
SELECT source_currency,
       AVG(exchange_rate) FILTER ( WHERE date = '2020-07-01') -
       AVG(exchange_rate) FILTER ( WHERE date = '2020-01-01') AS difference
FROM sf_exchange_rate
GROUP BY source_currency

In [None]:
df = sf_exchange_rate
jan_rate_df = df.query('date == "2020-01-01"')
jul_rate_df = df.query('date == "2020-07-01"')

df = pd.merge(jul_rate_df, jan_rate_df, how='inner', on='source_currency', suffixes=['_jul', '_jan'])

df['difference'] = df['exchange_rate_jul'] - df['exchange_rate_jan']

df[['source_currency', 'difference']]

#### ID 2064

```In a marathon, gun time is counted from the moment of the formal start of the race while net time is counted from the moment a runner crosses a starting line. Both variables are in seconds. You are asked to check if the interval between the two times is different for male and female runners. First, calculate the average absolute difference between the gun time and net time. Group the results by available genders (male and female). Output the absolute difference between those two values.```

In [None]:
%%sql
SELECT ABS((SELECT AVG(ABS(net_time - gun_time)) AS avg_abs_gund_and_net_times
            FROM marathon_male) -
           (SELECT AVG(ABS(net_time - gun_time)) AS avg_abs_gund_and_net_times
            FROM marathon_female)) AS difference 

In [None]:
abs((marathon_male['net_time'] - marathon_male['gun_time']).abs().mean() - (
            marathon_female['net_time'] - marathon_female['gun_time']).abs().mean())

#### ID 2065

```In a marathon, gun time is counted from the moment of the formal start of the race while net time is counted from the moment a runner crosses a starting line. Both variables are in seconds. How much net time separates Chris Doe from the 10th best net time (in ascending order)? Avoid gaps in the ranking calculation. Output absolute net time difference.```

In [None]:
%%sql
WITH chris_doe_net_time AS (SELECT net_time
                            FROM marathon_male
                            WHERE person_name = 'Chris Doe'),
     top_10_net_times AS (SELECT net_time, DENSE_RANK() OVER (ORDER BY net_time) AS rnk
                          FROM marathon_male)
SELECT (SELECT net_time FROM chris_doe_net_time) - AVG(net_time) AS difference
FROM top_10_net_times
WHERE rnk = 10

In [None]:
df = marathon_male
df['rnk'] = df['net_time'].rank(method='dense', ascending=True)

(df.query('person_name == "Chris Doe"')['net_time'] - df.query('rnk == 10')['net_time'].mean()).values

#### ID 2066

```Find the hometowns with the top 3 average net times. Output the hometowns and their average net time. In case there are ties in net time, return all unique hometowns.```

In [None]:
%%sql
WITH ranked_avg_net_by_hometown AS (SELECT hometown,
                    AVG(net_time)                                  AS avg_net_time,
                    DENSE_RANK() OVER (ORDER BY AVG(net_time) ASC) AS rnk
             FROM marathon_male
             GROUP BY hometown)
SELECT hometown, avg_net_time
FROM ranked_avg_net_by_hometown
WHERE rnk <= 3

In [None]:
df = marathon_male

avg_net_time_by_hometown = df.groupby('hometown', as_index=False).agg(avg_net_time=('net_time', 'mean')).nsmallest(3,
                                                                                                                   'avg_net_time',
                                                                                                                   keep='all')

#### ID 2067

```What percentage of all products are both low fat and recyclable?```

In [None]:
%%sql
SELECT COUNT(product_id) FILTER (WHERE is_low_fat = 'Y' AND is_recyclable = 'Y') * 100.0 /
       COUNT(product_id) AS percentage
FROM facebook_products;

In [None]:
df = facebook_products
cnt_products = df['product_id'].count()

filtered_cnt_products = df.query('is_low_fat == "Y" & is_recyclable == "Y"')['product_id'].count()

result = filtered_cnt_products * 100 / cnt_products

#### ID 2068

```The sales department wants to find lower priced products that sell well. Find product IDs that were sold at least twice (in two different purchases at least)  and have an average sales price of at least $3. Your output should contain the product ID and its corresponding brand.```

In [None]:
%%sql
WITH filtered_product_by_avg_price AS (SELECT product_id
                                       FROM online_orders
                                       GROUP BY product_id
                                       HAVING AVG(cost_in_dollars) >= 3),
     filtered_product_by_count AS (SELECT product_id
                                   FROM online_orders
                                   GROUP BY product_id
                                   HAVING COUNT(product_id) >= 2)
SELECT DISTINCT op.product_id, brand_name
FROM online_orders AS oo
         JOIN online_products AS op ON oo.product_id = op.product_id
WHERE op.product_id IN (SELECT product_id FROM filtered_product_by_count)
  AND op.product_id IN (SELECT product_id FROM filtered_product_by_avg_price)

In [None]:
df = online_orders
lists_df = df.groupby('product_id', as_index=False).agg(avg_price=('cost_in_dollars', 'mean'),
                                                        cnt_product=('product_id', 'count'))
filtered_product_by_avg_price = lists_df.query('avg_price >= 3')['product_id'].to_list()
filtered_product_by_count = lists_df.query('cnt_product >= 2')['product_id'].to_list()

df.query('product_id.isin(@filtered_product_by_avg_price) & product_id.isin(@filtered_product_by_count)').merge(
    online_products, how='inner', on='product_id')[['product_id', 'brand_name']].drop_duplicates()

#### ID 2069

```The marketing manager wants you to evaluate how well the previously ran advertising campaigns are working. Particularly, they are interested in the promotion IDs from the online_promotions table. Find the percentage of orders with promotion IDs from the online_promotions table applied.```

In [None]:
%%sql
SELECT SUM(CASE
               WHEN promotion_id IN (SELECT promotion_id
                                     FROM online_promotions) THEN 1
               ELSE 0 END) * 100.0 / COUNT(*) AS percentage
FROM online_orders

In [None]:
promotion_list = online_promotions['promotion_id'].to_list()
df = online_orders
cnt_all_promotions = df['promotion_id'].count()
cnt_filtered_promotions = df.query('promotion_id.isin(@promotion_list)')['promotion_id'].count()

result = cnt_filtered_promotions * 100 / cnt_all_promotions

#### ID 2070

```The marketing department wants to launch a new promotion for the most successful product classes. Find the top 3 product classes according to their number of sales. In the event of a tie, output all results.```

In [None]:
%%sql
SELECT product_class
FROM (SELECT product_class, RANK() OVER (ORDER BY total_sales DESC) AS rnk
      FROM (SELECT product_class, COUNT(units_sold) AS total_sales
            FROM online_orders
                     LEFT JOIN online_products USING (product_id)
            GROUP BY product_class) t1) t2
WHERE rnk <= 3

In [None]:
df = pd.merge(online_orders, online_products, how='left', on='product_id')

df.groupby('product_class').agg(total_sales=('units_sold', 'count')).reset_index().nlargest(3, 'total_sales',
                                                                                            keep='all')[
    ['product_class']]

#### ID 2071

```The marketing department is aiming its next promotion at customers who have purchased products from two particular brands: Fort West and Golden. You have been asked to prepare a list of customers who purchased products from both brands.```

In [None]:
%%sql
WITH all_customers AS (SELECT customer_id, brand_name
                       FROM online_orders AS oo
                                JOIN online_products AS op ON oo.product_id = op.product_id)
SELECT customer_id
FROM all_customers
WHERE brand_name IN ('Fort West')
INTERSECT
SELECT customer_id
FROM all_customers
WHERE brand_name IN ('Golden')

In [None]:
df = pd.merge(online_orders, online_products, how='inner', on='product_id')[['brand_name', 'customer_id']]

fw_df = df.query('brand_name == "Fort West"')
gd_df = df.query('brand_name == "Golden"')

result = pd.merge(fw_df, gd_df, how='inner', on='customer_id')['customer_id'].unique()

#### ID 2072

```For each platform (e.g. Windows, iPhone, iPad etc.), calculate the number of users. Consider unique users and not individual sessions. Output the name of the platform with the corresponding number of users.```

In [None]:
%%sql
SELECT platform, COUNT(DISTINCT user_id) AS n_users
FROM user_sessions
GROUP BY platform

In [None]:
df = user_sessions
df.groupby('platform', as_index=False).agg(n_users=('user_id', 'nunique'))

#### ID 2074

```Calculate the churn rate of September 2021 in percentages. The churn rate is the difference between the number of customers on the first day of the month and on the last day of the month, divided by the number of customers on the first day of a month. Assume that if customer's contract_end is NULL, their contract is still active. Additionally, if a customer started or finished their contract on a certain day, they should still be counted as a customer on that day.```

In [None]:
%%sql
WITH start_users AS (SELECT COUNT(DISTINCT user_id) AS cnt_start_period
                     FROM natera_subscriptions
                     WHERE contract_start <= '2021-09-01'
                       AND (contract_end >= '2021-09-01' OR contract_end IS NULL)),
     churned_users AS (SELECT COUNT(DISTINCT user_id) AS cnt_end_period
                       FROM natera_subscriptions
                       WHERE
                           contract_start <= '2021-09-30' AND contract_end >= '2021-09-30'
                          OR contract_end IS NULL)
SELECT (cnt_start_period - cnt_end_period) * 100.0 / cnt_start_period AS churn_rate
FROM start_users,
     churned_users

In [None]:
df = natera_subscriptions

users_in_start_period = \
df.query('contract_start <= "2021-09-01" & (contract_end >= "2021-09-01" | contract_end.isnull())')['user_id'].nunique()

users_in_end_period = \
df.query('contract_start <= "2021-09-30" & (contract_end >= "2021-09-30" | contract_end.isnull())')['user_id'].nunique()

monthly_churn_rate = (users_in_start_period - users_in_end_period) * 100 / users_in_start_period

#### ID 2075

```Given the homework results of a group of students, calculate the average grade and the completion rate of each student. A homework is considered not completed if no grade has been assigned. Output first name of a student, their average grade, and completion rate in percentages. Note that it's possible for several students to have the same first name but their results should still be shown separately.```

In [None]:
%%sql
WITH cte AS (SELECT student_id,
                    AVG(grade)                  AS avg_grade,
                    COUNT(DISTINCT homework_id) FILTER (WHERE grade IS NOT NULL) * 100.0 /
                    COUNT(DISTINCT homework_id) AS completion_rate
             FROM allstate_homework ah
             GROUP BY student_id)
SELECT student_firstname, avg_grade, completion_rate
FROM cte
         LEFT JOIN allstate_students ast ON cte.student_id = ast.student_id

In [None]:
df = allstate_homework


def custom_func(x):
    return 100.0 * x.count() / x.shape[0]


df.groupby('student_id', as_index=False).agg(avg_grade=('grade', 'mean'), completion_rate=('grade', custom_func)).merge(
    allstate_students, how='left', on='student_id')[['student_firstname', 'avg_grade', 'completion_rate']]