#### ID 2040

```Summarize the number of customers and transactions for each month in 2017, keeping transactions that were greater or equal to $5.```

In [None]:
%%sql
WITH cte AS (SELECT transaction_id, SUM(sales)
             FROM wfm_transactions
             WHERE EXTRACT(YEAR FROM transaction_date) = 2017
             GROUP BY transaction_id
             HAVING SUM(sales) >= 5)
SELECT EXTRACT(MONTH FROM transaction_date) AS month,
       COUNT(DISTINCT customer_id)          AS customers,
       COUNT(DISTINCT transaction_id)       AS transactions
FROM cte
         JOIN wfm_transactions USING (transaction_id)
GROUP BY month
ORDER BY month

In [None]:
df = wfm_transactions
df['year'] = df['transaction_date'].dt.year
df['month'] = df['transaction_date'].dt.month
transactions_list = df.query('year == 2017').groupby('transaction_id', as_index=False).agg(total_sales=('sales', 'sum')).query('total_sales >= 5')['transaction_id'].to_list()
df.query('transaction_id.isin(@transactions_list)').groupby('month', as_index=False).agg(customers=('customer_id', 'nunique'), transactions=('transaction_id', 'nunique'))

#### ID 2041

```You work for a multinational company that wants to calculate total sales across all their countries they do business in. You have 2 tables, one is a record of sales for all countries and currencies the company deals with, and the other holds currency exchange rate information. Calculate the total sales, per quarter, for the first 2 quarters in 2020, and report the sales in USD currency.```

In [None]:
%%sql
SELECT EXTRACT(QUARTER FROM date)        AS quarter,
       SUM(sales_amount * exchange_rate) AS total_sales
FROM sf_exchange_rate er
         JOIN sf_sales_amount sa
              ON er.date = sa.sales_date AND sa.currency = er.source_currency
WHERE EXTRACT(QUARTER FROM date) IN (1, 2)
GROUP BY quarter

In [None]:
df = pd.merge(sf_sales_amount, sf_exchange_rate, how='inner', right_on=['date', 'source_currency'], left_on=['sales_date', 'currency'])

df['amount_sales'] = df['sales_amount'] * df['exchange_rate']
df['year'] = df['sales_date'].dt.year
df['quarter'] = df['sales_date'].dt.quarter

df.query('year == 2020 & quarter.isin([1, 2])').groupby('quarter', as_index=False).agg(total_sales=('amount_sales', 'sum'))

#### ID 2042

```Find employees who have worked for Uber for more than 2 years (730 days) and check to see if they're still part of the company. Output 'Yes' if they are and 'No' if they are not. Use May 1, 2021 as your date of reference when calculating whether they have worked for more than 2 years since their hire date. Output the first name, last name, whether or not the employee is still working for Uber, and the number of years at the company.```

In [None]:
%%sql
SELECT first_name,
       last_name,
       CASE
           WHEN termination_date IS NOT NULL THEN
               DATE_PART('day', termination_date::TIMESTAMP - hire_date::TIMESTAMP) *
               1.0 / 365
           ELSE
               DATE_PART('day', '2021-05-01'::TIMESTAMP - hire_date::TIMESTAMP) * 1.0 /
               365
           END           AS years_spent,
       CASE
           WHEN termination_date IS NULL THEN 'Yes'
           ELSE 'No' END AS still_employed
FROM uber_employees
WHERE (COALESCE(termination_date, '2021-05-01') - hire_date) > 730

In [None]:
# TODO

#### ID 2043

```Return all employees who have never had an annual review. Your output should include the employee's first name, last name, hiring date, and termination date. List the most recently hired employees first.```

In [None]:
%%sql
SELECT first_name,
       last_name,
       hire_date,
       termination_date
FROM uber_employees
WHERE id NOT IN (SELECT emp_id FROM uber_annual_review)
ORDER BY hire_date DESC

In [None]:
employees_list = uber_annual_review['emp_id'].to_list()
df = uber_employees
df.query('~id.isin(@employees_list)')[['first_name', 'last_name', 'hire_date', 'termination_date']].sort_values('hire_date', ascending=False)

#### ID 2045

```Write a query to calculate the longest period (in days) that the company has gone without hiring anyone. Also, calculate the longest period without firing anyone. Limit yourself to dates inside the table (last hiring/termination date should be the latest hiring /termination date from table), don't go into future.```

In [None]:
%%sql
WITH prev_hire_termination AS (SELECT hire_date,
                    LAG(hire_date, 1) OVER (ORDER BY hire_date)               AS prev_hire_date,
                    termination_date,
                    LAG(termination_date, 1)
                    OVER (ORDER BY termination_date)                          AS prev_termination_date
             FROM uber_employees)
SELECT MAX(hire_date - prev_hire_date)               AS max_hire,
       MAX(termination_date - prev_termination_date) AS max_fire
FROM prev_hire_termination;

In [None]:
df = uber_employees
df['prev_hire_date'] = df.sort_values('hire_date')['hire_date'].shift(1)
df['prev_termination_date'] = df.sort_values('termination_date')['termination_date'].shift(1)
df['diff_hire_date'] = (df['hire_date'] - df['prev_hire_date']) / (60 * 60 * 24)
df['diff_termination_date'] = (df['termination_date'] - df['prev_termination_date']) / (60 * 60 * 24)
df[['diff_hire_date', 'diff_termination_date']].agg(['max'])

#### ID 2048

```For each service, calculate the percentage of incomplete orders along with the revenue loss percentage. Your output should include the name of the service, percentage of incomplete orders, and revenue loss from the incomplete orders.```

In [None]:
%%sql
SELECT service_name,
       SUM(number_of_orders) FILTER (WHERE status_of_order != 'Completed') * 1.0 /
       SUM(number_of_orders) * 100 AS orders_loss_percent,
       SUM(monetary_value) FILTER (WHERE status_of_order != 'Completed') * 1.0 /
       SUM(monetary_value) * 100   AS profit_loss_percent
FROM uber_orders
GROUP BY service_name

In [None]:
df = uber_orders
df_all = df.groupby('service_name', as_index=False).agg(sum_monetary_value_all = ('monetary_value', 'sum'), sum_number_of_orders_all = ('number_of_orders', 'sum'))
df_non_completed = df.query('status_of_order != "Completed"').groupby('service_name', as_index=False).agg(sum_monetary_value = ('monetary_value', 'sum'), sum_number_of_orders = ('number_of_orders', 'sum'))
result = df_all.merge(df_non_completed, on='service_name')
result['orders_loss_percent'] = result['sum_number_of_orders'] / result['sum_number_of_orders_all'] * 100
result['profit_loss_percent'] = result['sum_monetary_value'] / result['sum_monetary_value_all'] * 100
result[['service_name', 'orders_loss_percent', 'profit_loss_percent']]

#### ID 2049

```Uber is interested in identifying gaps in their business. Calculate the count of orders for each status of each service. Your output should include the service name, status of the order, and the number of orders.```

In [None]:
%%sql
SELECT service_name, status_of_order, SUM(number_of_orders) AS orders_sum
FROM uber_orders
GROUP BY service_name, status_of_order

In [None]:
df = uber_orders
df.groupby(['service_name', 'status_of_order'], as_index=False).agg(orders_sum=('number_of_orders', 'sum'))

#### ID 2050

```Find the average daily active users for January 2021 for each account. Your output should have account_id and the average daily count for that account.```

In [None]:
%%sql
SELECT account_id, SUM(users_count) * 1.0 / 31 AS Av_DAU
FROM (SELECT account_id,
             date,
             COUNT(DISTINCT user_id) AS users_count
      FROM sf_events
      WHERE DATE_PART('year', date) = 2021
        AND DATE_PART('month', date) = 1
      GROUP BY account_id, date) t1
GROUP BY account_id

In [None]:
df = sf_events
df.query('date.dt.month == 1 & date.dt.year == 2021').groupby(['account_id', 'date'], as_index=False).agg(active_users=('user_id', 'nunique')).groupby('account_id')['active_users'].agg(AvDAU = lambda x: x.sum() / 31).reset_index()