#### ID 2000

```Write a query that returns binary description of rate type per loan_id. The results should have one row per loan_id and two columns: for fixed and variable type.```

In [None]:
%%sql
SELECT loan_id,
       CASE WHEN rate_type = 'fixed' THEN 1 ELSE 0 END    AS fixed,
       CASE WHEN rate_type = 'variable' THEN 1 ELSE 0 END AS variable
FROM submissions;

In [None]:
df = submissions
pd.get_dummies(df[['loan_id','rate_type']], prefix='', prefix_sep='')

#### ID 2001

```Write a query that returns the rate_type, loan_id, loan balance , and a column that shows with what percentage the loan's balance contributes to the total balance among the loans of the same rate type```

In [None]:
%%sql
SELECT rate_type,
       loan_id,
       SUM(balance) OVER (PARTITION BY loan_id)                     AS balance,
       balance * 100.0 / SUM(balance) OVER (PARTITION BY rate_type) AS balance_share
FROM submissions;

In [None]:
df = submissions
df.groupby('loan_id', as_index=False).agg(balance=('balance', 'sum'))
df = pd.merge(pd.merge(df, df.groupby('rate_type', as_index=False).agg(balance_by_type=('balance', 'sum')), on='rate_type').drop(columns='balance'), df.groupby('loan_id', as_index=False).agg(balance=('balance', 'sum')), on='loan_id')
df['balance_share'] = df['balance'] * 100 / df['balance_by_type']
df[['loan_id', 'rate_type', 'balance', 'balance_share']]

#### ID 2002

```Write a query that returns the user ID of all users that have created at least one ‘Refinance’ submission and at least one ‘InSchool’ submission.```

In [None]:
%%sql
SELECT DISTINCT user_id
FROM loans
WHERE user_id IN (SELECT user_id FROM loans WHERE type IN ('Refinance'))
  AND user_id IN (SELECT user_id FROM loans WHERE type IN ('InSchool'))

In [None]:
df_ref = loans.query('type == "Refinance"')['user_id'].drop_duplicates()
df_sch = loans.query('type == "InSchool"')['user_id'].drop_duplicates()
df = pd.merge(df_ref, df_sch, on='user_id')

#### ID 2003

```Write a query that joins this submissions table to the loans table and returns the total loan balance on each user’s most recent ‘Refinance’ submission. Return all users and the balance for each of them.```

In [None]:
%%sql
WITH cte AS (SELECT user_id,
                    balance,
                    DENSE_RANK()
                    OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rnk
             FROM loans l
                      JOIN submissions s ON l.id = s.loan_id
             WHERE type = 'Refinance')
SELECT user_id, balance
FROM cte
WHERE rnk = 1

In [None]:
df = pd.merge(loans.query('type == "Refinance"'), submissions, how='inner', left_on='id', right_on='loan_id')
df['rnk'] = df.groupby('user_id')['created_at'].rank(method='first', ascending=False)
df.query('rnk == 1')[['user_id', 'balance']]

#### ID 2004

```Return the total number of comments received for each user in the 30 or less days before 2020-02-10. Don't output users who haven't received any comment in the defined time period.```

In [None]:
%%sql
SELECT user_id,
       SUM(number_of_comments) AS number_of_comments
FROM fb_comments_count
WHERE ('2020-02-10' - created_at) BETWEEN 0 AND 30
GROUP BY user_id

In [None]:
df = fb_comments_count
df[(pd.to_datetime('2020-02-10') - df['created_at']).dt.days.between(0, 30)].groupby('user_id', as_index=False).agg(n_comments=('number_of_comments', 'sum'))

#### ID 2006

```Return a distribution of users activity per day of the month. By distribution we mean the number of posts per day of the month.```

In [None]:
%%sql
SELECT EXTRACT(DAY FROM post_date), COUNT(post_text)
FROM facebook_posts
GROUP BY EXTRACT(DAY FROM post_date)

In [None]:
df = facebook_posts
df.groupby(df['post_date'].dt.day, as_index=False).agg(count=('post_text', 'count')).to_frame('user_activity')

#### ID 2009

```Find users who are both a viewer and streamer.```

In [None]:
%%sql
SELECT DISTINCT user_id
FROM twitch_sessions
WHERE user_id IN (SELECT user_id FROM twitch_sessions WHERE session_type = 'viewer')
  AND user_id IN (SELECT user_id FROM twitch_sessions WHERE session_type = 'streamer')

In [None]:
df = pd.merge(twitch_sessions, twitch_sessions, how='inner', on='user_id', suffixes=('_user1', '_user2')).query('session_type_user1 == "streamer" & session_type_user2 == "viewer"')['user_id'].drop_duplicates().sort_values().reset_index(drop=True)

#### ID 2010

```List the top 10 users who accumulated the most sessions where they had more streaming sessions than viewing. Return the user_id, number of streaming sessions, and number of viewing sessions.```

In [None]:
%%sql
WITH filtered AS (SELECT user_id,
                    COUNT(CASE WHEN session_type = 'streamer' THEN 1 ELSE NULL END) AS streaming_sessions,
                    COUNT(CASE WHEN session_type = 'viewer' THEN 1 ELSE NULL END)   AS viewing_sessions
             FROM twitch_sessions
             GROUP BY user_id
             HAVING COUNT(CASE WHEN session_type = 'streamer' THEN 1 ELSE NULL END) >
                    COUNT(CASE WHEN session_type = 'viewer' THEN 1 ELSE NULL END)),
     ranked AS (SELECT user_id,
                       streaming_sessions,
                       viewing_sessions,
                      DENSE_RANK()
                      OVER (ORDER BY (streaming_sessions + viewing_sessions) DESC) AS rnk
               FROM filtered)
SELECT user_id, streaming_sessions, viewing_sessions
FROM ranked
WHERE rnk <= 10

In [None]:
# TODO

#### ID 2011

```Calculate the average session duration for each session type?```

In [None]:
%%sql
SELECT session_type, AVG(session_end - session_start) AS duration
FROM twitch_sessions
GROUP BY session_type

In [None]:
df = twitch_sessions
df['duration'] = df['session_end'] - df['session_start']
df.groupby('session_type', as_index=False).agg(duration=('duration', 'mean'))

#### ID 2016

```Which partners have ‘pizza’ in their name and are located in Boston? And what is the average order amount? Output the partner name and the average order amount.```

In [None]:
%%sql
SELECT pp.name, AVG(po.amount) AS avg
FROM postmates_orders po
         LEFT JOIN postmates_markets pm ON po.city_id = pm.id
         LEFT JOIN postmates_partners pp ON po.seller_id = pp.id
WHERE pm.name = 'Boston'
  AND pp.name ILIKE '%pizza%'
GROUP BY pp.name

In [None]:
df = pd.merge(pd.merge(postmates_orders, postmates_markets, how='left', left_on='city_id', right_on='id'), postmates_partners, how='left', left_on='seller_id', right_on='id')
df.query('name_x == "Boston" & name_y.str.contains("pizza", case=False)').groupby('name_y', as_index=False).agg(avg=('amount', 'mean'))