<a href="https://colab.research.google.com/github/cmarie-bel/CoRise_Projects/blob/main/Crystal_Belton_Week_1_Project_Intermediate_SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> 1. DUPLICATE THIS COLAB TO START WORKING ON IT. Using File > Save a copy to drive.
> 2. SHARE SETTINGS: In the new notebook, click "Share" in the top right corner and set the sharing settings to "Anyone with the link" ("Commenter" role).




Welcome to this first week's project for *Intermediate SQL*!

This week's lecture and material on CoRise showed you how to write complex SQL queries from a single table. For this project we further your understanding of these concepts by delving a bit deeper. However, for everything covered in this project you can find related examples in our course material.


##### Prerequisite configuration
Below we install the software required to run this project. Please make sure to **RUN IT** by clicking on the play-button icon and to feel free to ignore the content of these two hidden cells.

**IMPORTANT:** These cells may have to be rerun every time you are away from the notebook a long time or access notebook on a different browser or a different laptop. If you see an error that says
"NameError: name 'run' is not defined" you need to run these two hidden cells again.

In [None]:
%%capture
!pip install git+https://github.com/corise-edu/course-intermediate-sql.git

In [None]:
import pandas as pd
from IPython.display import display, HTML
from sql_course import run as sql_run
from sql_course import check

# Show all the rows (instead of only a few)
pd.set_option("display.max_rows", None)

# Set precision to max 2 decimals
pd.set_option('precision', 2)

# Set CSS Style for Table
# Make it work with night & light mode
# - Alternating rows
# - th elements
# - td elements
css_style = '''
<style>
  html {
    --td-font-color: black;
    --font-color: black;
    --background-color: #e0e0e0;
  }
  html[theme=dark] {
    --td-font-color: white;
    --font-color: black;
    --background-color: #6688ff;
  }
  th {
    background: #fbd44c;
    color: var(--font-color);
    font-size: 16px;
    text-align: center;
    font-weight: bold;
  }
  tr:nth-child(even) {
    background-color: var(--background-color);
    color: var(--font-color);
  }
  td {
    font-size: 14px;
    color: var(--td-font-color);
  }
</style>
'''


def run(sql_query):
  df = sql_run(sql_query)

  # Puts the scrollbar next to the DataFrame
  display(HTML(css_style +
               "<div style='max-height: 500px; overflow: auto; width: fit-content; border-style: solid;" +
               " border-width: 1px; border-color: #0139fe; font-family: GT Planar,Inter,Arial,sans-serif;'>" +
               df.style.render() +
               "</div>"))


# **The Setting:**

Congratulations! You got yourself a new shiny job as the analyst at CoRise.

Here is what you need to know about the CoRise business to be able to complete this project. As you know, CoRise conducts plenty of courses such as *Intermediate SQL* and runs them several times in a year typically. CoRise wants to run these courses across several categories (such as Machine Learning, Data Engineering) and at different levels (Beginner vs Advanced). Each course is typically run for a few weeks (ranging from a 1- or 2-week crash course all the way to an extended 10-week course). Each course is facilitated by a small number of teaching assistants who help with conducting coding parties as well as help answering questions on the, project etc. Finally and most importantly, each course has a nps (net promoter score) rating which is based on what percentage of the class thinks very highly of the course and would recommend it without reservations. Since the nps is a percentage it varies from 0 to 100.





# **Schema:**

The schema of the table you will be analyzing contains the following columns. It captures the business details we want to include based on the above description.

course_id (primary key)

course_name (name of the course, e.g. Intermediate SQL)

course_level (basic or advanced to indicate the level of the course)

course_category (e.g., Machine Learning)

course_description (a paragraph about the course)

start_date (if a course is run multiple times, there will be one row in this table for each time it is run)

num_weeks (number of weeks this course is run)

num_learners_registered (number of learners for this run of the course)

num_TAs (number of teaching assistants)

nps (net promoter score)





# **Create Table Statement:**

The table you will be working on was created with the following create statement (you do not have to create this table and it is already pre-loaded):



```
create table courses (
course_id integer,
course_name text,
course_desc text,
course_category text,
course_level text,
start_date date,
num_weeks integer,
num_learners_registered integer,
num_TAs integer,
nps integer
);
```



# **Disclaimer**

Any resemblance of the data in this table to anything real (other than the names and descriptions of courses) are entirely random (could have been my pseudo random generator or my own imagination, since it isn't easy to distinguish between the two :-), so do not stress yourself out trying to make sense of any categorization, such as why a course run might even overlap with another run and such other niceties).

# **Part 0 - Running and Testing Queries - An Example**

To access the data contained in the courses table, you need to write SQL between three quotation marks, like:

```
query = """
SELECT * FROM courses
"""
run(query)
check(q1_1_1 = query)
```

Then that SQL is pulled through a `run()`-command and a `check()`-command so that you can see the output of the SQL query, and whether you wrote the expected query. This you can see after the output. If you've done it correctly, you'll see **"Your SQL query is correct!"**.

In [None]:
### Question: q_0_1
### This is just a test query
### to demonstrate how you will be submitting your SQL queries.
### It also allows you to browse the data bit.
query = """
         select *
         from courses
         limit 10
         """

run(query)
check(q_0_1 = query)

Unnamed: 0,course_id,course_name,course_desc,course_category,course_level,start_date,num_weeks,num_learners_registered,num_TAs,nps
0,1,Data Centric Deep Learning,A trope we often hear is that data is at the heart of deep learning. But how exactly do you manage;improve;and repair deep learning models with data? This course teaches students applied techniques to measure and improve data quality;deep learning infrastructure to do continuous testing and deployment;iterative annotation pipelines;and techniques to respond to distribution shift and adversarial examples. Projects will be grounded in practical challenges spanning computer vision and natural language processing.,Deep Learning,A,2016-11-21 07:26:00,5,72.0,1,83
1,2,People Analytics Jumpstart,Become a Qualified People Analytics Pro in 4 Weeks You will learn the techniques and tools you need to win roles in People Analytics;define the function for your organization;and stand out as a top analyst and HR specialist. This course focuses on the real-world day-to-day practice of analytics in modern;millennial-heavy workplaces.,Data Science,B,2021-12-04 05:32:00,2,161.0,2,60
2,3,MLOps: From Models to Production,Acquire the skills to build effective real-world ML systems (bootstrapping datasets;improving label quality;experimentation;model evaluation;deployment and observability) with hands-on projects. This course will help you bridge the gap between state-of-the-art ML modeling;and building real-world ML systems.,Machine Learning,B,2017-11-10 20:03:00,1,153.0,2,80
3,4,Building Computer Vision Applications,This course provides an introduction to machine learning for computer vision with a focus on practical applications relevant to industry teams. In this course;we will “reverse-engineer” a number of applications;such as traffic flow analysis;digital medicine;optical character recognition;and video analytics.,Machine Learning,B,2017-10-06 03:04:00,5,11.0,1,65
4,5,FREE Weekend Buildathon: NLP,DIY weekend project - FREE 48hr workshop to build a NLP project from scratch as the perfect way to learn a core NLP building block - word embeddings.,Language Processing,B,2018-03-22 22:34:00,4,96.0,1,62
5,6,Fundamentals of Data Modeling,This course provides an introduction to the fundamentals of data modeling for modern data warehouses. We use the Kimball dimensional model to guide us;but that does not mean this class is all theory! We have designed this course so you will learn the theory behind data modeling (normalization / denormalization;star schemas;fact and dimension tables) and you will build actual models using real world data. We will use plain SQL so you can do modeling whether you only have access to SQL or you are using a tool like dbt. By the end of this course;you will have a deep understanding of dimensional modeling;and you will have built one from scratch.,Data Engineering,B,2022-10-12 09:04:00,5,21.0,1,77
6,7,Data Science for Security and Risk,Head of Data at Mati will teach a course on using Data Science to improve Security and Risk,Data Science,B,2016-08-18 06:06:00,5,170.0,3,77
7,8,Web3 Applications & Filecoin/IPFS,This course provides an introduction to building simple DApps on a blockchain with storage on Filecoin/IPFS. You will finish the course with a mastery of the infrastructure tools used to build Web3 apps;knowledge of how to build a Web3 app that is accessible through the Web2 internet; and will understand how decentralized storage can be used to host files a Web3 app utilizes. We will start with the fundamentals of decentralized storage and then move on to programming for useful applications,Data Engineering,A,2017-07-15 20:49:00,2,,1,84
8,9,How to Implement ML Papers,Learn how to implement the proposed algorithms;models;and techniques from ML papers each week;while learning tips and tricks on debugging the implementation efforts and how to reproduce results in your own applications.,Machine Learning,A,2022-11-04 05:42:00,2,107.0,2,70
9,10,Advanced SQL,Over these four weeks;you will learn the advanced SQL kills necessary to dive into real datasets with confidence. We will clean up messy and nested data;learn about EAV schemas and pivots;and dive deep into advanced window functions. We will also cover performance;advanced joins;and other complex SQL patterns to optimize queries.,Data Engineering,A,2018-12-17 16:03:00,4,181.0,2,89


-------------------
Your SQL query is correct!


# **Part 1 - Basics of the Business**

Let's now fast-forward several years into your career at CoRise. It is the year 2024. The `courses` table has courses that have run up until now (in 2024).

 Your database has been humming along nicely, and CoRise has just hired a new head of content creation. They are very curious about the historical data and want to understand the basics of CoRise business and want some basic information from the courses table. Your task is to help them answer these questions they have:








In [None]:
### Question: q_1_1
### How many courses have we run so far?
### (This is the number of rows in the table
### as all courses in the table have been run as you are in 2024).
### Output columns: cnt
query = """

SELECT COUNT(*) AS cnt
FROM courses
WHERE strftime("%Y", start_date) < '2024'

         """

run(query)
check(q_1_1 = query)

Unnamed: 0,cnt
0,999


-------------------
Your SQL query is correct!


In [None]:
### Question: q_1_2
### Getting into some more details, they would like to know:
### What are the distinct categories of courses that we run?
### This is to understand the spectrum of courses we cover.
### Output columns: course_category
query = """

SELECT distinct course_category
FROM courses

         """

run(query)
check(q_1_2 = query)

Unnamed: 0,course_category
0,Deep Learning
1,Data Science
2,Machine Learning
3,Language Processing
4,Data Engineering
5,Search
6,Python


-------------------
Your SQL query is correct!


# **Part 2 - Digging into Machine Learning courses:**

They have heard from their boss that Machine Learning is a very important category for the CoRise business, and it would be very good to keep expanding the scope of the courses in this broad area. But before that we should understand reality today. So, their logical next question:




In [None]:
### Question: q_2_1
### For the course category 'Machine Learning' how many courses have we run so far?
### Output columns: cnt
query = """

SELECT
  COUNT(course_category) AS cnt
FROM
  courses
WHERE
  course_category IN ('Machine Learning')

         """

run(query)
check(q_2_1 = query)



Unnamed: 0,cnt
0,310


-------------------
Your SQL query is correct!


It's useful information to be able to see that we have run 310 courses so far. To dig deeper, you decide to look at how many courses we have run for each year to get an idea of that distribution.

In [None]:
### Question: q_2_2
### For the course category 'Machine Learning,'
### how many courses have we run so far for each calendar year?
### Hint: for each calendar year suggests you need to us group by.
### Hint: Please look at below link for extracting year from date
### https://corise.com/course/intermediate-sql/v2/module/datetime-ops-77pax
### Output columns: year, count_by_year

query = """

SELECT
  strftime('%Y', start_date) AS year,
  COUNT(*) AS count_by_year
FROM courses
WHERE
  course_category = 'Machine Learning'
GROUP BY
  strftime('%Y', start_date)

         """

run(query)
check(q_2_2 = query)

Unnamed: 0,year,count_by_year
0,2016,35
1,2017,45
2,2018,37
3,2019,39
4,2020,51
5,2021,38
6,2022,33
7,2023,32


-------------------
Your SQL query is correct!


Hmm...  Anecdotally you seem to recollect that we are running 60+ courses a year that can be termed 'Machine Learning.' The general talk around the water cooler is that we seem to have 5 ML courses a month. We would need to be running quite a few more courses to get to 60 per year. One thing that comes to your notice is that there are perhaps courses that are not filed under the 'Machine Learning' category but could still be considered 'Machine Learning.' You decide to look for some words in the name and description to potentially infer other courses that you could consider 'Machine Learning.'

In [None]:
### Question: q_2_3
### Retrieve distinct combinations of name, category and descriptions of courses
### that contain one of the words  'ml' or 'learning' or 'models'
### We want the match to be case-insensitive.
### Note that in general this need not be the same set of courses
### as ones that belong to the category 'Machine Learning.'
### Hint: The below link has string manipulation functions you may need such
### as Like, Upper, Lower etc.
### https://corise.com/course/intermediate-sql/v2/module/string-ops-5hapd
### Output columns: course_name, course_category, course_desc

query = """

 SELECT
    distinct course_name, course_category, course_desc
 FROM courses
 WHERE
    (lower(course_name || course_category || course_desc) LIKE '%ml%')
    OR (lower(course_name || course_category || course_desc) LIKE '%learning%')
    OR (lower(course_name || course_category || course_desc) LIKE '%models%')


         """

run(query)
check(q_2_3 = query)

Unnamed: 0,course_name,course_category,course_desc
0,Data Centric Deep Learning,Deep Learning,A trope we often hear is that data is at the heart of deep learning. But how exactly do you manage;improve;and repair deep learning models with data? This course teaches students applied techniques to measure and improve data quality;deep learning infrastructure to do continuous testing and deployment;iterative annotation pipelines;and techniques to respond to distribution shift and adversarial examples. Projects will be grounded in practical challenges spanning computer vision and natural language processing.
1,MLOps: From Models to Production,Machine Learning,Acquire the skills to build effective real-world ML systems (bootstrapping datasets;improving label quality;experimentation;model evaluation;deployment and observability) with hands-on projects. This course will help you bridge the gap between state-of-the-art ML modeling;and building real-world ML systems.
2,Building Computer Vision Applications,Machine Learning,This course provides an introduction to machine learning for computer vision with a focus on practical applications relevant to industry teams. In this course;we will “reverse-engineer” a number of applications;such as traffic flow analysis;digital medicine;optical character recognition;and video analytics.
3,Fundamentals of Data Modeling,Data Engineering,This course provides an introduction to the fundamentals of data modeling for modern data warehouses. We use the Kimball dimensional model to guide us;but that does not mean this class is all theory! We have designed this course so you will learn the theory behind data modeling (normalization / denormalization;star schemas;fact and dimension tables) and you will build actual models using real world data. We will use plain SQL so you can do modeling whether you only have access to SQL or you are using a tool like dbt. By the end of this course;you will have a deep understanding of dimensional modeling;and you will have built one from scratch.
4,How to Implement ML Papers,Machine Learning,Learn how to implement the proposed algorithms;models;and techniques from ML papers each week;while learning tips and tricks on debugging the implementation efforts and how to reproduce results in your own applications.
5,Applied Machine Learning,Machine Learning,Design;build;and debug machine learning models for classification and regression tasks using a variety of datasets with Python (Numpy;Scikit;Pyplot). Learn best practices to plan and execute ML development projects whether large or small.
6,Deep Learning Essentials,Deep Learning,Learn the foundations of deep learning and practice with training and implementing neural networks in Pytorch while covering topics including convolutional neural networks (CNNs);transformers;and generative adversarial networks (GANs).
7,Spoken Language Processing,Language Processing,Learn to design and build voice assistant or voice cloning systems;using available APIs and adjustable pre-trained models;as well as creating them from scratch in PyTorch. You will learn solid audio processing fundamentals combined with the most important advances in automatic speech recognition and speech synthesis methods.
8,Analytics Engineering with dbt,Data Engineering,Learn the modern analytics stack and best practices as an analytics engineer by using dbt with e-commerce data. Build data models to address real-world strategic questions.
9,Social Media Mining,Language Processing,Collect;analyze;and present insights from social media data — including the Twitter API — while learning concepts of natural language processing and graph analytics.


-------------------
Your SQL query is correct!


You found a few more courses that are in the 'Deep Learning' category and in the 'Language Processing' category that could be considered 'Machine Learning,' but you also found other courses that refer to data models under the 'Data Engineering' category. You decide to cross-check what would happen if you were to also count the courses run per year in the categories 'Deep Learning' and 'Language Processing.'

In [None]:
### Question: q_2_4
### For the course category 'Machine Learning,' 'Deep Learning' and 'Language Processing,'
### how many courses have we run so far for each calendar year?
### Output Columns: year, count_by_year

query = """

SELECT
    COUNT(*) AS count_by_year,
    strftime('%Y', start_date) AS year
FROM courses
WHERE
    course_category = 'Machine Learning'
    OR course_category = 'Deep Learning'
    OR course_category = 'Language Processing'
GROUP BY strftime('%Y', start_date)

        """

run(query)
check(q_2_4 = query)

Unnamed: 0,count_by_year,year
0,60,2016
1,73,2017
2,61,2018
3,76,2019
4,80,2020
5,65,2021
6,57,2022
7,63,2023


-------------------
Your SQL query is correct!


That seems to align better with the anecdotal water cooler conversations. The head of content is happy that you have been able to paint a good picture of current reality and is now off trying to figure out what new courses to add and what instructors they need to engage. But your life as an analyst continues. The CEO has put together a special ops commando team called the 'COVID' impact team. Of course, you are part of this team as the ace analyst in CoRise. Off we go!

# **Part 3 - COVID Impact**

As mentioned before, the CEO has put together a special ops commando team called the 'COVID' impact team which includes you as the star analyst. No one is sure if the courses were impacted by COVID as it has been a while. (You are in 2024 and the world has almost forgotten COVID (positive dreams are good for us 😀)). You suggest that you will mine the data and see if there are any quantitative differences between 2020 and other years.  

As you know by now, we are guided by nps quite a bit, as it is a great measure of how the course was perceived by the participants. So the first thing is to see if there is any substantial difference either in the total quantity of the courses or the average nps of the courses.


In [None]:
### Question: q_3_1
### Print the number of courses run per year
### Output columns: year, count_by_year
query = """

SELECT
    COUNT(*) AS count_by_year,
    strftime("%Y", start_date) AS year
FROM courses
GROUP BY
    strftime("%Y", start_date)

        """

run(query)
check(q_3_1 = query)

Unnamed: 0,count_by_year,year
0,129,2016
1,118,2017
2,128,2018
3,134,2019
4,140,2020
5,127,2021
6,108,2022
7,115,2023


-------------------
Your SQL query is correct!


That's interesting. The number of courses conducted doesn't seem to have been much lower in 2020. If anything, it is on the higher side. So somehow COVID did not impact the number of courses. Let's look at the average nps next and contrast 2020 with the other years.

In [None]:
### Question: q_3_2
### Print a report with two columns titled covid_nps and rest_nps.
### covid_nps will have the average of the nps of courses that started in 2020 and
### rest_nps will have the average of the nps of courses that started in other years
### You need to use the case statement in this module
### https://corise.com/course/intermediate-sql/v2/module/week-1-case-statement-61ilk
### to identify rows that belong to 2020 and average the nps of those rows to get covid_nps
### similarly for rest_nps. Recall that you can wrap a case statement with aggregate
### functions such as sum, count, avg from the last subsection of the above module
### Output columns: covid_nps, rest_nps

query = """

SELECT
  avg(case when strftime('%Y', start_date) = '2020' then nps end) as covid_nps,
  avg(case when strftime('%Y', start_date) != '2020' then nps end) as rest_nps
FROM courses

        """

run(query)
check(q_3_2 = query)

Unnamed: 0,covid_nps,rest_nps
0,60.96,70.5


-------------------
Your SQL query is correct!


Wow! Thats's a clear drop in the average nps in 2020 compared to the average nps of other years. The statistician in you wonders if there are other years where the nps is low and is compensated for some other years where nps is high - thus 2020 may not be an exception. Let's quickly cross-validate that thought!

In [None]:
### Question: q_3_3
### For each year, print the year and the average nps of the courses that started that year.
### Output columns: year, avg(nps)

query = """

SELECT
    avg(case when strftime('%Y', start_date) as year, avg(nps)
FROM courses
GROUP BY strftime('%Y', start_date)


        """

run(query)
check(q_3_3 = query)

Unnamed: 0,year,avg(nps)
0,2016,69.64
1,2017,71.8
2,2018,70.8
3,2019,70.83
4,2020,60.96
5,2021,69.38
6,2022,70.35
7,2023,70.8


-------------------
Your SQL query is correct!


Ok, good! 2020 was an aberration. The logical next question is why? Are there any explanatory variables? You go talk to some folks who have been around in the company for a long time. They tell you they recollect that it was tough getting enough TAs for many courses in 2020. Additionally, the courses ran for longer to account for sickness of many of the staff as well as learners. You set out to validate this next.

In [None]:
### Question: q_3_4
### For each year, print the year, the average learner:TA ratio
### and the average number of weeks for the courses that started that year.
### Hint: You can create derived columns by dividing one column with another
### and further aggregate derived columns
### Note that num_learners_registered has floating point values and so dividing
### it by integer will still give a float - but ideally it should have been an integer
### when data was generated. so please put it inside a cast operator to make
### it float.
### Output columns: year, learner_ta_ratio, avg_num_weeks


query = """

SELECT
  strftime('%Y', start_date) AS year,
  avg(CAST(num_learners_registered AS float)/num_tas) AS learner_ta_ratio,
  avg(num_weeks) as avg_num_weeks
FROM courses
GROUP BY strftime('%Y', start_date)


        """

run(query)
check(q_3_4 = query)

Unnamed: 0,year,learner_ta_ratio,avg_num_weeks
0,2016,56.27,4.67
1,2017,54.4,4.69
2,2018,60.57,4.77
3,2019,62.66,4.29
4,2020,102.13,9.15
5,2021,50.72,4.57
6,2022,54.92,4.6
7,2023,56.23,4.33


-------------------
Your SQL query is correct!


Awesome! You have a reasonable explanation for the huge nps drop. While another catastrophic event like COVID is unlikely to happen (and no one wants one more), there can be minor challenges that come up more regularly. This reminds you that you need to watch out for the number of weeks or num_tas starting to change dramatically over time (proactively monitoring for problems!). This is not something you will do in this course however.

# **Part 4 - Looking for pockets of underperformance:**

The head of content creation would like to understand the distribution of nps overall to get their bearings (recall that this is the most important metric for the business).





In [None]:
### Question: q_4_1
### Find the minimum, maximum, and average  of nps across all courses
### Output columns: min_nps, max_nps, avg_nps
query = """

SELECT
    min(nps) as min_nps,
    max(nps) as max_nps,
    avg(nps) as avg_nps
FROM courses

        """

run(query)
check(q_4_1 = query)

Unnamed: 0,min_nps,max_nps,avg_nps
0,40,99,69.16


-------------------
Your SQL query is correct!


At first blush, the news is good – the average seems to be reasonably high. Of course it would always be good to be better, but it's a start. However, the minimum is quite low. Perhaps there is a pocket of courses that can be improved and we can lift the average that way. You decide to dig deeper and see if you can find pockets of underperformance. You instintctively decide to explore if the size of a course (number of learners) has some correlation with the nps.

In [None]:
### Question: q_4_2
### Let's group the number of learners into three buckets:
### low (fewer than 30 learners), medium (30-100) and high (100+ learners)
### Please filter rows that have NULLS in num_learners_registered column in your query.
### Also recall group by can take a case statement as the column to group by
### Let's get the minimum, maximum, and the average of the nps for each of these three buckets.
### Output columns: num_learners_bucket, min_nps, max_nps, avg_nps
query = """

SELECT
  case
    when num_learners_registered <30 then 'low'
    when num_learners_registered < 100 then 'medium'
    when num_learners_registered >= 100 then 'high'
  end as num_learners_bucket,
  min(nps) as min_nps,
  max(nps) as max_nps,
  avg(nps) as avg_nps
FROM
  courses
WHERE
  num_learners_registered is not NULL
GROUP BY
  case
    when num_learners_registered <30 then 'low'
    when num_learners_registered < 100 then 'medium'
    when num_learners_registered >= 100 then 'high'
  end


        """

run(query)
check(q_4_2 = query)

Unnamed: 0,num_learners_bucket,min_nps,max_nps,avg_nps
0,high,41,89,68.44
1,low,56,99,78.24
2,medium,40,89,67.47


-------------------
Your SQL query is correct!


Aha! You observe something very interesting right away: all courses with fewer than 30 learners have extremely high nps compared to the other buckets. That's great, but the premise of CoRise is to scale the quality of experience and learning for as many learners as possible. So we would like to see the medium and high buckets also improve. The head of content decides to explore what aspects of the platform can be improved even when we have a large scale offline.

While the head of content is exploring how to improve high registration courses offline, you have been joining the head of course content in some feedback interviews with learners. The positive interviews always seems to have two flavors: a few learners preferred courses that ran longer and dug deeper into an area, while others preferred courses that were quick and exposed them to a new area. You decide to investigate this and see if somehow the nps was correlated to this combination of the length of the course and the level of the course.

In [None]:
### Question:q_4_3
### Let's designate courses that are less than 4 weeks to be 'short'
### and all the others 'long' and call this course_duration.
### Let's get the average nps for each course_level and course_duration combination.
### Note there should be four rows in your output given that
### we have two values for course_duration and two values for course_level.
### Output columns: course_duration, course_level, avg_nps
query = """

SELECT
 case
    when num_weeks < 4 then 'short'
    when num_weeks >= 4 then 'long'
    end as course_duration,
  course_level,
  avg(nps) as avg_nps
FROM
  courses
GROUP BY
  case
    when num_weeks < 4 then 'short'
    when num_weeks >= 4 then 'long'
  end,
  course_level


        """

run(query)
check(q_4_3 = query)

Unnamed: 0,course_duration,course_level,avg_nps
0,long,A,73.63
1,long,B,64.0
2,short,A,65.52
3,short,B,75.26


-------------------
Your SQL query is correct!


Aha! Indeed the advanced courses seem to make more sense when they are run longer. Perhaps the learners feel they are able to dig into the subject. Conversely, shorter basic courses are more valuable - just a quickstart or a crash course to get introduced to a new area so they can come back to advanced courses after digesting the basic course quickly. A great analysis by you and something that has made a huge dent in understanding the course data!

# **Conclusion:** That was some fantastic work! Your SQL skills are really getting polished, and you're now ready for the tests ahead of you involving multiple tables.