<h1> Data Science Bootcamp Module 3 - Structured Query Language </h1>
<hr>

<p>
    Throughout this module you will learn to work with the programming language SQL. <br>
    To start working with SQLite in VSCode, first install the SQLite extension using the extension tab on the left. <br>
    You will be working with a simple database consisting of three tables, which are comprised of the following columns:
    <ol>
        <li>
            <b><u>customers</u></b>
            <ul>
                <li> <b>id</b>: The unique ID reference of the customer. </li>
                <li> <b>firstName</b>: The first name of the customer. </li>
                <li> <b>lastName</b>: The last name of the customer. </li>
                <li> <b>address</b>: The street name and house number of the customer. </li>
            </ul>
        </li>
        <br>
        <li>
            <b><u>products</u></b>
            <ul>
                <li> <b>id</b>: The unique ID reference of the product. </li>
                <li> <b>name</b>: The name of the product, as displayed to the customer. </li>
                <li> <b>price</b>: The purchase price of the product. </li>
                <li> <b>stock</b>: The amount of products currently in stock. </li>
            </ul>
        </li>
        <br>
        <li>
            <b><u>orders</u></b>
            <ul>
                <li><b>id</b>: The unique ID reference of the order. </li>
                <li><b>customerId</b>: The ID reference of the customer that placed the order. </li>
                <li><b>productId</b>: The ID reference of the product the order is placed for. </li>
                <li><b>date</b>: The date the order is placed on. </li>
                <li><b>quantity</b>: The amount of the given product purchased. </li>
            </ul>
        </li>
    </ol>
</p>

<hr>
<h3>A. Import libraries and create database. </h3>
<p>
Before starting to write your own SQL queries, we first have to set up the environment. <br>
First, make sure that you selected the correct kernel on the top right of the screen. <br>
This will allow you to import the libraries to execute the code below. <br>
<br>
As we will work with some pre-written code, we will have to manage our references to other directories. <br>
When you run this file it will be run from within the current directory, namely: <i>datacademy_demo.Modules.M3_SQL.src</i>. <br>
From this directory we cannot reach the <i>datacademy_demo.Modules.M3_SQL.libs</i> directory, which we need for the pre-written code. <br>
What we should do is change our so-called <i>working directory</i>, which we will do through some command line code. <br>
The first row in the code cell below does this, through running the command line (cd) code <code>%cd ../../..</code>. <br>
<code>%cd</code> refers to the type of language used, where every <code>..</code> results in going up one directory in our directory tree. <br>
One up and we are in <i>M3_SQL</i>, another one and we are in <i>Modules</i> and after the third we are in the main directory <i>datacademy_demo</i>. <br>
We could have only gone up one directory, as this would allow us to use the <i>libs</i>, however it is good practice to work from within the main directory. <br>
<br>
Now that we have explained the directory referencing, we can start importing the database code from the <i>libs</i> directory. <br>
As we work from the main directory (<i>datacademy_demo</i>) we have to import this file using <code>Modules.M3_SQL.libs.database</code>. <br>
From this file we will import a class named <code>Database</code>, which upon running will construct a new database and a database object. <br>
The Database object is called "db", which also allows you to write and execute queries. <br>
To write queries you simple execute the code: <code>db.execute_query(<i>[YOUR QUERY HERE]</i>)</code>.
</p>

In [None]:
%cd ../../..
from Modules.M03_SQL.libs.database import Database
db = Database()

<hr>
<h3> B. Basic operations </h3>
<p>
First we will explore the basic operations, namely SELECT, FROM, WHERE and LIMIT. <br>
These operations form the base of most queries that you will write in the future, as they are most fundamental to the retrieval of data. <br>
The operations and their functionality will be listed below:
<ul>
    <li> <b>SELECT</b> - Defines the column values that you desired to retrieve. </li>
    <li> <b>FROM</b> - Defines the table from which these values need to be retrieved. </li>
    <li> <b>WHERE</b> - Defines the condition(s) that will affect which rows are returned. </li>
    <li> <b>LIMIT</b> - Defines the number of rows that are returned. </li>
    <li> <b>ORDER BY</b> - Defines the way the records that are returned will be sorted. </li>
</ul>
In this module you will explore the functionalities of these operators using the just created database. <br>
The tables and their columns are described in the beginning of this notebook, which description can be used throughout the exercises below.
</p>

<p>
The first queries you will write are simple retrieval queries, which only use the <b> SELECT </b> and <b> FROM </b> operators. <br>
Write the following queries:
<ol>
    <li> Retrieve all available information regarding products contained in the products table. </li>
    <li> Retrieve all rows of the orders table that have a quantity of 5 or more products, showing the customerId, ProductId and quantity. </li>
    <li> Retrieve first and last name of all customers with a first name that starts with a "J". </li>
    <li> Retrieve all available information of the top 6 largest orders in terms of quantity. </li>
    <li> Retrieve only the name and price of the cheapest product in the products table. </li>
</ol>
</p>

In [None]:
Q_B1 = ""

db.execute_query(query=Q_B1, return_df=True, exercise="B1")

In [None]:
Q_B2 = ""

db.execute_query(query=Q_B2, return_df=True, exercise="B2")

In [None]:
Q_B3 = ""

db.execute_query(query=Q_B3, return_df=True, exercise="B3")

In [None]:
Q_B4 = ""

db.execute_query(query=Q_B4, return_df=True, exercise="B4")

In [None]:
Q_B5 = ""

db.execute_query(query=Q_B5, return_df=True, exercise="B5")

<hr>
<h3> C. Database operations </h3>
<p>
Besides execution of basic reading operations, also a lot of database operations are available. <br>
These operations include Creating, Updating an Deleting operations. <br>
These operations form the foundation for every Database developer and/or administrator. <br>
Understanding these functionalities, you will be able to create, access and manipulate databases. <br>
The operations and their functionaltiy will be listed below:
<ul>
    <li> <b>CREATE TABLE</b> - Creates a table with the given table name. </li>
    <li> <b>INSERT INTO</b> - Allows you to add new records to the database table. </li>
    <li> <b>UPDATE</b> - Allows you to adjust data in existing records within the database table. </li>
    <li> <b>DELETE FROM</b> - Allows you to delete one or multiple data records from the database table. </li>
    <li> <b>DROP TABLE</b> - Allows you to drop an entire table from the database. </li>
</ul>
</p>

<h7> <b> -- C1. CREATE TABLE -- </b> </h7>
<p>
Write a query that creates a new table called 'campaigns', which contains all marketing campaigns of different products. <br>
The table should consists of the following columns and their corresponding data types:
<ol>
    <li><b>id</b> - Integer, Primary Key, Auto Increment</li>
    <li><b>productId</b> - Integer, Foreign Key (reference: products.id)</li>
    <li><b>campaignStart</b> - Timestamp </li>
    <li><b>campaignEnd</b> - Timestamp </li>
    <li><b>Discount</b> - float </li>
</ol>
</p>

In [None]:
Q_C1_1 = """ """

db.execute_query(query=Q_C1_1)

<h7> <b> -- C2. INSERT INTO -- </b> </h7>
<p>
Now it is time to populate your newly created table. <br>
Please insert the following data onto the campaigns table: <br>
<code> { <br>
    &nbsp; productId: 4, <br>
    &nbsp; campaignStart: '2022-01-01', <br>
    &nbsp; campaignEnd: '2022-04-08', <br>
    &nbsp; discount: 0.20 <br>
} </code> <br>

<code> { <br>
    &nbsp; productId: 6, <br>
    &nbsp; campaignStart: '2022-02-02', <br>
    &nbsp; campaignEnd: '2022-06-23', <br>
    &nbsp; discount: 0.15 <br>
} </code> <br>

<code> { <br>
    &nbsp; productId: 4, <br>
    &nbsp; campaignStart: '2022-04-20', <br>
    &nbsp; campaignEnd: '2022-07-15', <br>
    &nbsp; discount: 0.30 <br>
} </code> <br>

<code> { <br>
    &nbsp; productId: 7, <br>
    &nbsp; campaignStart: '2022-10-20', <br>
    &nbsp; campaignEnd: '2022-12-31', <br>
    &nbsp; discount: 0.125 <br>
} </code> <br>

</p>

In [None]:
Q_C2_1 = ""

db.execute_query(query=Q_C2_1)

In [None]:
Q_C2_2 = ""

db.execute_query(Q_C2_2)

In [None]:
Q_C2_3 = ""

db.execute_query(Q_C2_3)

In [None]:
Q_C2_4 = ""

db.execute_query(Q_C2_4)

<h7> <b> -- C3. UPDATE -- </b> </h7>
<p>
We have actually made some mistake with inserting the campaigns into the database. <br>
Make the following adjustments to the database records: <br>
<ol>
    <li> For the campaign with id 1, Change the start date from: '2022-01-01' to '2022-01-25'. </li>
    <li> For the campaign(s) with end date '2022-06-23', Change the start date to '2022-01-01' and the end date to '2022-12-31'. </li>
    <li> For the campaign(s) concerning productId 4, change the discount to 0.25. </li>
</ol>
</p>

In [None]:
Q_C3_1 = ""

db.execute_query(Q_C3_1)

In [None]:
Q_C3_2 = ""

db.execute_query(Q_C3_2)

In [None]:
Q_C3_3 = ""

db.execute_query(Q_C3_3)

<h7> <b> -- C4. DELETE FROM -- </b> </h7>
<p>
Deletion of record can also be done in a similar manner as previous database mutations using the WHERE clause. <br>
Write the queries executing the following behavior:
<ol>
    <li> Remove all campaigns that concern productId 6. </li>
    <li> Remove all other campaigns, emptying the database table. </li>
</ol>

</p>

In [None]:
Q_C4_1 = ""

db.execute_query(Q_C4_1)

In [None]:
Q_C4_2 = ""

db.execute_query(Q_C4_2)

<h3><b> CAUTION!</b></h3>
<p> 
The last query you wrote to remove all campaigns shows the danger of using the <b> DELETE FROM </b> operator. <br>
When using this operator make sure to <b>always include a where condition</b>, as otherwise the database table is emptied. <br>
If such a query without a condition is executed on an actual database table the data will be removed without a possibility of to be retrieved.
</p>

<h7> <b> -- C5. DROP TABLE -- </b> </h7>
<p>
The created campaigns table is only created and used to practice the database operators. <br>
For the following excersises we will clean the database by dropping the campaigns table. <br>
Write a query that drops the table from te database, only leaving the Customers, Products and Orders tables.
</p>

In [None]:
Q_C5_1 = ""

db.execute_query(Q_C5_1)

<hr>
<h3> D. Calculation operators </h3>
<p>
Next we will look into calculation operators, which extends the possibilities of what can be retrieved from the database. <br>
Instead of simply using the <b>SELECT</b> operator, the calculation operators can perform calculations on the values that are retrieved. <br>
There are a lot of possible calculation operators, however the most commonly used are the Max, Min, Sum and Count operators. <br>
Together with the <b>GROUP BY</b> operator you can perform some useful calculative queries that are able to return great Business Intelligence. <br>
When using <b>GROUP BY</b>, conditional retrieval is done using <b>HAVING</b> instead of <b>WHERE</b>, which also works for calculations. <br>
Write the following queries:
<ol>
    <li> Retrieve the total number of orders that are contained in the orders table. </li>
    <li> Return the product name and the total value stored in inventory (price * stock). </li>
    <li> Return the productId and the largest quantity ordered for all products separately (using <b>GROUP BY</b>). </li>
    <li> Return the productId and the number of different customers for all products separately (using <b>DISTINCT()</b>), only returning products with two or more distinct customers. </li>
</ol>
</p>


In [None]:
Q_D1 = ""

db.execute_query(query=Q_D1,  return_df=True, exercise="D1")

In [None]:
Q_D2 = ""

db.execute_query(query=Q_D2, return_df=True, exercise="D2")

In [None]:
Q_D3 = ""

db.execute_query(query=Q_D3, return_df=True, exercise="D3")

In [None]:
Q_D4 = ""

db.execute_query(query=Q_D4, return_df=True, exercise="D4")

<hr>
<h3> E. Writing complex (multi-table) queries </h3>
<p>
To enable the full extend of the capabilities of the SQL language, the <b>JOIN</b> operator will be introduced. <br>
JOIN allows you to query data from multiple tables, which enables you to write complex multi-table queries. <br>
Within this module you will be asked to use both the <b>basic</b> and <b>Calculation</b> operators combined with <b>JOIN</b>. <br>
<br>

Besides the <b>JOIN</b> operator, it is also possible to include other information using <b>sub-queries</b>. <br>
<b>Sub-queries</b> allow you to use the results of a sub-query in a conditional statement for another query. <br>
Such queries can for example be used to retrieve all information of customers that placed at least one order with a quantity larger than 3. <br>
This information can be retrieved using the query: <br>
<code> SELECT * FROM customers WHERE customers.id IN (SELECT customerId FROM orders WHERE quantity > 3) </code>. <br>
<br> 

The queries you have to write will be formulated as requests from different departments. <br>
If you will be working with databases in the future, you will be faced with such query requests. <br>
Write the following queries:
<ol>
    <li> Sales wants to analyse the number of products that are ordered 3 or more times, retrieve only the product names. <br>
    <li> The marketing department asks for all first and last names of customers who ordered a "Desk" in the past. </li>
    <li> Upper management wants to gain insight in consumer behavior. Calculate the total spend per customer and display the first and last name together with the total spend in ascending order based on total spend. </li>
    <li> For our customer loyalty program we want to retrieve a list of first and last names of customers that made at least 2 orders in the past with an average order value above 250 euros. <br>
</ol>
</p>

In [None]:
Q_E1 = """ """

db.execute_query(query=Q_E1, return_df=True, exercise="E1")

In [None]:
Q_E2 = """ """

db.execute_query(query=Q_E2, return_df=True, exercise="E1")

In [None]:
Q_E3 = """ """

db.execute_query(query=Q_E3, return_df=True, exercise="E2")

In [None]:
Q_E4 = """ """

db.execute_query(query=Q_E4, return_df=True, exercise="E4")

<hr>
<h1> Congratulations!! </h1>
<p>
You completed all modules and created all queries that are requested! <br>
Through running other queries you are able to validate whether the queries you have written are correct. <br>
However, to have a final check, we build in a mechanism that will check whether your queries show the desired, requested behavior. <br>
To kick off these tests, you only have to push this directory to the main Git branch. <br>
This can be done using the following steps:
<ol>
    <li> Open the command prompt in VS Code by pressing <b>CTRL</b> + <b>`</b>. </li>
    <li> Write the command <code>git status</code> to check which differences there are between your current and the master branch. </li>
    <li> Write the command <code>git add .\Modules\M3_SQL\</code> to add the main notebook to the staging lane. </li>
    <li> Write the command <code>git commit -m "<i>[INSERT COMMIT MESSAGE]</i>"</code> to commit the staged changes and adding a descriptive commit message. </li>
    <li> Finally, write the command <code>git push</code> to push the staged changes to the master branch. </li>
    <li> 
        Upon pushing the changes, the PyTest modules will be run to check your answers, for which an overview is generated in your Github 
        <ol>
            <li> View the results by heading to Github.com, open your forked repository and go to Actions. </li>
            <li> Upon first opening the Actions page it can be that you have to enable it first, if so just press "I understand my workflows, go ahead and enable them" button. </li>
            <li> Within the <b>"All workflows"</b> frame you will find a workflow run that has the same name as the commit message used to push your answers. </li>
            <li> 
                In front of  the workflow there can be either one of three things:
                <ul>
                    <li> <b>A yellow circle</b> - Meaning that the tests are still running. </li>
                    <li> <b>A green check mark</b> - Meaning that all tests were successful and your code is written perfectly! </li>
                    <li> <b>A red cross</b> - Meaning that a mistake is found within your code. </li>
                </ul>
            </li>
            <li> You can open the workflow run to view the details by clicking on the name of the workflow run (which is the commit message you wrote). </li>
            <li> After opening the workflow run, you can go to the details by clicking the white box that contains the mark and the text <b>"run"</b>. </li>
            <li> If a red cross is shown, meaning there is a mistake, the mistake can be found by navigating to (and opening) the <b>"PyTest"</b> section. </li>
            <li> Below the <b>"____test_results____"</b> you can find the <b>"AssertionError:"</b> which shows you where in your code the (first) mistake can be found and what the mistake is. </li>
            <li> For example the code: <code>AssertionError: R_B3: ...</code> means that there is a mistake in Query B3. </li>
        </ol>
    </li>
    <li> If a mistake was found, you head back to your code, fix the mistake using the hint shown behind the <b>AssertionError</b> and push your answers again. This will kickstart another test round. </li>
</ol>
If you pass all tests, feel free to continue to our next module, <b>Module 4 : Machine Learning (ML)</b>.
</p>