# Assignment - Relational Databases and SQL Practice

This assignment is a part of the [Zero to Data Science Bootcamp by Jovian](https://zerotodatascience.com)

As you go through this notebook, you will find a **???** in certain places. Your job is to replace the **???** with appropriate code or values, to ensure that the notebook runs properly end-to-end and your machine learning model is trained properly without errors. 

**Guidelines**

1. Make sure to run all the code cells in order. Otherwise, you may get errors like `NameError` for undefined variables.
2. Do not change variable names, delete cells, or disturb other existing code. It may cause problems during evaluation.
3. In some cases, you may need to add some code cells or new statements before or after the line of code containing the **???**. 
4. Since you'll be using a temporary online service for code execution, save your work by running `jovian.commit` at regular intervals.
5. Review the "Evaluation Criteria" for the assignment carefully and make sure your submission meets all the criteria.
6. Questions marked **(Optional)** will not be considered for evaluation and can be skipped. They are for your learning.
7. It's okay to ask for help & discuss ideas on the [Slack Group](https://zerotodatascience.com), but please don't post full working code, to give everyone an opportunity to solve the assignment on their own.


**Important Links**:

- Make a submission here: TODO
- Review the following notebooks:
    - TODO
    - TODO




## How to Run the Code and Save Your Work


**Option 1: Running using free online resources (1-click, recommended):** The easiest way to start executing the code is to click the **Run** button at the top of this page and select **Run on Binder**. This will set up a cloud-based Jupyter notebook server and allow you to modify/execute the code.


**Option 2: Running on your computer locally:** To run the code on your computer locally, you'll need to set up [Python](https://www.python.org), download the notebook and install the required libraries. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.

**Saving your work**: You can save a snapshot of the assignment to your [Jovian](https://jovian.ai) profile, so that you can access it later and continue your work. Keep saving your work by running `jovian.commit` from time to time.

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
jovian.commit(project='sql-practice-assignment', privacy='secret')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "aakashns/sql-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/sql-practice-assignment[0m


'https://jovian.ai/aakashns/sql-practice-assignment'

## SQLite and Initial Setup

Relational databases generally have two components:

1. **Database Server/Engine**: A software package that manages databases and runs in the background, listening for SQL queries from authorized users E.g. MySQL server, Microsoft SQL server, Postgres etc.
2. **Database Client**: A command-line tool or graphical user interface (GUI) to connect to the database server and run SQL queries. E.g. MySQL workbench, PgAdmin etc.

The server and client can be on the same computer e.g. both on your laptop, or on different computers e.g. the database server can be running on the cloud and you can connect to it using a client installed on your computer.

Most database servers/engines are designed to operate on databases containing large amounts of data (e.g. 100s of GBs) and to handle a very high volume of queries (e.g. thousands of queries per second). They typically require powerful hardware i.e. multi-core CPUs and large amounts of RAM.

In this assignment, however, we'll use a lightweight database engine called [SQLite](https://www.sqlite.org/index.html), which is well-suited for tiny databases with small amounts of data. Despite being limited in its capabilities, it is the [most widely used database engine in the world](https://www.sqlite.org/mostdeployed.html) because it is used by smartphone apps, web browsers and desktop applications to store and manage data locally on the device. 

If you're running this assignment locally, you'll need to [download and install `sqlite3`](https://www.servermania.com/kb/articles/install-sqlite/) on your computer. `sqlite3` is already installed on Binder. You can verify that you have `sqlite3` installed by running:

In [4]:
!sqlite3 --version

3.36.0 2021-06-18 18:36:39 5c9a6c06871cb9fe42814af9c039eb6da5427a6ec28f187af7ebfb62eafa66e5


Unlike other relational databases, SQLite doesn't have separate server and client packages. The `sqlite3` command line tool is all your need to create and interact with SQLite databases. The databases themselves are stored as files with the extension `.sqlite`. You can perform CRUD operations on the database simply by passing SQL queries using `sqlite3`.

Here's a visual representation of how SQLite differs from other relational database servers ([source](https://devopedia.org/sqlite)):

<img src="https://i.imgur.com/eC5Ieni.png" width="640">


Note that an `.sqlite` file is different from a `.sql` file, which contains commands for creating tables and inserting data. The `.sqlite` file is the actual database itself, where the data is stored in an binary tabular format for efficient querying.

In this assignment, we'll use the [Chinook open source database](https://github.com/lerocha/chinook-database). Let's begin by downloading the `.sqlite` file for the database containing all the required tables and the sample data. 

In [5]:
from urllib.request import urlretrieve

In [6]:
db_url = 'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite'

In [7]:
urlretrieve(db_url, 'chinook.sqlite')

('chinook.sqlite', <http.client.HTTPMessage at 0x7f83b6c26320>)

To access and interact with the database using SQL queries directly within Jupyter, we'll use the `ipython-sql` library that provides the `%%sql` magic commands.

In [8]:
!pip install sqlalchemy ipython-sql --quiet --upgrade

In [9]:
%load_ext sql

We can now connect to the database using a connection string.

In [18]:
%%sql 

sqlite:///chinook.sqlite

We are now connected to the database and we can start writing SQL queries.

## Database Structure and Tables

The Chinook database represents a digital media store, including tables for artists, albums, media tracks, invoices and customers. Here's the Entity Relationship Diagram (ERD) showing the structure of the Chinook database:

![](https://i.imgur.com/X1wM142.png)

Let's begin by looking at the data from some of the tables in the database. We can write SQL queries directly within Jupyter code cells by including the magic command `%%sql` as the first line of the cell, indicating that contents of cell represent a SQL query.

### Artist

In [44]:
%%sql 

SELECT * FROM Artist LIMIT 5

   sqlite:///chinook.db
 * sqlite:///chinook.sqlite
Done.


ArtistId,Name
1,AC/DC
2,Accept
3,Aerosmith
4,Alanis Morissette
5,Alice In Chains


### Album

In [20]:
%%sql 

SELECT * FROM Album LIMIT 5

   sqlite:///chinook.db
 * sqlite:///chinook.sqlite
Done.


AlbumId,Title,ArtistId
1,For Those About To Rock We Salute You,1
2,Balls to the Wall,2
3,Restless and Wild,2
4,Let There Be Rock,1
5,Big Ones,3


### Track


> **QUESTION 1**: Write a SQL query to show the first 10 rows from the table `Track` sorted in alphabetical order. Replace the `???` in the cell below with your answer.

In [None]:
%%sql

???

In [63]:
# DON'T MODIFY OR DELETE THIS CELL!
ans1 = _

(OPTIONAL) Write a SQL query to show the _next 10_ rows from `Tracks`.

(OPTIONAL) Write SQL queries in the cells below to explore the first few rows of each table in the database.

Let's save our work before continuing.

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

> **QUESTION 3**: ???

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 4**: ???

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 5**: ???

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 6**: 

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 7**: List the largest single invoice generated for every customer in the year 2012, ordered by the transaction value (highest to lowest). Order the list by the invoice total.

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY OR MOVE THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 8**: Write a SQL query to show the total number of albums and average number of tracks per album for every artist. The result should include the artist ID, artist's name, total albums and average number of tracks per album.

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY OR MOVE THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 9**: Write a SQL query

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY OR MOVE THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

> **QUESTION 10**: Write a SQL query to display the top 10 highest selling tracks in 2012. The result should contain the track ID, track name and the number of sales of the track in 2012.

In [None]:
%%sql

???

In [None]:
# DON'T MODIFY OR MOVE THIS CELL! IT IS USED FOR EVALUATION.
ans1 = _

Let's save our work before continuing.

In [None]:
jovian.commit()

In [31]:
%sql SELECT * FROM ARTIST LIMIT 10

   sqlite:///chinook.db
 * sqlite:///chinook.sqlite
Done.


ArtistId,Name
1,AC/DC
2,Accept
3,Aerosmith
4,Alanis Morissette
5,Alice In Chains
6,Antônio Carlos Jobim
7,Apocalyptica
8,Audioslave
9,BackBeat
10,Billy Cobham


In [42]:
%sql SELECT COUNT(*) FROM Track

   sqlite:///chinook.db
 * sqlite:///chinook.sqlite
Done.


COUNT(*)
3503


In [28]:
%sql SELECT * FROM TRACK LIMIT 10

   sqlite:///chinook.db
 * sqlite:///chinook.sqlite
Done.


TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
2,Balls to the Wall,2,2,1,,342562,5510424,0.99
3,Fast As a Shark,3,2,1,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman",230619,3990994,0.99
4,Restless and Wild,3,2,1,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman",252051,4331779,0.99
5,Princess of the Dawn,3,2,1,Deaffy & R.A. Smith-Diesel,375418,6290521,0.99
6,Put The Finger On You,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",205662,6713451,0.99
7,Let's Get It Up,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",233926,7636561,0.99
8,Inject The Venom,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",210834,6852860,0.99
9,Snowballed,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",203102,6599424,0.99
10,Evil Walks,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",263497,8611245,0.99


> **QUESTION 10**: Write SQL queries to insert the following records into the database:
> 
> 1. A new artist called "Linkin Park"
> 2. A two new albums:
>     1. Hybrid Theory
>     2. Meteora
> 3. Add 6 new tracks:
>     1. Papercut
>     2. In The End
>     3. Crawling
>     4. Somewhere I Belong
>     5. Numb
>     6. Breaking the Habit
>
> *Hint*: You need not provide a value for the ID (primary key) columns while inserting these rows, because the ID columns are marked as [AUTO INCREMENT](https://www.w3schools.com/sql/sql_autoincrement.asp) and will automatically be assigned the next available numeric value.

Here's the query to insert a new artist:

In [None]:
%%sql

INSERT INTO Artist (Name) VALUES ("Linkin Park")

Write the query to insert the new albums below:

In [None]:
%%sql

???

Write the query to insert the new tracks below:

In [None]:
%%sql

???

Make sure to insert exactly one copy of each of the above records.  

If the records were inserted properly, you should be able to retrieve them back using the following queries.

In [None]:
%%sql

SELECT * FROM Artist WHERE