# Introduction to SQL and Databases

## Useful Links
- [What is Database?](https://www.geeksforgeeks.org/what-is-database/)
- [Types of Databases: Relational, NoSQL, Cloud, Vector](https://www.datacamp.com/blog/types-of-databases-overview)
- [What’s the Difference Between an ACID and a BASE Database?](https://aws.amazon.com/compare/the-difference-between-acid-and-base-database/?nc1=h_ls)
- [SQLite](https://en.wikipedia.org/wiki/SQLite)
- [SQLite Tutorial](https://www.sqlitetutorial.net/)
- [Introduction to SQL JOINs](https://ugoproto.github.io/ugodoc/introduction_to_sql_joins/)
- [How to Work with SQLite in Python – A Handbook for Beginners](https://www.freecodecamp.org/news/work-with-sqlite-in-python-handbook)
- [Python SQLite](https://www.geeksforgeeks.org/python-sqlite/)
- [leetcode](https://leetcode.com/problemset/all-code-essentials/)
- [All SQL Crash Course Lessons](https://sqlcrashcourse.com/lessons/)
- [Neo4j academy](https://graphacademy.neo4j.com/categories/beginners/)




## Introduction

### Understanding Databases
A <em>**database**</em> is an organized collection of data, that can include text, numbers, images, videos, and other types of files, typically stored electronically. Databases are managed using a software called <em>**Database Management System** (DBMS)</em>, which allows users to store, retrieve, and manipulate data. Databases are the backbone of modern applications, supporting businesses, organizations, and systems across industries. Most of the applications that we use everyday (like WhatsApp, Gmail, Social Media Websites, etc.) have a database that stores user, transaction and all other required information.

### Key components of a Database
- **Data -** The information stored in a database, like numbers, text, images, videos, or documents, depending
  on the database’s purpose.
- **Schema -** The structure of the database. It defines how data is organized and includes details like tables, columns, data types, and relationships between entities. The schema ensures consistency and helps users understand how the database is designed.
- **DBMS **<em>(Database Management System)</em>** -** The software layer that enables interaction with the database. It manages the storage, retrieval, and manipulation of data while ensuring security and data integrity. The DBMS also handles tasks like backup, recovery, and query optimization to maintain the database’s performance. Examples of DBMS software include MySQL, PostgreSQL, Neo4j, and MongoDB.
- **Queries -** Commands used to interact with the database, allowing users to retrieve, manipulate, or update
  data. In relational databases, **SQL** <em>(Structured Query Language)</em> is commonly used, whereas Neo4j uses **Cypher** for querying graph data.
- **Users -** Individuals or applications that interact with the database. They can have different levels of access based on their roles, such as administrators, developers, or end-users. For example, a database administrator might have full control, including the ability to create or delete tables, while a regular user might only have permission to view specific data.

### Examples of types of Databases

Different types of databases exist to meet diverse use-case needs, driven by variations in data structure, access patterns, scalability, and consistency requirements. Relational databases are ideal for structured data with predefined schemas, common in traditional business applications. The rise of big data and real-time analytics exposed limitations of relational databases, leading to NoSQL databases, which offer flexibility and scalability for unstructured or semi-structured data. More recently, vector databases have emerged to support machine learning applications by efficiently handling high-dimensional vector data used in similarity searches.

- **Relational Databases -** Store data in **structured tables** with rows and columns. Accessing structured data is made most flexible and efficient by relational database technology. Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server. (RDBMS)
- **NoSQL Databases** <em>(Not Only SQL)</em> **-** Designed to handle unstructured and **semi-structured data**. Examples: MongoDB, Cassandra, DynamoDB. 
    - **Graph Databases -** Focus on the relationships between data objects. They use **nodes** to represent data entities and **edges** to define relationships between them, enabling efficient traversal and retrieval of complex interconnected data. They excels at handling complex relationships, making them ideal for use cases such as social networks, recommendation systems, and network analysis. Examples: Neo4j, Amazon Neptune, Ultipa.
    - **Vector Databases -** Designed to store, index, and manage vector **embeddings**, which are high-dimensional data representations often used in machine learning models. These db enables fast similarity search, allowing efficient identification of vectors that are "close" to a given query vector using distance metrics like cosine similarity or Euclidean distance. This makes them suitable for tasks as image recognition, recommendation systems, and natural language processing. They utilize indexing structures that optimize the retrieval of similar vectors based on distance metrics. Examples: Faiss, Milvus, Qdrant, chroma.

### Structured query language (SQL)
To interact with relational databases, we use <em>**Structured Query Language** (SQL)</em>. This powerful language enables us to query, insert, update, and delete data, as well as perform complex operations like joining data from multiple tables. SQL's structured nature ensures data integrity and consistency through **ACID** properties:
- **Atomicity -** All operations within a transaction are treated as a single unit, ensuring that either all changes are committed or none are.
- **Consistency -** Data remains in a valid state throughout a transaction, adhering to predefined constraints and rules.
- **Isolation -** Transactions are executed independently as if they were the only operation happening on the database.
- **Durability -** Once a transaction is committed, its changes are permanent, even in the event of system failures.

#### What’s the difference between an ACID and a BASE database? 


**ACID** and **BASE** are database transaction models that determine how a database organizes and manipulates data. In the context of databases, a **transaction** is any operation that the database considers a single unit of work. A transaction must complete fully for the database to remain consistent. For example, when you transfer money from one bank account to another, the money must leave your account and must be added to the third-party account. You cannot call the transaction complete without both steps occurring. 

**ACID** databases prioritize consistency over availability—the whole transaction fails if an error occurs in any step within the transaction. In contrast, **BASE** databases prioritize availability over consistency. Instead of failing the transaction, users can access inconsistent data temporarily. Data consistency is achieved, but not immediately. Examples of **ACID** databases are MySQL, PostgreSQL, Neo4j and MongoDB, wherease of **BASE** Elasticsearch.


## Hands on SQLite

A SQL database, also known as a relational database, is a system that stores and organizes data into highly structured
tables of rows and columns. These databases offer Structured Query Language (SQL) to read and write the data, and are
categorized as relational database management systems (RDBMS). Examples of relational database are MySQL, PostgreSQL.

### SQLite

**SQLite** is a free and open-source relational database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. It is used by several of the top web browsers, operating systems, mobile phones, and other embedded systems.

Why Use SQLite to start? 
- **Open-source** software, it does not require any license after installation.
- **Serverless**, it doesn't need a different server process or system to operate.
- **Cross-platform** DBMS that can run on all platforms, including macOS, Windows, etc.

### Part 1: Getting Started with SQLite
Basic concepts, creating your first table, and understanding data types.

#### How to Create Database Tables
A table is where we’ll store our data, organized in rows (records) and columns (attributes). For this example, we’ll
create a table called `Students` to store information about students.

To create a table, we use SQL's `CREATE TABLE` statement. This command defines the table structure, including the column
names and the data types for each column.

SQL command to create a `Students` table with the following fields:
- `id`: A unique identifier for each student (an integer).
- `name`: The student's name (text).
- `age`: The student's age (an integer).
- `email`: The student's email address (text).

Column `id` rapresents a `PRIMARY KEY` for the table `Student`. The `PRIMARY KEY` constraint uniquely identifies each record in a table. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).

```sql
CREATE TABLE Students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  age INTEGER,
  email TEXT
);
```
To prevent errors when a table already exists, `IF NOT EXISTS` ensures that it is only created if it hasn’t been defined
before.

~~~~sql
CREATE TABLE IF NOT EXISTS Students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  age INTEGER,
  email TEXT
);
~~~~
> 📝 **To do:** Create Students table

#### How to Insert Data into a Table
Now that we have our `Students` table created, it’s time to start inserting data into the database.

##### How to Insert a Single Record

SQL syntax for inserting a single record:

~~~sql
INSERT INTO Students (name, age, email)
VALUES ('John Doe', 20, 'johndoe@example.com');
~~~~

> 📝 **To do:** Insert the record representing yourself

##### How to Insert Multiple Records

SQL syntax for inserting multiple records:

~~~sql
INSERT INTO Students (name, age, email)
    VALUES
      ('Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net'),
      ('Zayyan Arya', 20, 'yashawinibhakta@example.org'),
      ('Hemani Shukla', 18, 'gaurikanarula@example.com');
~~~~

> 📝 **To do:** Insert records representing a couple of your colleagues

#### How to Select Data from a Table

We often use the `SELECT` statement to query data from one or more tables.

To select all the rows in a table:

~~~sql
SELECT *
FROM Students;
~~~~

To select a column from a table tha satisfies a condition:
~~~sql
SELECT name
FROM Students
WHERE age > 18;
~~~~

> 📝 **To do:** Select yourself and your desk mate

#### How to Update Data in a Table

To update existing data in a table, you use SQLite `UPDATE` statement. 

```sql
UPDATE Student
SET email = 'johndoe@exampleupdate.com'
WHERE
    name LIKE 'John Doe';
```

In this syntax:
- First, specify the table where you want to update after the `UPDATE` clause.
- Second, set new value for each column of the table in the `SET` clause.
- Third, specify rows to update using a condition in the `WHERE` clause. The `WHERE` clause is optional. If you skip it, the `UPDATE` statement will update data in all rows of the table.

> 📝 **To do:** Update the records representing yourself

#### How to Delete Data from a Table

The SQLite `DELETE` statement allows you to delete one row, multiple rows, and all rows in a table.

~~~sql
DELETE FROM Student
WHERE name LIKE 'John Doe';
~~~

In this syntax:
- First, specify the name of the table which you want to remove rows after the `DELETE FROM` keywords.
- Second, add a search condition in the `WHERE` clause to identify the rows to remove. The `WHERE` clause is an optional part of the `DELETE` statement. If you omit the `WHERE` clause, the `DELETE` statement will delete all rows in the table.

> 📝 **To do:** Delete the records representing yourself

#### How to Drop Database Tables

To remove a table in a database, you use SQL `DROP TABLE` statement. 

```sql
DROP TABLE Students;
```
If you remove a non-existing table, SQLite issues an error. If you use `IF EXISTS` option, then SQLite removes the table only if the table exists, otherwise, it just ignores the statement and does nothing.

~~~sql
DROP TABLE IF EXISTS Students;
~~~

The `DROP TABLE` statement performs an implicit `DELETE` statement before dropping the table.

> 📝 **To do:** Drop Student table

### Part 2: Working with Multiple Tables – Students and Courses
Creating related tables, inserting data, and modeling real-world relationships.

> 📝 **To do:** Create `Courses` table with the following fields:
> - `id`: unique identifier
> - `name`: name of the course
> - `description`: description of the course 

> 📝 **To do:** Create `Enrollments` table with the following fields:
> - `student_id`: unique identifier for a student, referencing the Students table.
> - `course_id`: unique identifier for a course, referencing the Courses table.
> - `enrollment_date`: date when the student enrolled in the course.

```sql
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES Students(id),
FOREIGN KEY (course_id) REFERENCES Courses(id)
```

- `PRIMARY KEY`: A composite primary key made up of `student_id` and `course_id`, ensuring each student-course pair is unique.
- `FOREIGN KEY` (student_id): Ensures that `student_id` exists in the Students table.
- `FOREIGN KEY` (course_id): Ensures that `course_id` exists in the Courses table.

> 📝 **To do:** Insert some Data inside the tables
> - Insert data for courses you have taken
> - For courses you will take (i.e., not yet enrolled), don’t insert it in `enrollments` table

### ER diagram notation

An <em>**Entity Relationship** (ER) **Diagram**</em> is a type of flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system. ER Diagrams are used to design relational databases

![students_courses_db_er](img/students_courses_db_er.png)

![er_relations.png](img/er_relations.png)

> 📝 **To do:** Query tables
> - Find all students who are enrolled in at least one course.
> - How Many Courses is Each Student Enrolled In?
> - Shows only students who are enrolled in more than one course
> - Find all courses that have at least one student enrolled
> - How Many Students per Course?
> - Show each student, the course they're enrolled in, and when they enrolled.
>   

### Part 3: Querying Data – Exploring with SELECT and JOIN
Learn how to retrieve meaningful information using SELECT, WHERE, and JOIN statements.


### SQL JOIN

![img.png](img/img.png)

## Chinnok database



### ER diagram notation

An <em>**Entity Relationship** (ER) **Diagram**</em> is a type of flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system. ER Diagrams are used to design relational databases

![Chinook_er.png](img/Chinook_er.png)


![er_relations.png](img/er_relations.png)

The **Chinook** sample database contains 11 tables, as follows:

- `employees` table stores employee data such as id, last name, first name, etc. It also has a field named `ReportsTo` to specify who reports to whom.
- customers `table` stores customer data.
- `invoices` & `invoice_items` tables: these two tables store invoice data. The `invoices` table stores invoice header data and the `invoice_items` table stores the invoice line items data.
- `artists` table stores artist data. It is a simple table that contains the id and name.
- `albums` table stores data about a list of tracks. Each album belongs to one artist, but an artist may have multiple albums.
- `media_types` table stores media types such as MPEG audio and AAC audio files.
- `genres` table stores music types such as rock, jazz, metal, etc.
- `tracks` table stores the data of songs. Each track belongs to one album.
- `playlists` & `playlist_track` tables: `playlists` table stores data about playlists. Each playlist contains a list of tracks. Each track may belong to multiple playlists. The relationship between the `playlists` and `tracks` tables is many-to-many. The `playlist_track` table is used to reflect this relationship.

🟢 Beginner

List all customers from Canada.
- Tables: Customer
- Concepts: SELECT, WHERE
- **Returns:** First and last names of all Canadian customers.


Find all albums by the artist "AC/DC".
- Tables: Artist, Album
- Concepts: JOIN, WHERE
- **Returns:** Album titles associated with "AC/DC".

List the names of all tracks that are longer than 5 minutes.
- Tables: Track
- Concepts: WHERE, time filtering
- **Returns:** Track names where duration exceeds 5 minutes.

🟡 Intermediate

Which genres have the most tracks?
- Tables: Genre, Track
- Concepts: JOIN, GROUP BY, ORDER BY, COUNT
- **Returns:** A list of genres with the number of tracks in each, sorted by track count.

Who are the top 5 customers by total money spent?
- Tables: Customer, Invoice, InvoiceLine
- Concepts: JOIN, GROUP BY, SUM, LIMIT
- **Returns:** Customers and how much they’ve spent, top 5 only.

List the top 3 employees who support the most customers.
- Tables: Employee, Customer
- Concepts: JOIN, GROUP BY, COUNT
- **Returns:** Employee names and number of customers they support.

Which albums have tracks from more than one genre?
- Tables: Album, Track, Genre
- Concepts: GROUP BY, HAVING, COUNT(DISTINCT)
- **Returns:** Album titles that include tracks from multiple genres.

🔴 Advanced

Find the most popular playlist (with the most tracks).
- Tables: Playlist, PlaylistTrack
- Concepts: JOIN, GROUP BY, COUNT, ORDER BY, LIMIT
- **Returns:** The playlist name and the number of tracks it contains.

For each country, what is the average invoice total?
- Tables: Invoice, Customer
- Concepts: JOIN, GROUP BY, AVG
- **Returns:** Each country with its average invoice total.


Which artist has generated the most revenue?
- Tables: Artist, Album, Track, InvoiceLine
- Concepts: multi-table JOIN, SUM, GROUP BY
- **Returns:** The artist name with the highest total sales revenue.

Find customers who have purchased from every genre.
- Concepts: HAVING, COUNT(DISTINCT), subqueries or NOT EXISTS
- **Returns:** Customers who have at least one purchase in *every* genre in the database.


**Note:** These questions were generated with the help of ChatGPT.

## 🔴 Pro-Level SQL Questions for Chinook Database

### 🔴 Question 1: Customers Who Purchased More Than the Average Invoice Amount

**🧠 The Question**
Find the customers who have spent more than the average invoice total. Show `FirstName`, `LastName`, and `TotalSpent` for each customer.

**📚 The Tables Involved**
- `Customer`
- `Invoice`
- `InvoiceLine`

**🔧 The SQL Concepts**
- **JOIN**  
- **GROUP BY**  
- **HAVING**  
- **SUBQUERY** or **AVG()**

**📥 What the Query Will Return**
- A list of customers who have spent more than the average total amount across all invoices.
- It will include `FirstName`, `LastName`, and the total amount they’ve spent (`TotalSpent`).

### 🔴 Question 2: Artists with Tracks Sold in Multiple Countries

**🧠 The Question**
List the artists who have sold tracks in more than one country. Show `ArtistName` and the number of distinct countries they’ve sold their tracks in.

**📚 The Tables Involved**
- `Artist`
- `Album`
- `Track`
- `InvoiceLine`
- `Invoice`
- `Customer`

**🔧 The SQL Concepts**
- **JOIN** (multi-table join)  
- **COUNT(DISTINCT)**  
- **GROUP BY**  
- **HAVING**

**📥 What the Query Will Return**
- A list of artists who have sold tracks in more than one country.
- It will display the `ArtistName` and the number of distinct countries where their tracks have been sold.


### 🔴 Question 3: The Most Purchased Tracks in a Given Time Period

**🧠 The Question**
Find the top 10 most purchased tracks in the past month. Show `TrackName`, the number of times it was purchased (`PurchaseCount`), and its `UnitPrice`.

**📚 The Tables Involved**
- `Track`
- `InvoiceLine`
- `Invoice`

**🔧 The SQL Concepts**
- **JOIN**  
- **GROUP BY**  
- **COUNT**  
- **DATE filtering (using `strftime()` or `WHERE`)**  
- **ORDER BY**

**📥 What the Query Will Return**
- A list of the top 10 tracks purchased in the past month, including:
  - The `TrackName`
  - The number of times it was purchased (`PurchaseCount`)
  - The `UnitPrice` of the track


### 🔴 Question 4: Calculate Revenue by Genre Over Time

**🧠 The Question**
For each genre, calculate the total revenue generated each month in the last year. Show `GenreName`, `Month`, and `Revenue` (total revenue for that month).

**📚 The Tables Involved**
- `Genre`
- `Track`
- `InvoiceLine`
- `Invoice`

**🔧 The SQL Concepts**
- **JOIN**  
- **GROUP BY**  
- **SUM()**  
- **DATE formatting and extraction (`strftime()` or `MONTH()`)**  
- **HAVING**

**📥 What the Query Will Return**
- A list showing the total revenue (`Revenue`) for each genre, split by month for the last year.
- The results will include the `GenreName`, the `Month`, and the `Revenue` for each genre in that month.



**Note:** These questions were generated with the help of ChatGPT.