# Introduction to SQL and Databases

## Useful Links
- https://www.geeksforgeeks.org/what-is-database/
- https://www.geeksforgeeks.org/python-sqlite/
- https://ugoproto.github.io/ugodoc/introduction_to_sql_joins/
- https://www.freecodecamp.org/news/work-with-sqlite-in-python-handbook
- SQLite Tutorial https://www.sqlitetutorial.net/

## Introduction

### Understanding Databases
A <em>**database**</em> is an electronically stored, systematic collection of data that can include text, numbers, images, videos, and other types of files. Databases are managed using a software called <em>**Database Management System** (DBMS)</em>, which allows users to store, retrieve, and manipulate data. Databases are the backbone of modern applications, supporting businesses, organizations, and systems across industries. Most of the applications that we use everyday (like WhatsApp, Gmail, Social Media Websites, etc.) have a database that stores user, transaction and all other required information.

### Key components of a Database
- **Data -** The information stored in a database, like numbers, text, images, videos, or documents, depending
  on the database’s purpose.
- **Schema -** The structure of the database. It defines how data is organized and includes details like tables, columns, data types, and relationships between entities. The schema ensures consistency and helps users understand how the database is designed.
- **DBMS **<em>(Database Management System)</em>** -** The software layer that enables interaction with the database. It manages the storage, retrieval, and manipulation of data while ensuring security and data integrity. The DBMS also handles tasks like backup, recovery, and query optimization to maintain the database’s performance. Examples of DBMS software include MySQL, PostgreSQL, Neo4j, and MongoDB.
- **Queries -** Commands used to interact with the database, allowing users to retrieve, manipulate, or update
  data. In relational databases, **SQL** <em>(Structured Query Language)</em> is commonly used, whereas Neo4j uses **Cypher** for querying graph data.
- **Users -** Individuals or applications that interact with the database. They can have different levels of access based on their roles, such as administrators, developers, or end-users. For example, a database administrator might have full control, including the ability to create or delete tables, while a regular user might only have permission to view specific data.

### Examples of types of Databases
[Reference](https://www.datacamp.com/blog/types-of-databases-overview)

Different types of databases exist to meet diverse use-case needs, driven by variations in data structure, access patterns, scalability, and consistency requirements. Relational databases are ideal for structured data with predefined schemas, common in traditional business applications. The rise of big data and real-time analytics exposed limitations of relational databases, leading to NoSQL databases, which offer flexibility and scalability for unstructured or semi-structured data. More recently, vector databases have emerged to support machine learning applications by efficiently handling high-dimensional vector data used in similarity searches.

- **Relational Databases -** Store data in **structured tables** with rows and columns. Accessing structured data is made most flexible and efficient by relational database technology. Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server. (RDBMS)
- **NoSQL Databases** <em>(Not Only SQL)</em> **-** Designed to handle unstructured and **semi-structured data**. Examples: MongoDB, Cassandra, DynamoDB. 
    - **Graph Databases -** Focus on the relationships between data objects. They use **nodes** to represent data entities and **edges** to define relationships between them, enabling efficient traversal and retrieval of complex interconnected data. They excels at handling complex relationships, making them ideal for use cases such as social networks, recommendation systems, and network analysis. Examples: Neo4j, Amazon Neptune, Ultipa.
    - **Vector Databases -** Designed to store, index, and manage vector **embeddings**, which are high-dimensional data representations often used in machine learning models. These db enables fast similarity search, allowing efficient identification of vectors that are "close" to a given query vector using distance metrics like cosine similarity or Euclidean distance. This makes them suitable for tasks as image recognition, recommendation systems, and natural language processing. They utilize indexing structures that optimize the retrieval of similar vectors based on distance metrics. Examples: Faiss, Milvus, Qdrant, chroma.

### Structured query language (SQL)
To interact with relational databases, we use <em>**Structured Query Language** (SQL)</em>. This powerful language enables us to query, insert, update, and delete data, as well as perform complex operations like joining data from multiple tables. SQL's structured nature ensures data integrity and consistency through **ACID** properties:
- **Atomicity -** All operations within a transaction are treated as a single unit, ensuring that either all changes are committed or none are.
- **Consistency -** Data remains in a valid state throughout a transaction, adhering to predefined constraints and rules.
- **Isolation -** Transactions are executed independently as if they were the only operation happening on the database.
- **Durability -** Once a transaction is committed, its changes are permanent, even in the event of system failures.

#### What’s the difference between an ACID and a BASE database? 

[Reference](https://aws.amazon.com/compare/the-difference-between-acid-and-base-database/?nc1=h_ls)

**ACID** and **BASE** are database transaction models that determine how a database organizes and manipulates data. In the context of databases, a **transaction** is any operation that the database considers a single unit of work. A transaction must complete fully for the database to remain consistent. For example, when you transfer money from one bank account to another, the money must leave your account and must be added to the third-party account. You cannot call the transaction complete without both steps occurring. 

**ACID** databases prioritize consistency over availability—the whole transaction fails if an error occurs in any step within the transaction. In contrast, **BASE** databases prioritize availability over consistency. Instead of failing the transaction, users can access inconsistent data temporarily. Data consistency is achieved, but not immediately. Examples of **ACID** databases are MySQL, PostgreSQL, Neo4j and MongoDB, wherease of **BASE** Elasticsearch.


## Hands on SQLite

A SQL database, also known as a relational database, is a system that stores and organizes data into highly structured
tables of rows and columns. These databases offer Structured Query Language (SQL) to read and write the data, and are
categorized as relational database management systems (RDBMS). Examples of relational database are MySQL, PostgreSQL.

### SQLite
[Reference](https://en.wikipedia.org/wiki/SQLite)
**SQLite** is a free and open-source relational database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. It is used by several of the top web browsers, operating systems, mobile phones, and other embedded systems.

Why Use SQLite to start? 
- **Open-source** software, it does not require any license after installation.
- **Serverless**, it doesn't need a different server process or system to operate.
- **Cross-platform** DBMS that can run on all platforms, including macOS, Windows, etc.

### How to Create Database Tables
A table is where we’ll store our data, organized in rows (records) and columns (attributes). For this example, we’ll
create a table called `Students` to store information about students.

To create a table, we use SQL's `CREATE TABLE` statement. This command defines the table structure, including the column
names and the data types for each column.

SQL command to create a `Students` table with the following fields:
- `id`: A unique identifier for each student (an integer).
- `name`: The student's name (text).
- `age`: The student's age (an integer).
- `email`: The student's email address (text).

Column `id` rapresents a `PRIMARY KEY` for the table `Student`. The `PRIMARY KEY` constraint uniquely identifies each record in a table. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).

```sql
CREATE TABLE Students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  age INTEGER,
  email TEXT
);
```
To prevent errors when a table already exists, `IF NOT EXISTS` ensures that it is only created if it hasn’t been defined
before.

~~~~sql
CREATE TABLE IF NOT EXISTS Students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  age INTEGER,
  email TEXT
);
~~~~


### How to Insert Data into a Table
Now that we have our `Students` table created, it’s time to start inserting data into the database.

#### How to Insert a Single Record

SQL syntax for inserting a single record:

~~~~sql
INSERT INTO Students (name, age, email)
VALUES ('John Doe', 20, 'johndoe@example.com');
~~~~

**To do:** Insert the record representing yourself

#### How to Insert Multiple Records

SQL syntax for inserting multiple records:

~~~~sql
INSERT INTO Student (name, age, email)
    VALUES
      ('Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net'),
      ('Zayyan Arya', 20, 'yashawinibhakta@example.org'),
      ('Hemani Shukla', 18, 'gaurikanarula@example.com');
~~~~

**To do:** Insert records representing a couple of your colleagues

### How to Update Data in a Table

To update existing data in a table, you use SQLite `UPDATE` statement. 

```sql
UPDATE Student
SET email = 'johndoe@exampleupdate.com'
WHERE
    name LIKE 'John Doe';
```

In this syntax:
- First, specify the table where you want to update after the `UPDATE` clause.
- Second, set new value for each column of the table in the `SET` clause.
- Third, specify rows to update using a condition in the `WHERE` clause. The `WHERE` clause is optional. If you skip it, the `UPDATE` statement will update data in all rows of the table.

**To do:** Update the records representing yourself

### How to Delete Data from a Table

The SQLite `DELETE` statement allows you to delete one row, multiple rows, and all rows in a table.

```sql
DELETE FROM Student
WHERE name LIKE 'John Doe';
```

In this syntax:
- First, specify the name of the table which you want to remove rows after the `DELETE FROM` keywords.
- Second, add a search condition in the `WHERE` clause to identify the rows to remove. The `WHERE` clause is an optional part of the `DELETE` statement. If you omit the `WHERE` clause, the `DELETE` statement will delete all rows in the table.

**To do:** Delete the records representing yourself

### How to Drop Database Tables

To remove a table in a database, you use SQL `DROP TABLE` statement. 

```sql
DROP TABLE Student;
```
If you remove a non-existing table, SQLite issues an error. If you use `IF EXISTS` option, then SQLite removes the table only if the table exists, otherwise, it just ignores the statement and does nothing.

~~~~sql
DROP TABLE IF EXISTS Student;
~~~~

The `DROP TABLE` statement performs an implicit `DELETE` statement before dropping the table.

**To do:** Drop Student table

## Hands on SQLite with Python

### How to Create Database Tables


In [None]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [None]:
%%sql
sqlite:///test.db

In [None]:
%%sql
create table sample(column_1 int, column_2 varchar);


 * sqlite:///test.db
(sqlite3.OperationalError) table sample already exists
[SQL: create table sample(column_1 int, column_2 varchar);]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


In [None]:
%%sql
insert into sample values (1,'abc'),(2,'abcd');

 * sqlite:///test.db
2 rows affected.


[]

In [None]:
!pip install prettytable





In [None]:
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

In [None]:
%%sql
select * from sample;

 * sqlite:///test.db
Done.


column_1,column_2
1,abc
2,abcd
1,abc
2,abcd
1,abc
2,abcd


### How to Insert Data into a Table
Now that we have our Students table created, it’s time to start inserting data into the database.


#### How to Insert a Single Record

SQL syntax for inserting a single record:

~~~~sql
INSERT INTO Students (name, age, email)
VALUES ('John Doe', 20, 'johndoe@example.com');
~~~~


Programmatically

~~~~python
import sqlite3

# Use 'with' to open and close the connection automatically
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # Insert a record into the Students table
    query = """
        INSERT INTO Students (name, age, email)
        VALUES (?, ?, ?);
    """

    value = ('Jane Doe', 23, 'jane@example.com')

    cursor.execute(query, value)

    # Commit the changes automatically
    connection.commit()
~~~~
The ? placeholders represent the values to be inserted into the table.


#### How to Insert Multiple Records
~~~~sql
INSERT INTO Students (name, age, email)
    VALUES
      ('Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net'),
      ('Zayyan Arya', 20, 'yashawinibhakta@example.org'),
      ('Hemani Shukla', 18, 'gaurikanarula@example.com');
~~~~

Programmatically: `cursor.executemany()`: This method allows us to insert multiple records at once, making the code more
efficient.

~~~~python
import sqlite3

# Use 'with' to open and close the connection automatically
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # Insert a record into the Students table
    query = '''
    INSERT INTO Students (name, age, email)
    VALUES (?, ?, ?);
    '''
    values = [
        ['Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net'],
        ['Zayyan Arya', 20, 'yashawinibhakta@example.org'],
        ['Hemani Shukla', 18, 'gaurikanarula@example.com']
    ]
    # Execute the query for multiple records
    cursor.executemany(query, values)

    # Commit the changes
    connection.commit()
~~~~

### How to Handle Common Issues: SQL Injection

SQL Injection is a security vulnerability that occurs when attackers manipulate SQL queries by injecting malicious
input. This can lead to unauthorized access, data breaches, or even complete database deletion. For example, an attacker
might try to inject code like `DROP TABLE Students;` to delete the table.

By using parameterized queries, we avoid this issue. The ? placeholders in parameterized queries
ensure that input values are treated as data, not as part of the SQL command. This makes it impossible for malicious
code to be executed.

### How to Query Data

### Data Types in SQLite and Their Mapping to Python

![data_type.png](img/data_type.png)

### SQL JOIN

![img.png](img/img.png)