# Hands-On SQL using DuckDB


**Refer lecture 15.1 => Database Programming in Python**
- Using sqlite3 module for SQLite Database operations
- SQLite is just like other relational database management systems (RDBMS) like MySQL or PostgreSQL.
- SQL is a language used to interact with relational databases.
- sqlite3 is a Python module to work with SQLite databases.
- Important DataBase operations - Create, Read, Update and Delete (CRUD)

## Introduction to SQL

**SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving data stored in a relational database.**

SQL is the standard language for Relational Database System. All the Relational Database Management Systems (RDMS) like MySQL, Oracle, Postgres etc use SQL as their standard database language.

### Why SQL?

SQL is widely popular because it offers the following advantages −
   * Allows users to access data in the relational database management systems.
   * Allows users to describe the data.
   * Allows users to define the data in a database and manipulate that data.
   * Allows to embed within other languages using SQL modules, libraries & pre-compilers.
   * Allows users to create and drop databases and tables.
   * Allows users to create view, stored procedure, functions in a database.
   * Allows users to set permissions on tables, procedures and views.



# SQL Commands

The standard SQL commands to interact with relational databases are **CREATE, SELECT, INSERT, UPDATE, DELETE and DROP**. These commands can be classified into the following groups based on their nature −

### DDL - Data Definition Language
    * CREATE: Creates a new table, a view of a table, or other object in the database.
    * ALTER: Modifies an existing database object, such as a table.
    * DROP: Deletes an entire table, a view of a table or other objects in the database.
    

### DML - Data Manipulation Language
    * SELECT: Retrieves certain records from one or more tables.
    * INSERT: Creates a record.
    * UPDATE: Modifies records.
    * DELETE: Deletes records.


## What is DuckDB? Is it a Relational Database?

**Yes — DuckDB is a relational database.**

## Key Points

- **Type**: DuckDB is an **in-process SQL OLAP (analytical) database management system**.  
- **Relational**: Like PostgreSQL, MySQL, or SQLite, DuckDB organizes data in **tables with rows and columns** and supports **SQL queries** (SELECT, JOIN, GROUP BY, etc.).  
- **In-process**: Unlike client–server databases (e.g., PostgreSQL, MySQL), DuckDB runs **inside your application process** (like SQLite). No separate database server is required.  
- **Optimized for Analytics (OLAP)**:  
  - Designed for **analytical workloads** (large aggregations, joins, columnar scans)  
  - Uses a **columnar storage format** → very efficient for data science and analytics  
- **Use cases**: Often called **"the SQLite of analytics"**, it’s ideal for:  
  - Querying Parquet/CSV files directly  
  - Data science workflows in Python/R  
  - Embedding inside applications  
  - Fast analytical queries on medium to large datasets  

## Summary
**DuckDB = Relational database + SQL engine + Embedded + Optimized for analytics.**
- https://duckdb.org/why_duckdb.html

In [1]:
!pip install duckdb



In [26]:
# import duckdb
import duckdb

In [28]:
# create a database connection
con = duckdb.connect('OnlineStore.duckdb')

In [29]:
# Q1. Creating a Customers Table
con.execute(
    """
    CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(50),
    city VARCHAR(50),
    age INT);
    """
)

<duckdb.duckdb.DuckDBPyConnection at 0x20bc61a1f30>

In [30]:
# Q2. insert customers into Customers Table
con.execute(
    """
    INSERT INTO Customers (customer_id, customer_name, city, age) VALUES
    (1, 'Rahul', 'Delhi', 28),
    (2, 'Priya', 'Mumbai', 32),
    (3, 'Amit', 'Delhi', 22),
    (4, 'Sneha', 'Bangalore', 29),
    (5, 'Arjun', 'Mumbai', 40);
    """
)

<duckdb.duckdb.DuckDBPyConnection at 0x20bc61a1f30>

In [32]:
# Q3. Create Orders Table
con.execute(
    """
    CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product VARCHAR(50),
    price INT,
    order_date DATE,
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id));
    """
)

<duckdb.duckdb.DuckDBPyConnection at 0x20bc61a1f30>

In [33]:
# Q4. Add order in the Orders Table
con.execute(
    """
    INSERT INTO Orders (order_id, customer_id, product, price, order_date) VALUES
    (101, 1, 'Shoes', 1200, '2025-08-01'),
    (102, 1, 'Bag', 800, '2025-08-05'),
    (103, 2, 'Laptop', 60000, '2025-08-03'),
    (104, 3, 'Shoes', 1000, '2025-08-04'),
    (105, 4, 'Mobile', 20000, '2025-08-06'),
    (106, 5, 'Shoes', 1500, '2025-08-02'),
    (107, 2, 'Bag', 700, '2025-08-07'),
    (108, 3, 'Headphones', 2000, '2025-08-08'),
    (109, 4, 'Shoes', 1100, '2025-08-09'),
    (110, 5, 'Laptop', 58000, '2025-08-10');
    """
)

<duckdb.duckdb.DuckDBPyConnection at 0x20bc61a1f30>

In [41]:
# Q5: Show me the customer's name and city for customers
con.execute("SELECT customer_name, city FROM Customers;").fetchall()

[('Rahul', 'Delhi'),
 ('Priya', 'Mumbai'),
 ('Amit', 'Delhi'),
 ('Sneha', 'Bangalore'),
 ('Arjun', 'Mumbai')]

In [42]:
# Q6: Show only unique products from the Orders table.
con.execute("SELECT DISTINCT product FROM Orders;").fetchall()

[('Shoes',), ('Mobile',), ('Headphones',), ('Laptop',), ('Bag',)]

In [43]:
# Q7: Filters rows in Orders where price is greater than 1000.
con.execute("SELECT * FROM Orders WHERE price > 1000;").fetchall()

[(101, 1, 'Shoes', 1200, datetime.date(2025, 8, 1)),
 (103, 2, 'Laptop', 60000, datetime.date(2025, 8, 3)),
 (105, 4, 'Mobile', 20000, datetime.date(2025, 8, 6)),
 (106, 5, 'Shoes', 1500, datetime.date(2025, 8, 2)),
 (108, 3, 'Headphones', 2000, datetime.date(2025, 8, 8)),
 (109, 4, 'Shoes', 1100, datetime.date(2025, 8, 9)),
 (110, 5, 'Laptop', 58000, datetime.date(2025, 8, 10))]

In [44]:
# Q8: Retrieves customers from Delhi who are older than 25.
con.execute("SELECT * FROM Customers WHERE city='Delhi' AND age > 25;").fetchall()

[(1, 'Rahul', 'Delhi', 28)]

In [45]:
# Q9: Fetches orders where the product is either 'Shoes' or 'Bag'.
con.execute("SELECT * FROM Orders WHERE product IN ('Shoes','Bag');").fetchall()

[(101, 1, 'Shoes', 1200, datetime.date(2025, 8, 1)),
 (102, 1, 'Bag', 800, datetime.date(2025, 8, 5)),
 (104, 3, 'Shoes', 1000, datetime.date(2025, 8, 4)),
 (106, 5, 'Shoes', 1500, datetime.date(2025, 8, 2)),
 (107, 2, 'Bag', 700, datetime.date(2025, 8, 7)),
 (109, 4, 'Shoes', 1100, datetime.date(2025, 8, 9))]

In [46]:
# Q10: Retrieves orders with a price range between 1000 and 2000.
con.execute("SELECT * FROM Orders WHERE price BETWEEN 1000 AND 2000;").fetchall()

[(101, 1, 'Shoes', 1200, datetime.date(2025, 8, 1)),
 (104, 3, 'Shoes', 1000, datetime.date(2025, 8, 4)),
 (106, 5, 'Shoes', 1500, datetime.date(2025, 8, 2)),
 (108, 3, 'Headphones', 2000, datetime.date(2025, 8, 8)),
 (109, 4, 'Shoes', 1100, datetime.date(2025, 8, 9))]

In [47]:
# Q11: Sorts orders by price in descending order.
con.execute("SELECT * FROM Orders ORDER BY price DESC;").fetchall()

[(103, 2, 'Laptop', 60000, datetime.date(2025, 8, 3)),
 (110, 5, 'Laptop', 58000, datetime.date(2025, 8, 10)),
 (105, 4, 'Mobile', 20000, datetime.date(2025, 8, 6)),
 (108, 3, 'Headphones', 2000, datetime.date(2025, 8, 8)),
 (106, 5, 'Shoes', 1500, datetime.date(2025, 8, 2)),
 (101, 1, 'Shoes', 1200, datetime.date(2025, 8, 1)),
 (109, 4, 'Shoes', 1100, datetime.date(2025, 8, 9)),
 (104, 3, 'Shoes', 1000, datetime.date(2025, 8, 4)),
 (102, 1, 'Bag', 800, datetime.date(2025, 8, 5)),
 (107, 2, 'Bag', 700, datetime.date(2025, 8, 7))]

In [48]:
# Q12: Groups orders by customer and calculates total spending.
con.execute("SELECT customer_id, SUM(price) FROM Orders GROUP BY customer_id;").fetchall()

[(1, 2000), (2, 60700), (3, 3000), (4, 21100), (5, 59500)]

In [49]:
# Q13: Counts how many customers are from Mumbai.
con.execute("SELECT COUNT(*) FROM Customers WHERE city='Mumbai';").fetchall()

[(2,)]

In [50]:
# Q14: Filters grouped results to show only customers with spending above 5000.
con.execute("SELECT customer_id, SUM(price) FROM Orders GROUP BY customer_id HAVING SUM(price) > 5000;").fetchall()

[(2, 60700), (4, 21100), (5, 59500)]

In [55]:
# Q15: Inserts a new customer record into the Customers table.
con.execute("INSERT INTO Customers VALUES (6,'Karan','Chennai',27);").fetchall()
con.execute("SELECT * FROM Customers where customer_id=6;").fetchall()

[(6, 'Karan', 'Chennai', 27)]

In [57]:
# Q16: Updates the city of customer with ID=1 to Pune.
con.execute("UPDATE Customers SET city='Pune' WHERE customer_id=1;").fetchall()
con.execute("SELECT * FROM Customers where customer_id=1;").fetchall()

[(1, 'Rahul', 'Pune', 28)]

In [61]:
# Q17: Removes the customer Order with ID=3.
con.execute("DELETE FROM Orders WHERE customer_id=3;").fetchall()
con.execute("SELECT * FROM Orders where customer_id=3;").fetchall()

[]