# General Info

Dependencies:

- `conda install jupysql duckdb-engine`

Usage:

- use a __python__ notebook but load the __sql extension__
- use the __notebook prereqs__ below and then use the `%%sql` cell magic in all sql cells

# Notebook Prereqs

In [1]:
%load_ext sql

# This cell needs to run to enable SQL in all the other cells.

In [2]:
%sql duckdb://
    
# This cell needs to run to allow for loading files as SQL tables.
# As far as I can tell, subfolders are not allowed either above or
# in queries.

# Querying CSV File

In [12]:
%%sql

SELECT *
FROM penguins.csv
WHERE bill_length_mm > 40
LIMIT 10

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
Adelie,Torgersen,40.3,18.0,195,3250,FEMALE
Adelie,Torgersen,42.0,20.2,190,4250,
Adelie,Torgersen,41.1,17.6,182,3200,FEMALE
Adelie,Torgersen,42.5,20.7,197,4500,MALE
Adelie,Torgersen,46.0,21.5,194,4200,MALE
Adelie,Biscoe,40.6,18.6,183,3550,MALE
Adelie,Biscoe,40.5,17.9,187,3200,FEMALE
Adelie,Biscoe,40.5,18.9,180,3950,MALE
Adelie,Dream,40.9,18.9,184,3900,MALE
Adelie,Dream,42.2,18.5,180,3550,FEMALE


# Creating In-Memory Table

NOTE: multiple creation, insertion, update, delete, etc. operations can be in a cell, but queries won't see the changes until the next cell for some reason.

In [3]:
%%sql

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT,
    salary DECIMAL(10, 2)
);

Count


In [4]:
%%sql

INSERT INTO employees VALUES (1, 'John Doe', 30, 50000.00),
                             (2, 'Jane Smith', 25, 60000.00),
                             (3, 'Mike Johnson', 35, 70000.00);

Count


# Querying In-Memory Table

- note that a table created by a previous cell __still exists__
    - if you need to remake that table, you need to __restart the kernel__
- for some reason, you have to query in a __separate cell__ or it looks empty
    - I don't think it's supposed to work that way, but it does

In [5]:
%%sql

SELECT * from employees;

id,name,age,salary
1,John Doe,30,50000.0
2,Jane Smith,25,60000.0
3,Mike Johnson,35,70000.0


# Comments

JupySQL does __not handle comments propertly__. The only one that doesn't break the whole cell is a `--` comment at the __end of a query line__.

In [6]:
%%sql

-- This should be a legit comment but it ruins everything

SELECT * FROM employees;

id,name,age,salary


In [7]:
%%sql

/* This should be legit too */

SELECT * FROM employees;

id,name,age,salary


In [8]:
%%sql

SELECT * FROM employees;  -- This one works

id,name,age,salary
1,John Doe,30,50000.0
2,Jane Smith,25,60000.0
3,Mike Johnson,35,70000.0


In [10]:
%%sql

SELECT * FROM employees; /* This one does not work. */

# Multiple Queries

While all queries in the cell get executed, you only see the results from the __last one__.

In [19]:
%%sql

SELECT name FROM employees; -- won't see

SELECT * FROM employees; -- will see

id,name,age,salary
1,John Doe,30,50000.0
2,Jane Smith,25,60000.0
3,Mike Johnson,35,70000.0


# Single-Line Magic

It is possible to run python and sql in the same cell or have a cell that is a true one-liner, but if anything is printed besides the result of a SELECT, you won't see the SELECT results.

NOTE: single line magic won't work if you break up a statement on multiple lines.

In [20]:
print('hi')
%sql SELECT * from employees
print('bye')

hi


bye


In [21]:
%sql SELECT * from employees

id,name,age,salary
1,John Doe,30,50000.0
2,Jane Smith,25,60000.0
3,Mike Johnson,35,70000.0


In [24]:
%%sql 
SELECT * FROM
employees;

id,name,age,salary
1,John Doe,30,50000.0
2,Jane Smith,25,60000.0
3,Mike Johnson,35,70000.0


# Create or Replace Table

This is a way around having to restart the kernel when you change a cell that builds a table.

You can __rerun this cell__ as many times as you want.

In [26]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT,
    salary DECIMAL(10, 2)
);

Count


# Summary of Steps Needed to Make an AI-Generated Notebook Work

1. Add the preamble in a code cell at the top:
```
%load_ext sql
%sql duckdb://
```

1. Remove any comments that aren't at the end of lines.
1. Add `OR REPLACE` to `CREATE TABLE` statements so cells can be rerun.
1. Split cells (with __ctrl-shift-minus__) so that select statements are isolated from each other and everything else.
1. Add `%%sql` (for whole cell) or `%sql` (for single line) to make code cells into sql code

# ToDo (and consider adding to Outline Template)

1. Remove all extra "explanation:" and "snippet above" from markdown cells
1. Check GitHub view and improve formatting of queries as needed.