<a href="https://colab.research.google.com/github/JulTob/SQL/blob/master/SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL
SQL in Colab.






In [1]:
%pip install ipython-sql
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False



In [2]:
%pip install -U SQLAlchemy==1.4.49 ipython-sql==0.4.1




In [3]:
%load_ext sql
%sql sqlite://  -- in-memory database; gets wiped when the kernel resets



The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [4]:
# To save to a file you must use
# %sql sqlite:///my_database.db

**You** can use the `sqlite3` library to work with SQLite databases directly in python in Colab. SQLite is a filebased database, so you don't need a separate database server.

In [5]:
import sqlite3
import pandas as pd

# What are Databases and SQL?

Imagine you have a massive collection of information, like all the books in a library, or all the customers of a store. How do you keep it organized so you can quickly find what you need, add new information, or update existing details? That's where **databases** come in!

A database is essentially an organized collection of data. Think of it like a digital filing system that stores information in a structured way, usually in tables with rows and columns (like spreadsheets). This structure makes it easy to manage and work with large amounts of data efficiently.

Now, how do you interact with this organized data? You use a language called **SQL** (Structured Query Language). SQL is the standard language for managing and manipulating relational databases. With SQL, you can:

*   **Create** new databases and tables.
*   **Insert** new data into tables.
*   **Query** (ask questions of) the data to retrieve specific information.
*   **Update** existing data.
*   **Delete** data.

In short, if a database is the organized filing system, SQL is the language you use to talk to the filing system and get things done!

Before creating it, I like to guarantee a clean slate, so we don’t get the “table already exists” error.

In [21]:
%%sql
DROP TABLE IF EXISTS users;
-- Restarts the table

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT,
    age INTEGER);

INTEGER PRIMARY KEY gives each row a unique ID. We don’t have to assign it manually!

In [22]:
%%sql
INSERT INTO users (name, age)
    VALUES ('Alice', 30);
INSERT INTO users (name, age)
    VALUES ('Bob', 25);
INSERT INTO users (name, age)
    VALUES ('Charlie', 17);
INSERT INTO users (name, age) VALUES ('Diana', 27);
INSERT INTO users (name, age) VALUES ('Eren', 29);
INSERT INTO users (name, age) VALUES ('Frieren', 99);
INSERT INTO users (name, age) VALUES ('Gandalf', 2000);
INSERT INTO users (name, age) VALUES ('Harry', 17);
INSERT INTO users (name, age) VALUES ('Innigo', 17);
INSERT INTO users (name, age) VALUES ('Jotaro', 17);
INSERT INTO users (name, age) VALUES ('Katniss', 16);
INSERT INTO users (name, age) VALUES ('Logan', 17);
INSERT INTO users (name, age) VALUES ('Mario', NULL);
INSERT INTO users (name, age) VALUES ('Naruto', 16);
INSERT INTO users (name, age) VALUES ('Ororo', 23);
INSERT INTO users (name, age) VALUES ('Peter', 16);
INSERT INTO users (name, age) VALUES ('Roronoa', 19);
INSERT INTO users (name, age) VALUES ('Spike', 28);
INSERT INTO users (name, age) VALUES ('Tintin', 25);
INSERT INTO users (name, age) VALUES ('Usopp', 17);
INSERT INTO users (name, age) VALUES ('Velma', 23);
INSERT INTO users (name, age) VALUES ('Willow', 35);
INSERT INTO users (name, age) VALUES ('Xena', 31);
INSERT INTO users (name, age) VALUES ('Yoshi', 6);
INSERT INTO users (name, age) VALUES ('Zuko', 13);

In [23]:
%%sql
SELECT * FROM users LIMIT 30;


Unnamed: 0,id,name,age
0,1,Alice,30.0
1,2,Bob,25.0
2,3,Charlie,17.0
3,4,Diana,27.0
4,5,Eren,29.0
5,6,Frieren,99.0
6,7,Gandalf,2000.0
7,8,Harry,17.0
8,9,Innigo,17.0
9,10,Jotaro,17.0


The %sql magic returns a pretty table by default, but we can also pull the whole table into pandas to play with it in Python:

In [24]:
result = %sql SELECT * FROM users;
df = result
df.head()

Unnamed: 0,id,name,age
0,1,Alice,30.0
1,2,Bob,25.0
2,3,Charlie,17.0
3,4,Diana,27.0
4,5,Eren,29.0


# Asking slightly smarter questions

### Filter

Show me users older than 20, sorted from oldest to youngest:

In [25]:
%%sql
SELECT name, age
FROM users
WHERE age > 20
ORDER BY age DESC;


Unnamed: 0,name,age
0,Gandalf,2000
1,Frieren,99
2,Willow,35
3,Xena,31
4,Alice,30
5,Eren,29
6,Spike,28
7,Diana,27
8,Bob,25
9,Tintin,25


Show me everyone, but sorted alphabetically in reverse!

In [27]:
%%sql
SELECT *
    FROM users
    ORDER BY name DESC;

Unnamed: 0,id,name,age
0,25,Zuko,13.0
1,24,Yoshi,6.0
2,23,Xena,31.0
3,22,Willow,35.0
4,21,Velma,23.0
5,20,Usopp,17.0
6,19,Tintin,25.0
7,18,Spike,28.0
8,17,Roronoa,19.0
9,16,Peter,16.0


We’ve been using %sql magic, which is convenient. If you want “pure Python” control, we connect to the same file with sqlite3:


In [28]:
import sqlite3
import pandas as pd

connection = sqlite3.connect('my_database.db')
cursor     = connection.cursor()


We can run queries:

In [29]:
cursor.execute("""
    SELECT name, age
    FROM users
    WHERE age IS NOT NULL
    ORDER BY age;
""")

rows = cursor.fetchall()
for row in rows:
    print(row)

('Yoshi', 6)
('Zuko', 13)
('Katniss', 16)
('Naruto', 16)
('Peter', 16)
('Charlie', 17)
('Harry', 17)
('Innigo', 17)
('Jotaro', 17)
('Logan', 17)
('Usopp', 17)
('Roronoa', 19)
('Ororo', 23)
('Velma', 23)
('Bob', 25)
('Tintin', 25)
('Diana', 27)
('Spike', 28)
('Eren', 29)
('Alice', 30)
('Xena', 31)
('Willow', 35)
('Charlie', 35)
('Charlie', 35)
('Charlie', 35)
('David', 40)
('David', 40)
('David', 40)
('Frieren', 99)
('Gandalf', 2000)


And use pandas directly:

In [30]:
users_df = pd.read_sql_query("SELECT * FROM users;", connection)
users_df

Unnamed: 0,id,name,age
0,1,Alice,30.0
1,2,Bob,25.0
2,3,Charlie,17.0
3,4,Diana,27.0
4,5,Eren,29.0
5,6,Frieren,99.0
6,7,Gandalf,2000.0
7,8,Harry,17.0
8,9,Innigo,17.0
9,10,Jotaro,17.0


When we’re done:

In [31]:
connection.close()


Now the same code will raise an error.

In [32]:
users_df = pd.read_sql_query("SELECT * FROM users;", connection)
users_df

ProgrammingError: Cannot operate on a closed database.