# Before we go...

If you have already ran the code, you may need to delete the files "test.db" and "biggerDB.db" to avoid errors when running it again. The cell below test if this is the case and delete these files when they are find.


In [1]:
import os

if(os.path.isfile("test.db")):
    os.remove("test.db")
    print("Removed test.db!")
    
if(os.path.isfile("biggerDB.db")):
    os.remove("biggerDB.db")
    print("Removed biggerDB.db")

# Example 1 - Creating a new Database

In this first notebook we are goind to work using the SQLite module for Python3. This is a very beginner-friendly tool to learn the first steps with SQL.

In [2]:
import sqlite3

conn = sqlite3.connect('test.db')
print ("Opened database successfully")

try:
    conn.execute('''
            CREATE TABLE COMPANY
             (ID         INT PRIMARY KEY   NOT NULL,
             NAME        TEXT              NOT NULL,
             AGE         INT               NOT NULL,
             ADDRESS     CHAR(50),
             SALARY      REAL);
        ''')
    print("Table created successfully!")
except:
    print("The table could not be created. Please check if it already exists!")
conn.close()

Opened database successfully
Table created successfully!


Adding single lines of data to our brand-new table. All the commands just need to be written using SQL statements and executed via the execute() function.

In [3]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (1, 'George', 30, 'New York', 32000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (2, 'Paul', 28, 'Delaware', 25000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (3, 'David', 19, 'Kansas', 18000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (4, 'Steven', 31, 'Florida', 50000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (5, 'John', 27, 'Colorado', 30000.00 )")

conn.commit()
print ("Records created successfully")
conn.close()

Opened database successfully
Records created successfully


The new data do not need to be added by single execute() calls, but a longer SQL statement can be used.

In [4]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully")

conn.execute('''
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY)
    VALUES (6, 'Mary', 28, 'Maryland', 35000.00 ),
           (7, 'Jefferson', 22, 'Minnesota', 31000.00 ),
           (8, 'John', 27, 'Colorado', 30000.00 ),
           (9, 'Kim', 29, 'Michigan', 43000.00 ),
           (10, 'Anne', 25, 'New York', 25000.00 )
''')

conn.commit()
print ("Records created successfully")

Opened database successfully
Records created successfully


Let's see how our table is becoming:

In [5]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully")

cursor = conn.execute("SELECT * FROM COMPANY")
for row in cursor:
    print(row)

conn.close()

Opened database successfully
(1, 'George', 30, 'New York', 32000.0)
(2, 'Paul', 28, 'Delaware', 25000.0)
(3, 'David', 19, 'Kansas', 18000.0)
(4, 'Steven', 31, 'Florida', 50000.0)
(5, 'John', 27, 'Colorado', 30000.0)
(6, 'Mary', 28, 'Maryland', 35000.0)
(7, 'Jefferson', 22, 'Minnesota', 31000.0)
(8, 'John', 27, 'Colorado', 30000.0)
(9, 'Kim', 29, 'Michigan', 43000.0)
(10, 'Anne', 25, 'New York', 25000.0)


We can also use the UPDATE statement to change values in a table of a database.

In [6]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

conn.execute("UPDATE COMPANY set SALARY = 35000.00 where ID = 1")
conn.commit()
print ("Total number of rows updated :", conn.total_changes)

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print (row)

conn.close()

Opened database successfully
Total number of rows updated : 1
(1, 'George', 'New York', 35000.0)
(2, 'Paul', 'Delaware', 25000.0)
(3, 'David', 'Kansas', 18000.0)
(4, 'Steven', 'Florida', 50000.0)
(5, 'John', 'Colorado', 30000.0)
(6, 'Mary', 'Maryland', 35000.0)
(7, 'Jefferson', 'Minnesota', 31000.0)
(8, 'John', 'Colorado', 30000.0)
(9, 'Kim', 'Michigan', 43000.0)
(10, 'Anne', 'New York', 25000.0)


Finally, the DELETE statement being used to remove a row from the table.

In [7]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully")

conn.execute("DELETE from COMPANY where ID = 2;")
conn.commit()
print ("Total number of rows deleted :", conn.total_changes)

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print (row)

conn.close()

Opened database successfully
Total number of rows deleted : 1
(1, 'George', 'New York', 35000.0)
(3, 'David', 'Kansas', 18000.0)
(4, 'Steven', 'Florida', 50000.0)
(5, 'John', 'Colorado', 30000.0)
(6, 'Mary', 'Maryland', 35000.0)
(7, 'Jefferson', 'Minnesota', 31000.0)
(8, 'John', 'Colorado', 30000.0)
(9, 'Kim', 'Michigan', 43000.0)
(10, 'Anne', 'New York', 25000.0)


Iterating through the SQL cursor allow us to access the data like it was a Python tuple:

In [8]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print ("ID = ", row[0])
    print ("NAME = ", row[1])
    print ("ADDRESS = ", row[2])
    print ("SALARY = ", row[3], "\n")

conn.close()

Opened database successfully
ID =  1
NAME =  George
ADDRESS =  New York
SALARY =  35000.0 

ID =  3
NAME =  David
ADDRESS =  Kansas
SALARY =  18000.0 

ID =  4
NAME =  Steven
ADDRESS =  Florida
SALARY =  50000.0 

ID =  5
NAME =  John
ADDRESS =  Colorado
SALARY =  30000.0 

ID =  6
NAME =  Mary
ADDRESS =  Maryland
SALARY =  35000.0 

ID =  7
NAME =  Jefferson
ADDRESS =  Minnesota
SALARY =  31000.0 

ID =  8
NAME =  John
ADDRESS =  Colorado
SALARY =  30000.0 

ID =  9
NAME =  Kim
ADDRESS =  Michigan
SALARY =  43000.0 

ID =  10
NAME =  Anne
ADDRESS =  New York
SALARY =  25000.0 



# Example 2 - Creating and viewing a Database using Faker and Pandas

We will now create a bigger database using the Faker module and see how pandas module can receive a SQL cursor and generate a DataFrame automatically from it.

Users may need to run the respective pip install commands:

pip install pandas

pip install Faker


## Generating Fake Data

We will use Faker to generate fake data to fill our table. This process may take some time if the number of rows to be generated was set to great values. 

To make it faster and at the same time get a bigger volume of data, we will generate 50.000 data rows to our database.

In [9]:
from faker import Faker
import random

dataParameters = {
    "it_IT": 10_000, # Italy
    "en_US":  8_000, # USA
    "pt_BR": 15_000, # Brazil
    "es_AR": 10_000, # Argentina
    "fr_FR":  7_000  # France
}
TOTAL_DATA_SIZE = sum(dataParameters.values())
MIN_AGE = 18
MAX_AGE = 70

data = []
id_base_val = 0

for key in dataParameters.keys():
    fake = Faker(key)
    data.extend(
        [[i, fake.name(), random.randint(MIN_AGE, MAX_AGE), fake.address().replace("\n", " - ")]
         for i in range(id_base_val, dataParameters[key]+id_base_val)]
    )
    id_base_val += dataParameters[key]

random.shuffle(data)

for i in range(TOTAL_DATA_SIZE):
    data[i][0] = i


## Viewing the data in a pandas DataFrame

Before adding it to the SQL database, let's see how our data looks in a DataFrame

In [10]:
import pandas as pd

df = pd.DataFrame(data, columns=["ID", "Name", "Age", "Address"])
df.set_index("ID", inplace=True)

df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Alain Pineau,47,"25, rue Moulin - 76556 Boutin"
1,Dott. Gian Canevascini,55,"Stretto Turci, 383 Appartamento 49 - 37123, Ve..."
2,Mateo Ezequiel Lautaro Rodriguez Ruiz,27,Diagonal 6 N° 719 - San Salvador de Jujuy 4600...
3,Maureen Price,61,"518 Vincent Ports Apt. 717 - Lewisfurt, NJ 61138"
4,Grégoire Joly-Rossi,32,"74, avenue de Didier - 51047 ParisVille"
5,Dino Cainero,47,"Stretto Mastandrea, 80 - 20034, San Giorgio Su..."
6,Sr(a). Micaela Torres,61,Calle Santiago del Estero N° 7442 Local 62 - L...
7,Giulia Anguillara,40,"Incrocio Ginese, 257 Appartamento 1 - 55039, G..."
8,Dott. Iolanda Ramazzotti,25,"Via Irma, 82 Appartamento 33 - 88042, Falerna ..."
9,Sig.ra Marta Donatoni,43,"Strada Vespa, 3 Appartamento 43 - 16146, Genov..."


## Creating the new database

We're now going to create a new database and table to include our fake data

In [11]:
import sqlite3

conn = sqlite3.connect('biggerDB.db')
try:
    conn.execute('''
            CREATE TABLE USERS
             (ID         INT PRIMARY KEY   NOT NULL,
             NAME        TEXT              NOT NULL,
             AGE         INT               NOT NULL,
             ADDRESS     CHAR(50));
        ''')
    print("Table created successfully!")
except:
    print("Table not created!")

try:
    valuesString = ["({}, \"{}\", {}, \"{}\")".format(row[0], row[1], row[2], row[3]) for row in data]
    valuesString = ",".join(valuesString)

    insertString = "INSERT INTO USERS (ID,NAME,AGE, ADDRESS)\n VALUES " + valuesString

    conn.execute(insertString)
    conn.commit()
    print ("Records created successfully")
except:
    print("Data not included in the table!")

conn.close()

Table created successfully!
Records created successfully


Pandas DataFrame can receive a cursor from SQL as argument. One may notice this is exactly the same DataFrame we build using the original data in lists.

In [12]:
conn = sqlite3.connect('biggerDB.db')
cursor = conn.execute("SELECT * FROM USERS")

new_df = pd.DataFrame(cursor, columns=["ID", "Name", "Age", "Address"])
new_df.set_index("ID", inplace=True)

new_df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Alain Pineau,47,"25, rue Moulin - 76556 Boutin"
1,Dott. Gian Canevascini,55,"Stretto Turci, 383 Appartamento 49 - 37123, Ve..."
2,Mateo Ezequiel Lautaro Rodriguez Ruiz,27,Diagonal 6 N° 719 - San Salvador de Jujuy 4600...
3,Maureen Price,61,"518 Vincent Ports Apt. 717 - Lewisfurt, NJ 61138"
4,Grégoire Joly-Rossi,32,"74, avenue de Didier - 51047 ParisVille"
5,Dino Cainero,47,"Stretto Mastandrea, 80 - 20034, San Giorgio Su..."
6,Sr(a). Micaela Torres,61,Calle Santiago del Estero N° 7442 Local 62 - L...
7,Giulia Anguillara,40,"Incrocio Ginese, 257 Appartamento 1 - 55039, G..."
8,Dott. Iolanda Ramazzotti,25,"Via Irma, 82 Appartamento 33 - 88042, Falerna ..."
9,Sig.ra Marta Donatoni,43,"Strada Vespa, 3 Appartamento 43 - 16146, Genov..."


In [13]:
df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Alain Pineau,47,"25, rue Moulin - 76556 Boutin"
1,Dott. Gian Canevascini,55,"Stretto Turci, 383 Appartamento 49 - 37123, Ve..."
2,Mateo Ezequiel Lautaro Rodriguez Ruiz,27,Diagonal 6 N° 719 - San Salvador de Jujuy 4600...
3,Maureen Price,61,"518 Vincent Ports Apt. 717 - Lewisfurt, NJ 61138"
4,Grégoire Joly-Rossi,32,"74, avenue de Didier - 51047 ParisVille"
5,Dino Cainero,47,"Stretto Mastandrea, 80 - 20034, San Giorgio Su..."
6,Sr(a). Micaela Torres,61,Calle Santiago del Estero N° 7442 Local 62 - L...
7,Giulia Anguillara,40,"Incrocio Ginese, 257 Appartamento 1 - 55039, G..."
8,Dott. Iolanda Ramazzotti,25,"Via Irma, 82 Appartamento 33 - 88042, Falerna ..."
9,Sig.ra Marta Donatoni,43,"Strada Vespa, 3 Appartamento 43 - 16146, Genov..."
