# Example 1 - Creating a new Database

In this first notebook we are goind to work using the SQLite module for Python3. This is a very beginner-friendly tool to learn the first steps with SQL.

In [28]:
import sqlite3

conn = sqlite3.connect('test.db')
print ("Opened database successfully");

try:
    conn.execute('''
            CREATE TABLE COMPANY
             (ID         INT PRIMARY KEY   NOT NULL,
             NAME        TEXT              NOT NULL,
             AGE         INT               NOT NULL,
             ADDRESS     CHAR(50),
             SALARY      REAL);
        ''')
    print("Table created successfully!")
except:
    print("The table could not be created. Please check if it already exists!")
conn.close()

Opened database successfully
Table created successfully!


Adding single lines of data to our brand-new table. All the commands just need to be written using SQL statements and executed via the execute() function.

In [29]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (1, 'George', 30, 'New York', 32000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (2, 'Paul', 28, 'Delaware', 25000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (3, 'David', 19, 'Kansas', 18000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (4, 'Steven', 31, 'Florida', 50000.00 )")

conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
    VALUES (5, 'John', 27, 'Colorado', 30000.00 )")

conn.commit()
print ("Records created successfully");
conn.close()

Opened database successfully
Records created successfully


The new data do not need to be added by single execute() calls, but a longer SQL statement can be used.

In [30]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

conn.execute('''
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY)
    VALUES (6, 'Mary', 28, 'Maryland', 35000.00 ),
           (7, 'Jefferson', 22, 'Minnesota', 31000.00 ),
           (8, 'John', 27, 'Colorado', 30000.00 ),
           (9, 'Kim', 29, 'Michigan', 43000.00 ),
           (10, 'Anne', 25, 'New York', 25000.00 )
''')

conn.commit()
print ("Records created successfully");

Opened database successfully
Records created successfully


Let's see how our table is becoming:

In [31]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

cursor = conn.execute("SELECT * FROM COMPANY")
for row in cursor:
    print(row)

conn.close()

Opened database successfully
(1, 'George', 30, 'New York', 32000.0)
(2, 'Paul', 28, 'Delaware', 25000.0)
(3, 'David', 19, 'Kansas', 18000.0)
(4, 'Steven', 31, 'Florida', 50000.0)
(5, 'John', 27, 'Colorado', 30000.0)
(6, 'Mary', 28, 'Maryland', 35000.0)
(7, 'Jefferson', 22, 'Minnesota', 31000.0)
(8, 'John', 27, 'Colorado', 30000.0)
(9, 'Kim', 29, 'Michigan', 43000.0)
(10, 'Anne', 25, 'New York', 25000.0)


We can also use the UPDATE statement to change values in a table of a database.

In [32]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

conn.execute("UPDATE COMPANY set SALARY = 35000.00 where ID = 1")
conn.commit()
print ("Total number of rows updated :", conn.total_changes)

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print (row)

conn.close()

Opened database successfully
Total number of rows updated : 1
(1, 'George', 'New York', 35000.0)
(2, 'Paul', 'Delaware', 25000.0)
(3, 'David', 'Kansas', 18000.0)
(4, 'Steven', 'Florida', 50000.0)
(5, 'John', 'Colorado', 30000.0)
(6, 'Mary', 'Maryland', 35000.0)
(7, 'Jefferson', 'Minnesota', 31000.0)
(8, 'John', 'Colorado', 30000.0)
(9, 'Kim', 'Michigan', 43000.0)
(10, 'Anne', 'New York', 25000.0)


Finally, the DELETE statement being used to remove a row from the table.

In [33]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

conn.execute("DELETE from COMPANY where ID = 2;")
conn.commit()
print ("Total number of rows deleted :", conn.total_changes)

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print (row)

conn.close()

Opened database successfully
Total number of rows deleted : 1
(1, 'George', 'New York', 35000.0)
(3, 'David', 'Kansas', 18000.0)
(4, 'Steven', 'Florida', 50000.0)
(5, 'John', 'Colorado', 30000.0)
(6, 'Mary', 'Maryland', 35000.0)
(7, 'Jefferson', 'Minnesota', 31000.0)
(8, 'John', 'Colorado', 30000.0)
(9, 'Kim', 'Michigan', 43000.0)
(10, 'Anne', 'New York', 25000.0)


Iterating through the SQL cursor allow us to access the data like it was a Python tuple:

In [34]:
conn = sqlite3.connect('test.db')
print ("Opened database successfully");

cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
for row in cursor:
    print ("ID = ", row[0])
    print ("NAME = ", row[1])
    print ("ADDRESS = ", row[2])
    print ("SALARY = ", row[3], "\n")

conn.close()

Opened database successfully
ID =  1
NAME =  George
ADDRESS =  New York
SALARY =  35000.0 

ID =  3
NAME =  David
ADDRESS =  Kansas
SALARY =  18000.0 

ID =  4
NAME =  Steven
ADDRESS =  Florida
SALARY =  50000.0 

ID =  5
NAME =  John
ADDRESS =  Colorado
SALARY =  30000.0 

ID =  6
NAME =  Mary
ADDRESS =  Maryland
SALARY =  35000.0 

ID =  7
NAME =  Jefferson
ADDRESS =  Minnesota
SALARY =  31000.0 

ID =  8
NAME =  John
ADDRESS =  Colorado
SALARY =  30000.0 

ID =  9
NAME =  Kim
ADDRESS =  Michigan
SALARY =  43000.0 

ID =  10
NAME =  Anne
ADDRESS =  New York
SALARY =  25000.0 



# Example 2 - Creating and viewing a Database using Faker and Pandas

We will now create a bigger database using the Faker module and see how pandas module can receive a SQL cursor and generate a DataFrame automatically from it.

Users may need to run the respective pip install commands:

pip install pandas

pip install Faker


## Generating Fake Data

We will use Faker to generate fake data to fill our table. This process may take some time if the number of rows to be generated was set to great values. 

To make it faster and at the same time get a bigger volume of data, we will generate 50.000 data rows to our database.

In [35]:
from faker import Faker
import random

QTY_OF_DATA_TO_GENERATE_IT = 10_000 # Italy
QTY_OF_DATA_TO_GENERATE_US =  8_000 # USA
QTY_OF_DATA_TO_GENERATE_BR = 15_000 # Brazil
QTY_OF_DATA_TO_GENERATE_AR = 10_000 # Argentina
QTY_OF_DATA_TO_GENERATE_FR =  7_000 # France
TOTAL_DATA_SIZE = (
    QTY_OF_DATA_TO_GENERATE_IT+
    QTY_OF_DATA_TO_GENERATE_IT+
    QTY_OF_DATA_TO_GENERATE_IT+
    QTY_OF_DATA_TO_GENERATE_IT+
    QTY_OF_DATA_TO_GENERATE_IT
)

id_base_val = 0

fake = Faker("it_IT")
data = [[i, fake.name(), random.randint(18, 70), fake.address().replace("\n", " - ")]
        for i in range(id_base_val, QTY_OF_DATA_TO_GENERATE_IT)
    ]
id_base_val += QTY_OF_DATA_TO_GENERATE_IT

fake = Faker("en_US")
data.extend(
    [[i, fake.name(), random.randint(18, 70), fake.address().replace("\n", " - ")]
     for i in range(id_base_val, QTY_OF_DATA_TO_GENERATE_US+id_base_val)]
)
id_base_val += QTY_OF_DATA_TO_GENERATE_US

fake = Faker("pt_BR")
data.extend(
    [[i, fake.name(), random.randint(18, 70), fake.address().replace("\n", " - ")]
     for i in range(id_base_val, QTY_OF_DATA_TO_GENERATE_BR+id_base_val)]
)
id_base_val += QTY_OF_DATA_TO_GENERATE_BR

fake = Faker("es_AR")
data.extend(
    [[i, fake.name(), random.randint(18, 70), fake.address().replace("\n", " - ")]
     for i in range(id_base_val, QTY_OF_DATA_TO_GENERATE_AR+id_base_val)]
)
id_base_val += QTY_OF_DATA_TO_GENERATE_AR

fake = Faker("fr_FR")
data.extend(
    [[i, fake.name(), random.randint(18, 70), fake.address().replace("\n", " - ")]
     for i in range(id_base_val, QTY_OF_DATA_TO_GENERATE_FR+id_base_val)]
)
id_base_val += QTY_OF_DATA_TO_GENERATE_FR

random.shuffle(data)

for i in range(TOTAL_DATA_SIZE):
    data[i][0] = i


## Viewing the data in a pandas DataFrame

Before adding it to the SQL database, let's see how our data looks in a DataFrame

In [36]:
import pandas as pd

df = pd.DataFrame(data, columns=["ID", "Name", "Age", "Address"])
df.set_index("ID", inplace=True)

df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Patricia Mcmillan,43,"05464 Julie Road - East Brandon, DE 90761"
1,Jeannine Devaux,64,boulevard de Hardy - 49315 Torresboeuf
2,Sr. Raul Nascimento,42,"Setor Elisa Lima, 83 - Solimoes - 77856356 Roc..."
3,Maria Eduarda da Mata,45,"Aeroporto Pinto, 535 - Prado - 03176671 Cavalc..."
4,Laís da Rocha,22,"Área Valentina Vieira, 4 - Vila Nova Paraíso -..."
5,Thomas Vieira,34,"Via da Conceição, 351 - Vila Oeste - 54237-999..."
6,Capucine Coulon,49,"11, rue de Rey - 93908 Sainte Paul-sur-Mer"
7,Inès Rolland du Valette,51,"1, boulevard Pons - 31349 Hebertnec"
8,Stephany Costa,41,"Núcleo Nascimento, 83 - Novo São Lucas - 38386..."
9,Martina Romero Ramirez,69,Calle 163 Bis N° 5638 Local 94 - Resistencia 3...


## Creating the new database

We're now going to create a new database and table to include our fake data

In [37]:
import sqlite3

conn = sqlite3.connect('biggerDB.db')
try:
    conn.execute('''
            CREATE TABLE USERS
             (ID         INT PRIMARY KEY   NOT NULL,
             NAME        TEXT              NOT NULL,
             AGE         INT               NOT NULL,
             ADDRESS     CHAR(50));
        ''')
    print("Table created successfully!")
except:
    print("Table not created!")

try:
    valuesString = ["({}, \"{}\", {}, \"{}\")".format(row[0], row[1], row[2], row[3]) for row in data]
    valuesString = ",".join(valuesString)

    insertString = "INSERT INTO USERS (ID,NAME,AGE, ADDRESS)\n VALUES " + valuesString

    conn.execute(insertString)
    conn.commit()
    print ("Records created successfully")
except:
    print("Data not included in the table!")

conn.close()

Table created successfully!
Records created successfully


Pandas DataFrame can receive a cursor from SQL as argument. One may notice this is exactly the same DataFrame we build using the original data in lists.

In [38]:
conn = sqlite3.connect('biggerDB.db')
cursor = conn.execute("SELECT * FROM USERS")

new_df = pd.DataFrame(cursor, columns=["ID", "Name", "Age", "Address"])
new_df.set_index("ID", inplace=True)

new_df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Patricia Mcmillan,43,"05464 Julie Road - East Brandon, DE 90761"
1,Jeannine Devaux,64,boulevard de Hardy - 49315 Torresboeuf
2,Sr. Raul Nascimento,42,"Setor Elisa Lima, 83 - Solimoes - 77856356 Roc..."
3,Maria Eduarda da Mata,45,"Aeroporto Pinto, 535 - Prado - 03176671 Cavalc..."
4,Laís da Rocha,22,"Área Valentina Vieira, 4 - Vila Nova Paraíso -..."
5,Thomas Vieira,34,"Via da Conceição, 351 - Vila Oeste - 54237-999..."
6,Capucine Coulon,49,"11, rue de Rey - 93908 Sainte Paul-sur-Mer"
7,Inès Rolland du Valette,51,"1, boulevard Pons - 31349 Hebertnec"
8,Stephany Costa,41,"Núcleo Nascimento, 83 - Novo São Lucas - 38386..."
9,Martina Romero Ramirez,69,Calle 163 Bis N° 5638 Local 94 - Resistencia 3...


In [39]:
df.head(10)

Unnamed: 0_level_0,Name,Age,Address
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Patricia Mcmillan,43,"05464 Julie Road - East Brandon, DE 90761"
1,Jeannine Devaux,64,boulevard de Hardy - 49315 Torresboeuf
2,Sr. Raul Nascimento,42,"Setor Elisa Lima, 83 - Solimoes - 77856356 Roc..."
3,Maria Eduarda da Mata,45,"Aeroporto Pinto, 535 - Prado - 03176671 Cavalc..."
4,Laís da Rocha,22,"Área Valentina Vieira, 4 - Vila Nova Paraíso -..."
5,Thomas Vieira,34,"Via da Conceição, 351 - Vila Oeste - 54237-999..."
6,Capucine Coulon,49,"11, rue de Rey - 93908 Sainte Paul-sur-Mer"
7,Inès Rolland du Valette,51,"1, boulevard Pons - 31349 Hebertnec"
8,Stephany Costa,41,"Núcleo Nascimento, 83 - Novo São Lucas - 38386..."
9,Martina Romero Ramirez,69,Calle 163 Bis N° 5638 Local 94 - Resistencia 3...
