# Second normal form (2NF)


#### A table is in First Normal Form (1NF) if it contains only atomic (indivisible) values and each column contains values of a single type.
#### A table is in Second Normal Form (2NF) if it is in First Normal Form (1NF) and all non-key attributes are fully functionally dependent on the entire primary key.


In [1]:
import sqlite3
import pandas as pd

# Step 1: Set up the table


#### Store our query as a string. 

##### This will create the BookShop table which contains information about books. It also contains author information. 


In [2]:
query = """-- Drop the tables in case they exist

DROP TABLE IF EXISTS BookShop;

-- Create the table

CREATE TABLE BookShop (
	BOOK_ID VARCHAR(4) NOT NULL, 
	TITLE VARCHAR(100) NOT NULL, 
	AUTHOR_NAME VARCHAR(30) NOT NULL, 
	AUTHOR_BIO VARCHAR(250),
	AUTHOR_ID INTEGER NOT NULL, 
	PUBLICATION_DATE DATE NOT NULL, 
	PRICE_USD DECIMAL(6,2) CHECK(Price_USD>0) NOT NULL
	);

-- Insert sample data into the table

INSERT INTO BookShop VALUES
('B101', 'Introduction to Algorithms', 'Thomas H. Cormen', 'Thomas H. Cormen is the co-author of Introduction to Algorithms, along with Charles Leiserson, Ron Rivest, and Cliff Stein. He is a Full Professor of computer science at Dartmouth College and currently Chair of the Dartmouth College Writing Program.', 123 , '2001-09-01', 125),
('B201', 'Structure and Interpretation of Computer Programs', 'Harold Abelson', 'Harold Abelson, Ph.D., is Class of 1922 Professor of Computer Science and Engineering in the Department of Electrical Engineering and Computer Science at MIT and a fellow of the IEEE.', 456, '1996-07-25', 65.5),
('B301', 'Deep Learning', 'Ian Goodfellow', 'Ian J. Goodfellow is a researcher working in machine learning, currently employed at Apple Inc. as its director of machine learning in the Special Projects Group. He was previously employed as a research scientist at Google Brain.', 369, '2016-11-01', 82.7),
('B401', 'Algorithms Unlocked', 'Thomas H. Cormen', 'Thomas H. Cormen is the co-author of Introduction to Algorithms, along with Charles Leiserson, Ron Rivest, and Cliff Stein. He is a Full Professor of computer science at Dartmouth College and currently Chair of the Dartmouth College Writing Program.', 123, '2013-05-15', 36.5),
('B501', 'Machine Learning: A Probabilistic Perspective', 'Kevin P. Murphy', '', 157, '2012-08-24', 46);

-- Retrieve all records from the table

SELECT * FROM BookShop; """

#### Now let's open a connection to the sqlite3 database


In [3]:
# Establish a connection to SQLite database
sql_connection = sqlite3.connect('books.db')
# Execute the entire query
cursor = sql_connection.cursor()
cursor.executescript(query)
# Use pandas to read SQL query results into a DataFrame
df = pd.read_sql("SELECT * FROM BookShop", sql_connection)

In [4]:
#Print the BookShop table
df

Unnamed: 0,BOOK_ID,TITLE,AUTHOR_NAME,AUTHOR_BIO,AUTHOR_ID,PUBLICATION_DATE,PRICE_USD
0,B101,Introduction to Algorithms,Thomas H. Cormen,Thomas H. Cormen is the co-author of Introduct...,123,2001-09-01,125.0
1,B201,Structure and Interpretation of Computer Programs,Harold Abelson,"Harold Abelson, Ph.D., is Class of 1922 Profes...",456,1996-07-25,65.5
2,B301,Deep Learning,Ian Goodfellow,Ian J. Goodfellow is a researcher working in m...,369,2016-11-01,82.7
3,B401,Algorithms Unlocked,Thomas H. Cormen,Thomas H. Cormen is the co-author of Introduct...,123,2013-05-15,36.5
4,B501,Machine Learning: A Probabilistic Perspective,Kevin P. Murphy,,157,2012-08-24,46.0




#### Problem: This table does not comply with 2NF. It contains redundant information. If we look at the author information, we can see that a single author can have many books. So, for each book instance, we are looking at redundant author information across multiple rows.

#### If we want to change the author information for any reason, we must update this information in every row.

#### Solution: Create a separate table with author information. Other tables can refer to this author information. If we ever need to update author information, we only do that one time in a single location. We achieve 2NF compliance.

#### Solution
##### Split the BookTable into two different tables.
- Table1: book information
- Table2: author details 

In [5]:
'''Create the new BookShop_AuthorDetails table'''
cursor.execute("DROP TABLE IF EXISTS BookShop_AuthorDetails;") 

cursor.execute("""
CREATE TABLE IF NOT EXISTS BookShop_AuthorDetails (
    AUTHOR_ID INTEGER NOT NULL,
    AUTHOR_NAME VARCHAR(30) NOT NULL,
    AUTHOR_BIO VARCHAR(250),
    PRIMARY KEY (AUTHOR_ID)
);
""")

<sqlite3.Cursor at 0x11fb80c00>

In [6]:
'''Populate the BookShop_AuthorDetails table'''
cursor.execute("""
INSERT INTO BookShop_AuthorDetails (AUTHOR_ID, AUTHOR_NAME, AUTHOR_BIO)
SELECT DISTINCT AUTHOR_ID, AUTHOR_NAME, AUTHOR_BIO
FROM BookShop;
""")

<sqlite3.Cursor at 0x11fb80c00>

In [7]:
'''Modify the BookShop table to remove author details columns'''
cursor.execute("""
CREATE TEMPORARY TABLE BookShop_backup AS
SELECT BOOK_ID, TITLE, AUTHOR_ID, PUBLICATION_DATE, PRICE_USD
FROM BookShop;
""")

cursor.execute("DROP TABLE BookShop;")

cursor.execute("""
CREATE TABLE BookShop (
    BOOK_ID VARCHAR(4) NOT NULL,
    TITLE VARCHAR(100) NOT NULL,
    AUTHOR_ID INTEGER NOT NULL,
    PUBLICATION_DATE DATE NOT NULL,
    PRICE_USD DECIMAL(6,2) CHECK(PRICE_USD > 0) NOT NULL,
    FOREIGN KEY (AUTHOR_ID) REFERENCES BookShop_AuthorDetails(AUTHOR_ID)
);
""")

cursor.execute("""
INSERT INTO BookShop (BOOK_ID, TITLE, AUTHOR_ID, PUBLICATION_DATE, PRICE_USD)
SELECT BOOK_ID, TITLE, AUTHOR_ID, PUBLICATION_DATE, PRICE_USD
FROM BookShop_backup;
""")

cursor.execute("DROP TABLE BookShop_backup;")

<sqlite3.Cursor at 0x11fb80c00>

In [8]:
# Commit the changes
sql_connection.commit()

# Fetch data from BookShop table
df_bookshop = pd.read_sql("SELECT * FROM BookShop", sql_connection)

# Fetch data from BookShop_AuthorDetails table
df_author_details = pd.read_sql("SELECT * FROM BookShop_AuthorDetails", sql_connection)

In [9]:
df_bookshop

Unnamed: 0,BOOK_ID,TITLE,AUTHOR_ID,PUBLICATION_DATE,PRICE_USD
0,B101,Introduction to Algorithms,123,2001-09-01,125.0
1,B201,Structure and Interpretation of Computer Programs,456,1996-07-25,65.5
2,B301,Deep Learning,369,2016-11-01,82.7
3,B401,Algorithms Unlocked,123,2013-05-15,36.5
4,B501,Machine Learning: A Probabilistic Perspective,157,2012-08-24,46.0


In [10]:
df_author_details

Unnamed: 0,AUTHOR_ID,AUTHOR_NAME,AUTHOR_BIO
0,123,Thomas H. Cormen,Thomas H. Cormen is the co-author of Introduct...
1,157,Kevin P. Murphy,
2,369,Ian Goodfellow,Ian J. Goodfellow is a researcher working in m...
3,456,Harold Abelson,"Harold Abelson, Ph.D., is Class of 1922 Profes..."


In [11]:
'''Commit the changes and close the connection'''
sql_connection.close()