# 6, SQL: .execute() vs .executemany() _ LIMIT num _ JOIN _ Call out columns from tables
----
## OUTLINE:
- Load the data sql
- Create sql tables within the database, DROP TABLES IF EXISTS to enhance the smooth flow of our code execution
- Load data from a csv file into a RDBMS using sqlite3: .execute() vs .executemany(string, list_of_rows)
- Save the tables
- Show specific number of rows: LIMIT num
- Create a new table from two different tables: types of JOIN and Call out columns from a table

- In this notebook, we will work with the netflixmovies dataset from Kaggle.
- Click [here](https://www.kaggle.com/datasets/kefahaied/netflixmovies?select=df_avgRating_with_usersCount.csv) to download and gain more details about the dataset

#### 1. Load the sql extension:

In [1]:
%load_ext sql
#load the sql extension
%sql sqlite:///movies.db
#create and connect to a new database named movies.db

#### 2. Create sql tables within the database, DROP TABLES IF EXISTS to enhance the smooth flow of our code execution:

In [2]:
%sql DROP TABLE IF EXISTS movies

 * sqlite:///movies.db
Done.


[]

In [3]:
%%sql
CREATE TABLE IF NOT EXISTS movies (
    MOVIE_ID INT,
    YEAR FLOAT, 
    NAME VARCHAR)

 * sqlite:///movies.db
Done.


[]

In [4]:
%sql DROP TABLE IF EXISTS ratings

 * sqlite:///movies.db
Done.


[]

In [5]:
%%sql
CREATE TABLE IF NOT EXISTS ratings (
    MOVIE_ID INT,
    RATING FLOAT,
    USER_ID INT
);


 * sqlite:///movies.db
Done.


[]

#### 3. Load data from a csv file into a RDBMS using sqlite3:

In [6]:
import sqlite3

conn = sqlite3.connect('movies.db')
cursor = conn.cursor()

##### a. Create movies table:

In [7]:
#open the file and extract the data:
with open('movie_titles.csv', 'r') as file:
    tab = file.readlines()
print(type(tab))
print(tab[0:1000]) #tab is a list of strings

<class 'list'>
[',movie_id,year,name\n', '0,1,2003.0,Dinosaur Planet\n', '1,2,2004.0,Isle of Man TT 2004 Review\n', '2,3,1997.0,Character\n', "3,4,1994.0,Paula Abdul's Get Up & Dance\n", '4,5,2004.0,The Rise and Fall of ECW\n', '5,6,1997.0,Sick\n', '6,7,1992.0,8 Man\n', '7,8,2004.0,What the #$*! Do We Know!?\n', "8,9,1991.0,Class of Nuke 'Em High 2\n", '9,10,2001.0,Fighter\n', '10,11,1999.0,Full Frame: Documentary Shorts\n', '11,12,1947.0,My Favorite Brunette\n', '12,13,2003.0,Lord of the Rings: The Return of the King: Extended Edition: Bonus Material\n', '13,14,1982.0,Nature: Antarctica\n', '14,15,1988.0,Neil Diamond: Greatest Hits Live\n', '15,16,1996.0,Screamers\n', '16,17,2005.0,7 Seconds\n', '17,18,1994.0,Immortal Beloved\n', "18,19,2000.0,By Dawn's Early Light\n", '19,20,1972.0,Seeta Aur Geeta\n', '20,21,2002.0,Strange Relations\n', '21,22,2000.0,Chump Change\n', "22,23,2001.0,Clifford: Clifford Saves the Day! / Clifford's Fluffiest Friend Cleo\n", '23,24,1981.0,My Bloody Valenti

> The first element of every string is useless => remove it

In [8]:
#create sql command:
titles = '' #creatE an empty string

for row in tab:
    if row ==tab [0]: #the first string is the header
        header = row.strip('\n').split(',')[1:] #split the string into a list, remove the 1st value since it is useless
    else:
        arow = row.strip('\n').split(',')[1:]
        arow = tuple(arow) #replace [] with ()
        titles = titles + str(arow) + ',\n'
        
print(titles[0:50])    #Check the 1st few chars
print(titles[-20:]) #check the last few chars


('1', '2003.0', 'Dinosaur Planet'),
('2', '2004.0'
', 'Alien Hunter'),



In [9]:
#remove the last '' and add ';'
titles = titles.rstrip(',\n') + ';'
print(titles[0:50])    #Check the 1st few chars
print(titles[-20:]) #check the last few chars

('1', '2003.0', 'Dinosaur Planet'),
('2', '2004.0'
0', 'Alien Hunter');


In [10]:
#finish the sql command:
start = f'''
INSERT INTO movies (MOVIE_ID, YEAR, NAME)
VALUES {titles}
''' 

print(titles[0:50])    #Check the 1st few chars
print(titles[-10:]) #check the last few chars

('1', '2003.0', 'Dinosaur Planet'),
('2', '2004.0'
 Hunter');


In [11]:
#execute the sql command:
cursor.execute(start)

OperationalError: near "s": syntax error

> The operational Error keeps popping up. Since the titles string contains too many specific characters that SQL cannot properly handel.
Hence, try to add rows into the table using `.executemany()`

In [None]:
# ... (previous code)

# create sql command:
values = []

for row in tab[1:]:  # Skip the header row
    arow = row.strip('\n').split(',')[1:]
    values.append(arow)

# finish the sql command:
start = '''
INSERT INTO movies (MOVIE_ID, YEAR, NAME)
VALUES (?, ?, ?);
'''

# execute the sql command:
cursor.executemany(start, values)
conn.commit()

# ... (remaining code)


> Use `.executemany(statement, value)` instead of .execute() is a better choice when working with cursor object
```sql
value = [[],[],[],...]
stmt ='''
INSERT INTO tab (col1, ...)
VALUES (?,?,...)
'''
cursor.executemany(stmt, value)
```

##### b. Create ratings table:

In [12]:
#open the file and extract the data:
with open('df_avgRating_with_usersCount.csv', 'r') as file:
    tab = file.readlines()
print(type(tab))
print(tab[0:1000]) #tab is a list of strings

<class 'list'>
['movie_id,rating,user_id\n', '1,3.749542961608775,547\n', '2,3.5586206896551724,145\n', '3,3.6411530815109345,2012\n', '4,2.73943661971831,142\n', '5,3.9192982456140353,1140\n', '6,3.084396467124632,1019\n', '7,2.129032258064516,93\n', '8,3.1898054996646548,14910\n', '9,2.6210526315789475,95\n', '10,3.180722891566265,249\n', '11,3.0303030303030303,198\n', '12,3.4175824175824174,546\n', '13,4.552,125\n', '14,3.0254237288135593,118\n', '15,3.286206896551724,290\n', '16,3.0985550203779177,2699\n', '17,2.90320765334834,7108\n', '18,3.7843685879500093,10722\n', '19,3.324675324675325,539\n', '20,3.146551724137931,116\n', '21,3.463302752293578,218\n', '22,2.2463054187192117,203\n', '23,3.55609756097561,615\n', '24,2.9939984996249063,1333\n', '25,3.9701739850869924,1207\n', '26,2.7937212079849854,5861\n', '27,3.5274725274725274,273\n', '28,3.823254175890521,39752\n', '29,3.598470363288719,523\n', '30,3.7618420274800908,118413\n', '31,3.0542986425339365,221\n', '32,4.07173678532

In [13]:
#create sql command:
values = [] #creatE an empty list

for row in tab[1:]:#t`he first string is the header
    alist = row.strip('\n').split(',')
    values.append(alist)
        
print(values)

[['1', '3.749542961608775', '547'], ['2', '3.5586206896551724', '145'], ['3', '3.6411530815109345', '2012'], ['4', '2.73943661971831', '142'], ['5', '3.9192982456140353', '1140'], ['6', '3.084396467124632', '1019'], ['7', '2.129032258064516', '93'], ['8', '3.1898054996646548', '14910'], ['9', '2.6210526315789475', '95'], ['10', '3.180722891566265', '249'], ['11', '3.0303030303030303', '198'], ['12', '3.4175824175824174', '546'], ['13', '4.552', '125'], ['14', '3.0254237288135593', '118'], ['15', '3.286206896551724', '290'], ['16', '3.0985550203779177', '2699'], ['17', '2.90320765334834', '7108'], ['18', '3.7843685879500093', '10722'], ['19', '3.324675324675325', '539'], ['20', '3.146551724137931', '116'], ['21', '3.463302752293578', '218'], ['22', '2.2463054187192117', '203'], ['23', '3.55609756097561', '615'], ['24', '2.9939984996249063', '1333'], ['25', '3.9701739850869924', '1207'], ['26', '2.7937212079849854', '5861'], ['27', '3.5274725274725274', '273'], ['28', '3.823254175890521'

In [14]:
start = '''
INSERT INTO ratings (MOVIE_ID, RATING, USER_ID)
VALUES (?,?,?);
''' 
#execute:
cursor.executemany(start, values)


<sqlite3.Cursor at 0x1f2249bb540>

#### 5. Save the tables:

In [15]:
conn.commit()

#### 6. Show a specific number of rows:

In [16]:
%%sql 
SELECT *
FROM ratings
LIMIT 10

 * sqlite:///movies.db
Done.


MOVIE_ID,RATING,USER_ID
1,3.749542961608775,547
2,3.5586206896551724,145
3,3.6411530815109354,2012
4,2.73943661971831,142
5,3.9192982456140353,1140
6,3.084396467124632,1019
7,2.129032258064516,93
8,3.1898054996646548,14910
9,2.6210526315789475,95
10,3.180722891566265,249


In [17]:
%%sql 
SELECT *
FROM movies
LIMIT 10

 * sqlite:///movies.db
Done.


MOVIE_ID,YEAR,NAME


> The clause `LIMIT` limits a specific number of output to appear on the screen

#### 7. Create new tables from two different tables:

##### a. JOIN:

- Create A NEW TABLE from different tables within a database using the operation `JOIN`
- There are 4 types of JOIN in SQL:
    - `INNER JOIN`: returns rows that are MATHED for the BOTH tables
    - `OUTER JOIN`: returns all rows from the both tables no matter they are matched or not
    - `LEFT JOIN`: returns all rows from the left table. If there is no matched value from the right table, it will be fullfilled with NULLvalues
    - `RIGHT JOIN`: returns all rows from the right table. If there is no matched value from the left table, it will be fullfilled with NULLvalues


##### b. Call out columns from table:
- Using alias in SQL:
    - It is possible to use the alias in a statement and then define it later.
- Calling out column from a table:
    - `table_name.column_name`

In [18]:
%%sql
CREATE TABLE join_table AS
SELECT m.MOVIE_ID, m.NAME, m.YEAR, r.RATING
FROM movies as m
INNER JOIN ratings as r
ON r.MOVIE_ID = m.MOVIE_ID

 * sqlite:///movies.db
(sqlite3.OperationalError) table join_table already exists
[SQL: CREATE TABLE join_table AS
SELECT m.MOVIE_ID, m.NAME, m.YEAR, r.RATING
FROM movies as m
INNER JOIN ratings as r
ON r.MOVIE_ID = m.MOVIE_ID]
(Background on this error at: https://sqlalche.me/e/14/e3q8)


In [19]:
%%sql
SELECT*
FROM join_table

 * sqlite:///movies.db
Done.


MOVIE_ID,NAME,YEAR,RATING
1,Dinosaur Planet,2003.0,3.749542961608775
1,Dinosaur Planet,2003.0,3.749542961608775
2,Isle of Man TT 2004 Review,2004.0,3.5586206896551724
2,Isle of Man TT 2004 Review,2004.0,3.5586206896551724
3,Character,1997.0,3.6411530815109354
3,Character,1997.0,3.6411530815109354
4,Paula Abdul's Get Up & Dance,1994.0,2.73943661971831
4,Paula Abdul's Get Up & Dance,1994.0,2.73943661971831
5,The Rise and Fall of ECW,2004.0,3.9192982456140353
5,The Rise and Fall of ECW,2004.0,3.9192982456140353


In [20]:
conn.commit()
conn.close()