---
<center><h1>Lesson 6 - SQL with Python. Relational databases</h1></center>
---
<center><h1>Part 3. Work with SQL databases using Python libraries</h1></center>

---

# Table of Contents

- [Connection to SQLite database using `sqlite3` library](#Connection-to-SQLite-database-using-sqlite3-library)
- [Connection to MySQL database using `MySQLdb` library](#Connection-to-MySQL-database-using-MySQLdb-library)
- [Interaction of pandas and SQL](#Interaction-of-pandas-and-SQL)
     - [*Exercise 3.1*](#Exercise-3.1)
     - [*Exercise 3.2*](#Exercise-3.2)
     - [*Exercise 3.3*](#Exercise-3.3)
     - [*Exercise 3.4*](#Exercise-3.4)
     - [*Exercise 3.5*](#Exercise-3.5)
---

# Connection to SQLite database using `sqlite3` library

[[back to top]](#Table-of-Contents)

The Python Standard Library includes a module called `sqlite3` intended for working with this database.

We use the function `sqlite3.connect` to connect to the database. We can use the argument “:memory:” to create a temporary DB in the RAM or pass the name of a file to open or create it. 

> **NOTE:** When we are done working with the DB we need to close the connection.

In [1]:
import sqlite3

# Get connection to SQLite database
try:
    # This code creates the database if it doesn't exist, else this opens the database
    con = sqlite3.connect('example.db')
    print "Database created and opened succesfully!"
except:
    print "Something went wrong."

Database created and opened succesfully!


In [2]:
# NOTE: If you have any unexpected errors in this lesson, try to run this cell. It will delete all databases.
# And after that run again all cells.
con.execute("DROP TABLE company;")

<sqlite3.Cursor at 0x7f1c1426e490>

In [3]:
# Create a Cursor object for interaction with database
cursor = con.cursor()

# Create a new table
cursor.execute('''CREATE TABLE company
        (id INT PRIMARY KEY       NOT NULL,
         name           TEXT      NOT NULL,
         age            INT       NOT NULL,
         address        CHAR(50),
         salary         REAL);''')

print "Table created successfully!"
# Insert rows of data
cursor.execute("INSERT INTO company (id, name, age, address, salary) \
        VALUES (1, 'John', 28, 'California', 25000.00)");

cursor.execute("INSERT INTO company (id, name, age, address, salary) \
      VALUES (2, 'Allen', 25, 'California', 150000.00)");

# Save (commit) the changes
con.commit()
print "Records created successfully"

# We should also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
con.close()
print "Connection was interrupted"

Table created successfully!
Records created successfully
Connection was interrupted


Let's open the just created database and insert a few new rows.

In [4]:
con = sqlite3.connect('example.db')
print "Database opened succesfully"

cursor = con.cursor()

cursor.execute("INSERT INTO company (id, name, age, address, salary) \
      VALUES (3, 'Richard', 33, 'Texas', 60000.00)");

cursor.execute("INSERT INTO company (id, name, age, address, salary) \
      VALUES (4, 'Mark', 35, 'New York', 75000.00)");

con.commit()
print "Records created successfully"
con.close()

Database opened succesfully
Records created successfully


Let's look at database content.

In [5]:
con = sqlite3.connect('example.db')

# suppose, we want to see all fields 
cursor = con.execute("SELECT * FROM company")
for row in cursor:
    print "ID = ", row[0]
    print "NAME = ", row[1]
    print "AGE = ", row[2]
    print "ADDRESS = ", row[3]
    print "SALARY = ", row[4], "\n"
    
con.close()

ID =  1
NAME =  John
AGE =  28
ADDRESS =  California
SALARY =  25000.0 

ID =  2
NAME =  Allen
AGE =  25
ADDRESS =  California
SALARY =  150000.0 

ID =  3
NAME =  Richard
AGE =  33
ADDRESS =  Texas
SALARY =  60000.0 

ID =  4
NAME =  Mark
AGE =  35
ADDRESS =  New York
SALARY =  75000.0 



Let's update one row data and delete another row.

In [6]:
con = sqlite3.connect('example.db')

# Let's change John's salary 
con.execute("UPDATE company SET salary = 37000.00 WHERE id=1;")
con.commit()
print "Total number of rows updated :", con.total_changes

# Let's remove row for Allen 
con.execute("DELETE FROM company WHERE id=2;")
con.commit()

cursor = con.execute("SELECT * FROM company")
print
for row in cursor:
    print "ID = ", row[0]
    print "NAME = ", row[1]
    print "AGE = ", row[2]
    print "ADDRESS = ", row[3]
    print "SALARY = ", row[4], "\n"

con.close()

Total number of rows updated : 1

ID =  1
NAME =  John
AGE =  28
ADDRESS =  California
SALARY =  37000.0 

ID =  3
NAME =  Richard
AGE =  33
ADDRESS =  Texas
SALARY =  60000.0 

ID =  4
NAME =  Mark
AGE =  35
ADDRESS =  New York
SALARY =  75000.0 



# Connection to MySQL database using `MySQLdb` library

[[back to top]](#Table-of-Contents)

Let's create a new MySQL database "example_db" with the help of Command Line or Terminal as was described in [MySQL](#MySQL) and [`CREATE DATABASE` and `SHOW DATABASES`](#CREATE-DATABASE-and-SHOW-DATABASES) sections.

All other operations with created database including tables’ creation and filling them with data we will do with the help of specific Python library [`MySQLdb`](https://pypi.python.org/pypi/MySQL-python/1.2.5). It is compatible with the Python DB API, which makes the code more portable. Using this model is the preferred way of working with the MySQL.

The simplest way to install it is using of pip:

    pip install MySQL-python

In [7]:
import MySQLdb

# Get connection to MySQL database
try:
    con = MySQLdb.connect("localhost", "root", "vagrant")
    print "Succesfully connected to MySQL server!"
except MySQLdb.Error as e:
    print "Something went wrong. \nError:", e

Succesfully connected to MySQL server!


In [8]:
# Create a Cursor object for interaction with database
cursor = con.cursor()

# Crate a new database
#cursor.execute("CREATE DATABASE example_db;")
print "Database was created successfully"

# Select a database for future work
cursor.execute("USE example_db;")
print "Database was selected successfully"

# Create a new table 
cursor.execute('''CREATE TABLE IF NOT EXISTS company 
                  (name VARCHAR(70) NOT NULL, 
                   age INT NOT NULL, 
                   address VARCHAR(50),
                   salary REAL);''')
print "Table created successfully!"

# Prepare SQL query to INSERT a record into the database.
sql = "INSERT INTO company (name, age, address, salary) \
        VALUES ('John', 28, 'California', 25000.00);"

try:
    # Execute the SQL command
    cursor.execute(sql)
    # Commit your changes in the database
    con.commit()
    print "Records created successfully"
except:
    # Rollback in case there is any error
    print 'Rollback'
    con.rollback()

# Disconnect from server
con.close()
print "Connection was interrupted"

Database was created successfully
Database was selected successfully
Table created successfully!
Records created successfully
Connection was interrupted




* `commit()` is the operation, which gives a green signal to database to finalize the changes, and after this operation, no change can be reverted back;

*  if you are not satisfied with one or more of the changes and you want to revert back those changes completely, then use `rollback()` method;

* to disconnect database connection, use `close()` method.

Let's open the just created database and insert a few new rows.

In [9]:
# Now we can set the need database during connection
con = MySQLdb.connect("localhost", "root", "vagrant", "example_db")
print "Succesfully connected to MySQL server!"

cursor = con.cursor()

sql_1 = "INSERT INTO company (name, age, address, salary) \
         VALUES ('Allen', 25, 'California', 150000.00);"

sql_2 = "INSERT INTO company (name, age, address, salary) \
         VALUES ('Richard', 33, 'Texas', 60000.00);"

sql_3 = "INSERT INTO company (name, age, address, salary) \
         VALUES ('Mark', 35, 'New York', 75000.00);"

try:
    cursor.execute(sql_1)
    cursor.execute(sql_2)
    cursor.execute(sql_3)
    con.commit()
    print "Records created successfully"
except:
    print 'Rollback'
    con.rollback()

Succesfully connected to MySQL server!
Records created successfully


Let's look at database content.

In [10]:
# Prepare SQL query to extract data from database for printing.
# Suppose, we want to see all worker with salary larger then 50 000
sql = "SELECT * FROM company WHERE salary > 50000"

try:
    cursor.execute(sql)
    results = cursor.fetchall()
    for row in results:
        name = row[0]
        age = row[1]
        address = row[2]
        salary = row[3]
        # Now print fetched result
        print "name: {}, age: {}, address: {}, salary: {}".format(name, age, address, salary)
except:
    print "Error: unable to fecth data"

name: John, age: 28, address: California, salary: 128994.5088
name: Allen, age: 25, address: California, salary: 773967.0528
name: John, age: 28, address: California, salary: 107495.424
name: John, age: 28, address: California, salary: 107495.424
name: Allen, age: 25, address: California, salary: 644972.544
name: John, age: 28, address: California, salary: 89579.52
name: Allen, age: 25, address: California, salary: 537477.12
name: John, age: 28, address: California, salary: 74649.6
name: Allen, age: 25, address: California, salary: 447897.6
name: John, age: 28, address: California, salary: 62208.0
name: Allen, age: 25, address: California, salary: 373248.0
name: John, age: 28, address: California, salary: 51840.0
name: Allen, age: 25, address: California, salary: 311040.0
name: Allen, age: 25, address: California, salary: 259200.0
name: Allen, age: 25, address: California, salary: 216000.0
name: Allen, age: 25, address: California, salary: 180000.0
name: Allen, age: 25, address: Califo

Let's update one row data and delete another row.

In [11]:
# Prepare SQL query to UPDATE rows required the following:
# increase salary those who are younger 30
sql_update = "UPDATE company SET salary = salary * 1.2 WHERE age < 30;"
  
# Prepare SQL query to DELETE rows required the following:
# worker does not lieve in California 
sql_delete = "DELETE FROM company WHERE address!='California';"

# SQL query for printing
sql = "SELECT * FROM company;"
try:
    cursor.execute(sql_update)
    cursor.execute(sql_delete)
    con.commit()
    print "Records created successfully"
    
    try:
        cursor.execute(sql)
        for row in cursor.fetchall():
            print "name: {}, age: {}, address: {}, salary: {}".format(row[0], row[1], row[2], row[3])
    except:
        print "Error: unable to fecth data"
except:
    # Rollback in case there is any error
    print 'Rollback'
    con.rollback()

# Disconnect from server
con.close()

Records created successfully
name: John, age: 28, address: California, salary: 154793.41056
name: Allen, age: 25, address: California, salary: 928760.46336
name: John, age: 28, address: California, salary: 128994.5088
name: John, age: 28, address: California, salary: 128994.5088
name: Allen, age: 25, address: California, salary: 773967.0528
name: John, age: 28, address: California, salary: 107495.424
name: Allen, age: 25, address: California, salary: 644972.544
name: John, age: 28, address: California, salary: 89579.52
name: Allen, age: 25, address: California, salary: 537477.12
name: John, age: 28, address: California, salary: 74649.6
name: Allen, age: 25, address: California, salary: 447897.6
name: John, age: 28, address: California, salary: 62208.0
name: Allen, age: 25, address: California, salary: 373248.0
name: John, age: 28, address: California, salary: 51840.0
name: Allen, age: 25, address: California, salary: 311040.0
name: John, age: 28, address: California, salary: 43200.0
na

# Interaction of pandas and SQL

[[back to top]](#Table-of-Contents)

Using pandas, we can import results of a SQL query into a dataframe.

pandas DataFrame’s method `to_sql()` allows to rewrite a DataFrame to the existing SQL database. Moreover, it does no matter if it is SQLite, MySQL or any other SQL database. 

pandas function `read_sql()` allows convert a SQL table to the DataFrame.

Let's read three data files for MovieLens dataset, which have used in a few previous lessons.

In [12]:
import pandas as pd

genres = [
    'Action', 'Adventure', 'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
    'Film_Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci_Fi', 'Thriller', 'War', 'Western'
]

movies = pd.read_csv('data/u.data', sep='\t', engine='python', names=['user_id', 'movie_id', 'rating', 'timestamp'])
users = pd.read_csv('data/u.user', sep='|', engine='python', names=['user_id', 'age', 'gender', 'occupation', 'zip_code'])
items = pd.read_csv('data/u.item', sep='|', engine='python', 
                        names=['movie_id', 'movie_title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown'] + genres)

# Remove columns with undefined data and empty rows 
items = items.drop(["video_release_date", "unknown"], axis=1)

# Remove uninformative rows
items = items[(items['movie_title'] != 'unknown') | (items['release_date'].notnull())]

In [13]:
print movies.shape[0], "rows"
movies.head()

100000 rows


Unnamed: 0,user_id,movie_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [14]:
print users.shape[0], "rows"
users.head()

943 rows


Unnamed: 0,user_id,age,gender,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [15]:
print items.shape[0], "rows"
items.head()

1681 rows


Unnamed: 0,movie_id,movie_title,release_date,IMDb_URL,Action,Adventure,Animation,Childrens,Comedy,Crime,...,Fantasy,Film_Noir,Horror,Musical,Mystery,Romance,Sci_Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,1,1,1,0,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,http://us.imdb.com/M/title-exact?GoldenEye%20(...,1,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,http://us.imdb.com/M/title-exact?Get%20Shorty%...,1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


Now we will save `movie` DataFrame as MySQL table to the `example_db` database created on the previous steps.

In [16]:
con = MySQLdb.connect("localhost", "root", "vagrant", "example_db")

# Create a new table `movies`
movies.to_sql(con=con,               # Current connection
              name='movies',         # Table name
              if_exists='replace',   # `if_exists` defines what you want to do if the tabe exists:
                                     # `fail`: If table exists, do nothing
                                     # `replace`: If table exists, drop it, recreate it, and insert data
                                     # `append`: If table exists, insert data. Create if does not exist
              flavor='mysql',        # The flavor of SQL to use,  default ‘sqlite’
              index=False            # Don't write DataFrame index as a column to the SQL table
             )
# Read first 5 records of just created table
cursor = con.cursor()
sql = "SELECT * FROM movies LIMIT 5"

try:
    cursor.execute(sql)
    results = cursor.fetchall()
    for row in results:
        print row
except:
    print "Error: unable to fecth data"
    # Rollback in case there is any error
    print 'Rollback'
    con.rollback()
# Disconnect from server
con.close()

  chunksize=chunksize, dtype=dtype)


(196L, 242L, 3L, 881250949L)
(186L, 302L, 3L, 891717742L)
(22L, 377L, 1L, 878887116L)
(244L, 51L, 2L, 880606923L)
(166L, 346L, 1L, 886397596L)


> ### Exercise 3.1:

> Create two new tables `users` and `items` for respective DataFrames in the way shown above. If you have the problems with 'IMDb_URL' column writting to the database, you may simply drop it.

In [17]:
# type your code here
con = MySQLdb.connect("localhost", "root", "vagrant", "example_db")

# Create a new table `users`
users.to_sql(con=con,               # Current connection
              name='users',         # Table name
              if_exists='replace',   # `if_exists` defines what you want to do if the tabe exists:
                                     # `fail`: If table exists, do nothing
                                     # `replace`: If table exists, drop it, recreate it, and insert data
                                     # `append`: If table exists, insert data. Create if does not exist
              flavor='mysql',        # The flavor of SQL to use,  default ‘sqlite’
              index=False            # Don't write DataFrame index as a column to the SQL table
             )
# Read first 5 records of just created table
cursor = con.cursor()
sql = "SELECT * FROM users LIMIT 5"

try:
    cursor.execute(sql)
    results = cursor.fetchall()
    for row in results:
        print row
except:
    print "Error: unable to fecth data"
    # Rollback in case there is any error
    print 'Rollback'
    con.rollback()
# Disconnect from server
con.close()
con = MySQLdb.connect("localhost", "root", "vagrant", "example_db")
items = items.drop('IMDb_URL',axis=1)
print(items.head(5))
# Create a new table `items`
items.to_sql(con=con,               # Current connection
              name='items',         # Table name
              if_exists='replace',   # `if_exists` defines what you want to do if the tabe exists:
                                     # `fail`: If table exists, do nothing
                                     # `replace`: If table exists, drop it, recreate it, and insert data
                                     # `append`: If table exists, insert data. Create if does not exist
              flavor='mysql',        # The flavor of SQL to use,  default ‘sqlite’
              index=False            # Don't write DataFrame index as a column to the SQL table
             )
# Read first 5 records of just created table
cursor = con.cursor()
sql = "SELECT * FROM items LIMIT 5"

try:
    cursor.execute(sql)
    results = cursor.fetchall()
    for row in results:
        print row
except:
    print "Error: unable to fecth data"
    # Rollback in case there is any error
    print 'Rollback'
    con.rollback()
# Disconnect from server
#con.close()

(1L, 24L, 'M', 'technician', '85711')
(2L, 53L, 'F', 'other', '94043')
(3L, 23L, 'M', 'writer', '32067')
(4L, 24L, 'M', 'technician', '43537')
(5L, 33L, 'F', 'other', '15213')
   movie_id        movie_title release_date  Action  Adventure  Animation  \
0         1   Toy Story (1995)  01-Jan-1995       0          0          1   
1         2   GoldenEye (1995)  01-Jan-1995       1          1          0   
2         3  Four Rooms (1995)  01-Jan-1995       0          0          0   
3         4  Get Shorty (1995)  01-Jan-1995       1          0          0   
4         5     Copycat (1995)  01-Jan-1995       0          0          0   

   Childrens  Comedy  Crime  Documentary   ...     Fantasy  Film_Noir  Horror  \
0          1       1      0            0   ...           0          0       0   
1          0       0      0            0   ...           0          0       0   
2          0       0      0            0   ...           0          0       0   
3          0       1      0          

  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)
  conn.executemany(self.insert_statement(), data_list)


In [18]:
from test_helper import Test

result = cursor.execute('SELECT movie_id FROM items') + cursor.execute('SELECT user_id FROM users')
Test.assertEqualsHashed(result, 'ef899d5268da6f195ca97b123df5f0e66082be4d', 'Incorrect sql query', "Exercise 3.1 is successful")

1 test passed. Exercise 3.1 is successful


To convert a SQL table to pandas DataFrame you may use `read_sql` function, that has two argument: SQL query and connection.  

In [19]:
df = pd.read_sql("SELECT * FROM movies GROUP BY user_id ORDER BY user_id", con)

print df.shape[0], "rows"
df.head()

943 rows


Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,61,4,878542420
1,2,292,4,888550774
2,3,335,1,889237269
3,4,264,3,892004275
4,5,2,3,875636053


> ### Exercise 3.2:

> Using SQL join commands join tables `movies` and `users` by the common field `user_id`. After that join the obtained table with `items` table by the common field `movie_id`. In the result you must to get the same table (let's call it `full`) which we have worked in the lesson **Lesson 2 - Basic intro into pandas** with (the same records amount).

In [20]:
# type your code here
sql = " \
SELECT * FROM movies as m,users as u,items as i; \
JOIN u ON u.user_id=m.user_id; \
JOIN i ON i.movie_id=m.movie_id; \
"
df1 =pd.read_sql("SELECT * FROM movies",con)
df2 =pd.read_sql("SELECT * FROM users",con)
df3 =pd.read_sql("SELECT * FROM items",con)
df41 = pd.merge(df1,df2,on='user_id')
df4 = pd.merge(df41,df3,on='movie_id')
print(len(df1),len(df2),len(df3),len(df4))
print(df4.head(5))
full = df4
del df4
#full = pd.read_sql(sql, con)
full.head()

(100000, 943, 1681, 99991)
   user_id  movie_id  rating  timestamp  age gender  occupation zip_code  \
0      196       242       3  881250949   49      M      writer    55105   
1      305       242       5  886307828   23      M  programmer    94086   
2        6       242       4  883268170   42      M   executive    98101   
3      234       242       4  891033261   60      M     retired    94702   
4       63       242       3  875747190   31      M   marketing    75240   

    movie_title release_date   ...     Fantasy  Film_Noir  Horror  Musical  \
0  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
1  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
2  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
3  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
4  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   

   Mystery  Romance  Sci_Fi  Thriller  War  Wes

Unnamed: 0,user_id,movie_id,rating,timestamp,age,gender,occupation,zip_code,movie_title,release_date,...,Fantasy,Film_Noir,Horror,Musical,Mystery,Romance,Sci_Fi,Thriller,War,Western
0,196,242,3,881250949,49,M,writer,55105,Kolya (1996),24-Jan-1997,...,0,0,0,0,0,0,0,0,0,0
1,305,242,5,886307828,23,M,programmer,94086,Kolya (1996),24-Jan-1997,...,0,0,0,0,0,0,0,0,0,0
2,6,242,4,883268170,42,M,executive,98101,Kolya (1996),24-Jan-1997,...,0,0,0,0,0,0,0,0,0,0
3,234,242,4,891033261,60,M,retired,94702,Kolya (1996),24-Jan-1997,...,0,0,0,0,0,0,0,0,0,0
4,63,242,3,875747190,31,M,marketing,75240,Kolya (1996),24-Jan-1997,...,0,0,0,0,0,0,0,0,0,0


In [21]:
Test.assertEqualsHashed(len(full), '1ec795c83bc6203b809313940936203cca766e10', 'Incorrect sql query', "Exercise 3.2 is successful")

1 test passed. Exercise 3.2 is successful


> ### Exercise 3.3:

> Using SQL commands group all movies by the genre (you need take all genres from above `genres` Python list) and display how many movies are included to each group. Does the total amount of movies in all groups is equal to the total amount of movies in `full` SQL table? Your resulting table should have the following form

>||Action_films|Adventure_films|...|Western_films|in_all|
|----|----|----|----|----|
|0|251|135|...|27|2891|

>i.e. each column corresponds to genre name and contains the `"_films"` suffics, the last column called `"in_all"` contains the total amount of movies in all genres. All genres columns should be positioned in alphabetic order.

> Write result to the `result` variable.

In [22]:
# type your code here
print(full.head(5))
placeholder= '?' # For SQLite. See DBAPI paramstyle.
placeholders= ', '.join(i for i in genres)
sql = " \
SELECT %s FROM movies; \
ORDER BY ASC;" % placeholders
list1 = full.columns.tolist()

df1 = full.groupby(genres).count().reset_index()
print df1.columns
#df1.drop([u'movie_title',u'release_date'],inplace=True,axis=1)
df1.drop([u'user_id', u'rating', u'timestamp', u'age',
       u'gender', u'occupation', u'zip_code', u'movie_title', u'release_date'], inplace=True, axis=1)
df1 = df1.add_suffix('_films')
df1.rename(columns={'movie_id_films':'in_all'}, inplace=True)
#df1['in_all'] = df1.sum(axis=1)
df1.head(5)
#print df1.sum()
result = df1.sum()
print result.head(5)
print result.values
#result = pd.read_sql(sql, con)
#print(result.head(5))

   user_id  movie_id  rating  timestamp  age gender  occupation zip_code  \
0      196       242       3  881250949   49      M      writer    55105   
1      305       242       5  886307828   23      M  programmer    94086   
2        6       242       4  883268170   42      M   executive    98101   
3      234       242       4  891033261   60      M     retired    94702   
4       63       242       3  875747190   31      M   marketing    75240   

    movie_title release_date   ...     Fantasy  Film_Noir  Horror  Musical  \
0  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
1  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
2  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
3  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   
4  Kolya (1996)  24-Jan-1997   ...           0          0       0        0   

   Mystery  Romance  Sci_Fi  Thriller  War  Western  
0        0        0 

In [23]:
Test.assertEqualsHashed(result.values, 'af649b8df0964186b7f15cd370275c6c8aaf9267', 'Incorrect sql query', "Exercise 3.3 is successful")

1 test failed. Incorrect sql query


> ### Exercise 3.4:

> Calculate how many movies possess by more than 2 genres. Write result to the `amount` variable.

In [24]:
# type your code here
print df1.dtypes
#df1['sum']=(df1.sum(axis=1)-df1['in_all'])/4
df1['sum']=df1.apply(lambda x: x+x).astype('int64')
#print sum(df1['sum'])
df2 = df1[df1['sum']>1]
df2.head()
amount = len(df2)
print amount

Action_films         int64
Adventure_films      int64
Animation_films      int64
Childrens_films      int64
Comedy_films         int64
Crime_films          int64
Documentary_films    int64
Drama_films          int64
Fantasy_films        int64
Film_Noir_films      int64
Horror_films         int64
Musical_films        int64
Mystery_films        int64
Romance_films        int64
Sci_Fi_films         int64
Thriller_films       int64
War_films            int64
Western_films        int64
in_all               int64
dtype: object


ValueError: Wrong number of items passed 19, placement implies 1

In [None]:
Test.assertEqualsHashed(amount, 'ba613d1fc0d9300175611e31cca7cf9f525056cb', 
                        'Incorrect value of "amount"', "Exercise 3.4 is successful")

> ### Exercise 3.5:

> Select all users who watched more than 50 movies, which were released after 1980, display his ID and how many films were evaluated by him with 1 or with 5 (call this column as `count`). Display results in ascending order by user age. If there are few users with the same age, then order them by ID. Your resulting table should contain at least two column `user_id` and `count`. Write result to the `result` variable.

In [51]:
#full.head(5)
import datetime as dt
full0 = full
full0['release_date'] = pd.to_datetime(full0['release_date'])
#print full0.head(5)
full0 = full0.loc[(full0['rating']==1)|(full0['rating']==5)&(full0['release_date']>'1979-12-31')]
print full0.head(5)
df4 = full0.groupby('user_id').size().reset_index()
print df4.columns
#df4.rename({0:'count'}, inplace=True)
print df4.columns
df4.head(5)
df6 = full0[['user_id','age']]
df5=pd.DataFrame()
df5['user_id'] = df4['user_id']
df5['count'] = df4[0]
df5 = df5[df5['count']>50]
df5 = pd.merge(df5,df6, on='user_id', how='left').drop_duplicates().reset_index(drop=True)
df5 = df5.sort_values(by='age',ascending=True).reset_index(drop=True)
print df5.head(5)
result = df5
# type your code here
#result = pd.read_sql(<<YOUR_CODE>>, con)
#result

    user_id  movie_id  rating  timestamp  age gender     occupation zip_code  \
1       305       242       5  886307828   23      M     programmer    94086   
5       181       242       1  878961814   26      M      executive    21218   
7       249       242       5  879571438   25      M        student    84103   
10      145       242       5  875269755   31      M  entertainment    V3N4P   
13       18       242       5  880129305   35      F          other    37212   

     movie_title release_date   ...     Fantasy  Film_Noir  Horror  Musical  \
1   Kolya (1996)   1997-01-24   ...           0          0       0        0   
5   Kolya (1996)   1997-01-24   ...           0          0       0        0   
7   Kolya (1996)   1997-01-24   ...           0          0       0        0   
10  Kolya (1996)   1997-01-24   ...           0          0       0        0   
13  Kolya (1996)   1997-01-24   ...           0          0       0        0   

    Mystery  Romance  Sci_Fi  Thriller  War 

In [52]:
Test.assertEqualsHashed(result['user_id'] + result['count'], 'c0b1c79e2d06e741b3fb7844fda11aee3c3f9250', 
                        'Incorrect sql query',  "Exercise 3.5 is successful")

1 test failed. Incorrect sql query


<center><h3>Presented by <a target="_blank" rel="noopener noreferrer nofollow" href="http://datascience-school.com">datascience-school.com</a></h3></center>