Introduction
Welcome to the Connecting Python to SQL lab!

In this lab, you will be working with the Sakila database on movie rentals. Specifically, you will be practicing how to do basic SQL queries using Python. By connecting Python to SQL, you can leverage the power of both languages to efficiently manipulate and analyze large datasets. Throughout this lab, you will practice how to use Python to retrieve and manipulate data stored in the Sakila database using SQL queries. Let's get started!

Challenge
In this lab, the objective is to identify the customers who were active in both May and June, and how did their activity differ between months. To achieve this, follow these steps:

In [44]:
!pip install pymysql
!pip install sqlalchemy

import pandas as pd
import numpy as np

import pymysql                        
from sqlalchemy import create_engine  

from getpass import getpass 



In [45]:
password = getpass()

 ········


In [46]:
# command to connect
connection_string = 'mysql+pymysql://root:'+password+'@localhost/sakila'
engine = create_engine(connection_string)

In [47]:
type(engine) #connection_string

sqlalchemy.engine.base.Engine

In [50]:
# to reach out SQL database e.g.,
df_actor = pd.read_sql_query('SELECT * FROM sakila.actor', engine)
df_actor.head() 

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 04:34:33
1,2,NICK,WAHLBERG,2006-02-15 04:34:33
2,3,ED,CHASE,2006-02-15 04:34:33
3,4,JENNIFER,DAVIS,2006-02-15 04:34:33
4,5,JOHNNY,LOLLOBRIGIDA,2006-02-15 04:34:33


In [51]:
df_rental = pd.read_sql_query('SELECT * FROM sakila.rental', engine)
df_rental.head() 

Unnamed: 0,rental_id,rental_date,inventory_id,customer_id,return_date,staff_id,last_update
0,1,2005-05-24 22:53:30,367,130,2005-05-26 22:04:30,1,2006-02-15 21:30:53
1,2,2005-05-24 22:54:33,1525,459,2005-05-28 19:40:33,1,2006-02-15 21:30:53
2,3,2005-05-24 23:03:39,1711,408,2005-06-01 22:12:39,1,2006-02-15 21:30:53
3,4,2005-05-24 23:04:41,2452,333,2005-06-03 01:43:41,2,2006-02-15 21:30:53
4,5,2005-05-24 23:05:21,2079,222,2005-06-02 04:33:21,1,2006-02-15 21:30:53


In [61]:

def rentals_month (engine, month, year):
    query = f"""SELECT rental_id, rental_date, return_date, inventory_id customer_id FROM rental WHERE MONTH(rental_date) = {month} AND YEAR(rental_date) = {year}"""
    df_local = pd.read_sql_query(query, engine)
    return df_local
df_rentals_month = rentals_month (engine, 5, 2005)
df_rentals_month.head()

Unnamed: 0,rental_id,rental_date,return_date,customer_id
0,1,2005-05-24 22:53:30,2005-05-26 22:04:30,367
1,2,2005-05-24 22:54:33,2005-05-28 19:40:33,1525
2,3,2005-05-24 23:03:39,2005-06-01 22:12:39,1711
3,4,2005-05-24 23:04:41,2005-06-03 01:43:41,2452
4,5,2005-05-24 23:05:21,2005-06-02 04:33:21,2079


In [66]:

def rental_count_month(rentals_month, month, year):
    # Convert 'rental_date' column to datetime format
    rentals_month['rental_date'] = pd.to_datetime(rentals_month['rental_date'])
    
    # Extract month and year from 'rental_date'
    rentals_month['rental_month'] = rentals_month['rental_date'].dt.month
    rentals_month['rental_year'] = rentals_month['rental_date'].dt.year
    
    # Filter rentals for the selected month and year
    rentals_filtered = rentals_month[(rentals_month['rental_month'] == month) & (rentals_month['rental_year'] == year)]
    
    # Group by customer_id and count rentals
    rental_counts = rentals_filtered.groupby('customer_id').size().reset_index(name='rentals_{:02d}_{:04d}'.format(month, year))
    
    return rental_counts

In [67]:
# Add a new cell where the user can input the month and year
month = int(input("Enter the month (as a number): "))
year = int(input("Enter the year (as a four-digit number): "))

# Call the rental_count_month function with the provided inputs
result = rental_count_month(df_rentals_month, month, year)
print(result)

Enter the month (as a number):  5
Enter the year (as a four-digit number):  2005


      customer_id  rentals_05_2005
0               2                1
1               6                1
2              14                1
3              17                1
4              20                1
...           ...              ...
1151         4568                1
1152         4573                1
1153         4577                1
1154         4579                1
1155         4581                1

[1156 rows x 2 columns]
