# Installing MySQL in the Computer
### 19.2.2 Installing MySQL Workbench

* [Download MYSQL application](https://dev.mysql.com/downloads/installer/)
* Click on the link highlighted below.

<img src="images/mysql1.jpg" width="1000"/>

* Once downloaded, install the application. Follow the instructions and continue with installation.

<img src="images/mysql2.jpg" width="1000"/>

* Select Standalone MySQL server option

<img src="images/mysql3.jpg" width="1000"/>

- Keep the default values for “Type and Networking” section

<img src="images/mysql4.jpg" width="1000"/>

<img src="images/mysql5.jpg" width="1000"/>

- Finish the installation by pressing next.
- Once the installation is complete, open the MYSQL workbench and click on the local instance connection. The window will look something like this:

<img src="images/mysql6.jpg" width="1000"/>

- Provide the password that you setup during installation:

<img src="images/mysql7.jpg" width="1000"/>

>**Note:** ***Remember the user name mentioned above as we will need it to connect python to Mysql.***

- Once the connection is established, you will see a window like this:

<img src="images/mysql8.jpg" width="1000"/>

# Case Study #1 - Danny's Diner

![Case_study_1.png](attachment:Case_study_1.png)

---

## Introduction 
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.

---

## Problem Statement

Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!

Danny has shared with you 3 key datasets for this case study:

- `sales`
- `menu`
- `members`

You can inspect the entity relationship diagram and example data below.

---

## Entity Relationship Diagram

![Case_study_2.png](attachment:Case_study_2.png)

---

## Example Datasets

All datasets exist within the `dannys_diner` database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.

### Table 1: sales

The `sales` table captures all `customer_id` level purchases with an corresponding `order_date` and `product_id` information for when and what menu items were ordered.

| customer_id | order_date | product_id |
|-------------|------------|------------|
| A           | 2021-01-01 | 1          |
| A           | 2021-01-01 | 2          |
| A           | 2021-01-07 | 2          |
| A           | 2021-01-10 | 3          |
| A           | 2021-01-11 | 3          |
| A           | 2021-01-11 | 3          |
| B           | 2021-01-01 | 2          |
| B           | 2021-01-02 | 2          |
| B           | 2021-01-04 | 1          |
| B           | 2021-01-11 | 1          |
| B           | 2021-01-16 | 3          |
| B           | 2021-02-01 | 3          |
| C           | 2021-01-01 | 3          |
| C           | 2021-01-01 | 3          |
| C           | 2021-01-07 | 3          |

### Table 2: menu

The `menu` table maps the `product_id` to the actual `product_name` and `price` of each menu item.

| product_id | product_name | price |
|------------|--------------|-------|
| 1          | sushi        | 10    |
| 2          | curry        | 15    |
| 3          | ramen        | 12    |

### Table 3: members

The final `members` table captures the `join_date` when a `customer_id` joined the beta version of the Danny’s Diner loyalty program.

| customer_id | join_date  |
|-------------|------------|
| A           | 2021-01-07 |
| B           | 2021-01-09 |

---

## Interactive SQL Session

The Dataset for this case study can be accessed from [here](https://www.db-fiddle.com/f/2rM8RAnq7h5LLDTzZiRWcd/138). I will be using MySQL to solve this case study. In order to solve yourself this case study, simply go to the above link and choose MySQL Dialect (version > 8, if using MySQL version higher than 8 locally), copy & paste the Database schema into MySQL.

Here is the snapshot of it.

![Case_study_3.png](attachment:Case_study_3.png)

## Creating the Database using MySQL Python Connector

In [1]:
DB_NAME = "Dannys_Diner_Langchain"

TABLES = {}
TABLES['sales'] = (
    " CREATE TABLE sales ("
    " `customer_id` VARCHAR(1),"
    " `order_date` DATE,"
    " `product_id` INTEGER"
    " ) ENGINE=InnoDB"
)

TABLES['members'] = (
    " CREATE TABLE members ("
    " `customer_id` VARCHAR(1),"
    " `join_date` DATE"
    " ) ENGINE=InnoDB"
)

TABLES['menu'] = (
    " CREATE TABLE menu ("
    " `product_id` INTEGER,"
    " `product_name` VARCHAR(5),"
    " `price` INTEGER"
    " ) ENGINE=InnoDB"
)

## Installing Necessary Libraries

In [2]:
# !pip3 install pymyql langchain==0.0.350 langchain-experimental==0.0.47 openai==0.28.1 tenacity -U --quiet

## Importing Libraries for creating the database

In [3]:
import os
import openai
from langchain import OpenAI
from langchain import SQLDatabase
import mysql.connector as connection
from mysql.connector import errorcode
from dotenv import load_dotenv, find_dotenv
from langchain.chat_models import ChatOpenAI
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.chains import create_sql_query_chain
from langchain_experimental.sql import SQLDatabaseChain
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain_experimental.sql import SQLDatabaseSequentialChain
from tenacity import retry, stop_after_attempt, wait_random_exponential

## Loading the Secrets from the .env file
print(load_dotenv())

# openai_api_key = os.getenv("OPENAI_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")

True


### Creating a connection with the MySQL

In [4]:
mydb = connection.connect(
        host='localhost',
        user='root',
        passwd="Environment1234567890",
        use_pure=True,
    )
print(f"Connection Check is {mydb.is_connected()}")
### Initializing the cursor
cursor = mydb.cursor()

Connection Check is True


### Writing a function to create a Database

In [5]:
def create_database(cursor):
    try:
        query = F"CREATE DATABASE {DB_NAME}"
        cursor.execute(query)
    except connection.Error as err:
        print(f"Failed to create the Database {err}")
        exit(1)

### Trying to Create a Database if it does not exists  

In [6]:
try:
    create_database(cursor)
    cursor.execute(f"USE {DB_NAME};")
except connection.Error as err:
    print(f"Database {DB_NAME} does not exists.")
    if err.errno == errorcode.ER_BAD_DB_ERROR:
        create_database(cursor)
        print(f"Database {DB_NAME} created successfully!!!")
        connection.database = DB_NAME
    else:
        print(err)
        exit(1)

### Creating tables by iteratively over the items of the TABLES dictionary.

In [7]:
for table_name in TABLES:
    table_description = TABLES[table_name]
    try:
        print(f"Creating table {table_name}")
        cursor.execute(table_description)
    except connection.Error as err:
        if err.errno == errorcode.ER_TABLE_EXISTS_ERROR:
            print("Already Exists!!!")
        else:
            print(err.msg)    
        
cursor.close()
mydb.close()

Creating table sales
Creating table members
Creating table menu


### Data that will be inserted

In [8]:
menu_records = (
    "INSERT INTO menu (`product_id`, `product_name`, `price`) VALUES ('1', 'sushi', '10');",
    "INSERT INTO menu (`product_id`, `product_name`, `price`) VALUES ('2', 'curry', '15');",
    "INSERT INTO menu (`product_id`, `product_name`, `price`) VALUES ('3', 'ramen', '12');",   
)

members_records = (
    "INSERT INTO members (`customer_id`, `join_date`) VALUES ('A', '2021-01-07');",
    "INSERT INTO members (`customer_id`, `join_date`) VALUES ('B', '2021-01-09');",       
)

sales_records = (
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-01', '1');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-01', '2');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-07', '2');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-10', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-11', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('A', '2021-01-11', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-01-01', '2');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-01-02', '2');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-01-04', '1');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-01-11', '1');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-01-16', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('B', '2021-02-01', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('C', '2021-01-01', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('C', '2021-01-01', '3');",
    "INSERT INTO sales (`customer_id`, `order_date`, `product_id`) VALUES ('C', '2021-01-07', '3');",
)

### Inserting the data into the appropriate tables

In [9]:
try:
    mydb = connection.connect(
        host="localhost",
        user="root",
        passwd="Environment1234567890",
        use_pure=True,
        database=DB_NAME
    )
    cursor = mydb.cursor()
    for query in (menu_records + sales_records + members_records):
        cursor.execute(query)
    print("Values are Inserted!!!!")
    mydb.commit()
    mydb.close()
except Exception as e:
    print(str(e))
    mydb.close()

Values are Inserted!!!!


## Creating an environment variable file to load OpenAI's API Key

In [10]:
%%writefile .env
# chiggy1994 account API key
OPENAI_API_KEY = "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"

Overwriting .env


### Case 1: Text-to-SQL query

In [11]:
db_user = "root"
db_password = "Environment1234567890"
db_host = "localhost:3306"

db = SQLDatabase.from_uri(f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{DB_NAME}")
llm = OpenAI(temperature=0, verbose=True, max_tokens=4097)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

In [14]:
introduction = """
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens 
up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very 
basic data from their few months of operation but have no idea how to use their data to help them run the business.

"""

problem_statement = """
Danny wants to use the data to answer a few simple questions about his customers, especially about their 
visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this 
deeper connection with his customers will help him deliver a better and more personalised experience for his 
loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer 
loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect 
the data without needing to use SQL.

Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these 
examples are enough for you to write fully functioning SQL queries to help him answer his questions!

Danny has shared with you 3 key datasets for this case study:

- `sales`: The `sales` table captures all `customer_id` level purchases with an corresponding `order_date` 
    and `product_id` information for when and what menu items were ordered.
- `menu`: The `menu` table maps the `product_id` to the actual `product_name` and `price` of each menu item.
- `members`: The final `members` table captures the `join_date` when a `customer_id` joined the beta 
version of the Danny’s Diner loyalty program.

"""

In [15]:
db_schema = """

CREATE TABLE sales (
`customer_id` VARCHAR(1),
`order_date` DATE,
`product_id` INTEGER")

CREATE TABLE members (
`customer_id` VARCHAR(1),
`join_date` DATE")

CREATE TABLE menu (
`product_id` INTEGER,
`product_name` VARCHAR(5),
`price` INTEGER")

The `customer_id` column of sales table is joined with `customer_id` column of members table and 
the `product_id` column in sales table is joined with `product_id` column in menu table.

"""

In [18]:
question_1 = "What is the total amount each customer spent at the restaurant?"
question_2 = "How many days has each customer visited the restaurant?"
question_3 = "What was the first item from the menu purchased by each customer?"
question_4 = "What is the most purchased item on the menu and how many times was it purchased by all customers?"
question_5 = "Which item was the most popular for each customer?"
question_6 = "Which item was purchased first by the customer after they became a member?"
question_7 = "Which item was purchased just before the customer became a member?"
question_8 = "What is the total items and amount spent for each member before they became a member?"
question_9 = """If each $1 spent equates to 10 points and sushi has a 2x points 
multiplier - how many points would each customer have?"""
question_10 = """In the first week after a customer joins the program (including their join date) they earn 2x 
points on all items, not just sushi - how many points do customer A and B have at the end of January?"""
bonus_question = """
Join All The Things

The following questions are related creating basic data tables that Danny and his team can use to quickly derive 
insights without needing to join the underlying tables using SQL.

Recreate the following table output using the available data:

| customer_id | order_date | product_name | price | member |
|-------------|------------|--------------|-------|--------|
| A           | 2021-01-01 | curry        | 15    | N      |
| A           | 2021-01-01 | sushi        | 10    | N      |
| A           | 2021-01-07 | curry        | 15    | Y      |
| A           | 2021-01-10 | ramen        | 12    | Y      |
| A           | 2021-01-11 | ramen        | 12    | Y      |
| A           | 2021-01-11 | ramen        | 12    | Y      |
| B           | 2021-01-01 | curry        | 15    | N      |
| B           | 2021-01-02 | curry        | 15    | N      |
| B           | 2021-01-04 | sushi        | 10    | N      |
| B           | 2021-01-11 | sushi        | 10    | Y      |
| B           | 2021-01-16 | ramen        | 12    | Y      |
| B           | 2021-02-01 | ramen        | 12    | Y      |
| C           | 2021-01-01 | ramen        | 12    | N      |
| C           | 2021-01-01 | ramen        | 12    | N      |
| C           | 2021-01-07 | ramen        | 12    | N      |
"""

In [19]:
# question_5 = """Which item was the most popular for each customer?

# To solve this question, use DENSE_RANK, OVER CLAUSE, PARTITION BY CLAUSE
# along with the AGGREGATED FUNCTIONS like SUM, COUNT, AVG etc. and CTE as
# well.

# Use these functions only to solve this question.
# """

In [20]:
prompt = f"""
As a Senior Data Scientist, having more than 30 years of experience in writing SQL 
queries in MySQL dialect & you have won several SQL query writing competitions 
such that you've made almost like 2% mistakes in those participated competitions 
and you've won 20 of them.

You are now given a task is to answer the question using the information such as introduction 
& the problem statement by your boss where he has asked you to write the SQL queries
in MySQL dialect. He has told you to make no sytactical errors while writing MySLQ queries.

If you

Introduction: {introduction}

Problem Statement: {problem_statement}

Database Schema: {db_schema}

The following question that needs to be answered is: {question_1}.

MAKE HEAVY USE OF CTE, ANALYTICAL FUNCTIONS LIKE RANK, DENSE_RANK, ROW_NUMBER,
NTILE, FIRST_VALUE, LAST_VALUE, OVER CLAUSE, PARTITION BY CLAUSE,
along with AGGREGATED FUNCTIONS LIKE SUM, AVG, COUNT etc.  to answer the above question.

Also, make extensive use of SUB-QUERIES to answer the above question.

DO NOT MAKE ANY MISTAKES WHILE GENERATING THE SQL QUERIES, ESPECIALLY WHEN SELECTING
THE COLUMNS IN THE SELECT CLAUSE AND USING GROUP BY CLAUSE.

Double check for the query that you generate, make sure you make no syntactical errors
while writing MySQL dialect queries.
"""

print(prompt)


As a Senior Data Scientist, having more than 30 years of experience in writing SQL 
queries in MySQL dialect & you have won several SQL query writing competitions 
such that you've made almost like 2% mistakes in those participated competitions 
and you've won 20 of them.

You are now given a task is to answer the question using the information such as introduction 
& the problem statement by your boss where he has asked you to write the SQL queries
in MySQL dialect. He has told you to make no sytactical errors while writing MySLQ queries.

If you

Introduction: 
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens 
up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very 
basic data from their few months of operation but have no idea how to use their data to help them run th

In [21]:
llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo')
chain = create_sql_query_chain(llm, db)
response = chain.invoke({"question": prompt})
print(response)

WITH customer_spending AS (
  SELECT s.customer_id, SUM(m.price) AS total_spent
  FROM sales s
  JOIN menu m ON s.product_id = m.product_id
  GROUP BY s.customer_id
)
SELECT customer_id, total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;


In [22]:
fixed_response_prompt = f"""{response}

Double check the MySQL query above for common mistakes, including:
 - Remember to add all the columns in the GROUP BY clause that are selected in the SELECT statement.
 - Handling case sensitivity.
 - Use CTE extensively to solve this question.
 - Make sure to use Sub-Queries if required.
 - Ensuring the join columns are correct.
 - Casting values to the appropriate type.
 - If used any column names as the function names like rank then in order to use those function names
 as column names add ` as prefix and suffix to fix the column name error.
 
Rewrite the query here if there are any mistakes. If it looks good as it is, just reproduce the original query."""

print(fixed_response_prompt)

WITH customer_spending AS (
  SELECT s.customer_id, SUM(m.price) AS total_spent
  FROM sales s
  JOIN menu m ON s.product_id = m.product_id
  GROUP BY s.customer_id
)
SELECT customer_id, total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;

Double check the MySQL query above for common mistakes, including:
 - Remember to add all the columns in the GROUP BY clause that are selected in the SELECT statement.
 - Handling case sensitivity.
 - Use CTE extensively to solve this question.
 - Make sure to use Sub-Queries if required.
 - Ensuring the join columns are correct.
 - Casting values to the appropriate type.
 - If used any column names as the function names like rank then in order to use those function names
 as column names add ` as prefix and suffix to fix the column name error.
 
Rewrite the query here if there are any mistakes. If it looks good as it is, just reproduce the original query.


In [23]:
response = chain.invoke({"question": fixed_response_prompt})

In [24]:
db.run(response)

"[('A', Decimal('76')), ('B', Decimal('74')), ('C', Decimal('36'))]"

In [None]:
# couldn't answer 3rd, 8th, 10th, bonus_question

### Helpful Resources
- [Replacing a SQL analyst with 26 recursive GPT prompts](https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/?ref=blog.langchain.dev)
- [LLMs and SQL](https://blog.langchain.dev/llms-and-sql/)
- [SQL with Langchain](https://python.langchain.com/docs/use_cases/qa_structured/sql)