# Text-to-SQL using Llama4 (DB Setup)
---

## Introduction

In this notebook we will set up ONE instance of the following database:
 - RDS for MySQL

Within this instance, we will create two databases with one table each. The tables will have a relationship between them. We will add some sample data to both tables.

## Contents

1. [Getting Started](#Getting-Started)
    + [Install Dependencies](#Step-0-Install-Dependencies)
    + [Setup Database](#Step-1-Set-up-database)
    + [Build Database](#Step-2-Build-Database)
    + [Cleanup Resources](#Step-3-Cleanup-Resources)

---

## Pre-requisites

1. Use kernel either `conda_python3`, `conda_pytorch_p310` or `conda_tensorflow2_p310`.
2. Install the required packages.

## Getting Started

### Step 0 Install Dependencies

Here, we will install all the required dependencies to run this notebook.

In [1]:
!pip install boto3==1.35.32 -qU --force --quiet --no-warn-conflicts
!pip install mysql-connector-python==8.4.0 -qU --force --quiet --no-warn-conflicts
!pip install langchain==0.2.5 -qU --force --quiet --no-warn-conflicts
!pip install chromadb==0.5.0 -qU --force --quiet --no-warn-conflicts
!pip install numpy==1.26.4 -qU --force --quiet --no-warn-conflicts
!pip install psycopg2==2.9.9 -qU --force --quiet --no-warn-conflicts

**Note:** *When installing libraries using the pip, you may encounter errors or warnings during the installation process. These are generally not critical and can be safely ignored. However, after installing the libraries, it is recommended to restart the kernel or computing environment you are working in. Restarting the kernel ensures that the newly installed libraries are loaded properly and available for use in your code or workflow.*

<div class='alert alert-block alert-info'><b>NOTE:</b> Restart the kernel with the updated packages that are installed through the dependencies above</div>

In [None]:
# Restart the kernel
import os
os._exit(00)

#### Import the required modules to run the notebook

In [7]:
!pip install mysql-connector-python

Collecting mysql-connector-python
  Downloading mysql_connector_python-9.2.0-py2.py3-none-any.whl.metadata (6.0 kB)
Downloading mysql_connector_python-9.2.0-py2.py3-none-any.whl (398 kB)
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-9.2.0


In [8]:
import boto3
import json
import mysql.connector as MySQLdb
from typing import Dict, List, Any
import yaml
import psycopg2 as PGdb
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


### Step 1 Setup database

Here, we retrieve the services that are already deployed as a part of the cloudformation template to be used in building the application. The services include,

+ Secret ARN with RDS for MySQL
+ Database Endpoints

In [16]:
stackname = "text2sql"  # If your stack name differs from "text2sql", please modify.

**Note: Marco to fix in CloudFormation template** 

ClientError: An error occurred (AccessDenied) when calling the DescribeStackResources operation: User: arn:aws:sts::333633606362:assumed-role/l4-txt2sql-SageMakerNotebookInstanceRole-bZXzdDYYTLvj/SageMaker is not authorized to perform: cloudformation:DescribeStackResources on resource: arn:aws:cloudformation:us-east-1:333633606362:stack/l4-text2sql/* because no identity-based policy allows the cloudformation:DescribeStackResources action

In [17]:
cfn = boto3.client('cloudformation')

response = cfn.describe_stack_resources(
    StackName=stackname
)
cfn_outputs = cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']

# Get rds secret arn and database endpoint from cloudformation outputs
for output in cfn_outputs:
    if 'SecretArnMySQL' in output['OutputKey']:
        mySQL_secret_id = output['OutputValue']

    if 'DatabaseEndpointMySQL' in output['OutputKey']:
        mySQL_db_host = output['OutputValue']


In [18]:
secrets_client = boto3.client('secretsmanager')

# Get MySQL credentials from Secrets Manager
credentials = json.loads(secrets_client.get_secret_value(SecretId=mySQL_secret_id)['SecretString'])

# Get password and username from secrets
mySQL_db_password = credentials['password']
mySQL_db_user = credentials['username']


##### Establish the database connection (MySQL DB)

In [19]:
mySQL_db_conn = MySQLdb.connect(
    host=mySQL_db_host,
    user=mySQL_db_user,
    password=mySQL_db_password
)


##### Check connection (MySQL DB)

In [20]:
mySQL_db_cursor = mySQL_db_conn.cursor()

mySQL_db_cursor.execute("SHOW DATABASES")

for tmp_db_name in mySQL_db_cursor:
    print(tmp_db_name)
    

('information_schema',)
('mysql',)
('performance_schema',)
('sys',)


### Step 2 Build Database
Now the notebook will drop the test table and also the test database if it exists. It then proceeds with creation of the table.
Then it will insert test data for use in our prompting examples.

#### Load table schema settings

In [33]:
def load_settings(file_path):
    """
    Reads a YAML file and returns its contents as a Python object.

    Args:
        file_path (str): The path to the YAML file.

    Returns:
        obj: The contents of the YAML file as a Python object.
    """
    try:
        with open(file_path, 'r') as file:
            data = yaml.safe_load(file)
        return data
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' does not exist.")
        return None
    except yaml.YAMLError as exc:
        print(f"Error: Failed to parse the YAML file '{file_path}': {exc}")
        return None

In [43]:
# MySQL Table Setup

# Load table settings - database: healthcare_db | table_name: patients
# Use the confirmed path that exists
settings_patients = load_settings('./schemas/patients_ms.yml')  # This path works
if settings_patients is not None:
    table_patients = settings_patients['table_name']
    table_schema_patients = settings_patients['table_schema']
    db_patients = settings_patients['database']
    print(f"Successfully loaded patients settings: {db_patients}.{table_patients}")
else:
    print("Failed to load patients settings")

# Load table settings - database: insurance_db | table_name: providers
settings_providers = load_settings('./schemas/providers_ms.yml')  # Use same convention
if settings_providers is not None:
    table_providers = settings_providers['table_name']
    table_schema_providers = settings_providers['table_schema']
    db_providers = settings_providers['database']
    print(f"Successfully loaded providers settings: {db_providers}.{table_providers}")
else:
    print("Failed to load providers settings")

# Load table settings for combined schema
settings_patients_providers = load_settings('./schemas/patients-providers_ms.yml')  # Use same convention
if settings_patients_providers is not None:
    print("Successfully loaded combined settings")
else:
    print("Failed to load combined settings")

Successfully loaded patients settings: healthcare_db.patients
Successfully loaded providers settings: insurance_db.providers
Successfully loaded combined settings


#### Cleanup Database

In [63]:
# Drop the dependent relation table and database first

# Delete patients' table
mySQL_db_cursor.execute(f"DROP TABLE IF EXISTS {db_patients}.{table_patients}")
# Delete database
mySQL_db_cursor.execute(f"DROP DATABASE IF EXISTS {db_patients}")

# Drop the parent table and database

# Delete providers' table
mySQL_db_cursor.execute(f"DROP TABLE IF EXISTS {db_providers}.{table_providers}")
# Delete database
mySQL_db_cursor.execute(f"DROP DATABASE IF EXISTS {db_providers}")

#### Create database and tables 

##### MySQL DB

In [64]:
# Create database `insurance_db` - MySQL DB
mySQL_db_cursor.execute(f"CREATE DATABASE {db_providers}")

# Create database `healthcare_db` - MySQL DB
mySQL_db_cursor.execute(f"CREATE DATABASE {db_patients}")

# Create table to hold data on insurance providers information called `providers`
mySQL_db_cursor.execute(table_schema_providers)

# Create table to hold data on patient information called `patients`
mySQL_db_cursor.execute(table_schema_patients)


#### Read sample data

In [65]:
# Read sample data for the providers' table
with open('sample_data/providers.json', 'r') as f:
    data_providers = json.load(f)

In [66]:
# Read sample data for the patients' table
with open('sample_data/patients.json', 'r') as f:
    data_patients = json.load(f)

#### Ingest sample data into database

##### MySQL DB

In [68]:
# Insert providers' data into MySQL database
for data in data_providers:
    sql = f"""
        INSERT INTO {db_providers}.{table_providers} 
        (Insurance_id, Provider_name, Coverage_type, Contact_email) 
        VALUES ({data['Insurance_id']}, '{data['Provider_name']}', '{data['Coverage_type']}', 
        '{data['Contact_email']}')"""
    mySQL_db_cursor.execute(sql)
mySQL_db_conn.commit()


In [69]:
# Insert patients' data into MySQL database
for data in data_patients:
    sql = f"""
        INSERT INTO {db_patients}.{table_patients} 
        (Patient_id, First_name, Last_name, Date_of_birth, Gender, 
        Contact_number, Insurance_id) 
        VALUES ({data['Patient_id']}, '{data['First_name']}', '{data['Last_name']}', 
        '{data['Date_of_birth']}', '{data['Gender']}',
        '{data['Contact_number']}', {data['Insurance_id']} )"""
    mySQL_db_cursor.execute(sql)
mySQL_db_conn.commit()


In [70]:
# First check if any providers data exists
mySQL_db_cursor.execute(f"SELECT COUNT(*) FROM {db_providers}.{table_providers}")
count = mySQL_db_cursor.fetchone()[0]

# If data exists, delete it before inserting
if count > 0:
    # Temporarily disable foreign key checks to avoid constraint errors
    mySQL_db_cursor.execute("SET FOREIGN_KEY_CHECKS=0")
    
    # Delete existing data
    mySQL_db_cursor.execute(f"TRUNCATE TABLE {db_providers}.{table_providers}")
    
    # Re-enable foreign key checks
    mySQL_db_cursor.execute("SET FOREIGN_KEY_CHECKS=1")

# Now insert providers' data
for data in data_providers:
    sql = f"""
        INSERT INTO {db_providers}.{table_providers} 
        (Insurance_id, Provider_name, Coverage_type, Contact_email) 
        VALUES ({data['Insurance_id']}, '{data['Provider_name']}', '{data['Coverage_type']}', 
        '{data['Contact_email']}')"""
    mySQL_db_cursor.execute(sql)

mySQL_db_conn.commit()

#### Verify our database connection works and we can retrieve records from our table.

##### MySQL DB

In [72]:
mySQL_db_cursor.execute(f"SELECT * FROM {db_providers}.{table_providers}")
sql_data = mySQL_db_cursor.fetchall()


for record in sql_data:
    print(record)
    

(101, 'Blue Shield Health Insurance', 'Medical', 'support@blueshield.com')
(102, 'Guardian Dental Care', 'Dental', 'info@guardiandentalcare.com')
(103, 'Vision Plus', 'Vision', 'contact@visionplus.org')
(104, 'United Health Group', 'Medical', 'service@unitedhealthgroup.com')
(105, 'Prestige Life Insurance', 'Life', 'claims@prestigelife.com')
(106, 'Medicare Health Solutions', 'Medical', 'medicare@healthsolutions.gov')
(107, 'Dental Health Alliance', 'Dental', 'support@dentalhealthalliance.com')
(108, 'Clear Vision Insurance', 'Vision', 'help@clearvision.com')
(109, 'Family Health Insurance Co.', 'Medical', 'care@familyhealth.com')
(110, 'Secure Life & Disability', 'Life', 'info@securelife.com')
(111, 'Premier Health Partners', 'Medical', 'contact@premierhealthpartners.org')
(112, 'SmileBright Dental Insurance', 'Dental', 'service@smilebright.com')
(113, 'OpticalCare Insurance', 'Vision', 'help@opticalcare.com')
(114, 'National Health Services', 'Medical', 'inquiries@nationalhealthservi

In [73]:
mySQL_db_cursor.execute(f"SELECT * FROM {db_patients}.{table_patients}")
sql_data = mySQL_db_cursor.fetchall()


for record in sql_data:
    print(record)


(1001, 'John', 'Smith', datetime.date(1982, 5, 15), 'Male', '555-123-4567', 101)
(1002, 'Emily', 'Johnson', datetime.date(1990, 8, 22), 'Female', '555-234-5678', 103)
(1003, 'Michael', 'Williams', datetime.date(1975, 11, 10), 'Male', '555-345-6789', 102)
(1004, 'Jessica', 'Brown', datetime.date(1988, 3, 30), 'Female', '555-456-7890', 104)
(1005, 'David', 'Davis', datetime.date(1965, 9, 18), 'Male', '555-567-8901', 101)
(1006, 'Sarah', 'Miller', datetime.date(1992, 12, 5), 'Female', '555-678-9012', 105)
(1007, 'James', 'Wilson', datetime.date(1970, 4, 25), 'Male', '555-789-0123', 102)
(1008, 'Jennifer', 'Moore', datetime.date(1986, 7, 14), 'Female', '555-890-1234', 103)
(1009, 'Robert', 'Taylor', datetime.date(1958, 2, 28), 'Male', '555-901-2345', 104)
(1010, 'Lisa', 'Anderson', datetime.date(1979, 10, 11), 'Female', '555-012-3456', 101)
(1011, 'Daniel', 'Thomas', datetime.date(1995, 6, 20), 'Male', '555-123-4567', 105)
(1012, 'Michelle', 'Jackson', datetime.date(1983, 1, 7), 'Female', 

### Step 3 Cleanup Resources

In [17]:
%%time
# Cleanup Cursor and connection objects.
mySQL_db_cursor.close()
mySQL_db_conn.close()


CPU times: user 1.77 ms, sys: 211 μs, total: 1.99 ms
Wall time: 2.53 ms


#### Thank you!
In this part we have set up the database. Now you can navigate to the llama3-2-text2sql notebook to continue with the example.