# SQL Solution
> Creating Series of SQL Procedures to use for the Data Migration <br>
<br>
- I'm going to use a local MySQL instance as a DEV environment <br>
- Create a new Database for the purposes of this project <br>
- Load the inital datasets into MySQL with the 'temp_' prefix <br>
- Go through the modifications highlighted on the Requirements <br>
    - For each modification I will write a stored procedure that I use @ the end <br>
    - For some modifications I will write a view to demonstrate the logic is correct before proceeding <br>

In [1]:
#| default_exp solution_sql

In [2]:
#| export
import pandas as pd
import mysql.connector

from virtuous_interview.utils import contacts, contact_methods, gifts
from mysql.connector import Error
from dotenv import dotenv_values
from sqlalchemy import create_engine

# Configuring SQL
> Creating a new db instance and loading tables

## Creating DB
> exam_db

In [3]:
#| export
DB_NAME = 'exam_db'

Using `mysql` library to connect as root user and create a new db instance

In [4]:
#| export solution_sql
try:
    connection = mysql.connector.connect(user='root', host='localhost')
    cursor = connection.cursor()
    cursor.execute(f"CREATE DATABASE IF NOT EXISTS {DB_NAME};")
except Error as e:
    print(f"Error: {e}")
finally:
    if connection.is_connected():
        cursor.close()
        connection.close()


## execute_sql
> Function to execute sql on the DEV db

Since that was a lot of code just to execute 1 line of SQL, I'm going to create a new function that will make it easier

In [5]:
DB_USER = dotenv_values()['DB_USER']
DB_PASSWORD = dotenv_values()['DB_PASSWORD']

In [6]:
#|export solution_sql
def insert_sql(sql):
    try:
        # Connect to MySQL
        connection = mysql.connector.connect(user=DB_USER, password=DB_PASSWORD, host='localhost', database=DB_NAME)
        cursor = connection.cursor()
    
        # Execute the SQL command to create the view
        cursor.execute(sql)
        print("SQL executed successfully")
    
    except Error as e:
        print(f"Error: {e}")
    
    finally:
        if connection.is_connected():
            cursor.close()
            connection.close()
        print("MySQL connection closed")

## insert_proc
> Function to insert procedure into the Database <br>
<br>
- Useful since I'll need to create many procedures

In [7]:
#|export solution_sql
def insert_proc(sql, proc_name, call=True):
    insert_sql(f'DROP PROCEDURE IF EXISTS {proc_name};')
    stmt = f"""         
        CREATE PROCEDURE {proc_name}()
        BEGIN
            {sql}
        END;
        COMMIT;
    """
    insert_sql(stmt)
    if call:
        insert_sql(f'CALL {proc_name}();')

## Loading Datasets Into Database
> Using prefix 'temp_'

Mapping DataFrames to table names

In [8]:
#| export solution_sql
tables = {'temp_contact_methods': contact_methods, 'temp_contacts': contacts, 'temp_gifts': gifts}

In [9]:
#| export
engine = create_engine(f'mysql+mysqlconnector://{DB_USER}:{DB_PASSWORD}@localhost/{DB_NAME}')

In [10]:
#| export solution_sql
try:
    for table_name, df in tables.items():
        try:
            df.to_sql(table_name, engine, index=False, if_exists='fail')
        except ValueError:
            print(f"Table {table_name} already exists. Skipping.")

except Error as e:
    print(f"Error: {e}")

finally:
    if connection.is_connected():
        cursor.close()
        connection.close()


Table temp_contact_methods already exists. Skipping.
Table temp_contacts already exists. Skipping.
Table temp_gifts already exists. Skipping.


Previewing the data

In [11]:
#| hide
pd.read_sql('select * from temp_gifts limit 5', engine)

Unnamed: 0,DonorNumber,GiftId,FirstName,LastName,AmountReceived,CreditCardType,PaymentMethod,LegacyPledgeID,Notes,Project1Code,Project2Code,GiftDate
0,848348568-0,95196378,Mannie,Turpin,4.15,,Other,0,,,,2019-03-04
1,729707142-0,95196889,Cymbre,Cross,2.3648,,Check,1,,ChildSponsorship,,2019-03-05
2,687119652-8,95197689,Ruggiero,Makepeace,1.31,,Cash,2,,,,2019-03-07
3,653377813-7,95198998,Karita,Lumbers,2.04,AMEX,Credit,3,In honor of Mannie Turpin,,,2019-03-10
4,390551098-7,95198999,Helga,Benech,5.8,,Cash,89752384,,,,2019-01-10


# Solution
> Creating procedures for each modification

## ContactType
> is required and can only be Household or Organization <br>
<br>
- Source Table: Contacts Table
- Solution:
    - Create procedure to add new column ContactType

In [12]:
#| hide
contacts_view = f"""
    CREATE or REPLACE VIEW contacts_view AS
    SELECT Number, CompanyName, CASE WHEN CompanyName = '' THEN 'Household' ELSE 'Organization' END AS ContactType
    FROM temp_contacts
"""
insert_sql(contacts_view)
pd.read_sql('select * from contacts_view limit 5', con=engine)


SQL executed successfully
MySQL connection closed


Unnamed: 0,Number,CompanyName,ContactType
0,653377813-7,,Household
1,390551098-7,,Household
2,093004505-X,,Household
3,729707142-0,A Company Co.,Organization
4,488464926-5,,Household


Now I'm going to write a procedure that would perform this transformation on the temp table. <br>
<br>
Since it's possible that I may need to run this procedure multiple times, each time the data is updated I'm going to write **2** procedures to solve this problem <br>
<br>
1. Add column procedure
    - Add a column if it doesn't exist
2. Procedure to add ContactType

In [13]:
#| export solution_sql
add_column = """
    DROP PROCEDURE IF EXISTS add_column;
    
    
    CREATE PROCEDURE add_column(
        IN tableName VARCHAR(255),
        IN columnName VARCHAR(255),
        IN columnType VARCHAR(255)
    )
    BEGIN
        DECLARE columnExists BOOLEAN DEFAULT FALSE;
    
        SELECT COUNT(*)
        INTO columnExists
        FROM information_schema.COLUMNS
        WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = tableName AND COLUMN_NAME = columnName;
    
        IF columnExists = 0 THEN
            SET @sql = CONCAT('ALTER TABLE ', tableName, ' ADD COLUMN ', columnName, ' ', columnType);
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
        END IF;
        COMMIT;
    END;
"""
insert_sql(add_column)

SQL executed successfully
MySQL connection closed


In [14]:
#|export solution_sql
insert_contact_type = """
    CALL add_column('temp_contacts', 'ContactType', 'VARCHAR(255)');
    
    UPDATE temp_contacts
    SET ContactType = CASE WHEN CompanyName = '' THEN 'Household' ELSE 'Organization' END;
    COMMIT;
"""
insert_proc(insert_contact_type, 'insert_contact_type', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


## Private
> Does someone want to be private <br>
<br>
- Source Table: Contacts Table
- Solution:
    - Create procedure to add new column Private

In [15]:
insert_private = """
    CALL add_column('temp_contacts', 'Private', 'TINYINT');

    UPDATE temp_contacts
    SET Private = CASE WHEN Remarks = 'Is anonymous' THEN 1 ELSE 0 END;
    commit;
"""
insert_proc(insert_private, 'insert_private', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


## Postal Code
> if address is present and is US, must be a valid zip code, either 12345 or 12345-1234 <br>
<br>
- Source Table: Contacts
- Solution:
    - Create procedure to remove any postal codees that doesn't match the approved format from the [usps](https://pe.usps.com/archive/html/dmmarchive20030810/A010.htm)

In [16]:
#| hide
','.join(contacts.columns.drop('Postal').tolist())

'Number,CompanyName,FirstName,LastName,Street,City,State,Phone,EMail,Remarks,Deceased,SecondaryFirstName,SecondaryLastName,LegacyIndividualId,SecondaryLegacyIndividualId,ContactName'

In [17]:
#| hide
contacts_view = f"""
    CREATE OR REPLACE VIEW contacts_view AS
    SELECT
        NUMBER,
        POSTAL AS OLD_POSTAL,
        CASE
        WHEN REGEXP_LIKE(Postal, '^[0-9]{{5}}$') OR REGEXP_LIKE(Postal, '^[0-9]{{5}}-[0-9]{{4}}$') THEN Postal
        ELSE ''
        END AS NEW_POSTAL
    FROM temp_contacts;
"""
insert_sql(contacts_view)
pd.read_sql("SELECT * FROM contacts_view where old_postal != '' limit 5", con=engine)

SQL executed successfully
MySQL connection closed


Unnamed: 0,NUMBER,OLD_POSTAL,NEW_POSTAL
0,390551098-7,89130,89130
1,488464926-5,49560,49560
2,029456846-8,30066,30066
3,687119652-8,68164,68164


Success!<br>
<br>
Creating Stored Procedure...

In [18]:
#|export solution_sql
update_zip = """
    UPDATE temp_contacts
    SET Postal = ''
    WHERE Postal NOT REGEXP '^[0-9]{5}$' AND Postal NOT REGEXP '^[0-9]{5}-[0-9]{4}$';
    COMMIT;
"""
insert_proc(update_zip, 'update_zip', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


## IsDeceased
> can only be TRUE or FALSE <br>
<br>
- Source Table: Contacts <br>
- Solution: <br>
    - Create procedure to update Deceased to TRUE/FALSE

In [19]:
#|export solution_sql
update_deceased = f"""
    UPDATE temp_contacts
    SET Deceased = CASE
        WHEN Deceased = 'Yes' THEN 1
        ELSE 0
    END;
    commit;
    ALTER TABLE temp_contacts MODIFY Deceased TINYINT;
    commit;
"""
insert_proc(update_deceased, 'update_deceased', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
Error: 1292 (22007): Truncated incorrect DOUBLE value: 'Yes'
MySQL connection closed


## GiftType
> Can only be Cash, Check, Credit, Other, or Reversing Transaction <br>
<br>
- Source Table: Gifts
- Solution:
    - Identify Incorrect Gift Types
    - Create procedure to replace invalid gift types

In [20]:
#| hide
pd.read_sql('Select Distinct PaymentMethod from temp_gifts', engine)

Unnamed: 0,PaymentMethod
0,Other
1,Check
2,Cash
3,Credit
4,Reversing Transaction


It looks like there are several payment methods that don't match the approved list. Additionally, the payment method 'credit card' will need to be mapped to 'credit'

In [21]:
#|export solution_sql
update_gift_type = f"""
  UPDATE temp_gifts
  SET PaymentMethod = CASE
    WHEN AmountReceived < 0 THEN 'Reversing Transaction'
    WHEN LOWER(TRIM(PaymentMethod)) = 'cash' THEN 'Cash'
    WHEN LOWER(TRIM(PaymentMethod)) = 'check' THEN 'Check'
    WHEN LOWER(TRIM(PaymentMethod)) LIKE 'credit%' THEN 'Credit'
    ELSE 'Other'
  END;
  commit;
"""
insert_proc(update_gift_type, 'update_gift_type', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


## CreditCardType
> Can only be Visa, Mastercard, AMEX, Discover <br>
<br>
- Solution: <br>
    - Identify Incorrect Credit Types <br>
    - Create procedure to replace invalid credit types

In [22]:
#| hide
pd.read_sql('select distinct CreditCardType from temp_gifts', engine)

Unnamed: 0,CreditCardType
0,
1,AMEX
2,Visa
3,Master card
4,Mastercard
5,Discover


In [23]:
#|export solution_sql
proc = f"""
UPDATE temp_gifts
    SET CreditCardType = CASE
    WHEN CreditCardType  = 'Master card' THEN 'Mastercard'
    else 'AMEX'
    end
WHERE CreditCardType IN ('American Ex', 'Master car');
commit;
"""
insert_proc(proc, 'update_gift_type', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


# Execution
> Creating Final Tables <br>
<br>

[contact_methods, contacts, gifts]

## Contact Methods

In [24]:
#| hide
pd.read_sql('select * from temp_contacts limit 5', engine)

Unnamed: 0,Number,CompanyName,FirstName,LastName,Street,City,State,Postal,Phone,EMail,Remarks,Deceased,SecondaryFirstName,SecondaryLastName,LegacyIndividualId,SecondaryLegacyIndividualId,ContactName,ContactType,Private
0,653377813-7,,Karita,Lumbers,4 Bunting Parkway,Washington,DC,,,kklumbers@ yahoo.co,Is anonymous,0,Kelvin,Lumbers,0,1.0,Karita & Kelvin Lumbers,Household,1
1,390551098-7,,Helga,Benech,48684 Jenifer Way,Las Vegas,NV,89130.0,,ebenech1@goodreads.com,,0,,,2,,Helga Benech,Household,0
2,093004505-X,,Masha,Butt Gow,353 Schmedeman Park,Indianapolis,IN,,577-374-96523,,,0,,,3,,Masha Butt Gow,Household,0
3,729707142-0,A Company Co.,Cymbre,Cross,2055 Lakewood Parkway,Camden,NJ,,,,,0,,,4,,Cymbre Cross,Organization,0
4,488464926-5,,Hoyt,Castille,37 8th Trail,Grand Rapids,MI,49560.0,,fcastille4@timesonline.co.uk,,0,,,5,,Hoyt Castille,Household,0


In [25]:
#| hide
pd.read_sql('select * from temp_contact_methods limit 5', engine)

Unnamed: 0,DonorNumber,Phone,EMail,Fax
0,653377813-7,832-442-4988,,
1,390551098-7,,ebenech1@goodreads.com,
2,093004505-X,818-323-9865,,818-156-7985
3,729707142-0,,,
4,488464926-5,,fcastille4@timesonline.co.uk,


> Procedure to create contact_methods table <br>
<br>

- Creating a contact_methods <br>
- Joining temp_contact_methods & temp_contacts into a temp table <br>
- Inserting non-null values into the contact_methods table <br>

In [26]:
#|export solution_sql
proc = """
CREATE TABLE IF NOT EXISTS contact_methods (
    `LegacyContactId` VARCHAR(255),
    `Type` VARCHAR(255),
    `Value` VARCHAR(255)
);

CREATE TEMPORARY TABLE clean_data AS
SELECT 
    temp_contacts.`Number` AS LegacyContactId,
    CASE
        WHEN temp_contact_methods.`Phone` != '' THEN temp_contact_methods.`Phone`
        ELSE temp_contacts.`Phone`
    END AS phone,
    CASE
        WHEN temp_contact_methods.`EMail` != '' THEN temp_contact_methods.`EMail`
        ELSE temp_contacts.`EMail`
    END AS email,
    temp_contact_methods.Fax AS fax
FROM
    temp_contacts
LEFT JOIN
    temp_contact_methods ON temp_contact_methods.DonorNumber = temp_contacts.`Number`
WHERE
    (temp_contacts.Phone != '' OR temp_contacts.`EMail` != '') OR
    (temp_contact_methods.Phone != '' OR temp_contact_methods.EMail != '' OR temp_contact_methods.Fax != '');


INSERT INTO contact_methods (LegacyContactId, Type, Value)
SELECT DISTINCT LegacyContactId, 'HomePhone', phone
FROM clean_data
WHERE phone != '';

INSERT INTO contact_methods (LegacyContactId, Type, Value)
SELECT DISTINCT LegacyContactId, 'HomeEmail', email
FROM clean_data
WHERE email != '';

INSERT INTO contact_methods (LegacyContactId, Type, Value)
SELECT DISTINCT LegacyContactId, 'Fax', fax
FROM clean_data
WHERE fax != '';

commit;
"""
insert_proc(proc, 'transform_contact_methods', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


In [27]:
pd.read_sql('select * from contact_methods', con=engine)

Unnamed: 0,LegacyContactId,Type,Value
0,653377813-7,HomePhone,832-442-4988
1,093004505-X,HomePhone,818-323-9865
2,848348568-0,HomePhone,702-844-9524
3,653377813-7,HomeEmail,kklumbers@ yahoo.co
4,390551098-7,HomeEmail,ebenech1@goodreads.com
5,488464926-5,HomeEmail,fcastille4@timesonline.co.uk
6,315297729-8,HomeEmail,dmouncey9@cnn.com
7,029456846-8,HomeEmail,jdoley6@telegraph.co.uk
8,687119652-8,HomeEmail,cmakepeace7@1688.com
9,093004505-X,Fax,818-156-7985


> Creating the final contacts table <br>
<br>

- The cleaning is done <br>
- Creating contacts table and renaming various columns <br>

In [28]:
proc = """
CREATE TABLE IF NOT EXISTS contacts (
    `LegacyContactId` VARCHAR(255),
    `LegacyIndividualId` VARCHAR(255),
    `ContactName` VARCHAR(255),
    `FirstName` VARCHAR(255),
    `LastName` VARCHAR(255),
    `SecondaryLegacyIndividualId` VARCHAR(255),
    `SecondaryFirstName` VARCHAR(255),
    `SecondaryLastName` VARCHAR(255),
    `HomePhone` VARCHAR(255),
    `HomeEmail` VARCHAR(255),
    `Address1` VARCHAR(255),
    `City` VARCHAR(255),
    `State` VARCHAR(255),
    `PostalCode` VARCHAR(255),
    `IsPrivate` VARCHAR(255),
    `IsDeceased` VARCHAR(255),
    `ContactType` VARCHAR(255)
);

insert into contacts
select 
    temp_contacts.`Number` as LegacyContactId,
    temp_contacts.LegacyIndividualId as LegacyIndividualId,
    `ContactName`,
    `FirstName`,
    `LastName`,
    `SecondaryLegacyIndividualId`,
    `SecondaryFirstName`,
    `SecondaryLastName`,
    `Phone` as HomePhone,
    `EMail` as HomeEmail,
    `Street` as Address1,
    `City`,
    `State`,
    `Postal` as PostalCode,
    `Private` as IsPrivate,
    `Deceased` as IsDeceased,
    `ContactType`
from temp_contacts;

commit;
"""
insert_proc(proc, 'create_contacts', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


In [29]:
pd.read_sql('select * from contacts limit 5', engine)

Unnamed: 0,LegacyContactId,LegacyIndividualId,ContactName,FirstName,LastName,SecondaryLegacyIndividualId,SecondaryFirstName,SecondaryLastName,HomePhone,HomeEmail,Address1,City,State,PostalCode,IsPrivate,IsDeceased,ContactType
0,653377813-7,0,Karita & Kelvin Lumbers,Karita,Lumbers,1.0,Kelvin,Lumbers,,kklumbers@ yahoo.co,4 Bunting Parkway,Washington,DC,,1,0,Household
1,390551098-7,2,Helga Benech,Helga,Benech,,,,,ebenech1@goodreads.com,48684 Jenifer Way,Las Vegas,NV,89130.0,0,0,Household
2,093004505-X,3,Masha Butt Gow,Masha,Butt Gow,,,,577-374-96523,,353 Schmedeman Park,Indianapolis,IN,,0,0,Household
3,729707142-0,4,Cymbre Cross,Cymbre,Cross,,,,,,2055 Lakewood Parkway,Camden,NJ,,0,0,Organization
4,488464926-5,5,Hoyt Castille,Hoyt,Castille,,,,,fcastille4@timesonline.co.uk,37 8th Trail,Grand Rapids,MI,49560.0,0,0,Household


> Creating the final gifts column <br>
<br>

- The cleaning is done <br>
- Creating the gifts table and renaming some columns <br>

In [30]:
proc = """
CREATE TABLE IF NOT EXISTS gifts (
    LegacyContactId VARCHAR(255),
    LegacyGiftId INTEGER,
    GiftType TEXT,
    GiftDate TEXT,
    GiftAmount REAL,
    Notes TEXT,
    CreditCardType TEXT,
    Project1Code TEXT,
    Project2Code TEXT,
    LegacyPledgeID INTEGER
);

insert into gifts
select 
    DonorNumber as LegacyContactId,
    GiftId as LegacyGiftId,
    PaymentMethod as GiftType,
    GiftDate,
    AmountReceived as GiftAmount,
    Notes,
    CreditCardType,
    Project1Code,
    Project2Code,
    LegacyPledgeID
from temp_gifts;

commit;
"""
insert_proc(proc, 'create_gifts', call=True)

SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed
SQL executed successfully
MySQL connection closed


In [31]:
pd.read_sql('select * from gifts limit 5', engine)

Unnamed: 0,LegacyContactId,LegacyGiftId,GiftType,GiftDate,GiftAmount,Notes,CreditCardType,Project1Code,Project2Code,LegacyPledgeID
0,848348568-0,95196378,Other,2019-03-04 00:00:00,4.15,,,,,0
1,729707142-0,95196889,Check,2019-03-05 00:00:00,2.3648,,,ChildSponsorship,,1
2,687119652-8,95197689,Cash,2019-03-07 00:00:00,1.31,,,,,2
3,653377813-7,95198998,Credit,2019-03-10 00:00:00,2.04,In honor of Mannie Turpin,AMEX,,,3
4,390551098-7,95198999,Cash,2019-01-10 00:00:00,5.8,,,,,89752384


# Export

In [32]:
#| hide
import nbdev

In [33]:
#| hide
nbdev.nbdev_export('01_SQL_Solution.ipynb')