# Troubleshooting the ETL Load

## **Scenario**

Andy the Intern has been tasked with the ETL load tasks. Lately, the database operations team has been receiving reports of various repeating errors. This workbook contains queries that the developers would like to look at so they can identify the errors and prioritize them for bug fixes.

**Please Note:**

<mark>The developers would like to see the results so they request that a copy of this file is made and saved in the client folder, along with the results, so they can determine if it's a problem with this particular client's file or if it's a bug in the ETL.</mark>

## Bad Dates
Are there any history records outside of 2020?

System only created/launched 2020-Jan-01, so there should be no records between then and today

**If any records are found**: the source file should be re-examined. 

In [1]:
SELECT
	Good_Deed_Timestamp,
	Person.First_Name,
	Person.Last_Name,
	Good_Deed_Type.Good_Deed_Type_Name,
	Good_Deed_History.Good_Deed_Description,
	Good_Deed_History.Good_Deed_History_ID
FROM Good_Deed_History
INNER JOIN dbo.Person
	ON Good_Deed_History.Good_Deed_Person_ID = Person.Person_ID
INNER JOIN Good_Deed_Type
	ON Good_Deed_History.Good_Deed_Type_ID = Good_Deed_Type.Good_Deed_Type_ID
WHERE Good_Deed_History.Good_Deed_Timestamp NOT BETWEEN '2020-01-01' AND GETDATE();

Good_Deed_Timestamp,First_Name,Last_Name,Good_Deed_Type_Name,Good_Deed_Description,Good_Deed_History_ID
2010-05-20 00:00:00.000,May,Parker,Help person,Consoled Peter over loss of Uncle Ben,3
1950-01-01 00:00:00.000,Peggy,Carter,Stop crime,Stopped money laundering by Hydra,14
1950-01-01 00:00:00.000,Peggy,Carter,Save city,Foiled Hydra,15
1950-01-01 00:00:00.000,Peggy,Carter,Defeat supervillan,Stopped band of Red Skull impersonators,16


Check ETL error log for any load errors.

In [2]:
SELECT *
FROM dbo.ETL_Error_Log
ORDER BY ETL_Error_Log_ID;

ETL_Error_Log_ID,Error_Description,CSV_Data,Error_Timestamp
1,Good Deed not found,"Peggy Carter,Foiled Hydra,Save city,NULL",2020-05-19 00:00:00.000
2,Person not found,"Tony Stark,Defeated the Mandarin,Defeat supervillan,",2020-05-19 00:00:00.000
3,Person not found,"Melkin Deborah,Help person,Prevented intern from dropping Prod,NULL",2020-05-19 00:00:00.000
4,Translation error,"Zero Wing,All your base are belong to us,All your base are belong to us,NULL",2020-05-19 00:00:00.000
5,Translation error,"Zero Wing,Somebody set up us the bomb,Somebody set up us the bomb,NULL",2020-05-19 00:00:00.000
6,Mismatched number of Values,"Zero Wing,You have no chance to survive make your time,NULL",2020-05-19 00:00:00.000


## Person Not Found

This is a common error. 
* If the person exists in the database but the name is backwards, the file needs to be fixed.
* If the person does not exist in the database, we need to confirm whether they need to be added. If so, please create a ticket to add so the file can be re-run.

In [3]:
SELECT *
FROM dbo.ETL_Error_Log
WHERE Error_Description = 'Person not found';

ETL_Error_Log_ID,Error_Description,CSV_Data,Error_Timestamp
2,Person not found,"Tony Stark,Defeated the Mandarin,Defeat supervillan,",2020-05-19 00:00:00.000
3,Person not found,"Melkin Deborah,Help person,Prevented intern from dropping Prod,NULL",2020-05-19 00:00:00.000


In [4]:
SELECT 'Names are reversed:', *
FROM dbo.Person
INNER JOIN (
	SELECT *
	FROM dbo.ETL_Error_Log
		CROSS APPLY STRING_SPLIT(CSV_Data, ',') AS Split_CSV_Data
	WHERE Error_Description = 'Person not found'
) AS Split_CSV_Data
	ON (Person.Last_Name + ' ' + Person.First_Name) = Split_CSV_Data.value


(No column name),Person_ID,First_Name,Last_Name,ETL_Error_Log_ID,Error_Description,CSV_Data,Error_Timestamp,value
Names are reversed:,10,Deborah,Melkin,3,Person not found,"Melkin Deborah,Help person,Prevented intern from dropping Prod,NULL",2020-05-19 00:00:00.000,Melkin Deborah


In [5]:
SELECT 'Person needs to be added:', *
FROM dbo.ETL_Error_Log
WHERE Error_Description = 'Person not found'
	AND ETL_Error_Log_ID NOT IN ( 
		SELECT ETL_Error_Log_ID
		FROM dbo.Person
		INNER JOIN (
			SELECT *
			FROM dbo.ETL_Error_Log
				CROSS APPLY STRING_SPLIT(CSV_Data, ',') AS Split_CSV_Data
			WHERE Error_Description = 'Person not found'
		) AS Split_CSV_Data
			ON (Person.Last_Name + ' ' + Person.First_Name) = Split_CSV_Data.value
	);


(No column name),ETL_Error_Log_ID,Error_Description,CSV_Data,Error_Timestamp
Person needs to be added:,2,Person not found,"Tony Stark,Defeated the Mandarin,Defeat supervillan,",2020-05-19 00:00:00.000


Since there is someone who doesn't exist in the database and it's not a simple name reversal, confirm the list of people in the database.

In [6]:
SELECT *
FROM dbo.Person
ORDER BY Person_ID;

Person_ID,First_Name,Last_Name
2,Diana,Prince
3,Linda,Danvers
4,Selina,Kyle
5,May,Parker
6,Natasha,Romanoff
7,Ororo,Munroe
10,Deborah,Melkin
11,Peggy,Carter
12,Doctor,Who
13,xXx,aAa


There is a client with known bad data for the Person file that cannot be changed. We know which records they are by the last name. We already have permission to remove those records. 

Here is the code you need to execute. Please run both the SELECT and DELETE statements in case there are new unexpected records being created for this bad data file:

In [7]:
SELECT * FROM Person
WHERE Last_Name IN ('aAa', 'bBb', 'cCc')

Person_ID,First_Name,Last_Name
13,xXx,aAa
14,yYy,bBb
15,zZz,cCc


In [8]:
DELETE FROM Person
WHERE Last_Name IN ('aAa', 'bBb', 'cCc')

## Good Deed Not Found
There is a set list of Good Deeds:
* Foiled Hydra
* Rescue animal
* Stop crime
* Help person
* Save city
* Defeat supervillan

First confirm whether the good deed that was not found should exist.

In [9]:
WITH Error_Log_CTE AS (
	SELECT *, ROW_NUMBER() OVER(PARTITION BY ETL_Error_Log_ID ORDER BY (SELECT NULL)) AS CSV_ID
	FROM dbo.ETL_Error_Log
		CROSS APPLY STRING_SPLIT(CSV_Data, ',') AS Split_CSV_Data
	WHERE Error_Description = 'Good Deed not found'
)
SELECT value
FROM Error_Log_CTE
WHERE CSV_ID = 2;


value
Foiled Hydra


Next determine whether the good deed table is populated properly.

In [10]:
SELECT Good_Deed_Type_Name
FROM dbo.Good_Deed_Type
ORDER BY Good_Deed_Type_Name;

Good_Deed_Type_Name
Defeat supervillan
Dominate world
Embarrass hero
HeLp PeRsOn
hElP pErSon
Help person
Rescue animal
Save city
"Steal from poor, give to self"
Stop crime
