### Outlier Detection with IQR

The Interquartile Range (IQR) is a measure of statistical dispersion and is used to detect outliers in a dataset. The IQR is calculated as the difference between the 75th percentile (Q3) and the 25th percentile (Q1) of the data. Any data point that falls below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR is considered an outlier.

Below, we will use the IQR method to detect outliers in the `o_totalprice` column of the `orders` table.

# Define the SQL query to detect outliers using the IQR method
OUTLIER_IQR_QUERY = """
WITH Quantiles AS (
  -- CTE to calculate the 25th percentile (Q1) and 75th percentile (Q3) of the o_totalprice column
  SELECT
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY o_totalprice) AS Q1,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY o_totalprice) AS Q3
  FROM orders
),

IQR_Calculation AS (
  -- CTE to calculate the Interquartile Range (IQR) as the difference between Q3 and Q1
  SELECT
    Q1,
    Q3,
    Q3 - Q1 AS IQR
  FROM Quantiles
)

-- Select orders where the o_totalprice is an outlier
SELECT
  l.o_orderkey,
  o_totalprice
FROM
  orders l
WHERE
  -- An outlier is defined as a value below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR
  o_totalprice < (SELECT Q1 - 1.5 * IQR FROM IQR_Calculation) OR o_totalprice > (SELECT Q3 + 1.5 * IQR FROM IQR_Calculation)
"""

df = pd.read_sql(OUTLIER_IQR_QUERY, connector)
df

### Outlier Detection with Z-Score

The Z-score is a measure of how many standard deviations a data point is from the mean. A Z-score of 0 indicates that the data point is exactly at the mean, while a Z-score of 1 indicates that the data point is one standard deviation above the mean. Any data point with a Z-score greater than 3 or less than -3 is considered an outlier.

OUTLIER_ZSCORE_QUERY = """
WITH Stats AS (
  SELECT
    AVG(o_totalprice) AS mean,
    STDDEV(o_totalprice) AS stddev
  FROM orders
)
SELECT
  o_orderkey,
  o_totalprice
FROM
  orders
WHERE
  ABS(o_totalprice - (SELECT mean FROM Stats)) / (SELECT stddev FROM Stats) > 3
"""
df = pd.read_sql(OUTLIER_ZSCORE_QUERY, connector)
df