In [0]:
-- Show available tables
SHOW TABLES IN anzdb;

### 1. Load the Silver Table and Basic Checks

In [0]:
select * from anz.transactions_silver;

In [0]:
select count(*) from anzdb.transactions_silver;

In [0]:
DESCRIBE TABLE anzdb.transactions_silver;

### 2. Analyzing Individual Columns

In [0]:
SELECT
    MIN(amount) AS min_amount,
    MAX(amount) AS max_amount,
    AVG(amount) AS avg_amount,
    STDDEV(amount) AS stddev_amount,
    PERCENTILE_APPROX(amount, 0.25) AS percentile_25_amount,
    PERCENTILE_APPROX(amount, 0.50) AS median_amount,
    PERCENTILE_APPROX(amount, 0.75) AS percentile_75_amount
FROM anzdb.transactions_silver;


In [0]:
SELECT
    MIN(age) AS min_age,
    MAX(age) AS max_age,
    AVG(age) AS avg_age,
    PERCENTILE_APPROX(age, 0.50) AS median_age
FROM anzdb.transactions_silver;

In [0]:
SELECT
    txn_description,
    COUNT(*) AS frequency
FROM transactions_silver
GROUP BY txn_description
ORDER BY frequency DESC
LIMIT 20;

In [0]:
SELECT
    merchant_state,
    COUNT(*) AS frequency
FROM transactions_silver
GROUP BY merchant_state
ORDER BY frequency DESC;

In [0]:
SELECT
    gender,
    COUNT(*) AS frequency
FROM transactions_silver
GROUP BY gender
ORDER BY frequency DESC;

In [0]:
SELECT
    status,
    COUNT(*) AS frequency
FROM transactions_silver
GROUP BY status
ORDER BY frequency DESC;

In [0]:
SELECT
    MIN(transaction_date) AS min_transaction_date,
    MAX(transaction_date) AS max_transaction_date
FROM transactions_silver;

In [0]:
SELECT
    DATE_FORMAT(transaction_date, 'MMMM') AS transaction_month, -- 'MMMM' for full month name
    COUNT(*) AS num_transactions
FROM transactions_silver
GROUP BY transaction_month
ORDER BY transaction_month;

In [0]:
SELECT
    DAYOFWEEK(transaction_date) AS day_of_week_num, -- Sunday=1, Saturday=7 
    DATE_FORMAT(transaction_date, 'E') AS day_of_week_name, -- e.g., Mon, Tue
    COUNT(*) AS num_transactions
FROM transactions_silver
GROUP BY DAYOFWEEK(transaction_date), DATE_FORMAT(transaction_date, 'E')
ORDER BY day_of_week_num;

### Bivariate/Multivariate Analysis (Relationships between columns)

In [0]:
SELECT
    gender,
    AVG(amount) AS avg_spend,
    SUM(amount) AS total_spend,
    COUNT(*) AS transaction_count
FROM transactions_silver
GROUP BY gender;

In [0]:
SELECT
    merchant_state,
    AVG(amount) AS avg_spend,
    SUM(amount) AS total_spend,
    COUNT(*) AS transaction_count
FROM transactions_silver
GROUP BY merchant_state
ORDER BY total_spend DESC
LIMIT 10;

In [0]:
SELECT
    CASE
        WHEN age < 20 THEN 'Under 20'
        WHEN age BETWEEN 20 AND 29 THEN '20-29'
        WHEN age BETWEEN 30 AND 39 THEN '30-39'
        WHEN age BETWEEN 40 AND 49 THEN '40-49'
        WHEN age BETWEEN 50 AND 59 THEN '50-59'
        WHEN age >= 60 THEN '60+'
        ELSE 'Unknown'
    END AS age_group,
    AVG(amount) AS avg_spend,
    SUM(amount) AS total_spend,
    COUNT(*) AS transaction_count
FROM transactions_silver
GROUP BY age_group
ORDER BY age_group;

### Customer-Level Aggregations

In [0]:
SELECT
customer_id,
SUM(amount) AS total_spend_per_customer,
COUNT(*) AS transaction_count_per_customer,
AVG(amount) AS avg_transaction_amount_per_customer,
MIN(transaction_date) AS first_txn_date,
MAX(transaction_date) AS last_txn_date,
DATEDIFF((SELECT MAX(transaction_date) FROM transactions_silver), MAX(transaction_date)) AS recency_days 
FROM transactions_silver
GROUP BY customer_id
ORDER BY total_spend_per_customer DESC
LIMIT 10;


In [0]:
SELECT
    customer_id,
    SUM(amount) AS total_spend_per_customer
FROM transactions_silver
GROUP BY customer_id
ORDER BY total_spend_per_customer DESC

### Exploratory Data Analysis on Customer Profile for Segmentation **Threshold**

In [0]:
select * from anzdb.customer_master_profile_gold;

In [0]:
select min(total_spend) as min_spend,max(total_spend) as max_spend,avg(total_spend) as avg_spend from anz.customer_master_profile_gold;

In [0]:
select min(avg_transaction_amount) as min_avg_spend, max(avg_transaction_amount) as max_avg_spend,avg(avg_transaction_amount) as avg_avg_spend from anzdb.customer_master_profile_gold;


In [0]:
select min(transaction_count) as min_txn_cnt, max(transaction_count) as max_txn_cnt,avg(transaction_count) as avg_txn_cnt from anz.customer_master_profile_gold;

In [0]:
select max(last_transaction_date) from anzdb.customer_master_profile_gold

In [0]:
select min(customer_tenure_days) as min_tenure_days, max(customer_tenure_days) as max_tenure_days,avg(customer_tenure_days) as avg_tenure_days from anzdb.customer_master_profile_gold;

In [0]:
select sum(count_pos), sum(count_payment_transfer),sum(count_inter_bank_transfer),count(count_phone_banking) from anzdb.customer_master_profile_gold;

### Customer Profile Metrics

- **total_spend**: Ranges from ~925 to ~12865.\
    _A few are > 10000 and a few are < 1000.\
    Many are in the 3000-9000 range._
- **avg_transaction_amount**: Varies widely, from ~21 to ~183.
- **transaction_count**: From 22 to 564.
- **recency_days**: Mostly 2400-2408. This suggests the data is from a fixed historical period and CURRENT_DATE() was far in the future. For actual campaign targeting, relative recency if we assume the latest last_transaction_date in the sample is "recent".
- **Max** **last_transaction_date** in sample: 2018-10-31
- **customer_tenure_days**: Mostly around 85-91 days. This means the observation window for this data is about 3 months. This is important context.
- **age**: Ranges from 18 to 78. Good spread.
- **spend_pos**: Significant for many, indicates retail/card activity.
- **spend_payment_transfer** & **spend_inter_bank_transfer**: Some customers have high values here.

### Example Thresholds (Based on this Sample - Adjust with full dataset EDA)

**1. total_spend Thresholds:**
- Observation: Eyeballing, > $9000 seems to be the higher end in this sample. Around 5000−6000 seems like a mid-to-high point. < $3000 seems lower.
- High Spenders: total_spend >= 9000
- Medium Spenders: total_spend >= 5000 AND total_spend < 9000
- Lower Spenders: total_spend < 5000

**2. transaction_count Thresholds (given ~90 day tenure):**
- High: > 150 transactions (roughly > 1.6 per day)
- Medium: 75-150 transactions
- Low: < 75 transactions
- Active (Frequency): transaction_count >= 75 (about 2-3 transactions per week on average)
- Very Active (Frequency): transaction_count >= 150

**3. avg_transaction_amount Thresholds:**
- High: avg_transaction_amount >= 75
- Medium: avg_transaction_amount >= 45 AND avg_transaction_amount < 75

**4. recency_days (Relative to Max Date in Sample: 2018-10-31):**
- For this historical dataset, if we consider "recent" as transactions within the last few days of this period:
  - last_transaction_date >= '2018-10-28' (i.e., within 3 days of max date).
- For a real campaign, this would be recency_days < 30 or < 90 using current date.

**5. age Group Thresholds:**
- Young: age <= 28
- Mid-Age: age > 28 AND age <= 50
- Senior: age > 50

**6. Spend Category Dominance (Example for spend_pos):**
- POS Dominant: spend_pos / total_spend >= 0.6 (60% of spend is POS)

### Rule-Based Segmentation for Marketing Campaigns (Using Derived Thresholds)

**is_sample_recent_customer**: Since your recency_days are very high, I've created a flag to identify customers active towards the end of your dataset's ~3 month period. In a real DLT pipeline targeting current customers, you would use your recency_days column directly with thresholds like < 30, < 90.

**Premium CC Candidate**:
- Targets the top spenders (>= 9000 from sample).
- Looks for higher average transaction amounts.
- Requires high frequency and recent activity.
- Targets a typical prime age range.

**Standard CC Candidate**:
- Targets medium spenders who are not premium candidates.
- Requires good activity levels and a significant portion of their spending at Point of Sale.

**Personal Loan Candidate**:
- Looks for established customers (good tenure for this sample).
- Good overall financial activity (total_spend).
- Evidence of handling larger sums (high max_transaction_amount or significant spend_payment_transfer / spend_inter_bank_transfer). This is a proxy for needing or being able to manage loan repayments.

**Potential Investor**:
- Targets the very highest spenders.
- Looks for consistently higher value transactions (median_transaction_amount).
- Significant proportion of funds being moved via transfers, which might indicate investment activity (this is a weaker proxy and needs business validation).
- Targets a slightly older, more established demographic.
