### **Data Modeling**
#### **1. What is data modeling, and why is it important?**
**Sample Answer:**  
Data modeling is the process of defining and structuring data elements and their relationships to support business processes. It ensures data consistency, improves data integrity, and enhances query performance by creating optimized database schemas. Data models are essential for designing efficient databases that meet business needs.

#### **2. Can you explain the difference between conceptual, logical, and physical data models?**
**Sample Answer:**  
- **Conceptual Data Model:** High-level representation of data, focusing on business entities and relationships without technical details.  
- **Logical Data Model:** Defines attributes, primary and foreign keys, and relationships in more detail but remains independent of a specific database system.  
- **Physical Data Model:** Represents how the data will be stored in the database, including table structures, indexes, and constraints.

#### **3. What is normalization, and why is it important?**
**Sample Answer:**  
Normalization is the process of structuring a relational database to reduce data redundancy and improve data integrity. It involves dividing large tables into smaller ones and defining relationships. It helps avoid anomalies in data insertion, deletion, and updating while improving efficiency.

#### **4. What are the key differences between OLTP and OLAP databases?**
**Sample Answer:**  
- **OLTP (Online Transaction Processing):** Used for transactional systems, supports real-time operations, and ensures quick insert, update, and delete operations.  
- **OLAP (Online Analytical Processing):** Designed for analytical workloads, optimized for read-heavy queries, and supports complex aggregations for business intelligence and reporting.

---

### **ETL (Extract, Transform, Load)**
#### **5. What is ETL, and why is it necessary in data warehousing?**
**Sample Answer:**  
ETL stands for Extract, Transform, and Load. It is a process used to collect data from different sources (Extract), clean and structure it (Transform), and store it in a data warehouse (Load). ETL ensures data consistency, accuracy, and availability for reporting and analytics.

#### **6. Can you explain the difference between ETL and ELT?**
**Sample Answer:**  
- **ETL (Extract, Transform, Load):** Data is transformed before being loaded into the data warehouse. Suitable for structured data processing.  
- **ELT (Extract, Load, Transform):** Data is first loaded into the data warehouse and then transformed using SQL or other processing tools. Preferred for big data environments.

#### **7. What are some common ETL challenges, and how do you overcome them?**
**Sample Answer:**  
- **Data Quality Issues:** Use data validation rules and logging mechanisms.  
- **Performance Bottlenecks:** Optimize queries, indexing, and partitioning.  
- **Handling Large Data Volumes:** Implement parallel processing and incremental data loading strategies.  
- **Error Handling & Monitoring:** Implement retry mechanisms and logging for failed jobs.

#### **8. How would you design an ETL pipeline for processing customer transactions?**
**Sample Answer:**  
- **Extract:** Collect transaction data from multiple sources (databases, APIs, logs).  
- **Transform:** Clean the data, normalize formats, handle missing values, and perform calculations (e.g., total transaction amount).  
- **Load:** Store transformed data in a data warehouse for reporting and analytics.

---

### **Python**
#### **9. How can Python be used in data processing and ETL workflows?**
**Sample Answer:**  
Python provides libraries like **Pandas, NumPy, PySpark, and SQLAlchemy** for data manipulation, transformation, and integration with databases. **Airflow and Luigi** are popular frameworks for managing ETL workflows.

#### **10. Write a Python script to read a CSV file and clean missing values.**
**Sample Answer:**
```python
import pandas as pd

# Read CSV file
df = pd.read_csv("data.csv")

# Fill missing values with mean
df.fillna(df.mean(), inplace=True)

# Save cleaned data
df.to_csv("cleaned_data.csv", index=False)

print("Data cleaned and saved successfully.")
```

#### **11. What is the difference between a list, tuple, and dictionary in Python?**
**Sample Answer:**  
- **List:** Mutable, ordered collection (e.g., `[1, 2, 3]`).  
- **Tuple:** Immutable, ordered collection (e.g., `(1, 2, 3)`).  
- **Dictionary:** Key-value pairs, unordered (e.g., `{"name": "Alice", "age": 30}`).

#### **12. What are lambda functions in Python, and how are they useful?**
**Sample Answer:**  
Lambda functions are anonymous, single-line functions in Python used for short, simple operations.  
Example:
```python
square = lambda x: x * x
print(square(5))  # Output: 25
```
They are commonly used in functions like `map()`, `filter()`, and `sorted()`.

---

### **SQL**
#### **13. What are the different types of joins in SQL?**
**Sample Answer:**  
- **INNER JOIN:** Returns matching records from both tables.  
- **LEFT JOIN:** Returns all records from the left table and matching records from the right.  
- **RIGHT JOIN:** Returns all records from the right table and matching records from the left.  
- **FULL JOIN:** Returns all records from both tables, with NULLs where there’s no match.

#### **14. Write an SQL query to find duplicate records in a table.**
**Sample Answer:**
```sql
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
```

#### **15. How do you optimize SQL queries for better performance?**
**Sample Answer:**  
- Use **indexes** on frequently queried columns.  
- Avoid **SELECT***; specify only required columns.  
- Optimize **JOINs** by indexing keys properly.  
- Use **EXPLAIN PLAN** to analyze query execution.  
- Implement **partitioning** for large tables.

#### **16. Write an SQL query to get the top 5 highest-paid employees from a table.**
**Sample Answer:**
```sql
SELECT employee_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
```

#### **17. What is a stored procedure, and when should you use it?**
**Sample Answer:**  
A stored procedure is a precompiled SQL code that can be executed multiple times. It improves performance, enhances security, and simplifies database management by encapsulating logic. Stored procedures are used for repetitive tasks like data validation, batch processing, and ETL workflows.

---

### **Data Modeling**
#### **18. What is a star schema, and how does it compare to a snowflake schema?**
**Sample Answer:**  
A **star schema** is a database structure where a central fact table connects to multiple dimension tables, making queries simple and fast.  
A **snowflake schema** normalizes dimension tables by breaking them into sub-dimensions, reducing redundancy but increasing complexity.

**Example:**  
- **Star Schema:**  
  - `fact_sales (sales_id, product_id, customer_id, date, amount)`
  - `dim_product (product_id, name, category)`
  - `dim_customer (customer_id, name, location)`
  
- **Snowflake Schema:**  
  - `dim_product (product_id, name, category_id)`
  - `dim_category (category_id, category_name)`

Use **star schema** for better performance in OLAP and **snowflake schema** for normalized data storage.

#### **19. How would you design a data model for tracking concert ticket sales?**
**Sample Answer:**  
**Entities:**
- **Customers** (customer_id, name, email, phone)
- **Concerts** (concert_id, artist, venue, date, time)
- **Tickets** (ticket_id, customer_id, concert_id, price, seat_number)
- **Payments** (payment_id, customer_id, amount, date, method)

**Relationships:**  
- One **customer** can buy multiple **tickets**.  
- One **concert** can have multiple **tickets sold**.  
- One **ticket** is linked to a **payment transaction**.

Using a **star schema** would make querying total revenue, popular concerts, and sales trends easier.

#### **20. What is a surrogate key, and when should you use it?**
**Sample Answer:**  
A **surrogate key** is a system-generated unique identifier (e.g., an auto-incremented ID) that replaces a natural key.  
Use it when:
- Natural keys are long or subject to change.
- Relationships involve complex composite keys.
- Data integrity and performance optimization are needed.

---

### **ETL (Extract, Transform, Load)**
#### **21. How would you design an ETL pipeline for real-time data ingestion?**
**Sample Answer:**  
A **real-time ETL pipeline** consists of:
1. **Extract:** Use **Kafka, AWS Kinesis, or RabbitMQ** to stream data from sources.
2. **Transform:** Process data using **Apache Flink, Spark Streaming, or Python scripts**.
3. **Load:** Store transformed data in **a NoSQL DB (MongoDB) for quick access** and **a data warehouse (Snowflake, Redshift) for analytics**.

For example, in **a concert ticketing system**, real-time ETL helps detect ticket availability, fraud detection, and dynamic pricing.

#### **22. What strategies would you use to handle slowly changing dimensions (SCDs)?**
**Sample Answer:**  
**SCDs** track changes in dimension tables over time. Common types:  
- **SCD Type 1:** Overwrites old data (e.g., updating customer email).  
- **SCD Type 2:** Creates a new record with a version or timestamp (e.g., tracking a customer’s address history).  
- **SCD Type 3:** Adds a new column for historical values (e.g., storing the previous and current region of a user).

#### **23. How would you optimize an ETL process that is running too slowly?**
**Sample Answer:**  
- **Optimize extraction:** Use bulk loading instead of row-by-row inserts.
- **Optimize transformation:** Apply partitioning and parallel processing.
- **Optimize loading:** Use indexing, compression, and batch processing.
- **Implement incremental loading:** Process only new/updated records instead of full loads.
- **Use caching mechanisms:** Store frequently accessed data in memory.

---

### **Python**
#### **24. How would you use Python to automate an ETL pipeline?**
**Sample Answer:**  
Use **Pandas for data transformation**, **SQLAlchemy for database connections**, and **Airflow for orchestration**.

Example Python script:
```python
import pandas as pd
from sqlalchemy import create_engine

# Extract
df = pd.read_csv("raw_data.csv")

# Transform
df["amount"] = df["amount"].fillna(0)  # Handle missing values

# Load
engine = create_engine("mysql://user:password@localhost/db")
df.to_sql("cleaned_data", con=engine, if_exists="replace", index=False)

print("ETL pipeline completed.")
```
---

#### **25. How would you handle large datasets efficiently in Python?**
**Sample Answer:**  
- Use **Dask** or **PySpark** instead of Pandas for distributed processing.
- Use **iterators** and **generators** instead of loading entire datasets into memory.
- Optimize SQL queries and avoid unnecessary computations.

Example using Dask:
```python
import dask.dataframe as dd
df = dd.read_csv("large_data.csv")
df.groupby("category").mean().compute()
```
---

#### **26. How would you detect and handle duplicate records in Python?**
**Sample Answer:**  
Using Pandas:
```python
df = df.drop_duplicates()
```
If keeping the latest record:
```python
df = df.sort_values("date").drop_duplicates(subset=["customer_id"], keep="last")
```
---

### **SQL**
#### **27. Write an SQL query to calculate the total sales per concert.**
```sql
SELECT concert_id, SUM(price) AS total_sales
FROM tickets
GROUP BY concert_id;
```
---

#### **28. How do you retrieve the second highest salary from an employee table?**
```sql
SELECT DISTINCT salary 
FROM employees 
ORDER BY salary DESC 
LIMIT 1 OFFSET 1;
```
---

#### **29. How would you troubleshoot slow SQL queries?**
**Sample Answer:**  
- Use `EXPLAIN ANALYZE` to check query execution plans.
- Create indexes on columns used in WHERE, JOIN, and GROUP BY.
- Optimize joins by reducing the number of scanned rows.
- Use `LIMIT` to test queries on smaller datasets.
- Avoid `SELECT *` and retrieve only required columns.

---

#### **30. Write an SQL query to find customers who have purchased tickets for multiple concerts.**
```sql
SELECT customer_id
FROM tickets
GROUP BY customer_id
HAVING COUNT(DISTINCT concert_id) > 1;
```
---

#### **31. How would you join three tables in SQL?**
**Sample Answer:**  
Example of joining `customers`, `tickets`, and `concerts`:
```sql
SELECT c.name, co.artist, t.price
FROM customers c
JOIN tickets t ON c.customer_id = t.customer_id
JOIN concerts co ON t.concert_id = co.concert_id;
```
---

### **Scenario-Based Questions**
#### **32. A manager wants to track concert ticket sales trends. What KPIs and reports would you suggest?**
**Sample Answer:**  
KPIs:
- Total revenue per concert.
- Ticket sales trends over time.
- Most popular concerts based on demand.
- Customer demographics and purchase behavior.

Reports:
- Daily/weekly sales performance.
- Customer retention and repeat purchase trends.
- Heatmaps showing ticket purchases by location.

---

#### **33. How would you detect fraudulent transactions in a ticketing system?**
**Sample Answer:**  
- Identify **suspiciously high ticket purchases** from a single user.
- Track **multiple purchases from the same IP address**.
- Flag transactions with **mismatched location and payment details**.
- Use **machine learning models** to detect anomalies.

SQL query to detect duplicate payments:
```sql
SELECT customer_id, COUNT(*)
FROM payments
WHERE amount > 1000
GROUP BY customer_id
HAVING COUNT(*) > 3;
```
---

#### **34. How would you design a database for loyalty rewards in the concert system?**
**Sample Answer:**  
**Tables:**
- `customers (customer_id, name, email, points)`
- `transactions (transaction_id, customer_id, amount, points_earned)`
- `rewards (reward_id, description, points_required)`

**Logic:**
- Earn points for each ticket purchase.
- Redeem points for discounts or VIP access.

---

### **Hadoop Basics**
#### **1. What is Hadoop, and why is it used?**  
**Sample Answer:**  
Hadoop is an **open-source framework** for processing and storing large datasets in a **distributed computing environment**. It enables **scalable, fault-tolerant, and parallel processing** using commodity hardware. Hadoop is mainly used for **big data analytics, batch processing, and unstructured data storage**.

---

#### **2. What are the core components of the Hadoop ecosystem?**  
**Sample Answer:**  
The Hadoop ecosystem consists of several core components:  
- **HDFS (Hadoop Distributed File System):** A distributed file storage system.  
- **MapReduce:** A parallel processing framework for large-scale data processing.  
- **YARN (Yet Another Resource Negotiator):** Manages cluster resources and job scheduling.  
- **HBase:** A NoSQL database for real-time read/write operations.  
- **Hive:** A SQL-like querying tool for Hadoop.  
- **Pig:** A high-level scripting language for data processing.  
- **Sqoop:** A tool for transferring data between Hadoop and relational databases.  
- **Flume:** A tool for ingesting log data into Hadoop.

---

### **HDFS (Hadoop Distributed File System)**
#### **3. What are the key features of HDFS?**  
**Sample Answer:**  
- **Distributed Storage:** Data is split into blocks and stored across multiple nodes.  
- **Fault Tolerance:** Data is replicated across nodes to prevent data loss.  
- **Scalability:** Can handle petabytes of data across thousands of machines.  
- **Write-Once, Read-Many:** Data can be written once and read multiple times.  
- **Automatic Data Balancing:** Hadoop automatically manages data distribution.

---

#### **4. How does Hadoop handle node failures?**  
**Sample Answer:**  
Hadoop handles node failures using **data replication**. When a node fails:  
- **HDFS automatically replicates missing data from other nodes.**  
- **YARN reschedules jobs to available nodes.**  
- **A NameNode monitors active and failed DataNodes.**  
- **Replication factors ensure multiple copies exist.**  

Example: If a block is stored with a **replication factor of 3**, it exists on three different nodes. If one node fails, the data is still available from the other two.

---

### **MapReduce**
#### **5. What is MapReduce, and how does it work?**  
**Sample Answer:**  
MapReduce is a **programming model** for processing large datasets in a **distributed and parallel** manner.  
It consists of two phases:  
1. **Map Phase:** Processes input data and converts it into key-value pairs.  
2. **Reduce Phase:** Aggregates and processes key-value pairs to generate final results.  

Example: Word count using MapReduce  
- **Map:** Read text and output (word, 1) pairs.  
- **Reduce:** Sum up values for each word.  

---

#### **6. What are the advantages and limitations of MapReduce?**  
**Sample Answer:**  
✅ **Advantages:**  
- Handles **large-scale data processing**.  
- Provides **automatic parallelization** and **fault tolerance**.  
- Works efficiently with **structured and unstructured data**.  

❌ **Limitations:**  
- **Batch-oriented:** Not suitable for real-time processing.  
- **I/O heavy:** Requires reading/writing from disk frequently.  
- **Complex coding:** Requires Java or Python knowledge for implementation.  

---

### **YARN (Yet Another Resource Negotiator)**
#### **7. What is YARN, and why is it important in Hadoop?**  
**Sample Answer:**  
YARN is the **resource management layer** in Hadoop. It allows multiple applications (MapReduce, Spark, etc.) to run on the same cluster by efficiently allocating CPU, memory, and storage resources.  

**YARN components:**  
- **ResourceManager:** Allocates resources across applications.  
- **NodeManager:** Monitors resources on each node.  
- **ApplicationMaster:** Manages application execution.  

YARN improves **scalability, efficiency, and multi-tenancy** in Hadoop.

---

### **Hadoop vs Other Big Data Technologies**
#### **8. How does Hadoop compare to Spark?**  
**Sample Answer:**  
| Feature | Hadoop (MapReduce) | Apache Spark |
|---------|-------------------|-------------|
| **Processing** | Batch processing | Real-time & batch |
| **Speed** | Disk-based, slower | In-memory, much faster |
| **Ease of Use** | Complex (Java required) | Easier (supports Python, Scala, R) |
| **Fault Tolerance** | Uses data replication | Uses RDD lineage |
| **Best Use Cases** | Large-scale batch jobs | Real-time analytics & ML |

**Example:** If you need **real-time fraud detection**, use **Spark**. If you need **processing petabytes of log data**, use **Hadoop MapReduce**.

---

### **Hadoop Data Ingestion**
#### **9. What is the difference between Sqoop and Flume?**  
**Sample Answer:**  
- **Sqoop:** Used for **batch data transfer** between Hadoop and relational databases (MySQL, PostgreSQL).  
- **Flume:** Used for **real-time streaming** of log data into Hadoop (e.g., web server logs).  

**Example Use Case:**  
- Use **Sqoop** to import historical sales data from MySQL to Hadoop.  
- Use **Flume** to stream real-time website click logs into HDFS.

---

#### **10. How would you ingest large datasets into Hadoop?**  
**Sample Answer:**  
1. **Batch Processing:** Use **Sqoop** for structured data imports from RDBMS.  
2. **Streaming Data:** Use **Flume or Kafka** for log and real-time event data.  
3. **File Transfer:** Use **HDFS commands** or **DistCp** for copying large files.  
4. **Custom ETL Pipelines:** Use **Python, Apache NiFi, or Airflow** for automating ingestion.

---

### **Hadoop Performance Optimization**
#### **11. How would you optimize Hadoop job performance?**  
**Sample Answer:**  
- **Increase parallelism:** Use more nodes and increase mappers/reducers.  
- **Use compression:** Store files in **Parquet** or **ORC** format to reduce size.  
- **Tune block size:** Increase HDFS block size for large files (default: 128MB).  
- **Use combiners:** Reduce data shuffling in MapReduce.  
- **Partition data:** Distribute data based on keys for better load balancing.  

Example: Changing the default replication factor from **3 to 2** can save storage space.

---

#### **12. How does Hadoop handle small files efficiently?**  
**Sample Answer:**  
Hadoop is optimized for **large files**, but small files cause **high NameNode memory usage**. To handle small files:  
- Use **HAR (Hadoop Archive Files)** to merge small files.  
- Use **SequenceFiles** to store multiple files as key-value pairs.  
- Use **HBase** for frequent small file updates.  
- Increase **HDFS block size** to accommodate more data per block.

---

### **Hadoop Security**
#### **13. How does Hadoop ensure security?**  
**Sample Answer:**  
Hadoop provides security through:  
- **Kerberos authentication:** Ensures secure user access.  
- **HDFS permissions & ACLs:** Restrict file access.  
- **Encryption:** Protects data at rest and in transit.  
- **Ranger & Sentry:** Provide fine-grained access control.  

Example: In **a financial application**, you can use **Kerberos authentication** to prevent unauthorized access to transaction data.

---

### **Hadoop Use Cases**
#### **14. What are some real-world use cases of Hadoop?**  
**Sample Answer:**  
- **Banking & Fraud Detection:** Analyzing transaction data for fraud patterns.  
- **E-commerce Recommendations:** Processing customer behavior for product recommendations.  
- **Healthcare Analytics:** Storing and analyzing large-scale patient records.  
- **Log Analysis:** Processing server logs for performance monitoring.  

Example: **Netflix** uses Hadoop to **analyze streaming patterns and optimize recommendations**.

---

### Banking

### 1. Can you explain the key performance indicators (KPIs) you would monitor for a credit card portfolio?

**Sample Answer:**

Monitoring the performance of a credit card portfolio involves tracking several KPIs, including:

- **Delinquency Rate:** Measures the percentage of accounts past due, indicating potential credit risk.

- **Net Charge-Off Rate:** Represents the amount of debt unlikely to be collected, reflecting the quality of the credit portfolio.

- **Activation Rate:** The proportion of issued cards that are actively used, assessing customer engagement.

- **Average Spend per Account:** Evaluates card usage and helps in understanding customer spending behaviors.

- **Customer Retention Rate:** Indicates the percentage of customers who continue to use the card over time, essential for long-term profitability.

Regularly analyzing these KPIs helps in making informed decisions to enhance portfolio performance.

### 2. How would you approach analyzing the profitability of a new credit card product?

**Sample Answer:**

Analyzing the profitability of a new credit card product involves several steps:

1. **Market Research:** Assess the target market to estimate potential customer acquisition and usage rates.

2. **Revenue Streams:** Identify all possible income sources, such as interest income, annual fees, late fees, and interchange fees.

3. **Cost Analysis:** Calculate costs, including customer acquisition expenses, operational costs, rewards programs, and potential credit losses.

4. **Break-Even Analysis:** Determine the point at which revenues equal costs to understand the viability of the product.

5. **Sensitivity Analysis:** Evaluate how changes in key assumptions (e.g., default rates, spending behaviors) impact profitability.

This comprehensive approach ensures a thorough understanding of the financial prospects of the new product.

### 3. Describe a time when you used data analysis to identify and solve a problem related to credit card usage.

**Sample Answer:**

In my previous role, I noticed a decline in the average transaction value among our credit card users. By analyzing transaction data, I identified that a significant number of customers were using their cards primarily for small, everyday purchases.

To address this, I collaborated with the marketing team to develop a campaign promoting the benefits of using the card for larger purchases, such as cashback offers and extended warranties. Post-campaign, we observed a 15% increase in the average transaction value over the next quarter.

### 4. How do you assess the creditworthiness of potential credit card applicants?

**Sample Answer:**

Assessing the creditworthiness of applicants involves evaluating several factors:

- **Credit Score:** Provides a snapshot of the applicant's credit history and reliability.

- **Income and Employment History:** Ensures the applicant has a stable income to meet repayment obligations.

- **Debt-to-Income Ratio:** Calculates existing debt relative to income to assess additional debt capacity.

- **Payment History:** Reviews past payment behaviors to predict future reliability.

- **Existing Credit Lines:** Examines current credit accounts to evaluate credit management skills.

By analyzing these factors, we can make informed decisions on extending credit while mitigating risk.

### 5. What strategies would you implement to reduce credit card fraud?

**Sample Answer:**

Reducing credit card fraud requires a multifaceted approach:

- **Advanced Fraud Detection Systems:** Implement machine learning algorithms to detect unusual spending patterns in real-time.

- **Two-Factor Authentication (2FA):** Require additional verification for online transactions to enhance security.

- **Regular Account Monitoring:** Proactively monitor accounts for suspicious activities and alert customers promptly.

- **Customer Education:** Inform customers about safe practices, such as recognizing phishing attempts and safeguarding personal information.

- **Collaboration with Law Enforcement:** Work closely with authorities to track and prevent fraudulent activities.

Implementing these strategies can significantly reduce the incidence of fraud and protect both the company and its customers.

### 6. How do you stay updated with regulatory changes affecting the credit card industry?

**Sample Answer:**

Staying informed about regulatory changes is crucial. I regularly:

- **Subscribe to Industry Newsletters:** Receive updates from reputable financial news sources and regulatory bodies.

- **Attend Workshops and Seminars:** Participate in events focused on financial regulations and compliance.

- **Engage in Professional Networks:** Connect with peers and industry experts to discuss and share insights on emerging regulations.

- **Continuous Education:** Enroll in courses and certifications related to financial regulations and compliance.

This proactive approach ensures I remain knowledgeable and can adapt to any regulatory changes effectively.

### 7. Can you discuss a successful strategy you implemented to increase credit card customer retention?

**Sample Answer:**

In a previous position, we faced challenges with customer attrition. To combat this, I conducted a thorough analysis to identify common reasons for account closures.

Based on the findings, we introduced a personalized rewards program that offered tailored benefits aligning with individual spending habits. Additionally, we enhanced our customer service by providing dedicated support channels for premium customers.

As a result, customer retention rates improved by 20% over the following year, and customer satisfaction scores saw a significant uptick.

These questions and answers are designed to help you prepare effectively for a Credit Card Business Analyst interview, showcasing your expertise and experience in the field.

## **Credit Card Business & Market Analysis**  

### **1. What factors influence credit card customer acquisition and retention?**  
**Sample Answer:**  
Several factors impact both acquiring and retaining credit card customers:  

**For Customer Acquisition:**  
- **Attractive Offers & Rewards:** Cashback, travel points, and introductory 0% APR promotions.  
- **Competitive Interest Rates & Fees:** Lower fees and competitive APRs attract more applicants.  
- **Marketing & Partnerships:** Collaborations with retailers, airlines, and financial institutions.  
- **Approval Criteria & Ease of Application:** A seamless and fast approval process.  

**For Customer Retention:**  
- **Personalized Rewards & Benefits:** Customizing rewards based on spending behavior.  
- **Customer Support & Engagement:** 24/7 customer service and quick issue resolution.  
- **Loyalty Programs & Incentives:** Retention bonuses for long-term cardholders.  
- **Fraud Protection & Security:** Customers feel safer using a secure and reliable card.  

By understanding these factors, we can refine acquisition strategies and boost retention.  

---

### **2. How would you segment credit card customers to improve marketing efforts?**  
**Sample Answer:**  
Customer segmentation is crucial for **targeted marketing**. I would segment customers based on:  

1. **Demographics:** Age, income level, location, and profession.  
2. **Spending Behavior:** High spenders vs. low spenders, frequent travelers, online shoppers.  
3. **Credit Profile:** Prime, near-prime, and subprime customers based on credit scores.  
4. **Product Preference:** Cashback, travel rewards, business, or student credit cards.  
5. **Engagement Level:** Active users vs. dormant accounts.  

Example: If data shows that young professionals in urban areas spend heavily on dining and entertainment, we can tailor promotions and cashback rewards for those categories.  

---

### **3. How would you assess the impact of a new credit card fee structure?**  
**Sample Answer:**  
To evaluate a new **fee structure's impact**, I would:  

1. **Conduct Historical Analysis:** Compare similar changes in the past to predict customer response.  
2. **Segment Customers:** Identify which groups are most sensitive to fees.  
3. **Perform A/B Testing:** Implement the new fee for a small sample group and measure responses.  
4. **Analyze Customer Complaints & Churn Rate:** Check for increased complaints or card cancellations.  
5. **Monitor Revenue Changes:** Track short-term and long-term financial impact.  

Example: If we increase the **annual fee from $95 to $120**, we might lose some price-sensitive customers but gain more revenue from premium users who value benefits.  

---

## **Risk & Fraud Analysis**  

### **4. What methods would you use to detect fraudulent credit card transactions?**  
**Sample Answer:**  
To detect fraud, I would use a combination of **rule-based** and **machine learning** techniques, such as:  

- **Real-time anomaly detection:** Flagging transactions that deviate from normal patterns (e.g., large foreign transactions).  
- **Velocity checks:** Identifying multiple high-value transactions in a short time frame.  
- **Geolocation analysis:** Checking if the transaction location is inconsistent with previous activity.  
- **Behavioral analytics:** Using AI to predict fraud based on user behavior.  
- **Blacklisting & Whitelisting:** Blocking known fraudulent merchants or users while allowing trusted transactions.  

**Example:** A customer who typically spends $50-$100 per transaction suddenly makes a $5,000 purchase in another country. This would trigger an alert for manual review or automatic blocking.  

---

### **5. How do you assess credit risk when approving credit card applications?**  
**Sample Answer:**  
Credit risk assessment includes:  

1. **Credit Score Review:** Checking FICO scores to determine repayment history.  
2. **Debt-to-Income Ratio (DTI):** Ensuring the applicant’s existing debt is manageable.  
3. **Employment & Income Verification:** Evaluating financial stability.  
4. **Payment History:** Identifying past delinquencies or bankruptcies.  
5. **Behavioral Analytics:** Using predictive models to estimate future repayment likelihood.  

For example, an applicant with **a high income but a history of late payments** might still be a high-risk candidate despite financial stability.  

---

### **6. How would you handle an increase in credit card charge-offs?**  
**Sample Answer:**  
If **charge-offs (bad debts)** are increasing, I would:  

- **Analyze Delinquency Trends:** Identify which customer segments are defaulting.  
- **Adjust Credit Scoring Models:** Tighten approval criteria for high-risk profiles.  
- **Increase Customer Communication:** Send early warnings, payment reminders, and personalized support for struggling customers.  
- **Enhance Collections Strategy:** Offer structured repayment plans before sending accounts to collections.  
- **Evaluate Economic Conditions:** Check for macroeconomic factors (e.g., recession) that might be affecting repayment.  

Example: If charge-offs are rising among young customers with low credit scores, we might **increase minimum approval criteria or adjust interest rates** to mitigate risk.  

---

## **SQL & Data Analytics**  

### **7. How would you use SQL to analyze credit card transaction patterns?**  
**Sample Answer:**  
I would write queries to:  

1. **Identify high-spending customers:**  
   ```sql
   SELECT customer_id, SUM(transaction_amount) AS total_spent
   FROM transactions
   WHERE transaction_date BETWEEN '2024-01-01' AND '2024-12-31'
   GROUP BY customer_id
   ORDER BY total_spent DESC
   LIMIT 10;
   ```  
   *Finds top 10 spenders in 2024.*  

2. **Detect potential fraud (multiple transactions in a short period):**  
   ```sql
   SELECT customer_id, COUNT(*) AS num_transactions
   FROM transactions
   WHERE transaction_date >= NOW() - INTERVAL '1 HOUR'
   GROUP BY customer_id
   HAVING num_transactions > 5;
   ```  
   *Flags customers with 5+ transactions in the last hour for potential fraud review.*  

---

### **8. How do you measure customer lifetime value (CLV) for a credit card portfolio?**  
**Sample Answer:**  
To calculate **Customer Lifetime Value (CLV)**:  

1. **Average Monthly Spend:** Identify the average spend per customer.  
2. **Average Retention Period:** Determine how long customers keep the card.  
3. **Net Revenue Per Customer:** Calculate revenue from interest, fees, and transactions.  
4. **Apply the CLV Formula:**  
   \[
   CLV = (Average Monthly Revenue × 12 × Retention Years) - Acquisition Cost
   \]  
Example:  
- **Monthly Revenue per customer:** $50  
- **Retention Period:** 5 years  
- **Acquisition Cost:** $200  
\[
CLV = (50 × 12 × 5) - 200 = $2,800
\]  
This helps **prioritize high-value customer segments** for marketing efforts.  

---

## **Business Strategy & Product Innovation**  

### **9. What are some emerging trends in the credit card industry?**  
**Sample Answer:**  
- **Digital & Virtual Cards:** Mobile wallets like Apple Pay and Google Pay are growing.  
- **Personalized Rewards:** AI-driven recommendations for maximizing card benefits.  
- **Buy Now, Pay Later (BNPL):** Short-term installment plans for purchases.  
- **Sustainability:** Eco-friendly credit cards that reward green spending.  
- **AI-driven Fraud Prevention:** Advanced machine learning models for detecting fraud.  

Example: A **cashback credit card linked to sustainable purchases** could attract environmentally conscious consumers.  

---

### **10. How would you evaluate a new credit card product launch?**  
**Sample Answer:**  
I would evaluate success using:  

1. **Customer Adoption Rate:** Number of new accounts opened.  
2. **Activation Rate:** Percentage of customers actively using the card.  
3. **Spend & Transaction Volume:** Total purchases made using the card.  
4. **Retention & Churn Rate:** Percentage of users keeping the card over time.  
5. **Profitability Metrics:** Interest income, interchange fees, and rewards costs.  

**Example:** If a new **travel credit card** has **low activation rates**, I would recommend **offering a higher sign-up bonus or reducing annual fees**.  

---

## **Scenario-Based Questions**  

### **1. Scenario: Declining Credit Card Usage**  
**Question:** You notice that credit card spending among existing customers has declined by 10% over the past six months. How would you analyze this issue and propose solutions?  

**Sample Answer:**  
1. **Data Analysis:**  
   - Analyze **transaction history** to identify which customer segments are spending less.  
   - Check for **seasonal trends** or macroeconomic factors affecting spending.  
   - Review **customer feedback** to find potential pain points (e.g., high fees, reduced rewards).  

2. **Hypothesis Testing:**  
   - Are customers shifting spending to **competitor cards** with better rewards?  
   - Is the **economic downturn** reducing discretionary spending?  
   - Are there **technical issues** (e.g., card declines, fraud alerts) discouraging usage?  

3. **Solution Proposal:**  
   - **Rewards Optimization:** Introduce bonus cashback or category-based promotions.  
   - **Personalized Offers:** Target dormant users with special incentives.  
   - **Marketing Campaigns:** Educate customers on maximizing card benefits.  
   - **Customer Engagement:** Gather feedback via surveys and support channels.  

*Example:* If analysis shows that young professionals are switching to fintech credit cards, we might **introduce a millennial-friendly cashback structure (e.g., more rewards for streaming services and online shopping).**  

---

### **2. Scenario: Increase in Credit Card Defaults**  
**Question:** Your team has noticed a **15% increase in credit card default rates** over the past quarter. What steps would you take to investigate and mitigate the risk?  

**Sample Answer:**  
1. **Root Cause Analysis:**  
   - Segment customers based on **credit score, income level, and spending habits** to identify high-risk groups.  
   - Analyze if the increase is concentrated in a specific **industry** (e.g., job losses in tech or retail sectors).  
   - Check for **changes in lending policies** that may have led to riskier approvals.  

2. **Mitigation Strategies:**  
   - **Preemptive Risk Management:** Adjust credit limits for high-risk customers.  
   - **Enhanced Collections Strategy:** Offer flexible repayment plans before accounts go delinquent.  
   - **Tighter Underwriting:** Modify approval criteria based on new risk insights.  
   - **Early Warning System:** Implement predictive analytics to flag at-risk accounts before default.  

*Example:* If data shows that **defaults are rising among gig economy workers**, we might introduce a **"payment holiday" feature**, allowing them to skip payments during low-income months.  

---

### **3. Scenario: New Credit Card Product Evaluation**  
**Question:** Your company is launching a **new travel rewards credit card**. How would you measure its success in the first year?  

**Sample Answer:**  
1. **Key Performance Indicators (KPIs):**  
   - **Customer Acquisition:** Number of new accounts opened.  
   - **Activation Rate:** Percentage of customers making at least one transaction in the first 60 days.  
   - **Transaction Volume:** Total spend per customer, especially on travel-related purchases.  
   - **Retention Rate:** Percentage of customers who renew after the first year.  
   - **Interchange Fee Revenue:** Income from card transactions at merchants.  

2. **Analysis Approach:**  
   - Compare performance **against projected targets** and competitor benchmarks.  
   - Segment users based on **spending patterns (frequent travelers vs. casual users).**  
   - Monitor **customer feedback** to identify areas for product enhancement.  

3. **Action Plan:**  
   - If **activation rates are low**, launch a targeted **bonus miles** campaign.  
   - If **transaction volume is low**, increase **reward multipliers for key spending categories (e.g., hotels, airlines).**  
   - If **customer churn is high**, improve retention efforts with **fee waivers or anniversary rewards.**  

*Example:* If analysis shows that **most cardholders use the card only for flights but not daily purchases**, we might introduce **higher cashback on dining and local transportation.**  

---

## **Case Study Questions**  

### **4. Case Study: Optimizing a Credit Card Rewards Program**  
**Question:** Your company’s credit card rewards program is underperforming, with **low customer engagement**. How would you revamp the program?  

**Sample Answer:**  
1. **Diagnose the Problem:**  
   - Analyze **reward redemption rates**—are customers finding value in the rewards?  
   - Compare with **competitor programs**—are they offering better incentives?  
   - Survey customers to understand **frustrations (e.g., hard-to-redeem points, lack of relevant rewards).**  

2. **Redesign the Rewards Structure:**  
   - Introduce **dynamic rewards** that adjust based on spending categories.  
   - Offer **personalized bonuses** based on customer preferences.  
   - Reduce **redemption barriers** (e.g., lower minimum point requirements).  

3. **Measure Success:**  
   - Track changes in **card usage frequency and total spending volume.**  
   - Monitor **customer satisfaction scores and retention rates.**  
   - Compare against a **control group** to isolate the impact of the changes.  

*Example:* If **younger customers prefer cashback over airline miles**, we might shift to a **hybrid model** where they can choose between **cashback, travel points, or merchant discounts.**  

---

### **5. Case Study: Fraud Detection & Prevention Strategy**  
**Question:** Your company has detected an increase in **credit card fraud cases**. How would you tackle this issue?  

**Sample Answer:**  
1. **Identify Fraud Patterns:**  
   - Use **SQL and analytics tools** to detect unusual transaction patterns.  
   - Implement **machine learning models** to classify fraudulent vs. legitimate transactions.  
   - Monitor common fraud types (e.g., **card-not-present fraud, account takeovers**).  

2. **Enhance Security Measures:**  
   - Strengthen **multi-factor authentication (MFA)** for online transactions.  
   - Implement **AI-driven fraud alerts** with real-time customer notifications.  
   - Introduce **spending limits on high-risk transactions** (e.g., international purchases).  

3. **Evaluate Impact & Adjust Strategy:**  
   - Track the **false positive rate** to ensure real customers aren’t being blocked unfairly.  
   - Analyze **customer complaints** related to fraud prevention efforts.  
   - Work with **law enforcement and cybersecurity teams** to prevent large-scale fraud.  

*Example:* If **card-not-present fraud (e.g., online shopping scams) is increasing**, we might **partner with e-commerce platforms to introduce biometric authentication.**  

---

### **6. Case Study: Credit Card Pricing Strategy**  
**Question:** Your company wants to **increase credit card fee revenue** without losing customers. What approach would you take?  

**Sample Answer:**  
1. **Analyze Current Pricing Structure:**  
   - Compare **annual fees, late fees, and interest rates** with competitors.  
   - Identify which fee types have **high customer sensitivity** (e.g., late payment fees).  
   - Segment customers to understand **who is most affected by fee changes.**  

2. **Adjust Fee Structure Strategically:**  
   - Offer **tiered pricing** (e.g., basic, premium, and elite card options).  
   - Introduce **waivers for high-value customers** to increase retention.  
   - Increase fees **only on low-impact areas** (e.g., foreign transaction fees for domestic users).  

3. **Measure & Optimize:**  
   - Monitor **customer attrition rates** post-change.  
   - Track **profitability improvements** across customer segments.  
   - Use **customer feedback** to refine the pricing model further.  

*Example:* If we **increase the annual fee from $99 to $129**, we might **add new premium benefits (e.g., free airport lounge access) to justify the price hike.**  

---

## **SQL-Based Scenarios**  

### **1. Scenario: Identifying High-Spending Customers**  
**Question:** Write an SQL query to find the **top 10 customers** who spent the most in the last 3 months.  

**Sample SQL Query:**  
```sql
SELECT customer_id, SUM(transaction_amount) AS total_spent
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '3 months'
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;
```
**Explanation:**  
- Filters transactions from the **last 3 months**.  
- Groups by `customer_id` to calculate **total spend per customer**.  
- Orders in **descending order** to get top spenders.  

---

### **2. Scenario: Detecting Potential Fraud**  
**Question:** Write an SQL query to find customers who made more than **5 transactions within 1 hour**.  

**Sample SQL Query:**  
```sql
SELECT customer_id, COUNT(*) AS transaction_count
FROM transactions
WHERE transaction_time >= NOW() - INTERVAL '1 HOUR'
GROUP BY customer_id
HAVING COUNT(*) > 5;
```
**Explanation:**  
- Filters transactions made in the **last hour**.  
- Groups by `customer_id` to count their transactions.  
- Filters customers who made **more than 5** transactions.  

---

### **3. Scenario: Analyzing Late Payments**  
**Question:** Write an SQL query to find customers who have **missed at least 2 payments in the last 6 months**.  

**Sample SQL Query:**  
```sql
SELECT customer_id, COUNT(*) AS missed_payments
FROM payments
WHERE status = 'missed'
AND payment_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY customer_id
HAVING COUNT(*) >= 2;
```
**Explanation:**  
- Filters payments with `status = 'missed'`.  
- Checks within the **last 6 months**.  
- Groups by `customer_id` and filters those with **2+ missed payments**.  

---

### **4. Scenario: Finding Customers with High Credit Utilization**  
**Question:** Identify customers whose **credit utilization** is above **80%** of their limit.  

**Sample SQL Query:**  
```sql
SELECT c.customer_id, c.credit_limit, SUM(t.transaction_amount) AS total_spent,
       (SUM(t.transaction_amount) / c.credit_limit) * 100 AS utilization_rate
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
WHERE transaction_date >= CURRENT_DATE - INTERVAL '1 month'
GROUP BY c.customer_id, c.credit_limit
HAVING (SUM(t.transaction_amount) / c.credit_limit) > 0.8;
```
**Explanation:**  
- Joins **customers** with their **transactions**.  
- Calculates **utilization rate** (`total_spent / credit_limit * 100`).  
- Filters customers **exceeding 80% utilization**.  

---

## **Python-Based Scenarios**  

### **5. Scenario: Detecting Anomalous Spending Behavior**  
**Question:** Write a Python script to flag transactions that are **3 times higher than a customer’s average transaction amount**.  

**Sample Python Code:**  
```python
import pandas as pd

# Sample transaction data
data = {
    'customer_id': [101, 101, 102, 102, 103, 103, 103],
    'transaction_id': [1, 2, 3, 4, 5, 6, 7],
    'amount': [50, 60, 100, 500, 30, 120, 700]
}

df = pd.DataFrame(data)

# Calculate the average transaction per customer
avg_spend = df.groupby('customer_id')['amount'].mean()

# Flag transactions that are 3x higher than the average
df['flagged'] = df.apply(lambda x: x['amount'] > 3 * avg_spend[x['customer_id']], axis=1)

# Display flagged transactions
print(df[df['flagged']])
```
**Explanation:**  
- **Groups transactions** by customer ID to calculate their **average spend**.  
- Flags any transaction that is **3x the average spend** for that customer.  
- This helps detect **unusual spikes** that could indicate fraud.  

---

### **6. Scenario: Predicting Credit Card Churn Using Logistic Regression**  
**Question:** Write a Python script using `sklearn` to build a **credit card churn prediction model**.  

**Sample Python Code:**  
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Sample dataset
data = {
    'credit_score': [750, 620, 680, 700, 590, 780, 810],
    'total_spent': [5000, 2000, 3000, 4000, 1500, 6000, 7000],
    'late_payments': [0, 3, 1, 0, 4, 0, 0],
    'churn': [0, 1, 0, 0, 1, 0, 0]  # 1 = churned, 0 = active
}

df = pd.DataFrame(data)

# Features and target variable
X = df[['credit_score', 'total_spent', 'late_payments']]
y = df['churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```
**Explanation:**  
- Uses **credit score, spending, and late payments** to predict churn.  
- Splits data into **train and test sets**.  
- Uses **RandomForestClassifier** for training.  
- Evaluates performance using **accuracy score**.  

---

### **7. Scenario: SQL Query Execution in Python**  
**Question:** How would you fetch customer transaction data from a SQL database using Python?  

**Sample Python Code:**  
```python
import sqlite3
import pandas as pd

# Connect to the database
conn = sqlite3.connect('credit_card.db')

# Query to fetch transactions
query = """
SELECT customer_id, transaction_date, transaction_amount
FROM transactions
WHERE transaction_date >= DATE('now', '-6 months');
"""

# Execute query and load into a DataFrame
df = pd.read_sql_query(query, conn)

# Close connection
conn.close()

# Display results
print(df.head())
```
**Explanation:**  
- **Connects** to an SQL database.  
- Runs a query to fetch transactions **from the last 6 months**.  
- Stores the results in a **Pandas DataFrame** for further analysis.  

---

### **8. Scenario: Creating a Monthly Spending Summary Report**  
**Question:** Write a Python script to **group credit card transactions by month** and calculate total spending per customer.  

**Sample Python Code:**  
```python
import pandas as pd

# Sample data
data = {
    'customer_id': [101, 101, 102, 102, 103, 103, 103],
    'transaction_date': pd.to_datetime(['2024-01-15', '2024-02-10', '2024-01-20', 
                                         '2024-03-05', '2024-02-25', '2024-03-15', '2024-03-25']),
    'transaction_amount': [150, 200, 300, 500, 100, 400, 600]
}

df = pd.DataFrame(data)

# Extract month from date
df['month'] = df['transaction_date'].dt.strftime('%Y-%m')

# Group by customer and month
monthly_spending = df.groupby(['customer_id', 'month'])['transaction_amount'].sum().reset_index()

print(monthly_spending)
```
**Explanation:**  
- **Extracts month** from transaction dates.  
- Groups transactions by **customer ID and month**.  
- Calculates **total spending per month** for each customer.  

---

## **Advanced SQL Scenarios**  

### **1. Scenario: Identifying Customers with Unusual Spending Patterns**  
**Question:** Find customers whose **monthly spending increased by more than 50% compared to the previous month**.  

**Sample SQL Query:**  
```sql
WITH MonthlySpending AS (
    SELECT customer_id, 
           DATE_TRUNC('month', transaction_date) AS txn_month,
           SUM(transaction_amount) AS total_spent
    FROM transactions
    WHERE transaction_date >= CURRENT_DATE - INTERVAL '6 months'
    GROUP BY customer_id, txn_month
),
MonthlyChange AS (
    SELECT m1.customer_id, 
           m1.txn_month, 
           m1.total_spent AS current_month_spend, 
           m2.total_spent AS prev_month_spend,
           ((m1.total_spent - m2.total_spent) / NULLIF(m2.total_spent, 0)) * 100 AS spend_increase_pct
    FROM MonthlySpending m1
    LEFT JOIN MonthlySpending m2 
    ON m1.customer_id = m2.customer_id 
    AND m1.txn_month = m2.txn_month + INTERVAL '1 month'
)
SELECT customer_id, txn_month, current_month_spend, prev_month_spend, spend_increase_pct
FROM MonthlyChange
WHERE spend_increase_pct > 50;
```
**Explanation:**  
1. Aggregates **monthly spending** for each customer.  
2. Joins the data to calculate **month-over-month changes**.  
3. Filters customers with **spending increase > 50%**.  

---

### **2. Scenario: Finding Customers with High Cash Advance Usage**  
**Question:** Identify customers who have taken **cash advances more than 3 times in the past month** and spent over **$5,000 on cash advances**.  

**Sample SQL Query:**  
```sql
SELECT customer_id, COUNT(*) AS cash_advance_count, SUM(transaction_amount) AS total_cash_advance
FROM transactions
WHERE transaction_type = 'cash_advance'
AND transaction_date >= CURRENT_DATE - INTERVAL '1 month'
GROUP BY customer_id
HAVING COUNT(*) > 3 AND SUM(transaction_amount) > 5000;
```
**Explanation:**  
- Filters **cash advance transactions**.  
- Aggregates data to find **frequent and high-value users**.  

---

### **3. Scenario: Optimizing SQL Query Performance for Large Datasets**  
**Question:** Your transaction table has **millions of records**. How would you improve the performance of this query?  

```sql
SELECT customer_id, COUNT(*) AS num_transactions
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY customer_id;
```
**Optimized Approach:**  
1. **Use Indexing:**  
   ```sql
   CREATE INDEX idx_transaction_date ON transactions(transaction_date);
   ```  
   - This speeds up filtering by `transaction_date`.  

2. **Use Partitioning:**  
   ```sql
   CREATE TABLE transactions_partitioned 
   PARTITION BY RANGE (transaction_date) (
       PARTITION p1 VALUES LESS THAN ('2024-01-01'),
       PARTITION p2 VALUES LESS THAN ('2025-01-01')
   );
   ```  
   - Partitions data by year for **faster query execution**.  

3. **Use Approximate Aggregations for Large Data:**  
   ```sql
   SELECT customer_id, APPROX_COUNT_DISTINCT(transaction_id) AS num_transactions
   FROM transactions
   WHERE transaction_date >= CURRENT_DATE - INTERVAL '1 year'
   GROUP BY customer_id;
   ```  
   - `APPROX_COUNT_DISTINCT` improves performance when **exact count is not needed**.  

---

### **4. Scenario: Credit Utilization Risk Analysis**  
**Question:** Identify customers whose **credit utilization increased by more than 30% compared to the previous month**.  

**Sample SQL Query:**  
```sql
WITH CreditUtilization AS (
    SELECT c.customer_id, 
           DATE_TRUNC('month', t.transaction_date) AS txn_month,
           SUM(t.transaction_amount) / c.credit_limit AS utilization_rate
    FROM customers c
    JOIN transactions t ON c.customer_id = t.customer_id
    WHERE transaction_date >= CURRENT_DATE - INTERVAL '6 months'
    GROUP BY c.customer_id, txn_month, c.credit_limit
),
UtilizationChange AS (
    SELECT u1.customer_id, u1.txn_month, 
           u1.utilization_rate AS current_utilization, 
           u2.utilization_rate AS prev_utilization,
           ((u1.utilization_rate - u2.utilization_rate) / NULLIF(u2.utilization_rate, 0)) * 100 AS utilization_change
    FROM CreditUtilization u1
    LEFT JOIN CreditUtilization u2 
    ON u1.customer_id = u2.customer_id 
    AND u1.txn_month = u2.txn_month + INTERVAL '1 month'
)
SELECT customer_id, txn_month, current_utilization, prev_utilization, utilization_change
FROM UtilizationChange
WHERE utilization_change > 30;
```
**Explanation:**  
- **Calculates credit utilization** for each customer.  
- **Finds monthly changes** and filters users exceeding a **30% increase**.  

---

## **Advanced Python Scenarios**  

### **5. Scenario: Predicting Customer Default Risk**  
**Question:** Build a **credit card default prediction model** using Python.  

**Sample Python Code:**  
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Sample dataset
data = {
    'credit_score': [750, 620, 680, 700, 590, 780, 810, 550, 520, 640],
    'total_spent': [5000, 2000, 3000, 4000, 1500, 6000, 7000, 1000, 800, 2500],
    'late_payments': [0, 3, 1, 0, 4, 0, 0, 5, 6, 2],
    'default': [0, 1, 0, 0, 1, 0, 0, 1, 1, 1]  # 1 = defaulted, 0 = paid on time
}

df = pd.DataFrame(data)

# Features and target variable
X = df[['credit_score', 'total_spent', 'late_payments']]
y = df['default']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```
**Explanation:**  
- Uses **credit score, spending, and late payments** to predict **default risk**.  
- Trains a **RandomForestClassifier** and evaluates **accuracy**.  

---

### **6. Scenario: Anomaly Detection in Credit Card Transactions**  
**Question:** Implement **anomaly detection** using Isolation Forest.  

**Sample Python Code:**  
```python
from sklearn.ensemble import IsolationForest
import numpy as np

# Sample transaction amounts
transactions = np.array([[50], [60], [100], [500], [30], [120], [7000]])  

# Train Isolation Forest model
model = IsolationForest(contamination=0.1, random_state=42)
model.fit(transactions)

# Predict anomalies (-1 = anomaly, 1 = normal)
predictions = model.predict(transactions)
print(predictions)
```
**Explanation:**  
- Uses **Isolation Forest** to detect **fraudulent transactions**.  
- Flags transactions significantly higher than **usual spending**.  

---

### **7. Scenario: Customer Segmentation Using K-Means Clustering**  
**Question:** Cluster customers based on **spending behavior**.  

**Sample Python Code:**  
```python
from sklearn.cluster import KMeans

# Sample spending data
data = [[5000], [2000], [3000], [4000], [1500], [6000], [7000]]

# Apply K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(data)

print(clusters)
```
**Explanation:**  
- Uses **K-Means clustering** to segment customers based on **spending**.  

---