**Assignment Submission Guidelines**

**1. Submission Platform:**

- Submit your completed assignment through Google Classroom.

**2. Submission Format:**

- Submit the Google Colab Notebook (.ipynb file) provided as the assignment template.
- Do not create a new notebook. Fill in the provided template.

**3. Template Completion:**

The template notebook contains:
- The code to generate the Banking Transaction csv datasets.
- Placeholders for your code and explanations for each question.

Follow the instructions within the template.
- Code Cells:
  - Place your code solutions directly in the designated code cells below each question.
- Markdown Cells:
  - Provide your explanations and justifications in the designated Markdown cells.
- Report section:
  - Complete the markdown section at the bottom of the notebook titled "Report".
  - In this section, compile the explanation of each of the questions.
  - Answer the following data analysis questions:
    1.   What are the key characteristics of the customer and transaction data?
    2. What are the main trends in customer spending?
    3. How does customer age relate to transaction amounts
    4. Identify and discuss any potential data quality issues
    5. Provide 2-3 actionable business insights based on your analysis.

- Do not modify the structure of the template notebook.

**4. File Naming:**

Ensure the file name remains as provided in the template. Do not rename the file.

**5. Timely Submission:**

- Submit your completed template notebook by the deadline: **24th of March, 2025**.
- Late submissions will be penalized as follows:
- Submissions within **5:00pm 26th of March, 2025**  will receive a maximum of 5 marks for timely submission.
Submissions after  will receive 0 marks for timely submission.

**6. Report:**

- Complete the "Report" section at the end of your notebook.
- Ensure your report is:
  - Well-organized and easy to read.
  - Clear and concise.
  - Free of grammatical errors.

**7. Code Execution:**

Ensure your completed notebook runs without errors from top to bottom.
Before submitting, restart the kernel and run all cells to confirm reproducibility.



**8. Academic Integrity:**

All work must be your own.
Plagiarism will result in a failing grade.
Cite any external resources you use.



**Tips for Success:**

- Start the assignment early.
- Read the instructions within the template carefully.
- Plan your approach before coding.
- Test your code thoroughly.
- Document your work clearly.
- Review the rubrics to understand the grading criteria.


**Grading Rubrics:**

Total 50 Marks

- Timely Submission: 10 Marks
- Report : 10 Marks
- Level 1 (Basic Questions): 5 Marks (1 x 5 = 5)
- Level 2 (Intermediate Questions): 10 Marks (2 x 5 = 10)
- Level 3 (Advanced Questions): 15 Marks (3 x 5 = 15)

##**Assignment**

**Background**

You are a data analyst working for "FinTech Insights," a consultancy specializing in data-driven financial analysis. FinTech Insights partners with banks, credit unions, and financial technology companies to optimize their operations and enhance customer experiences through in-depth data analysis. Your team has been assigned the task of analyzing a comprehensive dataset of banking transactions and customer details. This dataset, compiled from raw sources, contains information on a diverse group of bank customers, including their demographics, transaction histories, and account balances. Your goal is to leverage this data to uncover key patterns in customer behavior and identify opportunities for improved financial services. By identifying these trends, you can provide actionable recommendations to financial institutions for better customer engagement, risk management, and service optimization.

In [None]:
import pandas as pd
import numpy as np
from faker import Faker
import random

# Initialize Faker
fake = Faker()

# Generate Customer Data
df_customers = pd.DataFrame({
    'customer_id': range(1, 101),
    'name': [fake.name() for _ in range(100)],
    'age': np.random.randint(18, 80, 100),
    'gender': np.random.choice(['Male', 'Female', 'Other'], 100),
    'email': [fake.email() for _ in range(100)],
    'city': [fake.city() for _ in range(100)]
})

df_customers.to_csv('customers_raw.csv', index=False)

# Generate Transaction Data
df_transactions = pd.DataFrame({
    'transaction_id': range(1, 501),
    'customer_id': np.random.choice(df_customers['customer_id'], 500),
    'transaction_date': [fake.date_this_decade() for _ in range(500)],
    'transaction_type': np.random.choice(['Deposit', 'Withdrawal', 'Payment', 'Transfer'], 500),
    'amount': np.round(np.random.uniform(100, 5000, 500), 2),
    'balance_after_transaction': np.round(np.random.uniform(1000, 20000, 500), 2)
})

df_transactions.to_csv('bank_transactions.csv', index=False)

print("Synthetic datasets generated: 'customers_raw.csv' and 'bank_transactions.csv'")


Synthetic student performance dataset generated: student_performance_detailed.csv


**The Data**

customers_raw.csv:
  - customer_id: Unique identifier for each customer (integer).
  - name: Full name of the customer (string).
  - age: Age of the customer (integer).
  - gender: Gender of the customer (string: Male, Female, Other).
  - email: Email address of the customer (string).
  - city: City where the customer resides (string).

bank_transactions.csv:
  - transaction_id: Unique identifier for each transaction (integer).
  - customer_id: Identifier linking transactions to customers (integer).
  - transaction_date: Date of the transaction (date/string).
  - transaction_type: Type of transaction (string: Deposit, Withdrawal, Payment, Transfer).
  - amount: Transaction amount (float).
  - balance_after_transaction: Account balance after the transaction (float).


## **Basic (RBT Levels: 2, 3):**

Total: 5 Marks

Each Question Carry 1 Mark

**Question 1. Data Loading and Initial Exploration:**

- Load customers_raw.csv and bank_transactions.csv into Pandas DataFrames.
- Display the first 5 rows and use .info() to display data types.

In [None]:
# Question 1: Data Loading and Initial Exploration
# Load customers_raw.csv and bank_transactions.csv into Pandas DataFrames.
#Display the first 5 rows and use .info() to display data types.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 2: Data Merging:**

- Merge the two datasets using an inner join on customer_id.
- Display the first 5 rows of the merged DataFrame.

In [None]:
# Question 2: Data Merging
# Merge the two datasets using an inner join on customer_id.
# Display the first 5 rows of the merged DataFrame.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 3: Missing Value Identification:**

Identify columns with missing values and report the count of missing values in each.

In [None]:
# Question 3: Missing Value Identification
# Identify columns with missing values and report the count of missing values in each.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 4: Duplicate Row Removal:**

Check for and remove any duplicate rows in the bank_transactions.csv DataFrame.


In [None]:
# Question 4: Duplicate Row Removal
# Check for and remove any duplicate rows in the bank_transactions.csv DataFrame.
#Your Code Here:

**Explanation**

[Your explanation here]

**Question 5: Column Renaming:**

Rename the amount column in bank_transactions.csv to transaction_amount.

In [None]:
# Question 5: Column Renaming
# Rename the amount column in bank_transactions.csv to transaction_amount.
# Your Code Here:

**Explanation**

[Your explanation here]

##**Intermediate (RBT Levels: 3, 4):**

Total: 10 Marks

Each Question Carry 2 Marks



**Question 6: Missing Value Imputation:**

Impute missing values in the age column with the median age.



In [None]:
# Question 6: Missing Value Imputation
# Impute missing values in the age column with the median age.
# Your Code Here:

**Explanation**

[Your explanation here]

Impute missing values in the email column with a placeholder string "[email address removed]".


In [None]:
# Impute missing values in the email column with a placeholder string "[email address removed]".
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 7: Categorical Data Conversion:**

Convert the gender column to numerical values (e.g., Male=0, Female=1, Other=2).

In [None]:
# Categorical Data Conversion

# Convert the gender column to numerical values (e.g., Male=0, Female=1, Other=2).
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 8: String Manipulation:**

Extract the domain name from the email column and create a new column called email_domain.

In [None]:
# Question 8: String Manipulation:
# Extract the domain name from the email column and create a new column called email_domain.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 9: Discretization and Binning:**

Create a new categorical column called amount_category by binning the transaction_amount into "Low", "Medium", and "High" categories.


In [None]:
# Discretization and Binning
# Create a new categorical column called amount_category by binning the transaction_amount into "Low", "Medium", and "High" categories.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 10: Outlier Detection:**

Use the IQR method to identify outliers in the transaction_amount column.


In [None]:
# Question 10: Dummy Variable Creation and Stacked Bar Plot
# Create dummy variables for the 'Gender' and 'SchoolType' columns. Explain how dummy variables are used in data analysis. Create a stacked bar plot to visualize the distribution of 'Gender' within each 'SchoolType'.
# Your Code Here:

**Explanation**

[Your explanation here]

##**Advanced (RBT Levels: 4, 5):**

Total: 15 Marks

Each Question Carry 3 Marks

**Question 11: Grouped Aggregation:**

Group transactions by customer_id and calculate the total spending for each customer.

In [None]:
# Question 11: Grouped Aggregation
# Group transactions by customer_id and calculate the total spending for each customer.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 12: Grouped Transformation**

Normalize the transaction_amount within each transaction_type category using z-scores.


In [None]:
# Question 12: Grouped Transformation
# Normalize the transaction_amount within each transaction_type category using z-scores.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 13: Time Series Analysis**

- Convert transaction_date to datetime objects.
- Group transactions by month and calculate the average transaction_amount for each month.


In [None]:
# Question 13: Time Series Analysis
# Convert transaction_date to datetime objects.
# Group transactions by month and calculate the average transaction_amount for each month.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 14: Correlation Analysis**

- Calculate the correlation between age and transaction_amount.
- Calculate the correlation between transaction amount and numerical gender.

In [None]:
# Question 14: Correlation Analysis
# Calculate the correlation between age and transaction_amount.
# Calculate the correlation between transaction amount and numerical gender.
# Your Code Here:

**Explanation**

[Your explanation here]

**Question 15: Conditional Logic and Feature Engineering:**

Create a new column called high_spending_customer that indicates whether a customer's total spending is above a certain threshold.

In [None]:
# Question 15: Conditional Logic and Feature Engineering
# Create a new column called high_spending_customer that indicates whether a customer's total spending is above a certain threshold.
# Your Code Here:

**Explanation**

[Your explanation here]

**Report**

**Part 1**

- In this section, compile the explanation of each of the questions.

**Part 2**

- Answer the following data analysis questions:
  1. What are the key characteristics of the customer and transaction data?
  2. What are the main trends in customer spending?
  3. How does customer age relate to transaction amounts
  4. Identify and discuss any potential data quality issues
  5. Provide 2-3 actionable business insights based on your analysis.