# 🧪 Assignment #1: Exploratory Data Analysis (EDA) on Fraud Transactions


## 📝 Objective:
Perform exploratory data analysis (EDA) using a real-world fraud dataset to uncover insights, spot anomalies, and build intuition about fraud patterns.

### 🧠 What You'll Do:
- Load the dataset

- Explore the data structure and summary statistics

- Visualize fraud-related patterns using:

    - Boxplot

    - Histogram

    - Heatmap

In [1]:
# Import necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset (update path if needed)
df = pd.read_csv('creditcard.csv')

# Preview the data
print(df.head())

   Time        V1        V2        V3        V4        V5        V6        V7  \
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   
3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   
4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   

         V8        V9  ...       V21       V22       V23       V24       V25  \
0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   
1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   
2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   
3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   
4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   

        V26       V27       V28 

# 🧪 Assignment #1: Exploratory Data Analysis on Credit Card Fraud Data

## 🎯 Objective
In this assignment, you’ll perform exploratory data analysis (EDA) on a **real-world credit card fraud dataset**. Your goal is to develop a clear understanding of the dataset’s structure, key variables, and any potential signs of fraud using summary statistics and data visualizations.

This is your first hands-on opportunity to apply Python and data science techniques to a real fraud detection problem.

---

## 📁 Dataset: [Credit Card Fraud Detection Dataset (Kaggle)](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)

- Transactions made by European cardholders in September 2013
- Features are PCA-transformed (`V1`–`V28`) plus `Time`, `Amount`, and `Class`
- `Class = 1` indicates a fraudulent transaction

You may already have this dataset preloaded for the assignment. If not, download it from the Kaggle link above.

---

## 🧠 Your Task

1. **Import the Dataset**
   - Load the dataset into a Pandas DataFrame
   - Use `df.head()` to preview the first few rows

2. **Summary Statistics**
   - Use `df.describe()` to examine key metrics
   - Highlight and interpret any differences between fraudulent and non-fraudulent transactions

3. **Boxplot**
   - Plot a boxplot comparing the **Amount** variable by `Class` (fraud vs. non-fraud)
   - Identify any outliers or unusual distributions

4. **Histogram**
   - Create a histogram of transaction amounts
   - Use color or grouping to compare fraud vs. non-fraud

5. **Correlation Heatmap**
   - Generate a correlation matrix of all numeric features
   - Plot it using `seaborn.heatmap()` and interpret any strong relationships

6. **Observations**
   - Provide a short summary of what you've discovered
   - Use Markdown to document: suspicious trends, high-risk variables, or visual insights

---

## 💯 Grading Rubric (100 Points)

| Category                                      | Points | Details                                                                 |
|----------------------------------------------|--------|-------------------------------------------------------------------------|
| 📥 Dataset Loading & Head Output             | 10 pts | Dataset is correctly loaded and displayed                              |
| 📊 Summary Statistics (`describe`)           | 10 pts | Key metrics are shown and explained                                    |
| 📦 Boxplot of Amount by Class                | 20 pts | Boxplot created with meaningful labels and insights                    |
| 📉 Histogram with Class Overlay              | 20 pts | Histogram clearly compares fraud vs. non-fraud                         |
| 🔥 Correlation Heatmap                       | 20 pts | Accurate matrix and clear heatmap generated                            |
| 📝 Observations & Markdown Commentary        | 20 pts | Thoughtful insighoor formatting, hard-to-read, or broken code            |

> 🧮 **Total: 100 Points**

---

## 🌟 Optional Extra Credit (Up to +50 Points)

Looking to challenge yourself or build your portfolio? Choose any of the following:

| Extra Credit Task                                       | Points |
|---------------------------------------------------------|--------|
| 📈 Plot fraud rate over time using `Time` column         | +10 pts |
| 🧮 Use `groupby()` to explore fraud by amount bins       | +10 pts |
| 🔍 Create a pairplot with Seaborn for high-correlation variables | +10 pts |
| 🎻 Try violin or KDE plots to explore distributions      | +10 pts |
| 🧾 Use z-score or IQR for outlier detection              | +10 pts |

> You can earn **up to +50 extra points**, which will be added to your final score. These tasks are optional and great for resume-worthy exploration.

---

## 🧵 Submission Guidelines

- Submit your `.ipynb` Jupyter Notebook via the course platform
- Include inline comments and Markdown explanations
- Your notebook should be clean, organized, and runnable

---

## ✅ Example Code Snippet to Get You Started

```python
import pandas as pd
import matplotlib.pyplot as plt
import seabord dataset
df = pd.read_csv("creditcard.csv")
print(df.head())

# Summary statistics
print(df.describe())


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv("creditcard.csv")
print(df.head())

# Summary statistics
print(df.describe())

   Time        V1        V2        V3        V4        V5        V6        V7  \
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   
3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   
4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   

         V8        V9  ...       V21       V22       V23       V24       V25  \
0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   
1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   
2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   
3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   
4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   

        V26       V27       V28 