In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
sns.set(style='whitegrid')

This line is from the Seaborn library, which is used for data visualization in Python.

sns is the common alias for seaborn (usually written as import seaborn as sns).

set() is a function used to set the style of the plots.

style='whitegrid' means:
➤ Use a white background with light grey gridlines on it.

Why use it?

It makes your charts look cleaner and easier to read, especially for plots like bar charts or line graphs where the grid can help compare values.

In [None]:
import warnings
warnings.filterwarnings('ignore')

import warnings:
This brings in Python’s warnings module, which is used to show warning messages (not errors, just alerts or suggestions).

warnings.filterwarnings('ignore'):
This tells Python to ignore all warning messages.

Why use it?

Sometimes, when running code (especially with external libraries), Python shows warnings like:

"This feature is deprecated"
or
"This may change in future versions"

These warnings don't stop the code from running, but they can clutter your output. So, this line is used to hide those warnings and keep the output clean.

Be careful though — hiding warnings can sometimes make you miss important notices that could affect your code later.

In [None]:
df = pd.read_csv("AIML Dataset.csv")

In [None]:
df.head()

In [None]:
df.info()       # general info about data types and memory usuage, here we have 7 float, 1 integer and 3 object types

In [None]:
df.columns

In [None]:
df['isFraud'].value_counts()

In [None]:
df['isFlaggedFraud'].value_counts()

In [None]:
df.isnull().sum().sum()     # isnull() returns cell wise boolean; sum() it returns in column wise

df:
This is your DataFrame — basically a table of data (like an Excel sheet).

df.isnull():
This checks every cell in the DataFrame for missing values (NaN = Not a Number).
➤ It returns a new DataFrame of the same size, but with True where the value is missing, and False where it's not.

.sum() (first one):
This sums up the True values column by column. In Python, True counts as 1 and False as 0.
➤ So, now you get a Series that tells how many missing values are in each column.

.sum() (second one):
This adds up all the column totals, giving you the total number of missing values in the entire DataFrame.

Example:

Let’s say your DataFrame looks like this:

A	    B
1	    NaN
NaN	  5
3	    6

Then:

df.isnull()


returns:

A	    B
False	True
True	False
False	False

Then:

df.isnull().sum()


returns:

A    1
B    1


Finally:

df.isnull().sum().sum()


returns:

2

Your output: np.int64(0)

This means:

There are no missing values (NaN) in your DataFrame.

The result is 0, and it's in the data type np.int64, which is just NumPy's version of a 64-bit integer.

In [None]:
df.shape  # we have 7...rows,11 columns

In [None]:
# want to see percentage of fraud to the total data

(df['isFraud'].value_counts()[1]/df.shape[0]) * 100

df['isfraud'] ---> picks the column named isFraud from the dataframe(likely contains 1 for fraud and 0 for not fraud)

value_counts() ---> counts how many 1s and 0s are there in that column

[1] ---> select the counts of rows where isfraud is 1

df.shape[0] ---> gets the total number of rows in the dataframe.

Example:

If your data looks like this:

isFraud
0
1
0
1
0

value_counts() → {0: 3, 1: 2}

So value_counts()[1] → 2 (fraud cases)

df.shape[0] → 5 (total rows)

In [None]:
round((df['isFraud'].value_counts()[1] / df.shape[0])*100,2)

In [None]:
# now we'll analyze our data by visualizing imp thing about fraud detection. let's start with visualing transcation types.

df["type"].value_counts().plot(kind='bar', title='Transaction type', color='skyblue')
plt.xlabel('Transactional type')
plt.ylabel('Count')
plt.show()

In [None]:
# to find fraud rates by type

fraud_by_type = df.groupby('type')['isFraud'].mean().sort_values(ascending=False)
fraud_by_type.plot(kind='bar', title='Fraud rate by type', color='salmon')
plt.ylabel('Fraud rate')
plt.show()

In [None]:
fraud_by_type

In [None]:
# let's see amount statistics
df['amount'].describe()  #getting scientific values

In [None]:
df['amount'].describe().astype(int)

In [None]:
#now let's see the histogram for this

sns.histplot(np.log1p(df['amount']), bins=100, kde=True, color='green')
plt.title("transaction amount distribution (log scale)")
plt.xlabel("Log (amount+1)")
plt.show()

In [None]:
sns.boxplot(data = df[df['amount'] < 50000], x='isFraud', y='amount')
plt.title('Amount vs isFraud(filtered under 50k)')
plt.show()

In [None]:
df.columns

In [None]:
# let's see balance chain and anomalies
# here creating 2 diff col as balance difference of the origin a/c & dest a/c. Then we can see if we have -ve balance on that side.
df['balanceDiffOrigin'] = df['oldbalanceOrg'] - df['newbalanceOrig']
df['balanceDiffDest'] = df['newbalanceDest'] - df['oldbalanceDest']

df['balanceDiffOrigin'] = df['oldbalanceOrg'] - df['newbalanceOrig']

This calculates how much money was deducted from the origin account during a transaction.

So ideally, this should reflect the amount transferred (if no error or fraud).

Then:

(df['balanceDiffOrigin']).sum()


This adds up all the balance changes from all rows.

A negative total means:
  Across all transactions, the origin accounts gained money instead of losing it (which is unusual for typical transfers).

Possible Reasons for Negative Sum:

Data Errors or Corruption

Some rows might have values where the newbalanceOrig is greater than the oldbalanceOrg, meaning money was added — not deducted — which can be wrong depending on context.

Fraudulent or Invalid Transactions

In fraud detection datasets, some transactions may be crafted to simulate fraud (like incorrect balance updates).

Missing or Zero Values

If oldbalanceOrg is 0 and newbalanceOrig is also 0 — or worse, negative — it could affect the sum.

Sometimes, these columns might have zeros instead of NaN, which leads to misleading calculations.

Transaction Types

Did you filter by transaction type?
For example, a "CASH-IN" adds money, while a "TRANSFER" deducts it. If you're mixing types, the sum could be negative.

What You Can Do:
1. Check a few rows manually:
df[['oldbalanceOrg', 'newbalanceOrig', 'balanceDiffOrigin']].head(10)


Look for unexpected cases where:

newbalanceOrig > oldbalanceOrg

oldbalanceOrg == 0 but newbalanceOrig != 0, etc.

2. Group by transaction type:
df.groupby('type')['balanceDiffOrigin'].sum()


This will show you the sum of balance differences by transaction type. That can help reveal which types are causing the issue.

3. Check for invalid rows:
df[df['balanceDiffOrigin'] < 0].head()


This shows examples of rows where the origin balance increased, which might be unexpected.

4. Check for negative balances:
df[(df['oldbalanceOrg'] < 0) | (df['newbalanceOrig'] < 0)]


If you see any, that's likely incorrect.

Summary

The negative sum means that, overall, money appears to be added to origin accounts across all transactions — which is unusual and worth investigating. Start by checking for:

Specific transaction types

Incorrect or missing data

Negative or zero balances

In [None]:
df['balanceDiffOrigin'] < 0

In [None]:
(df['balanceDiffOrigin']).sum()

In [None]:
(df['balanceDiffDest']).sum()