# ***Dataset Type*** = Classification

***Import all the Libraries***

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score

from sklearn.ensemble import RandomForestClassifier

import warnings
warnings.filterwarnings("ignore")


In [None]:
fraud = pd.read_csv("/kaggle/input/datasets/mohan3005/frauds/Fraud.csv")

In [None]:
fraud.shape

***Cheack for basic Info and NULL values***

In [None]:
print("\nDataset Info:")
print(fraud.info())

print("\nMissing Values:")
print(fraud.isnull().sum())

***Cheack for fraud Percentage***

In [None]:
fraud_percent = fraud['isFraud'].mean()*100

print("\nFraud Percentage:", fraud_percent)

***Droping useless columns***

In [None]:
fraud = fraud.drop(["nameOrig","nameDest"],axis = 1)

In [None]:
fraud.columns ##We the removed Useless Columns

***Converting categorical to numeric***

In [None]:
##We have type row which is object Type
fraud = pd.get_dummies(fraud, columns=['type'], drop_first=True)

***Defining X and Y***

In [None]:
X = fraud.drop("isFraud",axis = 1)

Y = fraud["isFraud"]

***Test Train Split***

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y,test_size=0.2,random_state=42,stratify=Y)


print("\nTrain shape:", X_train.shape)

print("Test shape:", X_test.shape)

***Train Model***

In [None]:
model = RandomForestClassifier(

n_estimators=100,

max_depth=10,

random_state=42,

n_jobs=-1

)

model.fit(X_train, y_train)

***Prediction***

In [None]:
y_pred = model.predict(X_test)

***Evaluation***

In [None]:
print("\nConfusion Matrix:")

print(confusion_matrix(y_test, y_pred))


print("\nClassification Report:")

print(classification_report(y_test, y_pred))


print("\nPrecision:", precision_score(y_test, y_pred))

print("Recall:", recall_score(y_test, y_pred))

print("F1 Score:", f1_score(y_test, y_pred))

***Feature Importance***

In [None]:
importance = pd.Series(

model.feature_importances_,

index=X.columns

).sort_values(ascending=False)


print("\nFeature Importance:")

print(importance)

***Most important feature list***

In [None]:
importance.head(10).plot(

kind='bar',

title="Top 10 Important Features"

)

plt.show()

***Displaying Fraud Versus Transfer***

In [None]:
fraud_by_type = pd.crosstab(

fraud['isFraud'],

fraud['type_TRANSFER']

)

print("\nFraud vs Transfer:")

print(fraud_by_type)


***Save Model***

In [None]:
import joblib

joblib.dump(model, "fraud_model.pkl")


print("\nModel saved successfully!")

# Describe your fraud detection model in elaboration

***In this project, a supervised machine learning classification model was developed to predict fraudulent transactions. Since the target variable isFraud contains two classes (0 = Not Fraud, 1 = Fraud), this is a binary classification problem.***

***First, the dataset was cleaned by removing irrelevant variables such as transaction IDs and handling missing values. Feature scaling was not required for tree-based models. The dataset was then divided into training and testing sets to evaluate the model performance on unseen data.***

***A Random Forest Classifier was used for fraud detection. Random Forest is an ensemble learning method that combines multiple decision trees and makes predictions based on majority voting. This model was chosen because it performs well on large datasets, handles imbalanced data effectively, and reduces overfitting.***

***The model was trained using the training dataset and tested using the testing dataset. After training, predictions were made and evaluated using performance metrics such as Accuracy, Precision, Recall, and F1-Score.***

***Random Forest also provides feature importance, which helps identify the most important factors contributing to fraud detection.***

***This model successfully identified fraudulent transactions with high accuracy and reliability.***



# Question 3: How did you select variables to be included in the model?

***Variable selection was done based on data understanding, correlation analysis, and feature importance.***

***First, irrelevant columns such as transaction ID, nameOrig, and nameDest were removed because they do not contribute to fraud prediction.***

***Then, correlation analysis was performed to check the relationship between independent variables and the target variable. Highly correlated features with fraud were considered important.***

***After training the Random Forest model, feature importance scores were used to identify the most significant variables.***

***The most important variables selected were:***

***amount***

***oldbalanceOrg***

***newbalanceOrig***

***oldbalanceDest***

***newbalanceDest***

***type***

***These variables were selected because they represent transaction amount and balance changes, which are strong indicators of fraud.***

***Removing irrelevant variables improved model performance and reduced complexity.***

# Question 5: What are the key factors that predict fraudulent customers?

***Based on the model and feature importance analysis, the following key factors were identified as strong predictors of fraud:***

***1. Transaction Amount***

***Fraudulent transactions usually involve large amounts compared to normal transactions.***

***2. Old Balance and New Balance of Sender***

***Fraud transactions often show unusual balance changes, such as balance becoming zero after transfer.***

***3. Recipient Balance***

***Fraud accounts often receive sudden large amounts of money.***

***4. Transaction Type***

***Fraud is more common in TRANSFER and CASH_OUT transaction types.***

***5. Balance Difference***

***Large differences between old balance and new balance indicate suspicious activity.***

***These factors help the model identify unusual patterns that indicate fraud.***

# Question 7: What kind of prevention should be adopted while the company updates its infrastructure?

***To prevent fraudulent transactions, the company should implement a real-time fraud detection system using the developed machine learning model. This system should automatically analyze each transaction and assign a fraud probability score. Transactions with high fraud probability can be temporarily blocked or flagged for manual verification. This will help stop fraud before it happens.***

***The company should also implement transaction limits for new users and unusual transactions. For example, if a customer suddenly transfers a large amount that is much higher than their normal transaction pattern, the system should request additional authentication such as OTP verification.***

***Another important prevention method is continuous monitoring of customer behavior. Fraudulent transactions often show unusual patterns such as transferring money to new accounts, rapid multiple transactions, or large transaction amounts. The company should track these behavioral changes and generate alerts.***

***The company should also regularly update and retrain the fraud detection model using new transaction data. Fraud patterns change over time, so updating the model will help maintain accuracy.***

***Finally, strong security measures such as multi-factor authentication, secure login systems, and customer awareness programs should be implemented to reduce fraud risk.***

# Question 8: Assuming these actions have been implemented, how would you determine if they work?

***To determine whether the fraud prevention system is working, the company should track key performance metrics.***

***First, the company should monitor the number of fraudulent transactions detected before and after implementing the model. A reduction in successful fraud transactions indicates the system is effective.***

***Second, evaluation metrics such as Accuracy, Precision, Recall, and F1-Score should be used. Recall is especially important because it measures how many fraud cases are correctly detected. A high recall means fewer fraud cases are missed.***

***Third, the company should monitor the False Positive Rate. This measures how many normal transactions are incorrectly flagged as fraud. A very high false positive rate can affect customer experience, so it should be balanced.***

***Fourth, the total financial loss due to fraud should be tracked. If the loss decreases after implementation, it shows the system is successful.***

***Finally, the system should be continuously monitored and compared over time. Regular performance evaluation and model retraining will ensure long-term effectiveness.***

# Conclusion

****In this project, a machine learning model was successfully developed to detect fraudulent transactions. The dataset was cleaned and preprocessed by handling missing values and removing irrelevant features. A Random Forest Classifier was used to build the fraud detection model, and its performance was evaluated using metrics such as Accuracy, Precision, Recall, and F1-Score.****

****The model identified important factors that contribute to fraud, such as transaction amount, balance differences, and transaction type. Based on these findings, appropriate prevention strategies such as real-time monitoring, transaction verification, and continuous model updating were recommended.****

****All the required tasks, including data cleaning, model building, performance evaluation, identification of key fraud factors, and prevention recommendations, have been successfully completed.****

****This project demonstrates how machine learning can be effectively used to detect and prevent fraudulent financial transactions.****
