## Fraud Detection

In [78]:
# Load historical credit card transaction data
data_source = 'https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/download?datasetVersionNumber=3'
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('creditcard.csv')

In [79]:
df.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [80]:
df.shape

(284807, 31)

In [81]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

In [82]:
#finding missing value
df.isnull().sum().sum()

0

In [83]:
#Data Exploration and Preprocessing

In [84]:
# Perform data preprocessing and feature engineering
# Example: Normalize transaction amount
df['normalized_amount'] = (df['Amount'] - df['Amount'].mean()) / df['Amount'].std()

In [85]:
df['normalized_amount']

0         0.244964
1        -0.342474
2         1.160684
3         0.140534
4        -0.073403
            ...   
284802   -0.350150
284803   -0.254116
284804   -0.081839
284805   -0.313248
284806    0.514354
Name: normalized_amount, Length: 284807, dtype: float64

In [86]:
#SQL Database Setup

In [87]:
# Connect to SQL database and load preprocessed data
import sqlite3
conn = sqlite3.connect('fraud_detection.db')
df.to_sql('transactions', conn, if_exists='replace', index=False)

In [88]:
#Model Development
# Train a logistic regression model for fraud detection
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('Class', axis=1), data['Class'], test_size=0.2, random_state=42)
# Train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

In [89]:
#Model Evaluation
# Evaluate the model using test data
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Make predictions on test data
y_pred = model.predict(X_test)

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, F1 Score: {f1}")

Accuracy: 0.9988764439450862, Precision: 0.6770833333333334, Recall: 0.6632653061224489, F1 Score: 0.6701030927835052


In [90]:
#Model Deployment and Production
# Save the trained model for future use
import joblib
joblib.dump(model, 'fraud_detection_model.pkl')

# Load the deployed model
model = joblib.load('fraud_detection_model.pkl')

# Get new credit card transactions
new_data = pd.read_csv('creditcard.csv')

# Preprocess the new data (e.g., feature engineering, normalization)

# Make predictions using the deployed model
new_predictions = model.predict(new_data)

# Perform necessary actions based on the predictions

In [91]:
new_credit_card_data = pd.read_csv('creditcard.csv')

## Insights

In this case, the model achieves an accuracy of approximately 0.9989, indicating a high level of overall accuracy in classifying transactions.
Precision represents the model's ability to avoid false positives (i.e., correctly identifying non-fraudulent transactions as non-fraudulent). With a precision of around 0.6771, it suggests that out of all the transactions predicted as fraudulent, approximately 67.71% are actually fraudulent.
With a recall of approximately 0.6633, it indicates that the model captures around 66.33% of all actual fraudulent transactions.
With an F1 score of about 0.6701, it indicates a reasonable balance between precision and recall for the fraud detection model.

## Business Impacts and Recommendations

The insights derived from the analysis of credit card transaction data can provide valuable information for various stakeholders within financial service.

Fraudulent Transaction Patterns: By analyzing historical data and applying machine learning techniques, the business can identify patterns and characteristics associated with fraudulent transactions. This can help in understanding the common tactics and behaviors of fraudsters, enabling the development of more effective fraud detection and prevention strategies.

Customer Segmentation: Analyzing customer data can help identify different segments based on spending habits, transaction frequency, or other relevant factors. This segmentation can provide insights into customer preferences, behavior, and profitability. It can assist in targeted marketing campaigns, personalized customer experiences, and identifying potential high-value customers.

Risk Assessment: Through data analysis, you can identify high-risk transactions or customers, allowing the bank to prioritize risk management efforts. By evaluating various risk factors, such as transaction size, location, and customer history, business can create risk scores or models to help with real-time decision-making and fraud prevention.

Performance Evaluation: Analyzing the performance of various products, services, or campaigns can provide insights into their effectiveness and profitability. By tracking key performance metrics, such as conversion rates, customer acquisition costs, or revenue generated, business can make data-driven decisions to optimize business strategies and improve overall performance.

Operational Efficiency: Analyzing transaction data can provide insights into operational inefficiencies, such as transaction errors, delays, or bottlenecks. By identifying these issues, the bank can take corrective actions to streamline processes, enhance customer experience, and reduce costs.

Customer Satisfaction and Loyalty: By analyzing customer feedback, complaints, or satisfaction surveys, you can gain insights into customer sentiment and identify areas for improvement. Understanding customer needs and preferences can help enhance customer satisfaction, loyalty, and retention.