### Detect Credit card fraud transactions

Build models to detect credit card fraud transactions. At the end of this project, customers will have created:
1. A reasonable credit card fraud detection model and training jobs in datalab.
2. A reasonable credit card fraud detection model and training jobs in GCP production environment.
3. Small UI on App Engine to show off the model.
4. Bonus: continuous training and prediction.

### Steps
1. Create google cloud storage bucket
2. Upload sample data into newly created cloud storage bucket
3. Use polyglot library to view the graphs

In [None]:
# ToDo Add code to create bucket automatically

In [None]:
# Download sample featureset data and upload it into bucket
!wget https://storage.googleapis.com/advanced-solutions-lab/fraud/creditcard.csv -O ./data/creditcard.csv

In [None]:
# ToDo Add code to upload downloaded csv into bucket

In [None]:
from __future__ import division

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv("./data/creditcard.csv", sep=",")

In [None]:
df.head()

In [None]:
df.describe()

#### Check for missing value treatment

In [None]:
# Good No Null Values!
df.isnull().sum().max()

#### List the columns

In [None]:
df.columns

In [None]:
# Percentage of frauds and no frauds distribution
print('No Frauds', round(df['Class'].value_counts()[0]/len(df) * 100,2), '% of the dataset')
print('Frauds', round(df['Class'].value_counts()[1]/len(df) * 100,2), '% of the dataset')

In [None]:
colors = ["#0101DF", "#DF0101"]

sns.countplot('Class', data=df, palette=colors)
plt.title('Class Distributions \n (0: No Fraud || 1: Fraud)', fontsize=14)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(18,4))

amount_val = df['Amount'].values
time_val = df['Time'].values

sns.distplot(amount_val, ax=ax[0], color='r')
ax[0].set_title('Distribution of Transaction Amount', fontsize=14)
ax[0].set_xlim([min(amount_val), max(amount_val)])

sns.distplot(time_val, ax=ax[1], color='b')
ax[1].set_title('Distribution of Transaction Time', fontsize=14)
ax[1].set_xlim([min(time_val), max(time_val)])



plt.show()

In [None]:
df.iloc[[0, 153758, 153759, 153760]]

In [None]:
df.to_csv('./data/creditcard_schema.csv', sep=',', encoding='utf-8', index=False)

In [None]:
!head  $PWD/data/creditcard_schema.csv

<pre>
# Copyright 2018 Atos. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
</pre>