![Flowcast](https://raw.githubusercontent.com/interviewquery/takehomes/flowcast_1/flowcast_1/logo.png)
Please follow the instructions below to complete the data science
challenge. We expect it to take no longer than 1-2 hours. However, for
your convenience, we allow 24 hours from the time of receipt to submit
your code.

# Data Science Interview Challenge

The goal of this challenge is to produce a sample of your machine
learning skills as well as your ability to write clean, readable, and
well-documented code.

# Problem Statement

You will use credit card transactions data to detect fraud. Fraud can
take many forms, whether it is someone stealing a single credit card, to
large batches of stolen credit card numbers being used on the web, or
even a mass compromise of credit card numbers stolen from a merchant via
tools like credit card skimming devices. Note that this dataset loosely
resembles real transactional data, but the entities and relations within
are purely fictional. Please complete the following tasks:

1.  Load the data in `transactions.json`

2.  Briefly describe the data structure, provide summary statistics, and
    at least one plot.


3.  Build a model to determine whether a given transaction is fraudulent
    or not. Each of the transactions in the dataset has a field called
    isFraud.


    1.  Hint: This task should involve some feature engineering although
        it can be limited.


4.  Provide an evaluation of model performance.


5.  Document your solution. Hint: Documentation should include:


    1.  Assumptions about the data


    2.  Explanations of methods i.e. algorithm used and why, what
        features were important, and any alternative methods considered


    3.  Conclusions about your solution


    4.  Further steps you would take if given more time


# Important Guidelines

Although your submission will be in a Python Notebook, we wish for you to use any
tools you are comfortable with.

To provide a fair evaluation, we ask that you:

-   Refrain from receiving help from others on this challenge.


-   Stick to the time limit of 1-2 hours. We will evaluate your
    submission knowing that you had limited time and expect a simple yet
    complete and working solution.



In [1]:
!git clone --branch flowcast_1 https://github.com/interviewquery/takehomes.git
%cd takehomes/flowcast_1
!ls

Cloning into 'takehomes'...
remote: Enumerating objects: 1968, done.[K
remote: Counting objects: 100% (1968/1968), done.[K
remote: Compressing objects: 100% (1222/1222), done.[K
remote: Total 1968 (delta 755), reused 1933 (delta 729), pack-reused 0 (from 0)[K
Receiving objects: 100% (1968/1968), 299.41 MiB | 15.58 MiB/s, done.
Resolving deltas: 100% (755/755), done.
/content/takehomes/flowcast_1
logo.png  metadata.json  takehomefile.ipynb  transactions.json


In [5]:
# import the neccessary modules
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import json

In [4]:
# import the data
data = []
with open("transactions.json") as file:
    for line in file:
        data.append(json.loads(line))

In [8]:
# turn the list into a dataframe
df = pd.DataFrame(data)
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 29 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   accountNumber             100000 non-null  object 
 1   customerId                100000 non-null  object 
 2   creditLimit               100000 non-null  float64
 3   availableMoney            100000 non-null  float64
 4   currentBalance            100000 non-null  float64
 5   transactionDateTime       100000 non-null  object 
 6   transactionAmount         100000 non-null  float64
 7   merchantName              100000 non-null  object 
 8   acqCountry                100000 non-null  object 
 9   merchantCountryCode       100000 non-null  object 
 10  posEntryMode              100000 non-null  object 
 11  posConditionCode          100000 non-null  object 
 12  merchantCategoryCode      100000 non-null  object 
 13  cardPresent               100000 non-null  bo