# Business Understanding
During this stage, you should take some time to understand the questions presented by the data from a busines perspective. Many of these are very general questions and they may not be as important or applicable as some other stages of this project for training purposes, but it's good practice to begin any project with a general understanding of what problems you are trying to solve.

Consider the following questions and answer to the best of your ability, based on the data and project documentation.

#### What are the specific business objectives of this project? Define as precisely as possible.

- Improve the efficacy of fraudulent credit card transaction alerts, by creating a machine learning model that accurately predicts the probability that a given transaction is fraudulent, measured using area under the ROC curve.

#### What data are available to pursue those objectives?

- Identity data
    - The identity tables contain the following features:
        - Network connection information (IP, ISP, Proxy)
        - Digital signature (UA, Browser, OS, Version)
        - DeviceType (categorical)
        - DeviceInfo (categorical)
        - id_12 - id_38 (categorical)
- Transaction data
    - The transaction tables contain the following features:
        - TransactionDT: timedelta from a given reference datetime (not an actual timestamp)
        - TransactionAMT: transaction payment amount in USD
        - ProductCD: product code, the product for each transaction (categorical)
        - card1 - card6: payment card information, such as card type, card category, issue bank, country, etc. (categorical)
        - addr: address (categorical)
        - dist: distance
        - P_ emaildomain: purchaser email domain (categorical)
        - R__ emaildomain: recipient email domain (categorical)
        - C1-C14: counting, such as how many addresses are found to be associated with the payment card, etc. (actual meaning is masked)
        - D1-D15: timedelta, such as days between previous transaction, etc.
        - M1-M9: match, such as names on card and address, etc. (categorical)
        - Vxxx: Vesta engineered rich features, including ranking, counting, and other entity relations
- The transaction and identity tables can be joined on TransactionID

#### What resources? (programs, libraries, etc.)

- VSCode 
- Gitlab
- Anaconda Navigator
- Jupyter Notebook
- Python3
- Pandas
- Numpy
- Scikit learn
- Statsmodels
- Matplotlib 
- Seaborn
- Plotly
- Keras

#### What are the success criteria for each of the project's objectives? Define as precisely as possible.

- The success criteria of the project's objective is maximizing area under the roc curve
- This means that the best model most accurately predicts the probability that each instance (transaction) belongs to the positive class (Close to 1, or 1, for actual positives - fraudulent transactions - and close to 0 or 0 for actual negatives - non-fraudulent transactions) when evaluated against the actual observed target value of each instance across all possible decision thresholds

#### Describe the data mining problem type (regression, classification, clustering, etc.)

- This project constitues a classification data mining problem as the model should predict the probability that each transaction in the set belongs to the positive or negative class - whether a given transaction is or is not fraud(0 <= isFraud <= 1>)

#### What are the specific technical goals for the project?¶

- The specific technical goals of this project are as follows:
    - Create a binary classification model to predict whether a given transaction is fraudulent
    - The submission file should contain two columns, one with all the transaction IDs and the other with the probability that the transaction is fraudulent (between 0 and 1) with the header TransactionID, isFraud
