# Case de Prevenção à Fraudes @ Mercado Livre

Welcome to FREEDOM, a leader in the online commerce and payment systems market in your region, with over 25 years of history and a successful growth track record.

At FREEDOM, Fraud Prevention is the reference sector internally and externally on all security topics for transactions within its ecosystem. It ensures the security of users' operations through the application of tools, predictive-analytical models, and innovative technologies. It is one of the company's areas that allows taking risks and accelerating growth.
As a new member of the FREEDOM Fraud Prevention team, you have been allocated to your first data science project, congratulations!
For this project, the Analytics team has already extracted a sample of transactions that have been labeled as Fraud (fraud column with a value equal to 1) or Non-Fraud (fraud column with a value equal to 0). They also included the scores from an existing fraud predictive model in production (score column).

**Your main objectives in this project are:**
1. *Establish the baseline*: for this, it is necessary to evaluate the performance of the current machine learning model that is in production.
    - What is the predictive performance of the model? Which metrics are most appropriate for this analysis and why?
2. *Train new machine learning model(s) for fraud prediction*: those are going to be assessed as candidates to replace the current one.
   - You are free to generate new features if appropriate, and include any technique or analysis you believe suits the case. 
    - Specify the techniques and algorithms used for both: data preprocessing stage, and and to train the new model(s)? Comment on your decisions.
    - Explicitly compare the new model(s) you trained with the current model in terms of predictive performance?
3. *Business metrics and decision making based on models*: define a cutoff point to reject transactions based on the output of the model
Consider that FREEDOM's commission rate is 5% on the value of a correctly approved payment (monto column), and for each approved fraudulent transaction we lose 100% of the payment value. The cutoff point should maximize FREEDOM's profit based on this definition.

**Deliverables**:
- Jupyter notebook properly commented and submitted.
Requisites:
The notebook must be self contained, including the solutions, visualizations, and decisions made during the resolution of the case. 
Generate appropriate figures/graphs, use markdown annotations, and explain the necessary inferences. 
Any person should be able to read the notebook and understand each of the development stages and the reasoning behind them. You will be also evaluated based on the clarity, effective visualizations, and story telling of your analysis and modeling process.

**Files**:
- `meli_fraud_prevention_case.ipynb`: Python notebook containing the case description and information on how to install packages and read the dataset to be used.
- `fraud_dataset_v2.csv`: accessible at https://limewire.com/d/c4a2438f-b41d-40ee-955a-6d0b180b1a2d#mPBNISG8nbrYtdavEhExOXhEPrT0ls8QaOwfFkjyaG8
  
**Other instructions:**
- Feel free to create and use other files (folders, notebooks, scripts, etc.).
- When you are done, create a zip file with all your resources (all files required to run and understand your solution) and send us in reply to your email.

Last, but not least: have fun :) 

In [1]:
# If you'd like to install packages that aren't installed by default, list them here.
# This will ensure your notebook has all the dependencies and works everywhere

import sys
!{sys.executable} -m pip install xgboost sklearn

Collecting sklearn
  Downloading sklearn-0.0.post12.tar.gz (2.6 kB)
  Preparing metadata (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[15 lines of output][0m
  [31m   [0m The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
  [31m   [0m rather than 'sklearn' for pip commands.
  [31m   [0m 
  [31m   [0m Here is how to fix this error in the main use cases:
  [31m   [0m - use 'pip install scikit-learn' rather than 'pip install sklearn'
  [31m   [0m - replace 'sklearn' by 'scikit-learn' in your pip requirements files
  [31m   [0m   (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  [31m   [0m - if the 'sklearn' package is used by one of your dependencies,
  [31m   [0m   it would be great if you take some time to track which package uses
  [31m   [0m   'sklearn' instead of 'scikit-lea

In [12]:
# Import libraries

import pandas as pd
import matplotlib.pyplot as plt

In [130]:
# Attention! Please use this cell to load the dataset that will be needed in this case.

df = pd.read_csv('fraud_dataset_v2.csv')

In [131]:
df.head()

Unnamed: 0.1,Unnamed: 0,a,b,c,d,e,f,g,h,i,...,n,o,p,q,r,s,fecha,monto,score,fraude
0,0,4,0.7685,94436.24,20.0,0.444828,1.0,BR,5,Máquininha Corta Barba Cabelo Peito Perna Pelo...,...,1,,N,0.4,94436,0,2020-03-27 11:51:16,5.64,66.0,0
1,1,4,0.755,9258.5,1.0,0.0,33.0,BR,0,Avental Descartavel Manga Longa - 50 Un. Tnt ...,...,1,Y,N,0.02,9258,0,2020-04-15 19:58:08,124.71,72.0,0
2,2,4,0.7455,242549.09,3.0,0.0,19.0,AR,23,Bicicleta Mountain Fire Bird Rodado 29 Alumini...,...,1,,N,0.06,242549,0,2020-03-25 18:13:38,339.32,95.0,0
3,3,4,0.7631,18923.9,50.0,0.482385,18.0,BR,23,Caneta Delineador Carimbo Olho Gatinho Longo 2...,...,1,,Y,0.98,18923,100,2020-04-16 16:03:10,3.54,2.0,0
4,4,2,0.7315,5728.68,15.0,0.0,1.0,BR,2,Resident Evil Operation Raccoon City Ps3,...,1,,N,0.28,5728,0,2020-04-02 10:24:45,3.53,76.0,1
