# **Deteccion de fraude crediticio con machine learning** (Ricardo Bustos Carreón)
Partimos de un conjunto de datos con 7300 entradas y 31 variables de las cuales 29 son desconocidas, dos conocidas y el tipo de variable. De estas entradas 7000 son transacciones normales y 300 son transacciones fraudulentas, nuestro proposito es aprender a identificar si una operacion es legal o fraudulenta dado que este tipo de operaciones cuestan tanto al cliente como a la empresa, por eso comenzamos con un analisis enfocado el machine learning.

## **Analisis exploratorio de datos**
Ya conocemos nuestras variables, procederemos al analisis. No podemos basar si una transaccion es fraudulenta en funcion de la media o maxima pues la distribución del valor monetario de todas las transacciones está muy sesgada. La gran mayoría de las transacciones son relativamente pequeñas y solo una pequeña fracción de las transacciones se acerca al máximo.

Ahora, ¿Qué pasa con las distribuciones de clase? ¿Cuántas transacciones son fraudulentas y cuántas no lo son? Bueno, como podemos esperar, la mayoría de las transacciones no son fraudulentas (95.890%), mientras que solo el 4.110% fueron fraudulentas.
Visualicemoslo ahora con nuestro algoritmo: 


In [1]:
def parenthesis(string):
    string = string.replace(' ','') #Esto es por si por algun motivo hay espacios en el string
    if string.count('(') == string.count(')'): #esto es para asegurar que podemos juntar parejas de parentesis
        while len(string) > 2: #Solo nos queremos quedar con un par de parentesis
            string = string.replace('()','') #Quitaremos las parejas ya hechas
    return string.count('()') == 1 and len(string) == 2 #Si solo nos queda una pareja y nuestro string es de tamaño 2 entonces tenemos un string correcto

### Pruebas
print (parenthesis('(())((()())())')) #true
print (parenthesis('()')) #true
print (parenthesis(')(()))')) #false
print (parenthesis(')(')) #false
print (parenthesis('(')) #false
print (parenthesis(')((')) #false
print (parenthesis(')))))))))')) #false


El data frame tiene 7300 filas y 31 columnas.
         Var        V1        V2        V3  ...       V27       V28  Amount  Class
1652  105397  1.287631  0.410746  0.151841  ...  0.030363  0.020647    6.95      0
4902  129463  1.454601 -1.063245  0.440964  ...  0.021222  0.011972   25.00      0
2568   15877  1.219136  0.535374 -0.482669  ... -0.041098  0.032509    0.76      0
947   169560 -1.317357  1.609457 -1.498172  ...  0.273136 -0.035843    8.97      0
839   233197  2.113497  0.575076 -2.803749  ...  0.023942 -0.002685    1.00      0

[5 rows x 31 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7300 entries, 0 to 7299
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Var     7300 non-null   int64  
 1   V1      7300 non-null   float64
 2   V2      7300 non-null   float64
 3   V3      7300 non-null   float64
 4   V4      7300 non-null   float64
 5   V5      7300 non-null   float64
 6   V6      7300 non-null   f

## **Algoritmo nivel 2**
### **Description:**
Your task is write a function which formats a duration, given as a number of seconds, in a human-friendly way.
The function must accept a non-negative integer. If it is zero, it just returns 'now'. Otherwise, the duration is expressed as a combination of years, days, hours, minutes and seconds.
For the purpose of this problem, a year is 365 days and a day is 24 hours.
Note that spaces are important.
### Detailed rules
The resulting expression is made of components like 4 seconds, 1 year, etc. In general, a positive integer and one of the valid units of time, separated by a space. The unit of time is used in plural if the integer is greater than 1.
The components are separated by a comma and a space (', '). Except the last component, which is separated by ' and ', just like it would be written in English.
A more significant units of time will occur before than a least significant one. Therefore, 1 second and 1 year is not correct, but 1 year and 1 second is.
Different components have different unit of times. So there is not repeated units like in 5 seconds and 1 second.
A component will not appear at all if its value happens to be zero. Hence, 1 minute and 0 seconds is not valid, but it should be just 1 minute.
A unit of time must be used 'as much as possible'. It means that the function should not return 61 seconds, but 1 minute and 1 second instead. Formally, the duration specified by of a component must not be greater than any valid more significant unit of time.


In [2]:
### Algoritmo nivel 2
import math #Libreria a usar
from math import * #lo que importaremos de la libreria
def time(n): 
    if n == 0:
        return 'now' #si el tiempo es igual a cero inmediatamente nos arroja el ahora
    else:
        diva = math.trunc(n/31536000) #divisivilidad por años
        n = n-diva*31536000 #Se modifica el valor de n
        divd = math.trunc(n/86400) #divisivilidad por dias
        n = n-divd*86400 #Se modifica el valor de n
        divh = math.trunc(n/3600) #divisivilidad por horas
        n = n-divh*3600 #Se modifica el valor de n
        divm = math.trunc(n/60) #divisivilidad por minutos
        divs = n-divm*60 #divisivilidad por segundos
        divlists = [diva,divd,divh,divm,divs] #En esta lista guardamos nuestras unidades de tiempo
        year = str(diva)+' years' if diva > 1 or diva < -1 else str(diva)+' year' #Convertimos a strings
        day = str(divd)+' days' if divd > 1 or divd < -1 else str(divd)+' day' #Convertimos a strings
        hour = str(divh)+' hours' if divh > 1 or divh < -1 else str(divh)+' hour' #Convertimos a strings
        minute = str(divm)+' minutes' if divm > 1 or divm < -1 else str(divm)+' minute' #Convertimos a strings
        second = str(divs)+' seconds' if divs > 1 or divs < -1 else str(divs)+' second' #Convertimos a strings
        tl = [year,day,hour,minute,second] #Lista de unidades de tiempo
        date = [tl[i] for i in range(len(divlists)) if divlists[i] != 0] #aqui seleccionamos solo a los que tienen tiempos distintos de cero
        predate = [', '.join(date[:len(date)-1]), date[-1]] #unimos en un string
        return ' and '.join(predate)

### Pruebas
print (time(31536000+7200+120+59))
print (time(62))
print (time(3662))
print (time(0))
print (time(-31536000-7200-120-59))
print (time(-62))
print (time(-3662))
print (time(-0))


1 year, 2 hours, 2 minutes and 59 seconds
1 minute and 2 seconds
1 hour, 1 minute and 2 seconds
now
-1 year, -2 hours, -2 minutes and -59 seconds
-1 minute and -2 seconds
-1 hour, -1 minute and -2 seconds
now


## **Algoritmo nivel 3**
### **Description:**
My friend John likes to go to the cinema. He can choose between system A and system B.
System A : he buys a ticket (15 dollars) every time
System B : he buys a card (500 dollars) and a first ticket for 0.90 times the ticket price, then for each additional ticket he pays 0.90 times the price paid for the previous ticket.
John wants to know how many times he must go to the cinema so that the final result of System B, when rounded up to the next dollar, will be cheaper than System A.
The function movie has 3 parameters: card (price of the card), ticket (normal price of ticket), perc (fraction of what he paid for the previous ticket) and returns the first n such that


In [2]:
### Algoritmo nivel 3

def movie(card, ticket, perc):
    SA = ticket
    SB = card#+ticket*(perc)
    count = 1
    while SB > SA:
        SA = ticket*count
        SB = SB+ticket*(perc**count)
        count+=1
    return 'You must go '+str(count-1)+' times to the cinema, with card the total price is '+str(round(SB))+', with tickets '+str(SA) #es count-1 porque el while suma 1 al final del ciclo

### Pruebas
print (movie(500,15,0.9))
print (movie(100,10,0.95))


You must go 43 times to the cinema, with card the total price is 634, with tickets 645
You must go 24 times to the cinema, with card the total price is 235, with tickets 240
