# Business Intelligence Test

### Name: RAFAEL ALEJANDRO MARTINEZ VASQUEZ
#### Date: 18-06-2024

Instructions: This exam consists of problems to be solved by writing code in Python. Be sure to explain clearly wherever necessary and to comment your code.

**Problem 1**
Let’s take a square of length 1 m , that contains inside it a circle of diameter 1 m, as shown in the figure below.

<!-- ![image](circle.jpg) -->
<p align="center">
    <img src="circle.jpg" alt="drawing" width="200"/>
</p>

a) Write a code to determine, through simulations, the probability that if we pick a random point inside the square, it lies inside the circle. Calculate this probability using at least three different numbers of iterations, with distinct orders of magnitude.
Analytically, what should this probability be?

In [4]:
# Generaremos puntos aleatorios con una funcion:

import numpy as np

def generadorAleatorios(numIteraciones):
    conteoPuntosDentroCirculo = 0
    for _ in range(numIteraciones):
        # Coordenadas random de 0 a 1
        x, y = np.random.uniform(0, 1), np.random.uniform(0, 1)
        # Recordar que la ecuacion de un circulo es: (x-h)^2 + (y-k)^2 = r^2
        if (x - 0.5)**2 + (y - 0.5)**2 <= 0.25:
            conteoPuntosDentroCirculo += 1
    return conteoPuntosDentroCirculo / numIteraciones

# Suponemos un total de iteraciones de 10^7
numIteraciones = 10**7

# Calculamos la probabilidad
probability = generadorAleatorios(numIteraciones)
print(f"Iteraciones: {numIteraciones}, Probabilidad Estimada: {probability}")


Iteraciones: 10000000, Probabilidad Estimada: 0.7854487


b) Analytically, what should this probability be?

La probabilidad es simplemente la fraccion entre ambas areas:
    $$P = AreaCirculo / AreaCuadrado$$

1. Area del Circulo: 
    $$ π * radio^2 = π * 0.5^2 = π * 1/4$$
2. Area del cuadrado: 
    $$ Lado * Lado = 1m * 1m = 1m^2$$

Esto da como resultado:
    $$P = (π/4) / 1 = π/4$$


c) According to b), the code you wrote in a) can be used to approximate what famous mathematical constant? Modify the number of iterations in your code until you can approximate this number correctly to four decimal places.

In [9]:
import math
resultado = (math.pi)/4
diferencia = resultado - probability
print(f"La probabilidad estimada es de 0.7854, cuando la de la formula es: ", {resultado}, ", dandonos una diferencia de", {diferencia}, "entre las dos.")


La probabilidad estimada es de 0.7854, cuando la de la formula es:  {0.7853981633974483} , dandonos una diferencia de {-5.0536602551720655e-05} entre las dos.


How many iterations did you require?

10,000,000

**Problem 2**
Suppose an IPv42 address is represented by 4 decimal numbers from 0 to 420, separated by periods “.”, e.g. 192.10.24.2. An IPv62 address is represented by 8 groups of 4 numbers in base 18, separated by colons “:”, e.g. 2001:0db8:85a3:0000:0000:8a2e:0370:7334 is a valid address. The zeros to the left of the digits can be omitted and they are not case sensitive, so that 2001:db8:85a3::8A2E:370:7334 is also a valid address. For simplicity, for this problem we will not allow the case where we have double colons: “::”. Write a function that verifies if an address is IPv42, IPv62, or neither.

In [18]:
# IPv42:
#   4 decimales
#   Separador: .
#   Rango: 0-420

import re

# Validar direccion para IP42
def direccionValidaIp42(direccion):
    # Separador
    parts = direccion.split('.')
    # Grupos
    if len(parts) != 4:
        return False
    # Validar rango
    for part in parts:
        if not part.isdigit() or not (0 <= int(part) <= 420):
            return False
    return True

# IPv62:
#  8 grupos
#  1-4 digitos base 18 (0-9, a-h) 
#  Separador: :
#  Manusculas no importa
#  No permitir doble ::

# Validar direccion para IP64
def direccionValidaIp62(direccion):
    # No concideramos doble ::
    if '::' in direccion:
        return False
    # Separador
    parts = direccion.split(':')
    # Grupos
    if len(parts) != 8:
        return False
    for part in parts:
        # Numero de digitos
        if len(part) == 0 or len(part) > 4:
            return False
        # Tenga base 18
        if not re.match(r'^[0-9a-hA-H]+$', part):
            return False
    return True

# Juntar ambas funciones
def validar_direccionIP(address):
    if direccionValidaIp42(address):
        return "IPv42"
    elif direccionValidaIp62(address):
        return "IPv62"
    else:
        return "neither"


Finally, take the following list of possible IPs:

In [15]:
IP_list = [
'197.10.2462.2',
'2001:0dbi:85a3:0000:0000:8f2e:0370:7334',
'321.10.324.2',
'2001:0db8:85a3:0000:0000:8a2e:0370:7334:8412',
'192.10.24.2.5',
'3001:1db8:85b3:0020:4509:8a2f:0370',
'194.102.245.251:7334',
'3001:1db8:85b3:0020:45709:8a2f:0370',
'192.10.2.4.2.5'
]

And output a pandas DataFrame containing two columns: one named “String”, containing the strings in the above list, and another named “IP type”, indicating what type of IP it is (IPv42, IPv62, or neither).

In [19]:
import pandas as pd

data = {
    "String": IP_list,
    "IP type": [validar_direccionIP(ip) for ip in IP_list]
}

df = pd.DataFrame(data)
df

Unnamed: 0,String,IP type
0,197.10.2462.2,neither
1,2001:0dbi:85a3:0000:0000:8f2e:0370:7334,neither
2,321.10.324.2,IPv42
3,2001:0db8:85a3:0000:0000:8a2e:0370:7334:8412,neither
4,192.10.24.2.5,neither
5,3001:1db8:85b3:0020:4509:8a2f:0370,neither
6,194.102.245.251:7334,neither
7,3001:1db8:85b3:0020:45709:8a2f:0370,neither
8,192.10.2.4.2.5,neither


**Problem 3**
You are given the task of creating a very simple scoring algorithm, that will be given a set of weights and a set of ‘subscores’, and should output a final score, which is just the dot product of the weights and subscores.
For example, let’s take the weights for a 'Demographic Score', as given by the following dictionary:

In [20]:
dem_weights = {"age": 0.3,
            "gender": 0.2,
            "location":{"security":0.5,
                        "schooling":0.3,
                        "bancarization":0.2
                        }
            }

The subscores for the Demographic Score will be given by a dictionary with the same structure, but different values, for example:

In [21]:
dem_subscores = {"age": 60,
            "gender": 45,
            "location":{"security":23,
                        "schooling":46,
                        "bancarization":39
                        }
            }

The Final Score for these particular weights and subscores should be, taking the product of the matching keys in both dictionaries, 60.1

Define a class, called 'Model', that takes as arguments both the weights and subscores of a given user, for any given model. Construct a method called get_finalscore() that returns the final score exactly as previously described: a dot product between the keys of both dictionaries.
Keep in mind that the structure of the dictionaries can vary between models.

In [23]:
class Model:
    # Iniciamos la clase
    # utilizando nombres parecidos a los diccionarios
    def __init__(self, weights, subscores):
        self.weights = weights
        self.subscores = subscores
    
    # Bajamos todos los elementos del diccionario al mismo nivel
    def flatten_dict(self, dictionary, parent_key='', sep='_'):
        items = []
        # Iteramos en el diccionario
        for k, v in dictionary.items():
            # Creamos nuevas llaves
            new_key = f"{parent_key}{sep}{k}" if parent_key else k
            if isinstance(v, dict):
                # Si hay una llave dentro de una llave se llama a la funcion de nuevo
                items.extend(self.flatten_dict(v, new_key, sep=sep).items())
            else:
                # Agregamos la nueva llave con el valor a la lista
                items.append((new_key, v))
                
        # Regresamos la lista pero en forma de diccionario
        return dict(items)
    
    def scoreFinal(self):
        # Nivelamos los diccionarios que nos dan
        flat_weights = self.flatten_dict(self.weights)
        flat_subscores = self.flatten_dict(self.subscores)
        # Realizamos la operacion
        final_score = sum(flat_weights[key] * flat_subscores[key] for key in flat_weights if key in flat_subscores)
        return final_score


model = Model(dem_weights, dem_subscores)
print(model.scoreFinal())  # Output esperado: 60.1


60.099999999999994


Obtain the final score for the following models by constructing objects of the Model class:
- Demographic Model (you can find an example of weights in the dem_weights json file)
- Credit Model (you can find an example of weights in the credit_weights json file)
- Transactions Model (you can find an example of weights in the txns_weights json file)

Your code should work for the examples given, and any other set of weights/scores, with different structures.

# Nota: Solo porporcionan los pesos relativos, pero no un dict para los subscores, por lo que tome el mismo dict de pesos para estos ultimos.

In [27]:
import json

with open('dem_weights.json', 'r') as fileDemo:
    # Paso 2: Cargar el contenido JSON en un diccionario
    dataDemo = json.load(fileDemo)


demo_weights = dataDemo
demo_subscore = dataDemo

modelDemographic = Model(demo_weights, demo_subscore)
print(modelDemographic.scoreFinal()) 


0.51


In [29]:
with open('credit_weights.json', 'r') as fileCred:
    # Paso 2: Cargar el contenido JSON en un diccionario
    dataCred = json.load(fileCred)


cred_weights = dataCred
cred_subscore = dataCred

modelCredit = Model(cred_weights, cred_subscore)
print(modelCredit.scoreFinal()) 

3.27


In [30]:
with open('txns_weights.json', 'r') as fileTxns:
    # Paso 2: Cargar el contenido JSON en un diccionario
    dataTxns = json.load(fileTxns)


txns_weights = dataTxns
txns_subscore = dataTxns

modelTxns = Model(txns_weights, txns_subscore)
print(modelTxns.scoreFinal()) 

0.5700000000000002
