### Online Shoppers Intention

- **Dataset Descriptions**: This dataset contains feature vectors for 12,330 sessions, each representing a different user over a 1-year period. The data is curated to avoid bias towards specific campaigns, special days, user profiles, or periods.

Source: [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intention+dataset)

| Feature                     | Description                                                                                                          |
|-----------------------------|----------------------------------------------------------------------------------------------------------------------|
| **Administrative**              | The number of pages of this type (administrative) visited by the user in that session.                                |
| **Administrative_Duration**     | The total amount of time (in seconds) spent by the user on administrative pages during the session.                  |
| **Informational**               | The number of informational pages visited by the user in that session.                                                |
| **Informational_Duration**      | The total time spent by the user on informational pages.                                                               |
| **ProductRelated**              | The number of product-related pages visited by the user.                                                              |
| **ProductRelated_Duration**     | The total time spent by the user on product-related pages.                                                             |
| **BounceRates**                 | The average bounce rate of the pages visited by the user. The bounce rate is the percentage of visitors who navigate away from the site after viewing only one page. |
| **ExitRates**                   | The average exit rate of the pages visited by the user. The exit rate is a metric that shows the percentage of exits from a page. |
| **PageValues**                  | The average value of the pages visited by the user. This metric is often used as an indicator of how valuable a page is in terms of generating revenue. |
| **SpecialDay**                  | Indicates the closeness of the site visiting time to a specific special day (e.g., Mother’s Day, Valentine's Day) in which the sessions are more likely to be finalized with a transaction. |
| **Month**                       | The month of the year in which the session occurred.                                                                  |
| **OperatingSystems**            | The operating system used by the user.                                                                               |
| **Browser**                     | The browser used by the user.                                                                                        |
| **Region**                      | The region from which the user is accessing the website.                                                              |
| **TrafficType**                 | The type of traffic (e.g., direct, paid search, organic search, referral).                                           |
| **VisitorType**                 | A categorization of users (e.g., Returning Visitor, New Visitor).                                                    |
| **Weekend**                     | A boolean indicating whether the session occurred on a weekend.                                                       |
| **Revenue**                     | A binary variable indicating whether the session ended in a transaction (purchase).                                   |


- Objetivos:

0. Columna a predecir: **Revenue**
1. Limpieza de Datos.
2. Exploratory Data Analysis.
3. Probar modelos de clasificación.
4. Mostrar el **Feature Importance** para las columnas.
5. Aplicar **_SMOTE_** para balanceo de clases y repetir modelos.
6. Definir una red neuronal para clasificación (dataset original y dataset balaceado).
7. Para ambos modelos de redes neuronales calcular: _**confusion_matrix**_, _**roc_auc**_, _**f1-score**_, _**recall**_, _**precision**_, _**accuracy**_.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv("Data/online_shoppers_intention.csv")

df

Unnamed: 0,Administrative,Administrative_Duration,Informational,Informational_Duration,ProductRelated,ProductRelated_Duration,BounceRates,ExitRates,PageValues,SpecialDay,Month,OperatingSystems,Browser,Region,TrafficType,VisitorType,Weekend,Revenue
0,0,0.0,0,0.0,1,0.000000,0.200000,0.200000,0.000000,0.0,Feb,1,1,1,1,Returning_Visitor,False,False
1,0,0.0,0,0.0,2,64.000000,0.000000,0.100000,0.000000,0.0,Feb,2,2,1,2,Returning_Visitor,False,False
2,0,0.0,0,0.0,1,0.000000,0.200000,0.200000,0.000000,0.0,Feb,4,1,9,3,Returning_Visitor,False,False
3,0,0.0,0,0.0,2,2.666667,0.050000,0.140000,0.000000,0.0,Feb,3,2,2,4,Returning_Visitor,False,False
4,0,0.0,0,0.0,10,627.500000,0.020000,0.050000,0.000000,0.0,Feb,3,3,1,4,Returning_Visitor,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12325,3,145.0,0,0.0,53,1783.791667,0.007143,0.029031,12.241717,0.0,Dec,4,6,1,1,Returning_Visitor,True,False
12326,0,0.0,0,0.0,5,465.750000,0.000000,0.021333,0.000000,0.0,Nov,3,2,1,8,Returning_Visitor,True,False
12327,0,0.0,0,0.0,6,184.250000,0.083333,0.086667,0.000000,0.0,Nov,3,2,1,13,Returning_Visitor,True,False
12328,4,75.0,0,0.0,15,346.000000,0.000000,0.021053,0.000000,0.0,Nov,2,2,3,11,Returning_Visitor,False,False
