## Media Campaign Cost Prediction

Predecir el costo de ejecutar una campaña en los medios para **FoodMarts** de EE.UU.

#### Acerca del DataFrame

**Food Mart (CFM)** es una cadena de tiendas de conveniencia de Estados Unidos. La sede de la empresa privada está ubicada en Mentor, Ohio, y actualmente hay aproximadamente 325 tiendas ubicadas en Estados Unidos. Convenient Food Mart opera en el sistema de franquicia.

Food Mart era la tercera cadena de tiendas de conveniencia más grande del país en 1988.

**El objetivo es entrenar un modelo de Machine Learning que nos ayude a predecir el costo de las campañas de medios en los mercados de alimentos en función de las funciones proporcionadas.**

Fuente: [Kaggle - Media Campaign Cost Prediction](https://www.kaggle.com/datasets/gauravduttakiit/media-campaign-cost-prediction)

- **Exploratory Data Analysis**:
    - Estudiar la relación de cada columna con la columna objetivo (**cost**) usando visualizaciones.
    - Verificar NaN's o duplicados.
    - Mostrar la matriz de correlación.
    - Mostrar el total de elementos únicos por columna.
    - ¿Cuales columnas son categóricas?
    - ¿Exite alguna columna que esta repetida?
    - ¿El modelo es una Regresión o una Clasificación?
    - Usar RandomForest para ver el **_feature_importance_** de las columnas.
<br>
- **Machine Learning**:
    - Hacer **_train_test_split_**. Usa **_random_state = 42_**
    - Probar todos los modelos y calcular las métricas para encontrar el que mejor se adapta a los datos.
    - Hacer **_GridSearchCV_** para encontrar los mejores parámetros para ese modelo.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Normalizacion
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# GridSearchCV
from sklearn.model_selection import GridSearchCV

# Train, Test
from sklearn.model_selection import train_test_split

# Metricas para regresiones
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

# Metricas para Clasificadores
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import jaccard_score

# Modelos
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.svm import SVR, SVC
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.ensemble import AdaBoostRegressor, AdaBoostClassifier
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier

# Validacion
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import KFold

#### Descripción de las columnas:

|Feature                        |Description                                              |
|-------------------------------|---------------------------------------------------------|
|**store_sales(in millions)**   |Store sales in millions.                                 |
|**unit_sales(in millions)**    |Quantity of units sold in millions.                      |
|**total_children**             |Total children in home.                                  |
|**num_children_at_home**       |Total children at home as per customer filled details.   |
|**avg_cars_at_home(approx).1** |Average cars at home.                                    |
|**Num_children_at_home**       |num_children_at_home AS PER CUSTOMERS FILLED DETAILS     |
|**gross_weight**               |Gross weight of an item.                                 |
|**recyclable_package**         |If the package of the food item is recycleble 1 or not 0.|
|**low_fat**                    |If an item is a low fat 1 or not 0.                      |
|**units_per_case**             |Units/case units available in each store shelves.        |
|**store_sqft**                 |Store area available in sqft.                            |
|**coffee_bar**                 |If a store has a coffee bar available 1 or not 0.        |
|**video_store**                |If a video store/gaming store is available 1 or not 0.   |
|**salad_bar**                  |If a salad bar is available in a store 1 or not 0.       |
|**prepared_food**              |If a prepared food is available in a store 1 or not 0.   |
|**florist**                    |If flower shelves are available in a store 1 or not 0.   |
|**cost**                       |Cost on acquiring a customers in dollars. (**target**)   |

In [2]:
df = pd.read_csv("media_campaign_dataset.csv")

df.head(3)

Unnamed: 0,store_sales(in millions),unit_sales(in millions),total_children,num_children_at_home,avg_cars_at home(approx).1,gross_weight,recyclable_package,low_fat,units_per_case,store_sqft,coffee_bar,video_store,salad_bar,prepared_food,florist,cost
0,2.68,2.0,1.0,0.0,2.0,6.3,1.0,0.0,22.0,30584.0,1.0,1.0,1.0,1.0,1.0,79.59
1,5.73,3.0,5.0,5.0,3.0,18.7,1.0,0.0,30.0,20319.0,0.0,0.0,0.0,0.0,0.0,118.36
2,2.62,2.0,1.0,1.0,1.0,9.21,0.0,0.0,9.0,20319.0,0.0,0.0,0.0,0.0,0.0,67.2
