<p align="center">
<img src="https://pypi.org/static/images/logo-small.8998e9d1.svg">
</p>


<p align="justify">
👀 El objetivo es predecir si consultan o no consultan por determinado equipo usado...



 # **<font color="DarkBlue">Inventarios de Notebooks Usadas</font>**

In [None]:
import numpy as np
import pandas as pd

In [None]:
data = pd.read_csv("https://raw.githubusercontent.com/cristiandarioortegayubro/BDS/main/datasets/InventarioNotebooksUsadas.csv")

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 1000 non-null   int64  
 1   product_name       1000 non-null   object 
 2   price              1000 non-null   float64
 3   quantity_in_stock  1000 non-null   int64  
 4   technical          1000 non-null   object 
 5   category           1000 non-null   object 
 6   release_date       1000 non-null   object 
 7   weight             1000 non-null   float64
 8   ask?               1000 non-null   bool   
dtypes: bool(1), float64(2), int64(2), object(4)
memory usage: 63.6+ KB


In [None]:
data.head()

Unnamed: 0,id,product_name,price,quantity_in_stock,technical,category,release_date,weight,ask?
0,1,Notebook Dell,853.27,513,Skaboo,Notebooks,3/26/2024,1169.11,False
1,2,Notebook Lenovo,1281.22,359,Photobug,Notebooks,2/26/2024,763.87,False
2,3,Notebook Acer,996.25,250,Wordpedia,Notebooks,1/28/2024,1141.1,True
3,4,Notebook Acer,1082.01,138,Feedbug,Notebooks,1/4/2024,1088.38,False
4,5,Notebook Dell,1198.5,416,Lajo,Notebooks,3/23/2024,1037.95,True


In [None]:
data['release_date'] = pd.to_datetime(data['release_date'])
data.drop(columns=["id", "category"], inplace=True)

In [None]:
data.head()

Unnamed: 0,product_name,price,quantity_in_stock,technical,release_date,weight,ask?
0,Notebook Dell,853.27,513,Skaboo,2024-03-26,1169.11,False
1,Notebook Lenovo,1281.22,359,Photobug,2024-02-26,763.87,False
2,Notebook Acer,996.25,250,Wordpedia,2024-01-28,1141.1,True
3,Notebook Acer,1082.01,138,Feedbug,2024-01-04,1088.38,False
4,Notebook Dell,1198.5,416,Lajo,2024-03-23,1037.95,True


In [None]:
target_name = "ask?"
y = data[target_name]
X = data.drop(columns=[target_name])

In [None]:
X.head()

Unnamed: 0,product_name,price,quantity_in_stock,technical,release_date,weight
0,Notebook Dell,853.27,513,Skaboo,2024-03-26,1169.11
1,Notebook Lenovo,1281.22,359,Photobug,2024-02-26,763.87
2,Notebook Acer,996.25,250,Wordpedia,2024-01-28,1141.1
3,Notebook Acer,1082.01,138,Feedbug,2024-01-04,1088.38
4,Notebook Dell,1198.5,416,Lajo,2024-03-23,1037.95


In [None]:
y.head()

0    False
1    False
2     True
3    False
4     True
Name: ask?, dtype: bool

 # **<font color="DarkBlue">Selección basada en tipos de datos</font>**

<p align="justify">
👀 Separaremos variables categóricas y numéricas usando sus tipos de datos para identificarlas, ya que vimos anteriormente que objeto corresponde a las columnas categóricas (cadenas de caracteres). Hacemos uso del <code>make_column_selector</code> para seleccionar las columnas correspondientes.
</p>


In [None]:
from sklearn.compose import make_column_selector as selector

<p align="justify">
👀 En el selector de las columnas numericas excluimos los tipos de datos <code>object</code> porque podemos tener numeros enteros o numeros decimales.
</p>


In [None]:
numerical_columns_selector = selector(dtype_exclude=object)
categorical_columns_selector = selector(dtype_include=object)

In [None]:
numerical_columns = numerical_columns_selector(X)
numerical_columns.remove('release_date')
categorical_columns = categorical_columns_selector(X)

In [None]:
numerical_columns

['price', 'quantity_in_stock', 'weight']

In [None]:
categorical_columns

['product_name', 'technical']

 # **<font color="DarkBlue">Enviar columnas a un procesador específico</font>**

In [None]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [None]:
categorical_preprocessor = OneHotEncoder(handle_unknown="ignore")
numerical_preprocessor = StandardScaler()

<p align="justify">
👀 Ahora, creamos el transformador y asociamos cada uno de estos preprocesadores con sus respectivas columnas.
</p>

https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

In [None]:
from sklearn.compose import ColumnTransformer

In [None]:
preprocessor = ColumnTransformer([
    ('one-hot-encoder', categorical_preprocessor, categorical_columns),
    ('standard_scaler', numerical_preprocessor, numerical_columns)])

In [None]:
preprocessor

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

In [None]:
model = make_pipeline(preprocessor, LogisticRegression())
model

In [None]:
model.named_steps

{'columntransformer': ColumnTransformer(transformers=[('one-hot-encoder',
                                  OneHotEncoder(handle_unknown='ignore'),
                                  ['product_name', 'technical']),
                                 ('standard_scaler', StandardScaler(),
                                  ['price', 'quantity_in_stock', 'weight'])]),
 'logisticregression': LogisticRegression()}

 # **<font color="DarkBlue">Train-test, división del conjunto de datos</font>**

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [None]:
_ = model.fit(X_train, y_train)

 # **<font color="DarkBlue">Ajuste y prediccion</font>**

In [None]:
X_test.head()

Unnamed: 0,product_name,price,quantity_in_stock,technical,release_date,weight
521,Notebook Lenovo,1201.46,820,Photobug,2024-01-19,915.95
737,Notebook Lenovo,1290.87,607,Feedfish,2024-03-16,941.11
740,MacBook Pro,1203.8,915,Topicstorm,2024-02-18,1029.66
660,Notebook Lenovo,1405.31,572,Yamia,2024-03-23,841.3
411,Notebook HP,850.6,87,Plajo,2024-02-20,735.31


In [None]:
model.predict(X_test)[:10]

array([ True,  True,  True,  True, False,  True, False,  True, False,
       False])

In [None]:
y_test[:10]

521    False
737     True
740     True
660    False
411    False
678    False
626     True
513     True
859     True
136     True
Name: ask?, dtype: bool

In [None]:
model.score(X_test, y_test).round(4)

0.5

 # **<font color="DarkBlue">Evaluación del modelo con Cross-validation</font>**

<p align="justify">
👀 Un modelo predictivo puede ser evaluado con validación cruzada....
</p>


In [None]:
from sklearn.model_selection import cross_validate

In [None]:
cv_results = cross_validate(model, X, y, cv=5)
cv_results

{'fit_time': array([0.03356218, 0.02769971, 0.03832102, 0.02633619, 0.02521348]),
 'score_time': array([0.01617241, 0.00840592, 0.00809884, 0.00878143, 0.00813293]),
 'test_score': array([0.495, 0.42 , 0.53 , 0.51 , 0.49 ])}

In [None]:
scores = cv_results["test_score"]
print("")
print("The mean cross-validation accuracy is: "
      f"{scores.mean():.3f} ± {scores.std():.3f}")


The mean cross-validation accuracy is: 0.489 ± 0.037


 # **<font color="DarkBlue">¿Conclusiones?...</font>**

<p align="justify">
👀 En este colab nosotros:<br>
<br>
✅ Cargamos los datos de un archivo <code>CSV</code> usando <code>Pandas</code>.
<br>
✅ Se plantea el caso ¿Estará bien planteado?.
<br>
✅ Se usó un <code>ColumnTransformer</code> para  variables categóricas y numéricas.
<br>
✅ Se usó un Pipeline para encadenar el preprocesamiento de <code>ColumnTransformer</code>.
<br>
✅ Según sus criterios, ¿qué falla en este modelo?.
<br>




<br>
<br>
<p align="center"><b>
💗
<font color="DarkBlue">
Hemos llegado al final de nuestro colab, a seguir codeando...
</font>
</p>
