# Dataset: Yellow Submarine
----

Cada fila representa un hongo, cada columna contiene atributos de este.

La columna *classes* indica si el hongo es venenoso o no.

hongos | atributos
:----:   | :----:
8124     | 23


# Variables

Nombre   |  Posibles valores
:------- |  :---
cap-shape   | (bell, conical, convex, flat, knobbed, sunken)
cap-surface | (fibrous, grooves, scaly, smooth)
cap-color | (brown, buff, cinnamon, gray, green, pink, purple, red, white, yellow)
bruises | (bruises, no)
odor | (almond, anise, creosote, fishy, foul, musty, none, pungent, spicy)
gill-attachment | (attached, descending, free, notched)
gill-spacing | (close, crowded, distant)
gill-size | (broad, narrow)
gill-color | (black, brown, buff, chocolate, gray, green, orange, pink, purple, red, white, yellow)
stalk-shape | (enlarging, tapering)
stalk-root | (bulbous, club, cup, equal, rhizomorphs, rooted, missing)
stalk-surface-above-ring | (fibrous, scaly, silky, smooth)
stalk-surface-below-ring | (fibrous, scaly, silky, smooth)
stalk-color-above-ring | (brown, buff, cinnamon, gray, orange, pink, red, white, yellow)
stalk-color-below-ring | (brown, buff, cinnamon, gray, orange, pink, red, white, yellow)
veil-type | (partial, universal)
veil-color | (brown, orange, white, yellow)
ring-number | (none, one, two)
ring-type | (cobwebby, evanescent, flaring, large, none, pendant, sheathing, zone)
spore-print-color | (black, brown, buff, chocolate, green, orange, purple, white, yellow)
population | (abundant, clustered, numerous, scattered ,several, solitary)
habitat | (grasses, leaves, meadows, paths, urban, waste, woods)
classes   |  (edible, poisonous)

In [4]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

# Cargar los datos
data = pd.read_csv('Yellow_Submarine.csv')

In [5]:
# Verificar valores faltantes
missing_values = data.isnull().sum()
print("Valores faltantes en cada columna:")
print(missing_values)

Valores faltantes en cada columna:
class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64


In [7]:
  # Verificar si hay casos en los que "ring-number" tenga "none"
none_ring_number = data[data['ring-number'] == 'none']

# Mostrar los resultados
print(none_ring_number)

        class cap-shape cap-surface cap-color bruises   odor gill-attachment  \
6415  poisson    convex       scaly       red      no  musty            free   
6668  poisson   knobbed       scaly  cinnamon      no  musty        attached   
6855  poisson   knobbed       scaly     brown      no  musty            free   
6945  poisson      flat       scaly       red      no  musty        attached   
6991  poisson   knobbed       scaly       red      no  musty            free   
7034  poisson   knobbed       scaly  cinnamon      no  musty            free   
7065  poisson    convex       scaly     brown      no  musty        attached   
7091  poisson      flat       scaly     brown      no  musty            free   
7100  poisson    convex       scaly  cinnamon      no  musty        attached   
7111  poisson   knobbed       scaly     brown      no  musty        attached   
7146  poisson      flat       scaly  cinnamon      no  musty            free   
7166  poisson      flat       scaly     

Existen valores de ring-number con "none" y a su vez son consistentes con el parecen ser consistentes con el hecho de que si de da tal caso, entonces en "ring-type" habrá tamb "none" porque no hay anillo.

In [6]:
# Verificar si hay casos en los que "gill-attachment" no sea "free"
non_free_gill_attachment = data[data['gill-attachment'] != 'free']

# Mostrar los resultados
print(non_free_gill_attachment)

       class cap-shape cap-surface cap-color bruises  odor gill-attachment  \
6038  edible      bell      smooth     brown      no  none        attached   
6040  edible    convex      smooth     brown      no  none        attached   
6375  edible      bell      smooth     brown      no  none        attached   
6424  edible    convex      smooth     brown      no  none        attached   
6434  edible    convex      smooth     brown      no  none        attached   
...      ...       ...         ...       ...     ...   ...             ...   
8115  edible    convex      smooth     brown      no  none        attached   
8119  edible   knobbed      smooth     brown      no  none        attached   
8120  edible    convex      smooth     brown      no  none        attached   
8121  edible      flat      smooth     brown      no  none        attached   
8123  edible    convex      smooth     brown      no  none        attached   

     gill-spacing gill-size gill-color  ... stalk-surface-below

In [None]:
# Convertir todas las categorías a minúsculas
for col in data.select_dtypes(include=['object']).columns:
    data[col] = data[col].str.lower()

In [2]:
# Codificación de Variables Categóricas (One-Hot Encoding)
data_encoded = pd.get_dummies(data, columns=data.select_dtypes(include=['object']).columns)

NameError: name 'pd' is not defined

In [None]:
# Separar características y etiqueta
X = data_encoded.drop(columns=['class_edible', 'class_poisson'])
y = data_encoded[['class_edible', 'class_poisson']]

In [None]:
# Mostrar la estructura del dataset preprocesado
print("Dataset preprocesado:")
print(X.head())
print(y.head())

Dataset preprocesado:
   cap-shape_bell  cap-shape_conical  cap-shape_convex  cap-shape_flat  \
0           False              False              True           False   
1           False              False              True           False   
2            True              False             False           False   
3           False              False              True           False   
4           False              False              True           False   

   cap-shape_knobbed  cap-shape_sunken  cap-surface_fibrous  \
0              False             False                False   
1              False             False                False   
2              False             False                False   
3              False             False                False   
4              False             False                False   

   cap-surface_grooves  cap-surface_scaly  cap-surface_smooth  ...  \
0                False              False                True  ...   
1             