<a href="https://colab.research.google.com/github/dornercr/INFO371/blob/main/INFO371_Loading_Toy_Datasets_From_SKlearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 🔹 Listing Toy Datasets in scikit-learn

This code lists all functions in `sklearn.datasets` that begin with `load_`. These represent built-in toy datasets such as iris, wine, and digits, which can be used for quick machine learning practice.


In [1]:
import sklearn.datasets as ds

# List all built-in loader functions
print([fn for fn in dir(ds) if fn.startswith("load_")])


['load_breast_cancer', 'load_diabetes', 'load_digits', 'load_files', 'load_iris', 'load_linnerud', 'load_sample_image', 'load_sample_images', 'load_svmlight_file', 'load_svmlight_files', 'load_wine']


### 🔹 Loading the Iris Dataset

This code demonstrates how to load the Iris flower dataset. It shows the feature names (input variables), target names (class labels), and the shape of the data and target arrays. The last two lines preview the first five samples and their corresponding class labels.



In [2]:
#load the toy data set
from sklearn.datasets import load_iris

# Load
iris = load_iris()

# Inspect
print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)
print("Data shape:", iris.data.shape)
print("Target shape:", iris.target.shape)
print("First 5 samples:\n", iris.data[:5])
print("First 5 targets:", iris.target[:5])


Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']
Data shape: (150, 4)
Target shape: (150,)
First 5 samples:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
First 5 targets: [0 0 0 0 0]


### 🔹 General Structure to Load Any Dataset

This code serves as a template to load and inspect any scikit-learn toy dataset. Replace `<dataset>` with names like `iris`, `wine`, `digits`, `diabetes`, or `breast_cancer`.


In [3]:
from sklearn.datasets import (
    load_iris, load_wine, load_breast_cancer,
    load_digits, load_diabetes, load_linnerud
)

datasets = {
    "Iris": load_iris(),
    "Wine": load_wine(),
    "Breast Cancer": load_breast_cancer(),
    "Digits": load_digits(),
    "Diabetes": load_diabetes(),
    "Linnerud": load_linnerud()
}

for name, data in datasets.items():
    print(f"\n=== {name} Dataset ===")
    print("Short Description:", data['DESCR'].split('\n')[0])
    print("Feature Names:", data.feature_names if 'feature_names' in data else 'Not available')
    print("Data Shape:", data.data.shape)
    print("Target Shape:", data.target.shape)
    print("Target Classes:", data.target_names if 'target_names' in data else 'Regression values')



=== Iris Dataset ===
Short Description: .. _iris_dataset:
Feature Names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Data Shape: (150, 4)
Target Shape: (150,)
Target Classes: ['setosa' 'versicolor' 'virginica']

=== Wine Dataset ===
Short Description: .. _wine_dataset:
Feature Names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Data Shape: (178, 13)
Target Shape: (178,)
Target Classes: ['class_0' 'class_1' 'class_2']

=== Breast Cancer Dataset ===
Short Description: .. _breast_cancer_dataset:
Feature Names: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' '