# Introduction
- 3 API's are used for loading dataset
    - Loaders (`load_*`) for loading dataset in sklearn
    - Fetchers (fetch_*) to fetch large dataset from outside
    - Generators (generate_*) to generate dataset

### Loading Iris Dataset

In [5]:
from sklearn.datasets import load_iris
data = load_iris()

- `data` has feature matrix
- `target` has label vector
- `feature_names` contain name of features
- `target_names` contain name of targets
- `DESCR` has full description of dataset
- `filename` has path to the file

In [6]:
type(data)

sklearn.utils._bunch.Bunch

In [7]:
data.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [8]:
data.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [19]:
data.data[:5,]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [20]:
data.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [21]:
?load_iris

[0;31mSignature:[0m [0mload_iris[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mreturn_X_y[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mas_frame[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Load and return the iris dataset (classification).

The iris dataset is a classic and very easy multi-class classification
dataset.

Classes                          3
Samples per class               50
Samples total                  150
Dimensionality                   4
Features            real, positive

Read more in the :ref:`User Guide <iris_dataset>`.

.. versionchanged:: 0.20
    Fixed two wrong data points according to Fisher's paper.
    The new version is the same as in R, but not as in the UCI
    Machine Learning Repository.

Parameters
----------
return_X_y : bool, default=False
    If True, returns ``(data, target)`` instead of a Bunch object. See
    below for more information about the `data` and `target` object.

    .. versionadded:

### Loading Diabetes Dataset

In [38]:
from sklearn.datasets import load_diabetes
data = load_diabetes()

In [39]:
?load_diabetes

[0;31mSignature:[0m [0mload_diabetes[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mreturn_X_y[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mas_frame[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mscaled[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Load and return the diabetes dataset (regression).

Samples total    442
Dimensionality   10
Features         real, -.2 < x < .2
Targets          integer 25 - 346

.. note::
   The meaning of each feature (i.e. `feature_names`) might be unclear
   (especially for `ltg`) as the documentation of the original dataset is
   not explicit. We provide information that seems correct in regard with
   the scientific literature in this field of research.

Read more in the :ref:`User Guide <diabetes_dataset>`.

Parameters
----------
return_X_y : bool, default=False
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.



In [40]:
data.data[:5]

array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187239, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, -0.02632753, -0.00844872,
        -0.01916334,  0.07441156, -0.03949338, -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, -0.00567042, -0.04559945,
        -0.03419447, -0.03235593, -0.00259226,  0.00286131, -0.02593034],
       [-0.08906294, -0.04464164, -0.01159501, -0.03665608,  0.01219057,
         0.02499059, -0.03603757,  0.03430886,  0.02268774, -0.00936191],
       [ 0.00538306, -0.04464164, -0.03638469,  0.02187239,  0.00393485,
         0.01559614,  0.00814208, -0.00259226, -0.03198764, -0.04664087]])

In [43]:
data.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

## Generators

In [46]:
from sklearn.datasets import make_regression
?make_regression

[0;31mSignature:[0m
[0mmake_regression[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mn_samples[0m[0;34m=[0m[0;36m100[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_features[0m[0;34m=[0m[0;36m100[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_informative[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_targets[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbias[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0meffective_rank[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtail_strength[0m[0;34m=[0m[0;36m0.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnoise[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mshuffle[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcoef[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrandom_state[0m[0;34m=[0m[0;32mNone[0

### Let's generate 100 samples with 5 features for a single label regression problem

In [47]:
X, y = make_regression(n_samples = 100, n_features = 5, n_targets=1, shuffle= True, random_state=42)

In [49]:
print(X)

[[-0.93782504  0.51504769  0.51503527  3.85273149  0.51378595]
 [ 1.0889506  -0.71530371  0.06428002  0.67959775 -1.07774478]
 [-0.60170661 -1.05771093  1.85227818  0.82254491 -0.01349722]
 [ 0.8219025   0.09176078  0.08704707 -1.98756891 -0.29900735]
 [ 1.54993441  0.81351722 -0.78325329 -1.23086432 -0.32206152]
 [-0.00797264 -0.8612842   1.47994414  1.52312408  0.07736831]
 [-0.89841467  1.83145877  0.49191917  1.17944012 -1.32023321]
 [-0.71435142 -1.1913035   1.86577451  0.65655361  0.47383292]
 [-1.55066343  0.47359243  0.06856297 -0.91942423 -1.06230371]
 [ 0.78182287  0.52194157 -1.23695071  0.29698467 -1.32045661]
 [-1.42225371  1.68714164 -0.64657288  0.88163976 -1.081548  ]
 [ 0.31090757 -0.15993853  1.47535622 -0.01901621  0.85765962]
 [-0.40122047  0.0976761   0.22409248 -0.77300978  0.0125924 ]
 [ 1.36863156  1.05842449 -0.96492346 -1.75873949  0.68605146]
 [ 0.51934651  0.40171172  1.53273891  0.69014399 -0.10876015]
 [-0.44651495 -1.24573878  0.85639879  0.17318093  0.21