![``sklearn model select``](dataset/sklearn_map.png)

In [1]:
import sklearn as sk

In [2]:
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import tensorflow as tf

# **`机器学习基本流程`**

**`3 Part`**
+ 数据准备与预处理
+ 模型选择与训练
+ 模型验证与参数调优
--------------
+ **`特征工程`**: 数据清洗, 数据标准化, 特征选取, 特征降维
+ **`模型选取`**: 超参数确定
+ **`模型验证`**: 利用各种不同指标对模型性能进行检验

**`6 Steps`**
+ 首先应该加载训练模型所用的数据集
+ 采用合适的比例将数据集划分为训练集和测试
+ 选取合适或者创建合适的训练模型
+ 将训练集中的数据输入到模型中进行训练
+ 通过第四步的训练大致确定模型所用的合理参数
+ 将测试集中的数据输入到模型中，根据模型得到的结果和真实的结果进行比较再次调整参数

---------------

**`Scikit-leanr level`**
+ `调用`：知道算法的基本思想，能应用现有的库来做測试。简单说，就是了解kNN是做什么的，会调用sklearn中的kNN算法。
+ `调參`：知道算法的主要影响參数，能进行參数调节优化。
+ `嚼透`：理解算法的实现细节，而且能用代码实现出来。

--------------------

# Sklearn six parts

**`分类`**
+ 识别某个对象属于哪个类
+ 应用:垃圾邮件检测, 图像识别
+ 算法:SVM, nearest neighbors, random forest

**`回归`**
+ 预测与对象相关联的连续值属性
+ 应用:药物反应, 股价
+ 算法:SVR, ridge regression, Lasso

**`聚类`**
+ 将相似对象自动分组
+ 应用:客户细分, 分组实验结果
+ 算法: k-means, spectral clustering, mean-shirt

**`降维`**
+ 减少要考虑的随机变量的数量
+ 应用:可视化, 提高效率
+ 算法: PCA, feature selection, non-negative matrix factorization

**`模型选择`**
+ 比较, 验证, 选择参数和模型
+ 目标: 通过参数调整提高精度
+ 模型: grid search, cross validation, metrics

**`预处理`**
+ 特征提取和归一化
+ 应用: 把输入数据(如本文)转换为机器学习算法可用的数据
+ preprocessing, feature extraction

------------------

# **`API Reference`**

**`sklearn.calibration`**: Probability Calibration / 概率校准
+ Calibration of predicted probabilities.

**`sklearn.cluster`**: Clustering / 聚类
+ The sklearn.cluster module gathers popular unsupervised clustering algorithms.

**`sklearn.datasets`**: Datasets / 数据集
+ The sklearn.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

**`sklearn.exceptions`**: Exceptions and warnings / 异常和警告
+ The sklearn.exceptions module includes all custom warnings and error classes used across scikit-learn.

**`sklearn.feature_extraction`**: Feature Extraction / 特征提取
+ The sklearn.feature_extraction module deals with feature extraction from raw data. It currently includes methods to extract features from text and images.

**`sklearn.linear_model`**: Generalized Linear Models / 广义线性模型
+ The sklearn.linear_model module implements generalized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent. It also implements Stochastic Gradient Descent related algorithms.

**`sklearn.metrics`**: Metrics / 指标
+ The sklearn.metrics module includes score functions, performance metrics and pairwise metrics and distance computations.

**`sklearn.model_selection`**: Model Selection / 模型选择

**`sklearn.multiclass`**: Multiclass and multilabel classification / 多类别和多标签分类

This module implements multiclass learning algorithms:

        one-vs-the-rest / one-vs-all
        one-vs-one
        error correcting output codes

**`sklearn.naive_bayes`**: Naive Bayes / 朴素贝叶斯
+ The sklearn.naive_bayes module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.

**`sklearn.neighbors`**: Nearest Neighbors / 最近邻
+ The sklearn.neighbors module implements the k-nearest neighbors algorithm.

**`sklearn.neural_network`**: Neural network models / 神经网络模型
+ The sklearn.neural_network module includes models based on neural networks.

**`sklearn.pipeline`**: Pipeline / 管道线
+ The sklearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms and estimators.

**`sklearn.preprocessing`**: Preprocessing and Normalization / 预处理和正则化
+ The sklearn.preprocessing module includes scaling, centering, normalization, binarization and imputation methods.

**`sklearn.semi_supervised`**:Semi-Supervised Learning / 半监督学习
+ The sklearn.semi_supervised module implements semi-supervised learning algorithms. These algorithms utilized small amounts of labeled data and large amounts of unlabeled data for classification tasks. This module includes Label Propagation.

**`sklearn.svm`**: Support Vector Machines / 支持向量机
+ The sklearn.svm module includes Support Vector Machine algorithms.

**`sklearn.tree`**: Decision Trees / 决策树
+ The sklearn.tree module includes decision tree-based models for classification and regression.

---------

**`sklearn.cluster.bicluster`**: Biclustering / 双向聚类
+ Spectral biclustering algorithms.

**`sklearn.covariance`**: Covariance Estimators / 协方差估计
+ The sklearn.covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

**`sklearn.cross_decomposition`**: Cross decomposition / 交叉分解

**`sklearn.decomposition`**: Matrix Decomposition / 矩阵分解
+ The sklearn.decomposition module includes matrix decomposition algorithms, including among others PCA, NMF or ICA. Most of the algorithms of this module can be regarded as dimensionality reduction techniques.

**`sklearn.discriminant_analysis`**: Discriminant Analysis / 判别分析
+ Linear Discriminant Analysis and Quadratic Discriminant Analysis

**`sklearn.dummy`**: Dummy estimators / 虚拟估计器

**`sklearn.ensemble`**: Ensemble Methods / 集成方法
+ The sklearn.ensemble module includes ensemble-based methods for classification, regression and anomaly detection.

**`sklearn.gaussian_process`**: Gaussian Processes / 高斯过程
+ The sklearn.gaussian_process module implements Gaussian Process based regression and classification.

**`sklearn.isotonic`**: Isotonic regression / 保序回归

**`sklearn.kernel_approximation`**: Kernel Approximation / 核近似
+ The sklearn.kernel_approximation module implements several approximate kernel feature maps base on Fourier transforms.

**`sklearn.kernel_ridge`**: Kernel Ridge Regression / 内核岭回归
+ Module sklearn.kernel_ridge implements kernel ridge regression.

**`sklearn.manifold`**: Manifold Learning / 集成学习
+ The sklearn.manifold module implements data embedding techniques.

**`sklearn.mixture`**: Gaussian Mixture Models / 高斯混合模型
+ The sklearn.mixture module implements mixture modeling algorithms.

**`sklearn.multioutput`**: Multioutput regression and classification / 多输出回归和分类
+ This module implements multioutput regression and classification.

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. The meta-estimator extends single output estimators to multioutput estimators.

**`sklearn.utils`**: Utilities / 效率
+ The sklearn.utils module includes various utilities.

In [None]:
[````]()

**[``sklearn.linear_model.LinearRegression``](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)**(fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)
+ Parameters :
    + `fit_intercept` : boolean, optional, default True
        + whether to calculate the intercept for this model.
    + `normalize` : boolean, optional, default False
        + This parameter is ignored when fit_intercept is set to False.
        + If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.
        + If you wish to standardize, please use `sklearn.preprocessing.StandardScaler` before calling fit on an estimator with normalize=False.
+ Attributes :
    + `coef_` : array, shape (n_features, ) or (n_targets, n_features)
        + Estimated coefficients/系数 for the linear regression problem.
    + `intercept_` : array / 截距
        + Independent term in the linear model.

**`Method`**

+ **`fit`**(X, y, sample_weight=None) / Fit linear model.
    + Parameters :
        + X : (n_samples, n_features)
        + y : (n_samples, n_targets)
    + Returns :
        + self : returns an instance of self.

+ **`predict`**(X) / Predict using the linear model
    + Parameters :
        + X : (n_samples, n_features)
    + Returns :
        + C : array, shape (n_samples,)

+ **`score`**(X, y, sample_weight=None) / Returns the coefficient of determination R^2 of the prediction.
    + Parameters :
        + X : array-like, shape = (n_samples, n_features)
        + y : array-like, shape = (n_samples) or (n_samples, n_outputs)
    + Returns :
        + score : float
            + R^2 of self.predict(X) wrt. y.

+ **`get_params`**(deep=True) / Get parameters for this estimator.
    + Parameters :
        + deep : boolean, optional
            + params : mapping of string to any
    + Returns :
        + params : mapping of string to any

-> 多项式回归 sklearn.preprocessing.PolynomialFeature