### Basics
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python.
- The library is focused on modeling data. It is not focused on loading, manipulating and summarizing data.

Some popular groups of models provided by scikit-learn include:

- Clustering: for grouping unlabeled data such as KMeans.
- Cross Validation: for estimating the performance of supervised models on unseen data.
- Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
- Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis.
- Ensemble methods: for combining the predictions of multiple supervised models.
- Feature extraction: for defining attributes in image and text data.
- Feature selection: for identifying meaningful attributes from which to create supervised models.
- Parameter Tuning: for getting the most out of supervised models.
- Manifold Learning: For summarizing and depicting complex multi-dimensional data.
- Supervised Models: a vast array not limited to generalized linear models, discriminate analysis, naive bayes, lazy methods, neural networks, support vector machines and decision trees.

![Image of Yaktocat](http://1.bp.blogspot.com/-ME24ePzpzIM/UQLWTwurfXI/AAAAAAAAANw/W3EETIroA80/s1600/drop_shadows_background.png)


### Consistent APIs

Algorithms are implemented with the same core functions:

- fit = train an algorithm
- predict = predict the value for a given record
- predict_proba = predict the probability of all possible classes for a given record (classification only)
- transform = alter your data based on a given preprocessor (i.e. normalize or scale your data) (preprocessing/unsuperivsed)
- fit_transform = train a preprocessor and then transform the data in a single step (preprocessing/unsuperivsed)



##### Outline of sklearn models:

The basic outline of a sklearn model is given by the following pseudocode.

```
        input = labeled data
        X_train = input.features
        Y_train = input.target
        X_test =  test data set
        Y_test = test data set
        
        algorithm = sklearn.ClassImplementingTheAlgorithm(parameters of the algorithm)
        fitting = algorithm.fit(X_train, Y_train)
        prediction = algorithm.predict(X_test)
       ```

In [6]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data  #input features
Y = iris.target #input target

from sklearn import cross_validation
# Generating test data
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X,Y, test_size=0.4)

##KNN Model from scikit. basic implementation
from sklearn.neighbors import KNeighborsClassifier
algorithm = KNeighborsClassifier(n_neighbors=5) #algorithm
fitting = algorithm.fit(X_train, Y_train) #fit
prediction = algorithm.predict(X_test) #preedict


# Linear regression to it
from sklearn.linear_model import LinearRegression
algorithm2 = LinearRegression(fit_intercept=True)
algorithm2.fit(X_train, Y_train)
Y_test = algorithm2.predict(X_test)