<h1>Machine Learning with Python</h1>
<p>
    We will be using the <strong>Scikit Learn</strong> package.
    <br>It's the most popular machine learning package for Python and has a lot of algorithms built-in!
    <br>You'll need to install it using:
    <br><code>conda install scikit-learn</code>
    <br>or
    <br><code>pip install scikit-learn</code>
</p>

<p>
    Let's talk about the basic structure of how to use Scikit Learn!
    <br>First, a quick review of the machine learning process
    <ol>
        <li>Data Acquisition</li>
        <li>Data Cleaning</li>
        <li>Model Training & Building | Test Data</li>
        <li>Model Testing (can return to Model Training & Building)</li>
        <li>Model Deployment</li>
    </ol>
    <br>Now let's go over example of the process to use SciKit Learn. Don't worry about memorizing any of this, we'll get plenty of practice and review when we actually start coding in subsequent lectures!
</p>

<p>
    Every algorithm is exposed in scikit-learn via an "Estimator", first you'll import the model, the general form is:
    <br><code>from sklearn.family import Model</code>
    <br>For example
    <br><code>from sklearn.linear_model import LinearRegression</code>
    <br><br><strong>Estimator parameters:</strong>
    <br>All the parameters of an estimator can be set when it is instantiated, and have suitable default values
    <br>You can use <code>Shift + Tab</code> in jupyter to check the possible parameters
    <br><strong>For example:</strong>
    <br><code>model = LinearRegression(normalize=True)
        print(model)</code>
    <br><br><code>LinearRegression(copy_X=True, fit_intercept=True, normalize=True)</code>
</p>

<p>
    Once you have your model created with your parameters, it is time to fit your model on some data!
    <br>But remember, we should split this data into a training set and a test set
    <br><br>
    <code>
        >>> import numpy as np
        >>> from sklearn.cross_validation import train_test_split<br>
        >>> X, y = np.arange(10).reshape((5, 2)), reange(5)
        >>> X
        array([
            [0, 1],
            [2, 3],
            [4, 5],
            [6, 7],
            [8, 9]
        ])
        >>> list(y)
        [0, 1, 2, 3, 4]</code>
    <br><br>
    <code>
        >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
        >>> X_train
        array([
            [4, 5],
            [0, 1],
            [6, 7]
        ])
        >>> y_train
        [2, 0, 3]
        >>> X_test
        array([
            [2, 3],
            [8, 9]
        ])
        >>> y_test([
            [1, 4]
        ])</code>
</p>

<p>
    Now that we have split the data, we can train/fit our model on the training data
    <br>This is done through the model.fit() method:
    <br><code>model.fit(X_train, y_train)</code>
    <br>Now the model has been fit and trained on the training data. The model is ready to predict labels or values on the test set!
</p>

<p>
    We get predicted values using the predict method:
    <br><code>predictions = model.predict(X_test)</code>
    <br>We can then evaluate our model by comparing our predictions to the correct values
    <br>The evaluation method depends on what sort of machine learning algorithm we are using
    <br>E. G. Regression, Classification, Clustering, etc.
    <br>
</p>

<p>
    Let's get a quick recap!
    <br><br>Scikit-learn strives to have a uniform interface across all methods, and we'll see examples of these below
    <br>Given a scikit-learn estimator object named model, the following methods are available...
    <br>Available in <strong>all Estimators</strong>
    <li>model.fit(): fit training data</li>
    <li>For supervised learning applications, this accepts two arguments: the data X and the labels y (e. g. model.fit(X, y))</li>
    <li>For unsupervised learning applications, this accepts only a single argument, the data X (e. g. model.fit(X))</li>
    <br>
    <br>Available in <strong>supervised estimators</strong>
    <li>model.predict(): given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e. g. model.predict(X_new)), and returns the learned label for each object in the array</li>
    <li>model.predict_proba(): For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returnes by model.predict()</li>
    <li>model.score(): for classification or regression problems, most estimators implement a score method. Scores are between 0 and 1, with a larger score indicating a better fit</li>
    <br>
    <br>Available in <strong>unsupervised estimators</strong>
    <li>model.predict(): predict labels in clustering algorithms</li>
    <li>model.transform(): given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model</li>
    <li>model.fit_transform(): some estimators implement this method, wich more efficiently performs a fit and a transform on the same input data</li>
</p>