In [1]:
# INTRODUCTION TO SUPERVISED LEARNING
# In earlier lessons, we showed the importance of measuring risk when making financial decisions.
# We used risk metrics and analytics to make decisions about the market, investments, and even retirement portfolios.
# Financial risk is a serious concern in other areas of finance too, such as credit, lending, and insurance.
# Because many factors relate to risk, we need better tools to consider various components for making predictions about the future.
# In face, FinTech companies have moved beyond traditional risk analytics and are starting to use machine learning to model and predict risk.
# In this lesson, you'll learn about the field of machine learning called SUPERVISED LEARNING.
# Supervised Learning trains an algorithm to learn based on a labeled dataset, where each item in the dataset is tagged with the answer.
# This provides an answer key that you can use to evaluate the accuracy of the training data.
# Supervised learning can consider multiple factors and known past outcomes to make predictions about future outcomes, such as financial risk.
# By the end of this lesson, you'll be able to define supervised learning and explain how FinTech can use it.
# You'll also compare and contrast regression and classification.
# Finally, you'll describe the model-fit-predict patter that is used to create, train, and use machine learning model.

In [2]:
# WHAT IS SUPERVISED LEARNING?
# We typically divide machine learning into three main categories:
    # 1. Supervised Learning
    # 2. Unsupervised Learning
    # 3. Reinforcement Learning
# As previously explained, we can use unsupervised learning for knowledge discovery and clustering.
# By contrast, supervised learning can learn from the data and the expected outcomes that you choose to feed into it.
# If you supply the data and the expected outcome together, the model can learn how to make predictions for new pieces of data that have similar features.
# You SUPERVISE the model's learning by feeding it carefully selected data with known outcomes that the model can use to make the most accurate predictions that are possible.
# For example, assume that you have a dataset of high risk vs low risk loans and the factors that led to those results.
# You can use that information to improve the model's predictive capability.
# That is, models can oftern learn from their mistakes.
# If a model's prediction is slightly off, it can adjust itself to become even better the next time that it gets that data.
# And once trained, a supervised learning model can predict the outcome of a new piece of data, such as a new consumer loan.

In [3]:
# REGRESSION AND CLASSIFICATION
# While we often broadly divide machine learning into supervised learning, unsupervised learning, and reinforcement learning, we can further divide supervised learning into two types of algorithms:
    # 1. Regression
    # 2. Classification

In [4]:
# REGRESSION
# We use REGRESSION algorithms to model and predict continuous variables.
# For example, say that we want to predict a person's weight.
# Weight is a continuous variable, because it can be any number.
# We can use regression to predict a person's weight based on factors like height, age, and exercise duration.
# In finance, we can use regression to predict prices, dividends, rates, or any other continuous variables.

In [5]:
# CLASSIFICATION
# Conversely, we use CLASSIFICATION algorithms to predict discrete outcomes.
# For example, say that we want to use a person's traits, such as age, income, and geographic location to predict how the person will vot on a particular issue.
# The outcome is finite, with two possibilities in this case - whether the person will vote Yes or No.
# The classification model wll try to learn patterns form the data and, if successful, gain the ability to make accurate predictions for new voters.
# In finance, we can use classification to predict any discrete outcome, such as:
    # 1. Buy vs. sell
    # 2. High risk vs. low risk
    # 3. Fraud vs. not fraud

In [6]:
# REGRESSION VS CLASSIFICATION
# Let's visualize the distinction between regression and classification.
# Regression is defining the general trend that works for the whole dataset.
# With classification, we want to classify the dataset into distinct groups.

In [7]:
# MODEL-FIT-PREDICT PIPELINE
# Regardless of whether we use a regression or classification model, we create most supervised learning models by following a basic pattern: Model-fit-predict.
# In this three-stage pattern, we present a machine learning algorithm with data (model stage).
# The algorithm learns form this data (the fit stage).
# This forms a predictive model (the predict stage).
# A PREDICTIVE MODEL is simply the resulting model, where the algorithm has mathematically adjusted itself so that it can tarnslate a new set of inputs to the correct output.

In [9]:
# MODEL
# A machine learning model mathematically represents something in the real world.
# A model starts as untrained - that is, we haven't yet adjusted it to make sense of the data.
# You can think of an untrained model as a mathematical ball of clay that's ready to be shaped to the data.

In [10]:
# FIT
# The fit stage (known as the training stage) is when we fit the model to the data.
# In the mathematical ball-of-clay analogy, we adjust the model so that it matches patterns in the data.
# Recall that in our time series forecasting, the Prophet tool built a model that matched the time series components of the data.
# We could then use that model to forecast the values that future data might have.
# The fit stage of supervised learning works the same way.
# This  is when the model starts to learn how to adjust (or train) itself to make predictions matching the data that we give it.

In [None]:
# PREDICT
# Once the model has been fit, or trained, to the data, we can use the trained model to predict new data.
# If we give the model new data that's similar enough to the data that it's gotten before, it can predict the outcome for that data.