**Predictive modeling**: the process of developing a mathematical tool or model that generates an accurate prediction.

There are a number of common reasons why predictive models fail, e.g,

- inadequante pre-processing of the data
- inadequate model validation
- unjustified extrapolation
- over-fitting the model to the existing data
- explore relatively few models when searching for relationships

## 1.1 Prediction Versus Interpretation

The trade-off between prediction and interpretation depends on the primary goal of the task. The unfortunate reality is that as we push towards higher accuracy, models become more complex and their interpretability becomes more difficult.

## 1.2 Key Ingredients of Predictive Models

The foundation of an effective predictive model is laid with *intuition* and *deep knowledge of the problem context*, which are entirely vital for driving decisions about model development. The process begins with *relevant* data. 

## 1.3 Terminology

- The *sample*, *data point*, *observation*, or *instance* refer to a single independent unit of data
- The *training* set consists of the data used to develop models while the *test* or *validation* set is used solely for evaluating the performance of a final set of candidate models. **NOTE**: usually people refer to the *validation* set for evaluating candidates and divide *training* set using cross-validation into several sub-*training* and *test* sets to tune parameters in model development.
- The *predictors*, *independent variables*, *attributes*, or *descriptors* are the data used as input for the prediction equation.
- The *outcome*, *dependent variable*, *target*, *class*, or *response* refer to the outcome event or quantity that is being predicted.

## 1.4 Example Data Sets and Typical Data Scenarios

## 1.5 Overview

- **Part I General Strategies**
    - Ch.2 A short tour of the predictive modeling process
    - Ch.3 Data pre-processing
    - Ch.4 Over-fitting and model tuning
- **Part II Regression Models**
    - Ch.5 Measuring performance in regression models
    - Ch.6 Linear regression and its cousins
    - Ch.7 Nonlinear regression models
    - Ch.8 Regression trees and rule-based models
    - Ch.9 A summary of solubility models
    - Ch.10 Case study: compressive strength of concrete
- **Part III Classification Models**
    - Ch.11 Measuring performance in classification models
    - Ch.12 Discriminant analysis and other linear classification models
    - Ch.13 Nonlinear classification models
    - Ch.14 Classification trees and rule-based models
    - Ch.15 A summary of grant application models
    - Ch.16 Remedies for severe class imbalance
    - Ch.17 Case study: job scheduling
- **Part IV Other Considerations**
    - Ch.18 Measuring predictor importance
    - Ch.19 An introduction to feature selection
    - Ch.20 Factors that can affect model performance