# Review of Predictive Modeling

Our primary goal here is to describe the main capabilities and limitations of predictive modeling

In business applications, data analytics methods are often categorized at a high level into three distinct types: 
-    descriptive, 
-    predictive, and 
-    prescriptive


**Descriptive analytics** refers to methods for data summarization, data quality assess-
ment, and finding correlations

**Predictive analytics** focuses on estimation of the likelihood of a potential outcome by using data that are observed or known prior to the outcome
e.g)
forecast demand
propensity scoring used to tell likelihood of customer responding to promotion

**prescriptive analytics** refers to modeling of the dependency between decisions and future outcomes for optimal decision making
e.g)
when it is price optimization where the profit is modeled as a function of the price, so that one can estimate how many dollars of profit would be generated by every dollar of price discount and determine the profit-optimal discount value.


# Steps

we need to take as we begin to discuss the algorithmic approach is to translate
this business language into more formal models that describe the objective we are trying to achieve, the space of possible actions, and the constraints we should meet

naturally, naturally optimization problems like express business metrics

- revenue
- possible action of campaigns
- assortment adjustments
- require optimal actions to be found from various possible strategies


`Importantly`: there are several basic considerations that should be taken into account in any model design.

**First**, we need to define the business objective and express it as a numerical metric that can be a subject of optimization

_e.g) The design of the objective can be especially challenging if the objective represents a trade-off between the enterprise’s profit and the usefulness to the consumer_



**Second**, we should account for available data or address the data collection problem

_e.g) there is a trade-off between the cost of data acquisition and the value delivered by the acquired data_


**Third**, the model can be created at different levels of granularity

_e.g) this method assume the availability of a powerful data processing infrastructure and high-resolution data, which enables more granular modeling_

**Finally**, the economic model that estimates the business outcomes from the distribution should be defined. This is an economic problem, rather than a machine learning one

**example:**

In 1998, 

`scenario 1`:
    Suppose a retailer sells a product with margin _m_ and $q_{m}$ is the monthly amount of this product purchased by customer u expressed by simple dot product, 
    
$$G =\sum_{u}  q_{u}*m$$



 `scenario 2`:
    He wants to boost sale by factor _k_, and he is okay to take risk of cost of each promotion _c_, narutally becames optimization problem. where k and c are hyper parameters

$$ \underset{s}{\operatorname{max}}  \sum_{u} k q_{u} m - c $$


`scenario 3`:
    He segments customers into different group and apply different strategy  $s_{i}=(k_{i},c_{i})$ and $s_{j}=(k_{j},c_{j})$, so now we have four hyper parameters and non-linearity

$$\underset{s_{i},s_{j}}{\operatorname{max}}  \sum_{u}(  (k_{i} q_{u} m - c_{i}) , ( k_{j} q_{u} m - c_{j})  ) $$



 

Now we want to know difference in controlled and uncontrolled factors and metrics of interest. $P(Outcome| Invest)$

so, in our simple mode 
$G =\sum_{u} G( p(y| X(s))) $

As usual ML lingo,    design_matrix, $D = [X | y]$

where, x= feature matrix of (n * m) and  y= label vector of ( n * 1 )


$$ D  = [ X_{(n*m)} | y_{(n*1)} ]$$


Applying divergence in ML or constrastive in DL model, $\hat{y} = y(x)$




However in many case x,y are not explicitly present in data. we do a lot of feature engineering

# Supervised Learning
- Regression
- Classification

on other considerations:
- parametric
- Non parametric


Techniques:
- Linear Regression (MLE,MPA)  
- Logistic Regression/ Binary
- KNN
- Navie Bayes Classifier

- Non Linear Models
    1. feature mapping and kernel methods
    2. adaptive basis and decision trees
    
- Representation Learning
    1. PCA (Decorrelation/Dimensionality Reduction )
    2. Clustering
    
- **More Specialized Models**
    1. COnsumer Choice Theory
         I. Multinomial Logit Model
        II. Estimation of Multinomial Logit Model
    2. Survival Analysis
         I, Survival Function
        II. Hazard Function
       III. Survival Analysis Regression 
    3. Auction Theory

**Optimization**
- minimizaton problem with equality contrained problems G(x,y)=C

$$ \underset{x,y}{\operatorname{min}}  \sum_{i=0}^{n} F(x,y) + \lambda * [ G(x,y)-C ]  )  $$


#### Linear regression
#### Logistic regression
#### Navie bayes Classifier

#### Nonlinear Models
