# Conjoint Analysis

Conjoint analysis is a common statistical method of pricing and product research. The method uncovers customers' choices through market surverys.

The primary goald of conjoint analysis is to predict what features customers want in a new product.

Examples:
* Freemium plan for application
* Headphone jacks of iPhone

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [236]:
df = pd.read_csv("./data/sample_data.csv")
df

Unnamed: 0,Run,Sears,Goodyear,Goodrich,Price50,Price60,Price70,Miles30K,Miles40K,Miles50K,SideBlack,SideWhite,Utility
0,1,1,0,0,1,0,0,1,0,0,0,1,5.2
1,2,1,0,0,0,1,0,0,1,0,0,1,7.3
2,3,1,0,0,0,0,1,0,0,1,1,0,5.7
3,4,0,1,0,0,1,0,1,0,0,1,0,4.8
4,5,0,1,0,0,0,1,0,1,0,0,1,7.2
5,6,0,1,0,1,0,0,0,0,1,0,1,9.3
6,7,0,0,1,0,0,1,1,0,0,0,1,0.8
7,8,0,0,1,1,0,0,0,1,0,1,0,3.2
8,9,0,0,1,0,1,0,0,0,1,0,1,6.4
9,10,1,0,0,0,0,1,1,0,0,1,0,2.2


In [237]:
target = 'Utility'
unused = 'Run'
drop_cols = []
drop_cols.append(target)
drop_cols.append(unused)
drop_cols

['Utility', 'Run']

In [238]:
target = 'Utility'
unused = 'Run'
drop_cols = []
drop_cols.append(target)
drop_cols.append(unused)
X = df.drop(drop_cols, axis=1)
Y = df[target]
lr = sm.OLS(Y,X).fit()
lr.summary()


kurtosistest only valid for n>=20 ... continuing anyway, n=18



0,1,2,3
Dep. Variable:,Utility,R-squared:,0.971
Model:,OLS,Adj. R-squared:,0.95
Method:,Least Squares,F-statistic:,47.0
Date:,"Mon, 08 Jul 2024",Prob (F-statistic):,7.39e-07
Time:,03:49:15,Log-Likelihood:,-8.7377
No. Observations:,18,AIC:,33.48
Df Residuals:,10,BIC:,40.6
Df Model:,7,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Sears,1.6935,0.178,9.500,0.000,1.296,2.091
Goodyear,2.6102,0.178,14.643,0.000,2.213,3.007
Goodrich,-0.6731,0.178,-3.776,0.004,-1.070,-0.276
Price50,1.8602,0.178,10.435,0.000,1.463,2.257
Price60,1.6269,0.178,9.126,0.000,1.230,2.024
Price70,0.1435,0.178,0.805,0.439,-0.254,0.541
Miles30K,-0.8565,0.178,-4.805,0.001,-1.254,-0.459
Miles40K,1.8102,0.178,10.155,0.000,1.413,2.207
Miles50K,2.6769,0.178,15.017,0.000,2.280,3.074

0,1,2,3
Omnibus:,2.753,Durbin-Watson:,1.714
Prob(Omnibus):,0.253,Jarque-Bera (JB):,1.287
Skew:,-0.632,Prob(JB):,0.525
Kurtosis:,3.342,Cond. No.,4.82e+16


In [239]:
coef = lr.params.to_dict()

attributes = {
'brand' : ['Sears','Goodyear','Goodrich'],
'Price' : ['Price50','Price60','Price70'],
'Miles' : ['Miles30K','Miles40K','Miles50K'],
'Side' : ['SideBlack','SideWhite']
}

part_worth_utility = {}
for attribute in attributes.keys():
    print(f'Attribute: {attribute}')
    _ls = attributes[attribute]
    for feature in _ls:
        val = coef[feature]
        print(f'    {feature} : {coef[feature]:.3f}')
        part_worth_utility[feature] = val

Attribute: brand
    Sears : 1.694
    Goodyear : 2.610
    Goodrich : -0.673
Attribute: Price
    Price50 : 1.860
    Price60 : 1.627
    Price70 : 0.144
Attribute: Miles
    Miles30K : -0.856
    Miles40K : 1.810
    Miles50K : 2.677
Attribute: Side
    SideBlack : 1.203
    SideWhite : 2.428


**Importance of an attribute $R_i$** :&nbsp;&nbsp; $R_i = max(u_{ij}) - min(u_{ik})$ for the $i$-th attribute <br>

**Relative Importance of an attribute  $Rimp_i$**:&nbsp;&nbsp;$Rimp_i = \frac{R_i}{\sum_{i=1}^m{R_i}}$ 

In [240]:
part_worth_utility

{'Sears': 1.693518518518517,
 'Goodyear': 2.6101851851851894,
 'Goodrich': -0.6731481481481463,
 'Price50': 1.860185185185187,
 'Price60': 1.6268518518518535,
 'Price70': 0.14351851851851843,
 'Miles30K': -0.8564814814814812,
 'Miles40K': 1.8101851851851865,
 'Miles50K': 2.6768518518518527,
 'SideBlack': 1.202777777777779,
 'SideWhite': 2.4277777777777794}

In [241]:
vals ={}
for attribute in attributes.keys():
    print(attribute)
    _tmp = []
    for feature in attributes[attribute]:
        print(f"    {feature}: {part_worth_utility[feature]:.3f}")
        _tmp.append(part_worth_utility[feature])
    vals[attribute]=max(_tmp), min(_tmp)
# vals

brand
    Sears: 1.694
    Goodyear: 2.610
    Goodrich: -0.673
Price
    Price50: 1.860
    Price60: 1.627
    Price70: 0.144
Miles
    Miles30K: -0.856
    Miles40K: 1.810
    Miles50K: 2.677
Side
    SideBlack: 1.203
    SideWhite: 2.428


In [242]:
_df = pd.DataFrame(vals).T
_df.columns = ['max','min']
_df['importance'] = _df['max']  - (_df['min'])
_df['relative_importance'] = _df['importance']/sum(_df['importance'])*100
_df

Unnamed: 0,max,min,importance,relative_importance
brand,2.610185,-0.673148,3.283333,33.646456
Price,1.860185,0.143519,1.716667,17.591802
Miles,2.676852,-0.856481,3.533333,36.208369
Side,2.427778,1.202778,1.225,12.553373


## Selecting optimal feature bundle and price-point

Based on the part worthy utility scores, the optimal features are features with highest score in each attribute:
1. Goodyear

2. Miles50K

3. Sidewhite

Estimate the total utility from the regression model with those three features

In [243]:
opt_feature = [0,1,0,0,0,0,0,0,1,0,1]
total_utility = lr.predict(opt_feature)[0]
print(f'The total predicted utility : {total_utility:.3f}')

The total predicted utility : 7.715


### The optimal price point

Calculate from the dollar cost of one unit of utility <br>
The difference between max and min price is $20 ($70-$50). The importance of price is 1.71.
unit utility is $20/1.71 ~=$11.70  

In [244]:
print(f'''The optimal price point: {total_utility*20/(_df.loc['Price','importance']):.3f}''')

The optimal price point: 89.881


## Market share simulation

Calculate the probability that the customer will choose that a particular feature-bundle.

In order to simulate the design patter, we have to specify a choice rule to transform part-worth utility into the produc choice that customers are most likely to make. <br>
The three most commont rules are:
1. Maximum utility

2. Share of utility

3. logit

The maximum utility rule is simple, but it tends to predict more extreme market shares. i.e. closer to 0% or 100% market shares than other rules. This rule is also less robust. Share of uitlity and logit rules are sensitive to the scale range on which utility is measured. 

Reference: <br>

“[Conjoint Analysis: Marketing Engineering Technical Note](https://faculty.washington.edu/sundar/NPM/CONJOINT-ProductDesign/TN09%20-%20Conjoint%20Analysis%20Technical%20Note.pdf)” supplement to Chapter 6 of Principles of Marketing Engineering, by Gary L. Lilien, Arvind Rangaswamy, and Arnaud De Bruyn (2007).

*Logit-choice rule*

Assumption: the utilities follow a random process

This rule states that the probability of a customer choosing a particular feature bundle is proportional to the exponential of the uitlity of that the bundle.

$P_j = \frac{e^{U_j}}{\sum^n_{k=1}{e^{U_k}}}$

Where $U_j$ is the utility of bundle $j$, and the denominator is the sum of the exponentials of the utilities of all bundles being considered.

This approach accounts for the fact that customers are more likely to choose bundles with higher utility values, and it provides a way to estimate the maket share foe each possible bundle.


In [259]:
df_logit = df.copy()

'''Remove price cols.
The focus is on comparing the utility of different feature combinations without the influence of price.
By setting the price columns to zero, the analysis isolates the effect of other attributes (brand, mileage, and side) on customer preference.
This helps to determine the intrinsic value of the features themselves, independent of price, and to simulate market share based solely on these non-price attributes.
'''
ignore_price_cols = ["Price50","Price60","Price70"]
for col in ignore_price_cols:
    df_logit[col] = 0

df_logit = df_logit.drop(["Utility"],axis=1)
X = df_logit[["Sears",
                     "Goodyear",
                     "Goodrich",
                     "Price50",
                     "Price60",
                     "Price70",
                     "Miles30K",
                     "Miles40K",
                     "Miles50K",
                     "SideBlack",
                     "SideWhite"]]
predictUtil = lr.predict(X)
df_logit["predicted_Utility"] = predictUtil
utility_values = list(predictUtil)
total_utility=0
for val in utility_values:
    total_utility = total_utility + np.exp(val)
total_utility
market_shares =[]
for val in utility_values:
    probability = np.exp(val)/total_utility
    market_shares.append(probability*100)
df_logit["market_share"]  = market_shares
_logit =   df_logit[["Sears",
                        "Goodyear",
                        "Goodrich",
                        "Miles30K",
                        "Miles40K",
                        "Miles50K",
                        "SideBlack",
                        "SideWhite",
                        "predicted_Utility",
                        "market_share"]].drop_duplicates()
_logit

Unnamed: 0,Sears,Goodyear,Goodrich,Miles30K,Miles40K,Miles50K,SideBlack,SideWhite,predicted_Utility,market_share
0,1,0,0,1,0,0,0,1,3.264815,0.331655
1,1,0,0,0,1,0,0,1,5.931481,4.773155
2,1,0,0,0,0,1,1,0,5.573148,3.335672
3,0,1,0,1,0,0,1,0,2.956481,0.243657
4,0,1,0,0,1,0,0,1,6.848148,11.937376
5,0,1,0,0,0,1,0,1,7.714815,28.398631
6,0,0,1,1,0,0,0,1,0.898148,0.031107
7,0,0,1,0,1,0,1,0,2.339815,0.131512
8,0,0,1,0,0,1,0,1,4.431481,1.065035
9,1,0,0,1,0,0,1,0,2.039815,0.097426


## Misconceptions about Conjoint Analysis
Three common mistakes:
1. Conjoint will estimate a market share <br>

 Conjoint analysis is used to assess preferences, and preferences are highly correlated with sales. But they are not identical to sales. <br>
 
    > In short, I always talk about "relative share of preference" unless I have strong evidence and specific intent to assess market share.

2. Conjoint Gives simple pricing data <br>

 Motivation for conjoint analysis is to get insight on pricing. However, **conjoint does not directly answer, 'How much can we charge'** or **'How much is this feature worth?'**
 To use conjoint for successful pricing research, you need repeated observations to understand how prices work for your category, product, market, and brand. That requires iterative data, attention to the results, and rational modeling of effects.

3. Highest Preference == Best Product <br>

 **Stakeholders believe that the best product decision is to offer the most-preferred feature(s)**. For instance, you get a highest score for black color of a car. Would producing a black car maximize a profit? It may not be true since this analysis assumes that there is no competition. What if all availble cars in the world are black?You won't have good profit from this product.  

Referrence:
[Misconceptions about Conjoint Analysis](https://quantuxblog.com/misconceptions-about-conjoint-analysis)