# Choice Modeling: Conjoint Analysis

A marketing manager faces a common problem—how do customers evaluate various tangible or intangible attributes offered by a particular product(ex TV)? Now, he or she will have to make a judgement about his preference for various attribute combinations such as brand image, flat screen, screen size, sound quality, picture quality, price of different models, and so on. Conjoint analysis provides an answer to this question. The main objective of conjoint analysis is to find out the attributes of the product that a respondent prefers most.
<br>
<br>
Conjoint analysis determines the relative importance of various product attributes (attached by the consumers to different product attributes) and values (utilities) attached to different levels of these attributes.
<br> <br>
In fact, conjoint analysis asks the participants to give an overall evaluation of the product that vary systematically on a number of attributes.
<br> <br>
To formulate a problem, as a first step, a researcher must identify the various attributes and attribute levels.


![](docs/cjoint.PNG)

The conjoint analysis model can be represented by the following formula:


![](docs/cjoinform.PNG)

where U(x) is the utility of an alternative, uij the part-worth contribution (utility of jth level of ith attribute), ki the number of levels for attribute i, and m the number of attributes. xij = 1 if the jth level of the ith attribute is present and xij = 0 otherwise.
<br> <br>
Importance of an attribute (Ri) = [maximum(uij) − minimum(uij)]
<br> 

To estimate the model, a variety of techniques are available. The most popular and widely applied technique is dummy variable regression technique. To analyse the conjoint analysis data, dummy variables are treated as independent or explanatory variables and preference rating obtained from the respondent is treated as dependent variable.

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('data/candidate.tab', delimiter='\t')

In [None]:
df

In [None]:
df.describe()

In [None]:
df.atmilitary.value_counts()

In [None]:
df.atreligion.value_counts()

In [None]:
df.ated.value_counts()

In [None]:
df.atprof.value_counts()

In [None]:
df.atinc.value_counts()

In [None]:
df.atrace.value_counts()

In [None]:
df.atage.value_counts()

In [None]:
df.atmale.value_counts()

In [None]:
df.shape

In [None]:
df.isnull().sum()

In [None]:
# remove empty data
clean_df = df.dropna()

In [None]:
y = clean_df['selected']
x = clean_df[[x for x in df.columns if x not in ['selected', 'resID', 'rating']]]

In [None]:
xdum = pd.get_dummies(x, columns=[c for c in x.columns if c != 'selected'], )
xdum.head()

In [None]:
res = sm.OLS(y, xdum, family=sm.families.Binomial()).fit()
res.summary()

In [None]:
df_res = pd.DataFrame({
    'param_name': res.params.keys()
    , 'param_coef': res.params.values
    , 'p_val': res.pvalues
}).reset_index(drop=True)
print(df_res.shape)
df_res.head()


In [None]:
df_res['abs_param_coef'] = np.abs(df_res['param_coef'])
# marking field is significant under 95% confidence interval
df_res['is_sig_95'] = (df_res['p_val'] < 0.05)
# constructing color naming for each param
df_res['c'] = ['green' if x else 'red' for x in df_res['is_sig_95']]
df_res.shape

In [None]:
df_res.head()

In [None]:
# make it sorted by abs of parameter value
df_res = df_res.sort_values(by='abs_param_coef', ascending=True)
df_res

In [None]:
f, ax = plt.subplots(figsize=(14, 8))
plt.title('Part Worth')
pwu = df_res['param_coef']
xbar = np.arange(len(pwu))
plt.barh(xbar, pwu, color=df_res['c'])
plt.yticks(xbar, labels=df_res['param_name'])
plt.show()

In [None]:
# need to assemble per attribute for every level of that attribute in dicionary
range_per_feature = dict()
for key, coeff in res.params.items():
    sk =  key.split('_')
    feature = sk[0]
    if len(sk) == 1:
        feature = key
    if feature not in range_per_feature:
        range_per_feature[feature] = list()
        
    range_per_feature[feature].append(coeff)

In [None]:
range_per_feature

In [None]:
# importance per feature is range of coef in a feature
# while range is simply max(x) - min(x)
importance_per_feature = {
    k: max(v) - min(v) for k, v in range_per_feature.items()
}

In [None]:
importance_per_feature

In [None]:
alt_data = pd.DataFrame(
    list(importance_per_feature.items()), 
    columns=['attr', 'importance']
).sort_values(by='importance', ascending=False)


f, ax = plt.subplots(figsize=(12, 8))
xbar = np.arange(len(alt_data['attr']))
plt.title('Importance')
plt.barh(xbar, alt_data['importance'])
for i, v in enumerate(alt_data['importance']):
    ax.text(v , i + .25, '{:.2f}'.format(v))
plt.ylabel('attributes')
plt.xlabel('% importance')
plt.yticks(xbar, alt_data['attr'])
plt.show()