# 03 - Conjoint analysis
The exercises in this notebook are inspired by [Traditional Conjoint Analysis with Excel](https://sawtoothsoftware.com/resources/technical-papers/analysis-of-traditional-conjoint-using-excel-an-introductory-example). We use the same data and run the same analysis, but we perform the analysis in Python - which is more extensible.

## Example Problem
Consider a product range where each product has three attributes $(\text{Brand}, \text{Colour}, \text{Price})$ which can each take on the following values:

| Brand    | Colour   | Price    |
| -------- | -------- | -------- |
| A        | Red      | £50      |
| B        | Blue     | £100     |
| C        |          | £150     |

For example, a particular product instance might have attribute-values $(\text{B}, \text{Red}, \text{£150})$. In total, there are $18$ possible products that can be created from these attribute values. 

$$\text{3 brands} × \text{2 colors} × \text{3 prices} = \text{18 products}$$

Assume that each of these products are tested is a trial where partipants rate each of the products on a scale from $0$ to $10$, where $10$ represents the highest degree of preference.

## Example data
Let's assume we have data from one participant, who rated every product. Run the code below to load their data into a `DataFrame`.

In [24]:
import pandas as pd
columns = ['Product', 'Brand', 'Colour', 'Price', 'Preference']
data = [
    [ 1, 'A', 'Red',  '£50',  5],
    [ 2, 'A', 'Red',  '£100', 5],
    [ 3, 'A', 'Red',  '£150', 0],
    [ 4, 'A', 'Blue', '£50',  8],
    [ 5, 'A', 'Blue', '£100', 5],
    [ 6, 'A', 'Blue', '£150', 2],
    [ 7, 'B', 'Red',  '£50',  7],
    [ 8, 'B', 'Red',  '£100', 5],
    [ 9, 'B', 'Red',  '£150', 3],
    [10, 'B', 'Blue', '£50',  9],
    [11, 'B', 'Blue', '£100', 6],
    [12, 'B', 'Blue', '£150', 5],
    [13, 'C', 'Red',  '£50', 10],
    [14, 'C', 'Red',  '£100', 7],
    [15, 'C', 'Red',  '£150', 5],
    [16, 'C', 'Blue', '£50',  9],
    [17, 'C', 'Blue', '£100', 7],
    [18, 'C', 'Blue', '£150', 6]
]

df_responses = pd.DataFrame(data=data,columns=columns)
df_responses.head(20)


Unnamed: 0,Product,Brand,Colour,Price,Preference
0,1,A,Red,£50,5
1,2,A,Red,£100,5
2,3,A,Red,£150,0
3,4,A,Blue,£50,8
4,5,A,Blue,£100,5
5,6,A,Blue,£150,2
6,7,B,Red,£50,7
7,8,B,Red,£100,5
8,9,B,Red,£150,3
9,10,B,Blue,£50,9


## Coding attribute levels
To perform a conjoint analysis on the above data, we first need to format (/code) the attribute values/levels. We can do this using the `get_dummies` method.

In [25]:
attributes = ['Brand', 'Colour', 'Price']
df_dummies = pd.get_dummies(df_responses, columns=attributes)
df_dummies.head(20)

Unnamed: 0,Product,Preference,Brand_A,Brand_B,Brand_C,Colour_Blue,Colour_Red,Price_£100,Price_£150,Price_£50
0,1,5,1,0,0,0,1,0,0,1
1,2,5,1,0,0,0,1,1,0,0
2,3,0,1,0,0,0,1,0,1,0
3,4,8,1,0,0,1,0,0,0,1
4,5,5,1,0,0,1,0,1,0,0
5,6,2,1,0,0,1,0,0,1,0
6,7,7,0,1,0,0,1,0,0,1
7,8,5,0,1,0,0,1,1,0,0
8,9,3,0,1,0,0,1,0,1,0
9,10,9,0,1,0,1,0,0,0,1


## Resolving linear dependencies
The problem with the above coding, is the linear dependency between input features. To fix this problem, we can use the `drop_first=True` argument to produce a better set of input features.

In [26]:
df_coded = pd.get_dummies(df_responses, columns=attributes, drop_first=True)
df_coded.head(20)

Unnamed: 0,Product,Preference,Brand_B,Brand_C,Colour_Red,Price_£150,Price_£50
0,1,5,0,0,1,0,1
1,2,5,0,0,1,0,0
2,3,0,0,0,1,1,0
3,4,8,0,0,0,0,1
4,5,5,0,0,0,0,0
5,6,2,0,0,0,1,0
6,7,7,1,0,1,0,1
7,8,5,1,0,1,0,0
8,9,3,1,0,1,1,0
9,10,9,1,0,0,0,1


## Multiple regression analysis
We can now use the `sklearn` package to perform regression analysis.

In [27]:
from sklearn import linear_model
regr = linear_model.LinearRegression()

dependent_var = 'Preference'
independent_vars = ['Brand_B', 'Brand_C', 'Colour_Red', 'Price_£150', 'Price_£50']

y = df_coded[dependent_var]
X = df_coded[independent_vars]
regr.fit(X, y)

print(regr.coef_)

[ 1.66666667  3.16666667 -1.11111111 -2.33333333  2.16666667]


These regression coefficients, show the effect of each attribute level relative to the one missing. To see this, we can print the final results in a more human-readible format.

In [29]:
for attribute in attributes:
    attribute_levels = [level for level in df_dummies.columns if level.startswith(attribute)]
    print(attribute)
    for level in attribute_levels:
        value = regr.coef_[independent_vars.index(level)] if (level in independent_vars) else 0.00
        level = level.split('_')[-1]
        print(f'  {level}={value:.2f}')

    

Brand
  A=0.00
  B=1.67
  C=3.17
Colour
  Blue=0.00
  Red=-1.11
Price
  £100=0.00
  £150=-2.33
  £50=2.17


Look at results compared to the original [article](https://sawtoothsoftware.com/resources/technical-papers/analysis-of-traditional-conjoint-using-excel-an-introductory-example). Why are they slightly different?

Also, read the original [article](https://sawtoothsoftware.com/resources/technical-papers/analysis-of-traditional-conjoint-using-excel-an-introductory-example) for more details.