# Choice modelling


Today we will look at the example of Choice modelling task.

The practice is based on tutorials: https://www.linkedin.com/pulse/marketing-analytics-series-part-1-customer-choice-nguyen-u-s-cma/ and https://www.linkedin.com/pulse/conjoint-analysis-simple-python-implementation-prajwal-sreenivas/ 
(you need to use VPN in order to access Linkedin site)


**The Customer Choice (Logit) model** is an individual-level model helping to understand factors behind each person's purchase decisions. Through quantitative analysis, companies and marketing executives can gain insight on how features such as prices, design, and durability influence the customer's choice of a brand. Customer Choice model is therefore widely used in various marketing areas, including, but not limited to, customer segmentation, product design, pricing or customer churn management.

The inputs for Customer Choice Models are usually:

+ Customer's choice data for alternative offerings, each containing specific attributes.
+ Customer's ratings of different alternatives.

![](choice-ex.png)

**Choice modelling task in simple words:**
    
we observe that the customer makes a choice among several options (for example, if a customer comes into the store and purchases some particular chocolate, he or she chooses thi chocolate from many others). Each of the chocolates has its own characteristics - price, brand, favor etc. And customer has some characteristics as well and we may have information about his/her other choices. 

So, we have choices of chocolates for a set of customers. Now we can use some model that will help us to understand the relationship between the attributes of products and customers’ choices among sets of products.

Again, target variable is choice of some product. There are some finite number of products. If there are only two of them, we can consider this task as a binary classification. If there are more than two products, then we can use algorithms for multi-class classification. We also need interpretable models, so that we can interpret the results after its training and understand the relationships between the features and the target variable (choice).

So, it seems that multinomial logit model will be a good choice.

## Data

Here is some list of choice data I found: https://github.com/alvarogutyerrez/TheDiscreteChoiceDataBank.
        
We will use data on **school transportation choice by students in Dresden**. 

**The link to the data**: https://github.com/svenne0815/DresdenModeChoiceData/blob/master/DDModeChoice.txt.


Article where this data is used: https://www.sciencedirect.com/science/article/abs/pii/S0966692320309492?dgcid=coauthor.


In [None]:
data = pd.read_csv('transport_choice_data.txt', sep='\t')

In [None]:
data.head()

Here is the variables descriptions from the article

![](variables.png)

In [None]:
data.groupby(['Season', "Choice"]).count()['ID']/data.shape[0]*100

Let's name categories:

+ 1 - Walk
+ 2 - Bike
+ 3 - Public transport
+ 4 - Car

(this is only my guess, as you see the proportions of each transport are different from those presented in the article, and there is no any codings for data in the data itself, so let's assume that mt guess is right)

In [None]:
# selecting columns we need
data = data[['Choice', 'Distance', 'Grade', 'Age', 'Gender', 'CarAvail', 'Season']]

Other categorial columns codings:
    
+ Season (1-Winter, 0-Summer)
+ CarAvail (1-available, 0-not available)


In [None]:
data['CarAvail'] = data['CarAvail'].apply(lambda x: 'available' if x == 1 else 'not available')
data['Season'] = data['Season'].apply(lambda x: 'winter' if x == 1 else 'summer')

## Modelling with Multinomial Logit


Practice on mnlogit from the previous year: https://drive.google.com/file/d/1NzGHvmWddAFOMMr3u-vrMLzH1qq050-9/view?usp=sharing

In [None]:
from statsmodels.formula.api import mnlogit

multi_model = mnlogit('Choice ~ C(Gender) + Age + Grade + Distance + C(CarAvail) + C(Season)', 
                      data=data, distr='logit').fit()
multi_model.summary()

## Interpeting the results

(reminder from the practice on mnlogit)

### Extra comments:

the very same interpreataion as it was for binary logreg with the only difference that we have
- one reference category (here we have immigration_policy=1) 
- then we have 3 other categories of the dependent variable

in the binary logreg we have one reference and another one and we look at the log-odds

$$\log(\frac{P(Y==1)}{P(Y==0)}) = \beta_0 + \beta_1 * x_1 + ... + \beta_n * x_n$$


here we have several similar formulas with the only difference in nominators (and obviously in coefficients)

$$\log(\frac{P(Y=i)}{P(Y=0)}) = \beta_{i0} + \beta_{i1} * x_1 + ... + \beta_{in} * x_n$$

For ex, in our case:

$$\log(\frac{P(immigration\_policy==2)}{P(immigration\_policy==1)}) = 1.2362 + 0.1623 * Sex +  0.0010 * Age$$

$$\log(\frac{P(immigration\_policy==3)}{P(immigration\_policy==1)}) = 2.0210 + -0.0902 * Sex +  0.0070 * Age$$


More reminders: 

*Interpreting coefficients*

+ β (log-odds) refers to whether the mean of y increases or decreases as x increases. So, it shows whether there is a positive or negative relationship between the target variable and the feature.

+ odds coefficient A shows you that "when the feature increases by 1 unit the target variable increases by a factor of A" (the target variable will be multiplied by A)

+ Average marginal effects: for categorical variables with more than two possible values, the marginal effects show you the difference in the predicted probabilities for cases in one category relative to the reference category.

With binary independent variables, marginal effects measure discrete change, i.e. how the predicted probabilities change as the binary independent variable changes from 0 to 1.
Marginal effects for continuous variables often provide a good approximation to the amount of change in Y that will be produced by a 1-unit change in X.

    

This is how we can calculate odds (from log-odds)

In [None]:
odds = np.exp(multi_model.params)

In [None]:
odds.columns = ['Bike', 'Transport', 'Car']

In [None]:
odds

In [None]:
print(multi_model.get_margeff(at ='overall').summary())

## Predicting

In [None]:
data['Distance'].quantile(0.75)

In [None]:
# Predictions for the average and extreme values of predictors 
# tables (change one variable, remain the others constant)
# out variables C(Gender) + Age + Grade + Distance + C(CarAvail) + C(Season)

pd.options.display.float_format = '{:.4f}'.format

pred_table = pd.DataFrame(columns=["Gender", "Age", "Grade", "Distance", "CarAvail", "Season"])
pred_table['Distance'] = data['Distance'].min(), data['Distance'].quantile(0.25), data['Distance'].quantile(0.5), data['Distance'].quantile(0.75), data['Distance'].max()
pred_table['Gender'] = data['Gender'].mode()[0]
pred_table['Age'] = data['Age'].mean()
pred_table['Grade'] = data['Grade'].mean()
pred_table['CarAvail'] = data['CarAvail'].mode()[0]
pred_table['Season'] = data['Season'].mode()[0]

predictions = multi_model.predict(pred_table)
predictions.columns = ['P_Walk', 'P_Bike', 'P_Transport', 'P_Car']

pred_table = pd.concat([pred_table, predictions], axis=1)
pred_table

## Market share

In a customer choice analysis, we may want to analyze how the choice changes with some variable change. It can be some customer characteritics or product attribute.


We can also look at the predicted market share of each transport choice.

NOTE: this is a little bit stupid example, because we predict the choices on the same data that we used for training. Actually, we have real labels here and we now real shares of each transport. In real research, we may want to calculate probabilites and shares for some new data that we did not use for training.

In [None]:
multi_model.predict(data)

In [None]:
# predictions of the class
predicted_classes = multi_model.predict(data).apply(lambda x: np.argmax(x), axis=1)

In [None]:
predicted_classes

In [None]:
shares = np.unique(predicted_classes, return_counts=True)[1]

In [None]:
# market shares of each transport
shares / np.sum(shares)

What is more, it may be interested how these market shares change when we change some characteritics of the people and/or product.

# Conjoint Analysis

**What is Conjoint Analysis?**

Conjoint Analysis, short for "consider jointly" is a marketing insight technique that provides consumers with combinations, pairs or groups of products that are a combination of various features and ask them what they prefer. The product is described by a number of attributes and each attribute has several levels.

One of the greatest strengths of Conjoint Analysis is its ability to develop market simulation models that can predict consumer behavior to changes in the product. Conjoint Analysis can be applied to a variety of difficult aspects of the Market research such as product development, competitive positioning, pricing pricing, product line analysis, segmentation and resource allocation.

The difference from choice model  - we have not only the choice of a customer, but also his/her rankings of the products. They come from the experimantal design - the customer chooses from a set of options, and this options usually represent some similar products with some varying characteristics.

The tutorial on conjoint analysis: https://ariepratama.github.io/How-to-do-conjoint-analysis-in-python/.

## Task for you


1. Show how the probabilities of choosing each option of school transportation change when we vary other predictors. Choose 2 predictors, vary each of them with remaining other variables constant, calculate predictions and plot the probabilities. Make conclusions on the relationship between predictors and target variable.

NOTE: this is the same thing that was in Predicting part of this practice. So, you need to choose one predictor, make a table where you vary the values of this predictor (assign constant values to other variables). Then predict probabilites, for each row and plot how they change with different values of chosen predictor. Then choose some other predictor and do the same thing. So, you should analyse the relationship between chosen predictors and target variable separately.

The example of plot we draw in the previous year. Here we vary the age variable and plot the probabilites of some 4 groups for some age.
![](plot1.png)