# McDonald's Market Segmentation

#### Import required libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA

#### Read the CSV to form dataframe

In [2]:
df = pd.read_csv('../mcdonalds.csv')

#### Initial Looks on the data

In [3]:
df.head()

Unnamed: 0,yummy,convenient,spicy,fattening,greasy,fast,cheap,tasty,expensive,healthy,disgusting,Like,Age,VisitFrequency,Gender
0,No,Yes,No,Yes,No,Yes,Yes,No,Yes,No,No,-3,61,Every three months,Female
1,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,2,51,Every three months,Female
2,No,Yes,Yes,Yes,Yes,Yes,No,Yes,Yes,Yes,No,1,62,Every three months,Female
3,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,No,Yes,4,69,Once a week,Female
4,No,Yes,No,Yes,Yes,Yes,Yes,No,No,Yes,No,2,49,Once a month,Male


In [4]:
print(f'Columns of the dataset: {df.columns.tolist()}')

Columns of the dataset: ['yummy', 'convenient', 'spicy', 'fattening', 'greasy', 'fast', 'cheap', 'tasty', 'expensive', 'healthy', 'disgusting', 'Like', 'Age', 'VisitFrequency', 'Gender']


#### Initial Conclusions

As we can observe from the first line of the dataset, that the first respondent does not feel that McDonald's food is
- yummy
- spicy
- greasy
- tasty
- healthy
- disgusting

The same respondent has given a rating of -3 to McDonald's, showing dislike for the same.
The respondent is a 61 year old female who visits McDonald's once every three months.

A quick look at the dataset shows that the perception is not numerical but verbal.
This means it is cded using the words: `YES` and `NO`.
This is not suitable for segment extraction.

#### Further Checking

In [5]:
MD_x_binary = (df.iloc[:, 0:11] == "Yes").astype(int)
MD_x_binary.mean().round(2)

yummy         0.55
convenient    0.91
spicy         0.09
fattening     0.87
greasy        0.53
fast          0.90
cheap         0.60
tasty         0.64
expensive     0.36
healthy       0.20
disgusting    0.24
dtype: float64

#### New Conclusions

The averages of the numeric segmentation show that about half the respondents, about 55% to be specific, percieve McDonald's to be YUMMY.<br>
While a strong 91% believe that eating at McDonald's is CONVENIENT and a meagre 9% feel like the food at McDonald's to be SPICY.

#### Exploratory Perceptual Mapping via Principal Components Analysis

Another approach to initial data exploration is to perform a principal components analysis (PCA) and generate a perceptual map. The perceptual map provides initial insights into how respondents rate different attributes and highlights which attributes tend to be rated similarly.

> **Note:** In this instance, PCA is not used for reducing the number of variables. Instead, we calculate the principal components to serve as a basis for rotating and projecting the data onto the perceptual map. This approach—also known as factor-cluster analysis—is generally considered inferior to clustering raw data in most cases (Dolnicar and Grün 2008).

Since our segmentation variables are binary, we work with unstandardised data for this analysis.


In [6]:
pca = PCA()
pca.fit(MD_x_binary)

In [7]:
std_devs = np.sqrt(pca.explained_variance_)
prop_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(prop_variance)

In [8]:
std_devs_rounded = np.round(std_devs, 2)
prop_variance_rounded = np.round(prop_variance, 2)
cumulative_variance_rounded = np.round(cumulative_variance, 2)

In [9]:
print("Standard Deviations:")
for i, sd in enumerate(std_devs_rounded, start=1):
    print(f"PC{i}: {sd}")

Standard Deviations:
PC1: 0.76
PC2: 0.61
PC3: 0.5
PC4: 0.4
PC5: 0.34
PC6: 0.31
PC7: 0.29
PC8: 0.28
PC9: 0.27
PC10: 0.25
PC11: 0.24


In [10]:
print("\nProportion of Variance:")
for i, pv in enumerate(prop_variance_rounded, start=1):
    print(f"PC{i}: {pv}")


Proportion of Variance:
PC1: 0.3
PC2: 0.19
PC3: 0.13
PC4: 0.08
PC5: 0.06
PC6: 0.05
PC7: 0.04
PC8: 0.04
PC9: 0.04
PC10: 0.03
PC11: 0.03


In [11]:
print("\nCumulative Proportion:")
for i, cp in enumerate(cumulative_variance_rounded, start=1):
    print(f"PC{i}: {cp}")


Cumulative Proportion:
PC1: 0.3
PC2: 0.49
PC3: 0.63
PC4: 0.71
PC5: 0.77
PC6: 0.82
PC7: 0.86
PC8: 0.9
PC9: 0.94
PC10: 0.97
PC11: 1.0
