# Introduction - Mcdonald vs Burger king
**Kaggle Link -** [Mcdonald vs Burger king - Kaggle](https://www.kaggle.com/datasets/harishthakur995/mcdonald-vs-burger-king)

![Image](https://storage.googleapis.com/kaggle-datasets-images/7688924/12205840/d700ceda9b4053c473d75f12ec83ac32/dataset-cover.png?t=2025-06-18-11-23-44)

A Financial & Operational Showdown Between Burger King and McDonald’s (2021–2023)

## About Dataset
McDonald’s and Burger King are two of the biggest names in the global fast-food industry, each with its own loyal fan base. McDonald’s, known for its iconic Big Mac, fries, and efficient service, emphasizes consistency and speed. Burger King, on the other hand, prides itself on flame-grilled burgers like the Whopper and a “have it your way” customization approach. While McDonald’s leads in global reach and revenue, Burger King often focuses on bold marketing and innovation to stand out. Both offer value menus and breakfast options, but differ in taste, menu variety, pricing, and brand experience.

This project compares the financial performance and business strategies of Burger King and McDonald’s between 2021 and 2023.

# Import data

## Data: Burger King - Global Sales Performance (2021 - 2023)
**Description:** This CSV file contains aggregated annual performance metrics for Burger King from 2021 to 2023, segmented by metric type and region scope (Global/U.S.).

In [0]:
df_bk = spark.read.format("csv").option("header", "true").load("/Volumes/workspace/my_data/performance_analysis/bk.csv")
display(df_bk)

## Data: McDonalds - Global Sales Performance (2021 - 2023)
**Description:** This CSV file contains aggregated annual performance metrics for McDonalds from 2021 to 2023, segmented by metric type and region scope (Global/U.S.).

In [0]:
df_mcd = spark.read.format("csv").option("header", "true").load("/Volumes/workspace/my_data/performance_analysis/mcd.csv")
display(df_mcd)

# Explore data

## Burger King

Distinct items:

In [0]:
display(df_bk.select("item").distinct())

Distinct categories:

In [0]:
display(df_bk.select("global_us_usc").distinct())

## McDonalds

Distinct table_name:

In [0]:
display(df_mcd.select("table_name").distinct())

Distinct heading:

In [0]:
display(df_mcd.select("heading").distinct())

In [0]:
display(df_mcd.select("item").distinct())

# Prediction

## McDonalds

Encoding the categorical variables

In [0]:
from pyspark.sql.functions import to_date, col
df_mcd = df_mcd.\
  withColumn("Date", to_date(df_mcd.Date, "MM/dd/yyyy")).\
  withColumn("value", df_mcd["value"].cast("double"))

from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType

# 1. Collect distinct categories to build mappings (if not already known)
table_name_map = {row.table_name: idx for idx, row in enumerate(df_mcd.select("table_name").distinct().collect())}
heading_map = {row.heading: idx for idx, row in enumerate(df_mcd.select("heading").distinct().collect())}

# 2. Define UDFs for each column using the created maps
@udf(IntegerType())
def encode_table_name(category):
    return table_name_map.get(category, -1) # -1 for unknown/unmapped

@udf(IntegerType())
def encode_heading(category):
    return heading_map.get(category, -1)

# 3. Apply UDFs using withColumn
df_mcd = df_mcd.withColumn("table_name_encoded", encode_table_name(col("table_name")))
df_mcd = df_mcd.withColumn("heading_encoded", encode_heading(col("heading")))

display(df_mcd)

In [0]:
from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml.regression import DecisionTreeRegressor
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml import Pipeline

# Assemble feature columns
assembler = VectorAssembler(
    inputCols=["table_name_encoded", "heading_encoded", "value"],
    outputCol="features"
)
assembled_df = assembler.transform(df_mcd)

# Split the data into training and test sets
train_data, test_data = assembled_df.randomSplit([0.8, 0.2], seed=42)

# Create DecisionTreeRegressor model
dt = DecisionTreeRegressor(featuresCol="features", labelCol="item", maxDepth=5)

# Fit the model
model = dt.fit(train_data)

# Make predictions
predictions = model.transform(test_data)

# Evaluate the model
evaluator = RegressionEvaluator(
    labelCol="item",
    predictionCol="prediction",
    metricName="rmse"
)
rmse = evaluator.evaluate(predictions)

rmse