# Introduction to Plotly
Before we start working with the dataset, I want to briefly introduce Plotly and why it’s useful for data visualization.

## What is Plotly?
Plotly is a data visualization library that allows users to create interactive charts in Python, R, and JavaScript. 

Unlike traditional plotting libraries, Plotly graphs are dynamic. Users can hover over data points, zoom into specific regions, filter categories, and interact with the visualization in real time. This makes Plotly particularly useful for exploring data and presenting insights clearly.

## Basic Plotly Charts and Interactions

Plotly supports many common chart types such as bar charts, line charts, scatter plots, histograms, and pie charts.

What makes Plotly powerful is its interactivity. Users can hover to see detailed values, zoom into sections of the chart, pan across the data, and toggle categories on and off. These features help users better understand patterns and relationships within the data.

## Plotly Graph Objects vs Plotly Express

Plotly provides two main interfaces for creating visualizations.

**Plotly Graph Objects** is the lower-level interface that gives users full control over every element of a chart. It is useful when building highly customized or complex visualizations but usually requires more code.

**Plotly Express** is the high-level interface designed for speed and simplicity. It allows users to create charts quickly with minimal code while still producing interactive visuals.

For this demonstration, we will focus on Plotly Express because it makes exploring and visualizing data much faster and easier.

## Moving to the Dataset

Now that we understand the basics of Plotly, we will work with a daily food delivery orders dataset and demonstrate how the same variables can be used to create multiple visualizations using Plotly Express.

## INSTALL IN TERMINAL CONDA ACTIVATE ENVIRONMENT
## CONDA INSTALL PLOTLY

In [65]:
## Loading data and libraries required for the project.

import plotly.express as px
import plotly as py
import json as js
import pandas as pd
import numpy as np

In [66]:
## Loading the required data files assigning them to variables called dfdo. 

dfdo = pd.read_csv("daily_food_delivery_orders.csv")

In [67]:
## Shape of the data to understand the number of rows and columns in the dataframe. 
## This will help us in understanding the size of the data and how much data we have to work with for our analysis and visualization. 
## Additionally, it can also help us in identifying any missing values or inconsistencies in the data which can be addressed before creating visualizations.

dfdo.shape

(2600, 10)

In [68]:
dfdo.head()

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed


In [69]:
## Data info helps us in understanding the data types of each column, the number of non-null values, and the memory usage of the dataframe. 
## This information is crucial for data cleaning and preprocessing steps, as it allows us to identify 
## any missing values or incorrect data types that may need to be addressed before creating visualizations.
## Additionally, understanding the data types can help us in selecting appropriate visualization techniques for each column, 
## such as using a bar chart for categorical data or a scatter plot for numerical data.

dfdo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2600 entries, 0 to 2599
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   order_id                 2600 non-null   int64  
 1   order_date               2600 non-null   object 
 2   customer_age             2600 non-null   int64  
 3   restaurant_type          2600 non-null   object 
 4   order_value              2600 non-null   float64
 5   delivery_distance_km     2600 non-null   float64
 6   delivery_time_minutes    2600 non-null   int64  
 7   payment_method           2600 non-null   object 
 8   delivery_partner_rating  2600 non-null   float64
 9   order_status             2600 non-null   object 
dtypes: float64(3), int64(3), object(4)
memory usage: 203.3+ KB


In [70]:
## Describing the data to get a better understanding of the data and its distribution.
## This will help us in identifying any outliers or skewness in the data which can be useful for our analysis and visualization.
## For example, if we find that the data is skewed, we can use a log transformation to make it more normal and easier to visualize.
## Additionally, describing the data can also help us in identifying any missing values or inconsistencies in the data which can be addressed before creating visualizations.


dfdo.describe()

Unnamed: 0,order_id,customer_age,order_value,delivery_distance_km,delivery_time_minutes,delivery_partner_rating
count,2600.0,2600.0,2600.0,2600.0,2600.0,2600.0
mean,1300.5,41.492308,670.293873,7.8868,51.748462,3.749577
std,750.699674,13.977196,300.767326,4.211332,21.98754,0.721153
min,1.0,18.0,150.9,0.5,15.0,2.5
25%,650.75,29.0,406.4025,4.2075,32.0,3.1
50%,1300.5,41.0,667.58,7.965,51.0,3.8
75%,1950.25,54.0,927.48,11.59,70.0,4.4
max,2600.0,65.0,1199.78,14.99,90.0,5.0


In [71]:
## Checking for any missing values in the dataframe to identify any potential issues with the data that may need to be addressed before creating visualizations.

dfdo.isnull().sum()

order_id                   0
order_date                 0
customer_age               0
restaurant_type            0
order_value                0
delivery_distance_km       0
delivery_time_minutes      0
payment_method             0
delivery_partner_rating    0
order_status               0
dtype: int64

In [72]:
## 1) Ensure order_date is datetime type (coerce errors to NaT instead of raising an error)
dfdo["order_date"] = pd.to_datetime(dfdo["order_date"], errors="coerce")

## 2) Create month features (name + number) for easier grouping and sorting later on
dfdo["month_num"] = dfdo["order_date"].dt.month
dfdo["month"] = dfdo["order_date"].dt.month_name()

## 3) Quick check: orders by month (to confirm month feature looks correct and to get a sense of the data distribution across months)
orders_by_month = dfdo["month"].value_counts()

## 4) Revenue by month and restaurant type (sorted Jan → Dec) to see how revenue trends evolve over time and across different restaurant types.
rev_month = (
    dfdo.groupby(["month_num", "month", "restaurant_type"], as_index=False)["order_value"]
        .sum()
        .sort_values("month_num")
)

In [73]:
## Putting the months in the correct order for visualization.

month_order = [
    "January","February","March","April","May","June",
    "July","August","September","October","November","December"
]

In [74]:
## 

dfdo["order_date"] = pd.to_datetime(dfdo["order_date"], errors="coerce")
dfdo["month_num"] = dfdo["order_date"].dt.month
dfdo["month"] = dfdo["order_date"].dt.month_name()

In [75]:
## Ensuring that the order_value, delivery_time_minutes, delivery_distance_km, 
## and delivery_partner_rating columns are of numeric data type to make it easier to perform calculations and visualizations on these columns.
## Sometimes CSV files can have these columns as strings due to formatting issues, 
## so converting them to numeric data type will help in performing any necessary calculations or visualizations on these columns.

dfdo["order_value"] = pd.to_numeric(dfdo["order_value"])
dfdo["delivery_time_minutes"] = pd.to_numeric(dfdo["delivery_time_minutes"])
dfdo["delivery_distance_km"] = pd.to_numeric(dfdo["delivery_distance_km"])
dfdo["delivery_partner_rating"] = pd.to_numeric(dfdo["delivery_partner_rating"])

## Category Review

Before creating our visualizations, we reviewed the categorical variables in the dataset such as **restaurant type**, **payment method**, and **order status**.

In many real-world datasets, these variables contain a large number of unique values, which often requires grouping categories together to keep visualizations readable. However, in this dataset the number of categories is already manageable and clearly defined.

Because of this, grouping is not necessary. Keeping the original categories preserves more detail and allows the visualizations to represent the data more accurately.


Now that the dataset has been cleaned and preprocessed, we will define consistent color palettes for our variables. Using a unified color scheme helps keep our visualizations clear, professional, and easy to interpret across all charts.

In [76]:
# Consistent theme for all visualizations to make them look cohesive and professional.

PLOT_TEMPLATE = "plotly_white"

# Restaurant type colors
restaurant_colors = {
    "Fast Food": "#E63946",
    "Cafe": "#F4A261",
    "Bakery": "#E9C46A",
    "Chinese": "#2A9D8F",
    "Italian": "#457B9D",
    "Indian": "#8D5A97"
}

# Payment method colors
payment_colors = {
    "Cash": "#264653",
    "Card": "#2A9D8F",
    "UPI": "#E9C46A",
    "Wallet": "#F4A261"
}

# Order status colors
status_colors = {
    "Delivered": "#2A9D8F",
    "Delayed": "#E9C46A",
    "Cancelled": "#E63946"
}

In [77]:
monthly = (
    dfdo.groupby(["month_num", "month"], as_index=False)
        .agg(
            orders=("order_value", "size"),
            revenue=("order_value", "sum")
        )
        .sort_values("month_num")
)

fig = px.line(
    monthly,
    x="month",
    y="revenue",
    markers=True,
    title="Monthly Revenue Trend",
    labels={"month": "Month", "revenue": "Revenue"},
    category_orders={"month": month_order}
)

fig.update_layout(template=PLOT_TEMPLATE)
fig.show(config={"scrollZoom": True})

In [78]:
rev_month = (
    dfdo.groupby(["month_num", "month", "restaurant_type"], as_index=False)["order_value"]
        .sum()
        .sort_values("month_num")
)

fig = px.bar(
    rev_month,
    x="restaurant_type",
    y="order_value",
    color="restaurant_type",
    color_discrete_map=restaurant_colors,
    animation_frame="month",
    title="Monthly Revenue by Restaurant Type",
    labels={"order_value": "Revenue", "restaurant_type": "Restaurant Type"},
    category_orders={"month": month_order}
)

fig.update_layout(template=PLOT_TEMPLATE)
fig.show(config={"scrollZoom": True})

In [79]:
dfdo["age_group"] = pd.cut(dfdo["customer_age"], bins=[0,18,25,35,45,60,100],
                         labels=["<18","18-25","26-35","36-45","46-60","60+"])

fig = px.histogram(
    dfdo, x="order_value", color="order_status",
    color_discrete_map=status_colors,
    facet_col="age_group", facet_col_wrap=3,
    nbins=30,
    title="Order Value Distribution by Age Group (Faceted) and Status",
    labels={"order_value":"Order Value"}
)
fig.update_layout(template=PLOT_TEMPLATE)
fig.show()