# <span style="color:#3306B0">Electrification of the freight trucking industry</span>
[Based on the [2020 M3 Challenge problem](https://m3challenge.siam.org/resources/archives/2020-year-at-a-glance/2020-problem-keep-on-trucking-u-s-big-rigs-turnover-from-diesel-to-electric/) for high school students, and its [winning solutions](https://m3challenge.siam.org/resources/archives/2020-year-at-a-glance/2020-winning-solutions/)]

<P>&nbsp;</P>

<DIV ALIGN="CENTER">
    <IMG SRC="./electric-trucks.png" width="60%"></IMG><BR></BR>
Image courtesy of <A HREF="https://m3challenge.siam.org/resources/archives/2020-year-at-a-glance/2020-problem-keep-on-trucking-u-s-big-rigs-turnover-from-diesel-to-electric/">M3 Challenge Contest</A>
</DIV>

Containerized shipment of freight and cargo are central to 
the everyday functioning of life in many parts of the world.  In the 
United States, large freight trucks (also known as tractor trailers 
or semi-trucks) comprise the backbone of the cargo transportation 
network, and they carry nearly everything that we buy, build or 
consume.  To put things in context, here are some statistics about 
the U.S. trucking industr:y

- There are an estimated 1.7 million semi-trucks in operation.
- Collectively, they travel about 150 billion miles annually, and they account for more than 12% of the fuel purchased.
- They are powered by diesel fuel, and their typical fuel efficiency is very poor -- the estimated average is about 6 to 7.3 miles per gallon.

There is growing interest in the U.S. trucking industry to explore 
new alternatives such as electric trucks.  Since 2022, fully electric 
semi-trucks have gradually started entering the market.  Many 
large freight operators have committed to buying a few hundred 
such trucks.  A key challenge, beyond the large-scale manufacture 
and production of such vehicles, is also developing the needed 
infrastructure, such as charging stations and other support or 
maintenance facilities.

In this lab, we consider the first part of the 
[2020 M3 Challenge problem](https://m3challenge.siam.org/resources/archives/2020-year-at-a-glance/2020-problem-keep-on-trucking-u-s-big-rigs-turnover-from-diesel-to-electric/), 
which is stated as follows: 

<div class="alert alert-block alert-info">
<span style="color:#033600">Assume that all necessary 
electric semi infrastructure is already in place so that 
companies could seamlessly transition to an all-electric fleet 
today. Create a mathematical model to predict what percentage of 
semis will be electric 5, 10, and 20 years from 2020. You may 
consider the current fleet of operational semi-trucks, annual 
new truck production rates and their estimated lifetime, the cost 
difference between diesel and electric semi-trucks (both 
purchase and operational costs), and/or any other factors you 
deem important.
</span>
<P></P>
<span style="color:#033600">
Various organizations and agencies collect data on a regular 
basis. A small amount of data has been compiled and provided. You 
are not required to use this data; that is, you may choose to 
use none, some, or all of this data and/or any additional data 
sources you may identify while working on this problem. Be sure 
to cite all resources used.
</span>   
</div>

- [Keep on Trucking Information Sheet](https://m3challenge.siam.org/wp-content/uploads/M3-Challenge-2020_PROBLEM-INFO-SHEET.pdf) – Contains terminology and definitions. It is highly recommended that teams review this document!
- [semi_production_and_use](https://m3challenge.siam.org/wp-content/uploads/semi_production_and_use-1.xlsx): This spreadsheet has two tabs. \
Production: this tab contains the number of trucks produced each year from 1999 to 2019. \
Usage_info: this tab contains information about how SH, RH, and LH trucks are used, on average.
- [corridor_data](https://m3challenge.siam.org/wp-content/uploads/corridor_data-1.xlsx): This spreadsheet has five tabs. \
Notes: this tab defines the data types you will see on each subsequent (e.g., AADTT). There are also notes about how the data was collected and how teams might consider using the data provided. \
Other tabs: this tab provides information on traffic along each corridor. 
- [battery_data](https://m3challenge.siam.org/wp-content/uploads/battery_data-1.xlsx): This spreadsheet has three tabs.  \
Basic_charging_info: this tab contains some basic charging guidelines. \
Charging_scenarios: this tab contains information that has been provided by electric truck production companies about how their batteries might be charged. \
Charging_capability: this tab provides information about different types of electric vehicle chargers, including costs. 
- [Electric Trucks - Where They Make Sense](https://nacfe.org/future-technology/electric-trucks/) (North American Council for Freight Efficiency). The guidance report mentioned at this site is extensive; we do not expect all information in the guidance report to be useful or for teams to review the report in its entirety. Some salient points in the report have been included in the information sheet mentioned above.

## <span style="color:#336630">Model development</span> 
[Based on selected winning papers (https://m3challenge.siam.org/resources/archives/2020-year-at-a-glance/2020-winning-solutions/)

In the spirit of best practices, the M3 Challenge Contest recommends 
addressing each of the following components in the course of model 
development: clearly define the problem; state any assumptions 
made; identify the variables used;  construct the math model; 
analyze and assess the solutions.

### 1. Defining the problem

The goal of our problem may be stated as follows:

> We want to create a model to predict the percentage of semi-trucks that will be electric in the United States 5, 10, and 20 years from 
2020.

### 2. Strategy and assumptions

The winning teams approached the solution to this 
problem in completely different ways.  We will follow the 
strategy used by Team \#13343, which is based on the use of 
Markov chains and estimating the probability of replacement 
of a diesel truck with an electric truck in any given year.  Broadly, 
the strategy consists of the following steps

1. Divide the population of semi-trucks into 3 groups: Short haul (SH), 
regional haul (RH), and long haul (LH), depending on a truck's 
typical range of operations.  This makes sense because truck models 
and their prices vary significantly based on the operating range 
for which they are designed.

2. For each group, the unknown quantity of interest will be 
taken to be the proportion of electric trucks in the fleet at any 
given time.  We want our model to estimate (or predict) this quantity 
for future years.  In Markov chain terminology, our unknown is 
represented by a *state vector* consisting of two 
components: the proportion of electric trucks, and its complement 
(i.e., the proportion of diesel trucks).

3. For each group of trucks, we will construct a $2\times 2$ transition 
matrix that represents the probability of conversion from 
one state (diesel) to the other (electric).

4. Future states are predicted by multiplying the transition 
matrix with the current value of the state vector.  For instance, 
if $T$ is the transition matrix and $s_k$  is the state vector for 
year $k$, then for year $k+1$ we compute: $s_{k+1} = T s_k$.


In Markov chain methods, the transition matrix plays a critical 
role in shaping the model's outcomes.  Constructing this 
matrix carefully and reliably is a central goal of the modeling 
effort.  In our problem, the 4 terms in this matrix represent the 
probability of the 4 possible conversions happening in the 
course of a given year: diesel to diesel ($p_{dd}$), 
diesel to electric ($p_{de}$), electric to diesel ($p_{ed}$), 
electric to electric ($p_{ee}$).  The 
sketch below illustrates the situation and will help in 
constructing the matrix

<DIV ALIGN="CENTER">
<IMG SRC="./state_diagram.png"></IMG>
</DIV>

Notice that $p_{dd}+p_{de}$ must equal 1, and likewise for 
$p_{ee}+p_{ed}$.  Estimating the values of these 4 probabilities 
comprises the centerpiece of the modeling effort, and we will 
do it separately for each of the 3 truck groups.  The value of $p_{de}$ 
(the probability a diesel truck is replaced by electric) is 
particularly important, and Team 13343 hypothesized it is 
proportional to the **cost difference** between the two types of 
trucks.  In fact, here are the key assumptions they made, which are 
nicely summarized in their paper

 1. The total cost difference (buying+operating) between a diesel 
 and electric truck is proportional to the probability of a 
 diesel truck being replaced by an electric one.
 2. Both types of trucks have approximately the same life expectancy 
 (around 12 years), after which they become inoperable.
 3. Each semi-truck is replaced when it becomes inoperable.
 5. About 1% of electric trucks will be replaced by diesel trucks.
 3. The market demand for new semi-trucks requires manufacturing 
 at least 210,466 total trucks per year.
 4. There are/were no electric semi-trucks in operation in 2020.

While some of these assumptions may be hard to justify in a rigorous 
way (e.g., 2 and 4), they offer a reasonable starting point for 
a process of iterative model development.  Thus, we will use them 
to build a first model.
 
 
 ### 3. The model and its variables

Our model will consist of a $2\times 2$ transition matrix, 
together with an initial value of the state vector, for each of 
the 3 truck groups: SH, RH, and LH.  The form of the matrix is 
$$
  \mathbf{T} = \begin{bmatrix}
    p_{dd}  &  p_{ed} \\
    p_{de}  &  p_{ee}
  \end{bmatrix}
$$

The entries in the matrix correspond to the probabilities shown 
in the state diagram earlier.  From the assumptions listed 
previously, it follows that 
$p_{ed} = 0.01$ and $p_{ee}=0.99$.  Thus, what remains to be found 
is $p_{de}$, the probability of a diesel truck being replaced by an 
electric truck.  Once we find this, we get $p_{dd}=1-p_{de}$.  A major 
part of the modeling 
effort for this problem is based on finding and synthesizing 
relevant data.  The key 
information we need includes:

- Typical cost of buying a new diesel semi (separate for short haul and long haul).
- Typical cost of buying a new electric semi.
- Lifetime operating cost for each type of truck, on each type of 
route.  
Typically: `(cost per mile)` $\times$ `(number of miles per year)` $\times$ `(years in lifetime)`
- Number of semis that operate on each of the 3 types of routes.

Much of the needed data is available in the form of average 
quantities from the references and spreadsheets provided by the 
contest (see the links given above).  For instance, the average 
number of miles driven per year given in the "Truck Usage Data" 
reference is: 42,640 (SH), 70,000 (RH), and 118,820 (LH).  Team 13343 
estimated the average cost per mile to be $\$1.51$ and $\$1.26$ 
for diesel and electric trucks, respectively.  They compiled 
the following table to summarize all the essential data


<DIV ALIGN="CENTER">
<IMG SRC="./semitrucks_costs_table.png" width="80%"></IMG>
</DIV>

The difference in total cost between a diesel and electric truck 
on each type of route is then

>Short Haul:    $\$852,636.80 - \$794,716.80 = \$57,920$ \
Regional Haul:   $\$1,348,400 - \$1,208,400 = \$140,000$ \
Long Haul:    $\$2,278,018.40 - \$1,976,558.40 = \$301,460$

One of our key assumptions hypothesized that the probability of 
a diesel truck being replaced by an electric truck is proportional 
to this cost difference.  However, there is no simple way to 
translate cost differences into quantifiable probability 
values.  Team 13343 decided, somewhat arbitrarily, to assign 
$p_{de}$ the values 0.2, 0.4 and 0.6, corresponding to the respective 
cost differences for SH, RH, and LH vehicles.  This results in 
the following transition matrices for the 3 route types:

$$
  \mathbf{T}_{SH} = \begin{bmatrix}
    0.8  &  0.01 \\
    0.2  &  0.99
  \end{bmatrix}
$$

$$
  \mathbf{T}_{RH} = \begin{bmatrix}
    0.6  &  0.01 \\
    0.4  &  0.99
  \end{bmatrix}
$$

$$
  \mathbf{T}_{LH} = \begin{bmatrix}
    0.4  &  0.01 \\
    0.6  &  0.99
  \end{bmatrix}
$$

To illustrate how the Markov chain prediction process works, 
let us predict the number of electric trucks on SH routes 5 years 
from the start (i.e., year 2020).  Since we have assumed there 
are no electric trucks in operation in 2020, the initial 
distribution of diesel to electric is 1:0, which corresponds to 
the state vector $\mathbf{v}_0 = [1, 0]^T$.  To predict 
the distribution in 2021, we compute 
$$
  \mathbf{v}_1 = 
  \mathbf{T}_{LH} \times \mathbf{v}_0 = 
    \begin{bmatrix}
    0.8  &  0.01 \\
    0.2  &  0.99
    \end{bmatrix} \times
    \begin{bmatrix}
      1  \\ 0
    \end{bmatrix}  =  
    \begin{bmatrix}
      0.8  \\ 0.2
    \end{bmatrix}
$$

The interpretation is: 80\% of all 
**new** SH trucks introduced between 2020 and 2021 were diesel, and 
the rest were electric.  Continuing in this manner, we can get 
predictions for each successive year till 2025.  The resulting 
value of the state vector is 
$\mathbf{v}_5 \approx [0.34, 0.66]^T$.  Thus, in the 5th year, only 
34% of new SH trucks purchased were diesel, while 66\% were 
electric.  However, let's return to the problem statement and check what we 
really want: 
><span style="color:#630600">Create a model 
to predict the percentage of semi-trucks that will be electric in the 
United States 5, 10, and 20 years from 2020.</span>

Does the prediction from our model so far address the goal described 
here?  For example, to address the goal for the year 2025, we will need to 
determine how many of the total population of all trucks 
operating in 2025 are electric.  This requires more detailed 
data on the number of trucks that were in operation each 
year, and how many of them were replaced by new trucks.  Then 
the Markov model can be used to estimate how many new electric 
trucks are added to the population each year.  Though the 
calculations are straightforward, the process and data 
requirements are cumbersome.  We will, therefore, confine this 
lab to computing just the Markov distributions of diesel vs 
electric trucks for each group.


The Python code below implements the Markov chain separately for 
each truck group: SH, RH, LH.  Each simulation starts with the 
same initial value of the state vector, $[1, 0]^T$, and 
carries out 5 steps, producing results for the year 2025.  The 
code was largely 
produced by 
<A HREF="https://chat.openai.com" target="_blank">ChatGPT</A> 
based on the following prompt:

<PRE><span style="color:#630600">
Help me write a Python code to implement a simple Markov chain model for 3 different situations involving the following transition matrices

$$
  \mathbf{T}_{SH} = \begin{bmatrix}
    0.8  &  0.01 \\
    0.2  &  0.99
  \end{bmatrix}
$$

$$
  \mathbf{T}_{RH} = \begin{bmatrix}
    0.6  &  0.01 \\
    0.4  &  0.99
  \end{bmatrix}
$$

$$
  \mathbf{T}_{LH} = \begin{bmatrix}
    0.4  &  0.01 \\
    0.6  &  0.99
  \end{bmatrix}
$$

In each case, I want the same initial value of the state vector, $\mathbf{v}_0 = [1, 0]^T$, and I want to compute results after applying 5 steps of the Markov chain.
</span>
</PRE>

In [2]:
# Load the needed libraries & assign nicknames
import numpy as np

# Define the transition matrices
T_SH = np.array([
    [0.8, 0.01],
    [0.2, 0.99]
])

T_RH = np.array([
    [0.6, 0.01],
    [0.4, 0.99]
])

T_LH = np.array([
    [0.4, 0.01],
    [0.6, 0.99]
])

# Define the initial state vector
v0 = np.array([1, 0])

# Define the number of steps
num_steps = 5

# Function to apply Markov chain
def apply_markov_chain(T, v0, num_steps):
    v = v0
    print("Initial state:", v)
    for step in range(1, num_steps + 1):
        v = np.dot(T, v)
        print(f"State after step {step}:", v)
    return v

# Apply Markov chain for each transition matrix
print("Transition Matrix T_SH:")
final_state_SH = apply_markov_chain(T_SH, v0, num_steps)
print("\nTransition Matrix T_RH:")
final_state_RH = apply_markov_chain(T_RH, v0, num_steps)
print("\nTransition Matrix T_LH:")
final_state_LH = apply_markov_chain(T_LH, v0, num_steps)

Transition Matrix T_SH:
Initial state: [1 0]
State after step 1: [0.8 0.2]
State after step 2: [0.642 0.358]
State after step 3: [0.51718 0.48282]
State after step 4: [0.4185722 0.5814278]
State after step 5: [0.34067204 0.65932796]

Transition Matrix T_RH:
Initial state: [1 0]
State after step 1: [0.6 0.4]
State after step 2: [0.364 0.636]
State after step 3: [0.22476 0.77524]
State after step 4: [0.1426084 0.8573916]
State after step 5: [0.09413896 0.90586104]

Transition Matrix T_LH:
Initial state: [1 0]
State after step 1: [0.4 0.6]
State after step 2: [0.166 0.834]
State after step 3: [0.07474 0.92526]
State after step 4: [0.0391486 0.9608514]
State after step 5: [0.02526795 0.97473205]


### <span style="color:#336630">4. Results and model assessment</span>

The results from our model predict the proportion of new 
diesel vs electric trucks produced/purchased each year for the 
3 truck groups.  (For our purpose produced = purchased, to keep 
things simple.) After 5 years, in year 2025, about 34% of new Short 
Haul trucks produced are predicted to be diesel, and 66% 
electric.  Data from 2019 show that about $10,944$ new Short Haul 
trucks were produced.  If the number in 2025 can be considered 
similar, it would imply $0.66\times 10,944 = 7,223$ of the new SH 
trucks produced will be electric.

For RH trucks, 2019 data shows $98,498$ new trucks produced.  Taking 
the 2025 figure to be the same, our model predicts 
$0.094 \times 98,498 \approx 9,259$ diesel and $98,498-9,259=89,239$ 
electric trucks.  And, for LH trucks, again using the 2019 production 
figure of $101,024$ new trucks, in 2025 the model predicts 
$0.0253\times 101,024 = 2,556$ diesel and $101,024-2,556=98,468$ electric.


Putting together all these figures, for 2025 we get:
> Total number of new trucks produced = $10,944 + 98,498 + 101,024 = 210,466$ \
Number of these that are electric = $7,223 + 89,239 + 98,468 = 194,930$ \
Proportion of new trucks that are electric = $\frac{210,466}{194,930} = 0.926$

We can do similar calculations for the 10-year and 20-year time 
periods.


However, this leads to the important question: Do we trust this model 
and its predictions?  Is there any evidence to suggests these results 
are reasonable?  In general, there exist certain standard procedures 
to assess the reliability of Markov chain models.  Many of these 
rely on back-testing and/or other strategies that compare the model's 
predictions with observed data.  However, the challenge 
with the present application is that we're trying to forecast the 
performance and characteristics of a brand new market, for which 
data availability is virtually non-existent.  Thus, our model is 
largely based on conjectures and assumptions that are not backed 
by quantifiable evidence.  Given these facts, it seems inadvisable 
to put too much faith in this model's predictions.  On the positive 
side, this model could serve as a good first-approximation that can 
be fine-tuned to produce a more reliable model later, when 
more data becomes available.

In [None]:
# Temporary stuff -- delete later

drem = 0.8*3550 + 0.642*2301 + 0.51718*3780 + 0.4185722*7424 + 0.34067204*7436
tot = 3550 + 2301 + 3780 + 7424 + 7436
tot_rh = 31955 + 20708 + 34018 + 66814 + 66924
tot_lh = 57930 + 46715 + 44372 + 71325 + 80558
print("drem=", drem, ", elec=", tot-drem, ", tot=", tot)
print("drem_sh=", tot*0.34067204, ", elec_sh=", tot*(1-0.34067204), ", tot=", tot)
print("drem_rh=", tot_rh*0.09413896, ", elec_rh=", tot_rh*(1-0.09413896), ", tot_rh=", tot_rh)
print("drem_lh=", tot_lh*0.02526795, ", elec_lh=", tot_lh*(1-0.02526795), ", tot_lh=", tot_lh)