Prices, systems of pricing and technologies of prices vary among transit systems as much as stop spacings and other design features we've looked at.  On the New York City subway, the same low fare takes you from Coney Island to Harlem or just a few blocks, whereas BART charges a unique (and high) price for almost every OD pair; and The London Underground has something in-between: a series of zones inside of which the fare is uniform.

Even though pricing varies so much, most people think of pricing as an afterthought. It's almost as though the built is designed with no cost in mind, and then when the fun part is over we turn to pricing as some way to pay for it. But actually pricing is an essential aspect of the transit system, and it merits consideration from the beginning. It affects how many people are willing to ride, and where and at what time they ride, as much as the speed and headways. 
$$
\newcommand{\pder}[2]{\frac{\partial#1}{\partial#2}}
$$

# The problem: coming up with money to cover average costs

One thing transit opponents like to point out is that transit is subsidized. The idea is that if customers aren't willing to cover the cost of the service then it's not efficient for the service to be provided---at least not in the amount the government provides. But this isn't necessarily true. In reality, since transit has large economies of scale, its pricing presents two problems: solvency and efficiency.

## Solvency

It might be the case that there is no single price you can charge that will let a transit service break even. But this doesn't necessarily mean that the service shouldn't be provided. Consider two examples:

<img src="./img/prices-worthwhile.svg" width="40%"/>

Here, the demand curve $p(q)$ never intersects the average cost curve $ac(q)$. That means there is no price we can charge that will be as high as the average cost per rider at that price. In the figure, I've chosen the price that makes the most money possible, and it's still not high enough to cover costs. But it's still socially beneficial to operate the service with a small subsidy. To see why, note that at price $P$, the sum of consumer surplus and total revenue is given by the trapezoid $ABCO$. This area is bigger than total cost at price $P$, which is equal to the area of the rectangle $DECO$. Therefore, the service is socially efficient to provide (its benefits outweigh its costs for certain prices), but to provide it might require some kind of subsidy.

Just so you don't get the wrong idea, here is an example of a service whose demand is so low, relative to costs, that it is never worthwhile to provide. Even when price equals marginal cost, the service doesn't satisfy consumers enough to cover its costs. An example would be a bad movie: once it's created, the cost of someone streaming a bad movie online is roughly zero (unless you count the psychic damage to the viewer). But that doesn't mean the government should subsidize bad movies and offer them free online.

<img src="./img/prices-not-worthwhile.svg" width="45%"/>


## Efficiency

The solvency argument is pretty cut-and-dry: you know the service should exist, but it's impossible for the service to pay for itself (at least without some fancy pricing techniques). Solvency is a "to be or not to be" type problem, a dichotomy. But less clearly, it might be that by charging users a price that breaks even, you're sacrificing economic welfare and ridership unecessarily. Whenever price is above average cost, there are foregone opportunities to service riders.

<img src="./img/prices-efficiency.svg" width="50%"/>

# Strategies to come up with extra money

One way to come up with extra money is taxes. We raise taxes on society at large and use them to invest in things with low marginal costs. Examples include uncongested roads and parks. Especially in a society like ours where a small fraction of people make enormous amounts of money or hold incredible sums of wealth, this isn't too bad of a way to go. And in the US, as it happens, transit is extremely heavily subsidized. But taxes have downsides: they encourage people to change their behavior to avoid taxes and add to costs in other industries. More practically, transportation engineers cannot magically raise taxes whenever they want. So, in the end, you have to work with a budget. What we are going to learn are different techniques to raise more revenues from your customers by pricing intelligently.

One way to come up with money in an efficient manner is to charge different prices to different people. This is called *price discrimination*. 

There are three types of price discrimination as applied to transit.

1. First-degree. Selling at different prices to different riders. Uber does this but it would be hard for a transit agency to do it.
2. Second-degree.  Bulk discounts and travel passes.
3. Third-degree. Charging different prices on different routes and times-of-day. Basically, dividing customers into groups, each with its own demand curve.


## Ramsey pricing

Ramsey pricing is a type of third-degree price discrimination in which we charge different prices to different customer groups. For example, we might charge passengers going in one direction more than passengers going in the other direction. In this section we'll look at a model of a transit provider that provides service on two routes: $1$ and $2$. The goal is to choose $p_1$ and $p_2$ so as to maximize total social surplus.

$$
\newcommand{\L}{\mathcal{L}}
$$

Our measure of total social surplus is

$$
TSS = CS + TR - TC.
$$

Now suppose that the agency has "net fixed costs" of $\phi$. This could be a negative number if we get a subsidy that exceeds our capital costs, in which case the operating part of our service can lose a little bit of money. Or it could be a positive number if the subsidy isn't enough to cover our capital costs, in which case the operating part of our service needs to make a profit so as to cover the fixed costs. In any case, the budget constraint can be written $TR - TC = \phi$. Combining the objective $TSS$ and the constraint $TR-TC=\phi$ gives the expression we want to optimize:

$$
\L = CS + TR - TC + \lambda \cdot (TR - TC-\phi).
$$

Now we differentiate with respect to $p_1$ to get
$$
\pder{\L}{p_1} = -q_1 + (1+\lambda)\left[ \left(q_1+ \pder{q_1}{p_1}p_1  \right) - mc_1\frac{\partial q_1}{\partial p_1} \right] = 0.
$$

Killing off the extra $q_1$ gives

$$
\pder{\L}{p_1} = q_1\lambda + (1+\lambda)\cdot \frac{\partial q_1}{\partial p_1} \cdot \bigg( p_1   - mc_1 \bigg) = 0
$$

Rearranging yields...

$$
\pder{q_1}{p_1}\bigg(p_1-mc_1\bigg) = -\frac{\lambda}{1+\lambda} q_1.
$$

This isn't too informative, though. So divide both sides by $q_1$, then use the definition of own-price elasticity (the negative proportionate change in demand from a given change in price)

$$
\varepsilon_i = -\pder{q_i/q_i}{p_i/p_i}
$$
(don't forget the negative) which gives

$$
\frac{p_1-mc_1}{p_1} = \frac{\lambda}{1+\lambda} \cdot \frac{1}{\varepsilon_1}.
$$

This is a little more informative. The left-hand side is an index of how much we're marking up price over marginal cost; it's called the [The Lerner Index](https://en.wikipedia.org/wiki/Lerner_index). As for the right-hand side, the term $\lambda/(1+\lambda)$ is a constant for every route $i$. So if we compare the markup for two different routes $i$ and $j$ we have

$$
\frac{(p_i - mc_i)/p_i}{(p_j-mc_j)/p_j} = \frac{\varepsilon_j}{\varepsilon_i}.
$$

So the take-home lesson is: **The *less* elastic demand is, the higher the markup.** Or, conversely, the more elastic demand is, the lower the markup. The basic idea is that you want to gouge the riders who have the lowest propensity to stop riding.

There is also another informative way to write same equation. Let $\Delta q_1$ be the shift in demand from marginal cost levels (the reduction in quantity sold caused by having to meet the budget constraint). If we assume the derivative of the demand curve doesn't change too much (which is true for small changes), it's true that $\Delta q_1 \approx \pder{q_1}{p_1}\left(p_1 - mc_1\right)$. So we can also write our rule like

$$
\Delta q_1 \approx -\frac{\lambda}{1+\lambda} q_1
$$

or, dividing by $q_1$, giving

$$
\frac{\Delta q_1}{q_1} \approx -\frac{\lambda}{1+\lambda}.
$$

What this says is: **ridership on every route is reduced by the same proportion** (relative to the level it would have with marginal cost pricing.)

## Travel passes

Ramsey pricing is better than trying to charge everyone their average cost, but it turns out we can do even better through a technique used by transit operators every day: *discounts for frequent riders implemented via travel passes*. This practice counts as [third-degree price discrimination](https://en.wikipedia.org/wiki/Price_discrimination#Third_degree). One way to do so is to sell passes, like the unlimited-rides monthly pass from AC Transit.  Another way is memberships or discount cards: with the "Railcard" in the UK, you pay $70$ pounds and then get a $1/3$ discount off individual rides. However it's implemented, a rider pays an upfront charge to get cheaper individual rides. The basic economic idea is that the upfront charge (the cost of the travel pass) pays for the fixed cost of service, while the discounted fare pays for the marginal cost.

<div style="display: flex; padding:10px;">
<div><img src="./img/monthpass.png"/></div>
<div><img src="./img/railcard.png"/></div>
</div>



### Model 1: Travel cards w/ one type of rider

You're running a transit service with fixed cost $f_0$ and a marginal cost $c$. There are $N$ identical riders each with his own demand curve $p_i(q_i)$ giving his personal demand curve. Suppose that the individual rider's consumer surplus, evaluated when price $p=c$, is $S_i$. Suppose furthermore that $S_i>f_0/N$. In this case, the value to riders of the service, when priced at marginal cost, outweighs its fixed cost. Therefore, if we sell a travel card for $f_0/N$ that lets riders buy rides at price $c$, then riders will take us up on the deal. So we will be able to price at marginal cost, and still cover all the fixed costs. The efficiency lost from average cost pricing is eliminated.

<img width="40%" src="./img/two-part-universal.svg"/>


### Model 2: Travel cards w/ two types of rider

In this model, there are two types of rider, a high-demand ($H$) type and a low-demand ($L$) type, with $p_H(q)>p_L(q)$. Their consumer surpluses, when price equals marginal cost, are $S_H$ and $S_L$. Their population sizes are $N_H$ and $N_L$. 

Now, if things were simple we could sell a travel card for the average fixed cost per rider, $f_0/(N_H+N_L)$--- like we did in model 1. That would cover the fixed cost. But suppose that $S_L<f_0/(N_H+N_L)$. The low-demand consumers won't buy the travel card at that price, because their total surplus from having $P=c$ isn't high enough to justify it.

<img src="./img/two-part-dual.svg" width="40%"/>

In this situation, we can do better by offering *two* travel cards: 

1. Card 1: travel card cost $A_1$ with price $p_1$
2. Card 2: travel card cost $A_2$ with price $p_2$

where $A_2>A_1$ and $p_2<p_1$. One way to compare the cards is to plot total "expenditure", $T_i$, with each card as a function of the number of rides taken.

\begin{align}
T_1(q) = A_1 + p_1 q\\
T_2(q) = A_2+p_2 q
\end{align}

<img src="./img/expenditure.svg" width="35%"/>

If you're only doing a few rides, you're better off with Card 1 (lower $T$ for low $q$). But if you're riding often you'll want Card 2. Thus, if you choose your prices correctly the two groups will sort themselves out. The high-demand group will choose Card 2 and pay a price very close to marginal cost, while the low-demand group pays a higher price. Choosing the optimal prices is a mathematical problem we won't get into.

# Peak-load pricing

Another problem you'll face is that the same capacity can be used at different types of day, when demands are different. How do you choose how much capacity to supply? How do you price the peak and off-peak?

Let $p_H(q)$ and $p_L(q)<p_H(q)$ be the peak and off-peak demand (high and low). Let $\beta$ be the cost of capacity, and $c$ be the marginal operating cost of supplying a ride. So $\beta$ might be the cost of the track, and $c$ would involve the costs of running a train. Let $K$ be the capacity you've supplied.


Your job is to choose the optimal capacity to supply. Once the capacity is supplied you have to choose prices to ration demand to that capacity, and to pay for that capacity.

There are basically two cases. The first case is where the peak demand is much higher than the off-peak demand and capacity isn't too expensive. Therefore, even if you make peak users pay for all the cost of capacity, off-peak users won't use all the capacity when they face price equal to marginal cost. This situation is shown below. If you charge $p_H = \beta + c$, then the optimal capacity is given by the point where $p_H(K^*)=\beta + c$. And even if you only charge $p_L = c$, off-peak users demand less than $K*$ rides. This is called the "firm peak" case because the peak is still the peak before and after pricing.

<img src="./img/firm-peak.svg" width="40%"/>

The second case is where the peak and off-peak demands are pretty similar. In this case, if you charge the off-peak users only for their marginal costs, then off-peak demand will exceed capacity. An example is shown below. The way to find the optimal capacity in this case is to form an aggregated demand curve $p_{L+H}(q)=p_L(q)+p_H(q)$ and find where it intersects the line $\beta + 2c$. Then you choose the price for each group that rations demand to $K^*$. For both groups, it will be greater than marginal cost, so both the peak and off-peak will contribute to capacity costs. We call this the shifting peak case because if you only charge the off-peak its marginal costs the off-peak will have more demand than the peak.

<img src="./img/shifting-peak.svg" width="40%"/>
