# Econ 141  - Spring 2019

## Bryan Graham
### GSI Kevin Dano

## Problem Set 1 -  Due February 7th, 2020

Problem sets are due at 5PM in the GSIs mailbox. You may work in groups,
but each student should turn in their own write-up (including a narrated/commented
and executed Jupyter Notebook). Please use markdown boxes within your
Jupyter notebook for narrative answers to the questions below.

# 1  The distribution of total factor productivity

The file `semiconductor_firms.ou` contains several thousand firm-by-year observations for a sample of publicly traded semiconductor firms (NAICS 4-digit code 3344) drawn from the S&P Capital IQ - Compustat database. The following firm attributes, measured from 1998 to 2014 inclusive, are included:

`gvkey` - Compustat firm identification code

`conm` - firm name

`year` - calendar year

`Y` - total real sales by the firm (in millions of 2009 US\$)

`K` - capital stock (in millions of 2009 US\$)

`L` - employees (in thousands)

`M` - materials expenditures (in millions of 2009 US\$)

`VA` - total real valued added by the firm (in millions of 2009 US\$)

`w` - annual wage rate (in 2009 US\$)

`i` - real investment (in millions of 2009 US\$)

`naics_4digits` - NAICS four digit sector code for the firm

In this problem set you will use this dataset to study the distribution of productivity across firms. A nice introduction to economic research in this area is provided by Syverson (2011).

## Preparing and exploring the dataset

1. How many firm-year observations are in the dataset?

In [1]:
import pandas as pd

scf = pd.read_csv("https://raw.githubusercontent.com/bryangraham/Ec_141/master/Spring2020/Problem_Sets/semiconductor_firms.out",sep='\t', encoding='utf-8')
scf.set_index(['gvkey','year'], drop=False) #Set gvkey & year as indices
scf.head()

Unnamed: 0,gvkey,year,gvkey.1,conm,year.1,Y,K,L,M,VA,w,i,naics_4digit
0,1056,1999,1056,AEROFLEX INC,1999,92.727756,51.519396,1.1,50.190881,42.536875,23840.047219,10.488479,3344
1,1056,2000,1056,AEROFLEX INC,2000,122.685671,106.644832,1.18,68.554502,54.131169,28345.375961,8.175873,3344
2,1056,2001,1056,AEROFLEX INC,2001,172.101069,111.645297,1.25,99.452777,72.648292,31764.418883,15.032798,3344
3,1056,2002,1056,AEROFLEX INC,2002,156.510389,128.098419,2.03,72.158871,84.351518,34059.627713,6.828937,3344
4,1056,2003,1056,AEROFLEX INC,2003,237.229156,146.030959,1.86,143.830432,93.398724,37331.598845,7.077533,3344


2. How many distinct firms are in the dataset?

3. In 2014 what was the aggregate total sales across all semiconductor firms in the dataset?

4. How many employees did these firms employ in total?

5. In 2014 compute the average, standard deviation and 5th, 25th, 50th, 75th and 95th percentiles of total sales, capital stock, employees, materials, and investment across all firms in your dataset. Display this information in an easy-to-read table.

6. Write a few sentences summarizing your dataset.

*Answer here*:

## Profit maximization

Assume that in each period t output, $Y_t$, is produced using capital, $K_t$, labor, $L_t$, and materials, $M_t$, according to the production technology

$$ 
    Y_t = A_t K_t^{1-\alpha-\beta}L_{t}^{\alpha}M_t^{\beta} 
$$

where $A_t$ is a factor neutral shifter or *total factor productivity*. Let $R_t$ and $W_t$ be the prevailing rental rate for capital and wage rate for labor. Assume that firms maximize profits – taking their output price, $P_t$, as fixed (i.e., perfect competition):

$$
    max_{k_t,l_t,m_t} PA_{t}k_{t}^{1-\alpha-\beta}l_{t}^{\alpha}m_{t}^{\beta}-R_{t}k_{t}-W_{t}l_t-m_t.
$$

Let $K_t$, $L_t$ and $M_t$ denote the profit-maximizing input choices made by the firm. Show that the firm’s first order conditions for labor and materials imply that

$$ 
    \alpha = \frac{W_{t}L_t}{Y_t}, \beta = \frac{M_t}{Y_t} 
$$

*Answer here*:

## Measuring productivity

Note that `Y`in the dataset corresponds to $P_{t}Y_t$ in the theoretical model; our theory is about physical units of output, $Y_{t}$, but what we observe in the financial filings of firms is generally total sales, $P_{t}Y_{t}$. It is this latter quantity which is recorded as `Y` in the dataset. This discrepancy between data and theory has numerous implications which we will gloss over for the time being, but return to later in the course.

1. Construct a measure of the firm’s wage bill each period, $W_tL_t$, using the formula 

$$
    \text{wage bill} = \frac{(L \times 1000)\times w}{1,000,000}
$$ 

Explain the reasoning underlying this formula.

*Answer here*:

2. Let $i$ index firms and $t$ years. Consider the following estimate of firm $i$’s elasticity of output with respect to labor:

$$ 
    \widehat{\alpha}_{i} = \frac{1}{T_i} \sum_t \frac{\text{wage bill}_{it}}{Y_{it}} 
$$

where the summation is over all years firm $i$ is in the dataset and $T_i$ denotes the total number of years firm $i$  is in the dataset. Similarly estimate firm $i$’s elasticity of output with respect to materials as:

$$
    \widehat{\beta}_{i} = \frac{1}{T_i} \sum_t \frac{M_{it}}{Y_{it}}
$$

Explain the reasoning underlying these elasticity measures.


*Answer here*:

3. Compute the average, standard deviation and 5th, 25th, 50th, 75th and 95th percentiles of $\widehat{\alpha}_{i}$ and $\widehat{\beta}_{i}$ across all firms in your dataset. Display this information in a table.

*Answer here*:

4. Let $\widehat{\alpha}$ and $\widehat{\beta}$ be the median firm-specific elasticity estimates. Compute these.

*Answer here*:

5. Construct the following measure of productivity for each firm-year in your dataset:

$$
    TFPR_{it} = \frac{Y_{it}}{K_{it}^{1-\widehat{\alpha}-\widehat{\beta}}L_{it}^{\widehat{\alpha}}M_{it}^{\widehat{\beta}}}
$$

How does this measure relate to $A_{it}$ – total factor productivity – as defined in the
theoretical model?

*Answer here*:

6. In 2014 compute the average, standard deviation and 5th, 25th, 50th, 75th and 95th percentiles of $TFPR_{it}$ across all firms in your dataset. Display this information in a table. Are the productivity differences across firms larger or smaller than you expected?

*Answer here*:

## Productivity decomposition

1.  Let $S_t = Y_t / \mathbb{E} [Y_t]$ and show that

$$
\begin{aligned}
    \mathbb{E} [S_tA_t] &= \mathbb{E} [S_t]\mathbb{E} [A_t]+\mathbb{C}(S_t,A_t) \\
    &= \mathbb{E} [A_t] + \left[\frac{\mathbb{C}(S_t,A_t)}{\mathbb{V}(A_t)}\right]\mathbb{V}(A_t) \\
    &= \mathbb{E} [A_t] + \rho_t \mathbb{V}(A_t)
\end{aligned}
$$

Discuss how this expression might be used to understand industry-level change in
 productivity over time.

*Answer here*:

2. Equate $A_t$ with $TFPR_{it}$ and $S_i$ with $Y_{it}/\frac{1}{n} \sum_{i=1}^{N} \sum_{t=1}^{T_i} Y_{it}$; here $n$ equals the total number of firm-year observations in the dataset. Compute the sample analogs of $\mathbb{E} [A_t]$, $\rho_t$ and
$\mathbb{V}(A_t)$ for each year from 1998 to 2014.

*Answer here*:

3. Use your calculations above to discuss the evolution of productivity in the semi-
conductor industry from 1998 to 2014. Support your answer with plots and/or tables.

*Answer here*: 