<table style="width: 100%;">
    <tr style="background-color: transparent;"><td>
        <img src="https://d8a-88.github.io/econ-fa19/assets/images/blue_text.png" width="250px" style="margin-left: 0;" />
    </td><td>
        <p style="text-align: right; font-size: 12pt;"><strong>Economic Models</strong>, Fall 2019<br>
            Dr. Eric Van Dusen</p></td></tr>
</table>

# Data 88 - Project 2: Mariel Boatlift

For this projection, we will be looking at replicating an analysis done by Prof. David Card on the effects of the Mariel Boatlift on the Miami labor market. Because immigrants choose their destinations (and logically choose places with strong labor markets), it is not enough to look at whether places with more immigrants have different rates of unemployment or wages to determine the causal effect of adding immigrants to a labor market. However, Card determined that there are some circumstances under which immigrants will arrive which have very little to do with the labor market of that place; the Mariel Boatlift is one such event. Here is Card's description:

> The experiences of the Miami labor market in the aftermath of the Mariel Boatlift form one such \["natural"\] experiment. From May to September 1980, some 125,000 Cuban immigrants arrived in Miami on a flotilla of privately chartered boats. Their arrival was the consequence of an unlikely sequence of events culminating in Castro's declaration on April 20, 1980, that Cubans wishing to emigrate to the United States were free to leave from the port of Mariel. Fifty percent of the Mariel immigrants settled permanently in Miami. The result was a 7% increase in the labor force of Miami and a 20% increase in the number of Cuban workers in Miami. (Card, 1990:245-6)

**Reading:** You should download a copy of the original paper and read at least pages 245-251 and pages 255-257. It is available [here](http://davidcard.berkeley.edu/papers/mariel-impact.pdf).

In [None]:
from datascience import *
import numpy as np
from utils import *
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use("seaborn-muted")
import otter
grader = otter.Notebook()

**A note on autograding:** This notebook uses an autograder to verify the correctness of your inputs. _However_, some of the tests are only "sanity checks" which ensure that your answer is in the correct format, rather than that you have the correct answer. The checks in Part 1 and Question 2.1 check that you have the _correct_ answer, and the other tests only check that your answer is _in the correct format._ Think deeply about your answers, and take care to note that passing autograder tests does not necessarily guarantee that you are correct.

## Part 1: CPS Data

In the cell below, we loading the data from the Current Population Survey (CPS). The CPS "outgoing rotation groups" that we are using for the analysis is the largest sample available for this time period. Still, once we limit ourselves to Miami and the comparison cities, the sample sizes are still small.

In [None]:
mariel_raw = Table.read_table("mariel-boatlift.csv")
mariel_raw.show(5)

The columns of our table have some odd names. We provide descriptions of the variables of interest below.

| Column Name | Description |
|-----|-----|
| `age` | Age of individual |
| `smsarank` | City |
| `esr` | Employment status |
| `ftpt79` | Full-time or part-time |
| `earnhre` | Nominal hourly pay in cents |
| `educ` | Level of education (BA, HS diploma, or < HS) |
| `ethrace` | Race & ethnicity |

Before moving onto the analysis of the data, there is a tiny bit of data cleaning that needs to be done. The data currently have a `.id` column, which encodes the year. However, these values range from 1 to 7, so to get the actual years, we need to at 1978 to each.

**Question 1.1:** Add 1978 to each value in the `.id` column of `mariel_raw`. Create a new table `mariel` with the same columns as `mariel_raw` but with a `year` column and no `.id` column.

In [None]:
year = ...
mariel = mariel_raw.with_column(..., ...).drop(...)
mariel.show(5)

In [None]:
grader.check("q1_1")

Because we are focusing on certain racial groups in this lab, we want to remove the rows of `mariel` that are not in the set of ethnicities that we are concerned with.

**Question 1.2:** Remove the rows of `mariel` where the `ethrace` variable is `"others"`.

In [None]:
mariel = ...
mariel.show(5)

In [None]:
grader.check("q1_2")

**Question 1.3:** What categories are there in the `ethrace` variable (now)?

1. White and Black
2. White and Black and Cuban
3. White, Black, Cuban, and Hispanic
4. Cuban, Hispanic, non-Hispanic

Assign the number corresponding to your answer to `q1_3` below.

In [None]:
q1_3 = ...

In [None]:
grader.check("q1_3")

**Question 1.4:** What are the units of the `earnhre` variable?

1. 1980 dollars per hour
2. 1980 cents per hour
3. Nominal cents per hour (not adjusting for inflation)
4. Nominal dollars per hour (not adjusting for inflation)

Assign the number corresponding to your answer to `q1_4` below.

In [None]:
q1_4 = ...

In [None]:
grader.check("q1_4")

**Question 1.5:** What cities make up the comparison group?

1. All U.S. cities except Miami
2. All Florida cities except Miami
3. Cities around the country that Card thought would be subject to the same macro-economic influences as Miami but that didn't receive many Cuban immigrants.
4. Cities that also received a lot of Cuban immigrants.

Assign the number corresponding to your answer to `q1_5` below.

In [None]:
q1_5 = ...

In [None]:
grader.check("q1_5")

## Part 2: Attempt to replicate results for unemployment 

We're going to begin by trying to replicate Card's results for unemployment in his Table 4. (We'll do wages in Table 3 later).

In the cell below, we define a function `get_ue` that takes an array of values from the `esr` column and returns the proportion of people who are unemployed (the unemployment rate).

In [None]:
def get_ue(esr):
    assert type(esr) == np.ndarray, "esr must be an array"
    unemployed_looking = sum(esr == "Unemployed-Looking")
    others = sum(np.isin(esr, make_array("Unemployed-Looking",  "Employed-At Work", "Employed-Absent")))
    return unemployed_looking / others

When considering the effect on the unemployment rate of the boatlift, we want to first separate Miami from the comparison cities and select for the desired age group (16 to 61). After we have the desired rows, we want to create a table where the rows represent a year in the data, the columns represent the unique values of the `ethrace` variable, and the cells contain the unemployment rate.

This is accomplished by creating a pivot table (using `Table.pivot()`). The `.pivot()` method can also take as an argument an aggregator function, to which it will pass an array of values that corresponding to each row-column pair. In this case, it will pass the array of `esr` values that correspond to each `year`-`ethrace` pair to the function we pass, which will be the `get_ue` function define above.

The end result is that we will have a table where each column is an `ethrace` value, each row is a year, and the values are the unemployment rate for that `ethrace` value. All of this is done for you in the cell below, and the results are stored as `miami_ue`.

In [None]:
miami_ue = (mariel
            .where("smsarank", are.equal_to("Miami"))
            .where("age", are.between_or_equal_to(16, 61)))

miami_ue = miami_ue.pivot("ethrace", "year", "esr", get_ue)
miami_ue

**Question 2.1:** Create the same pivot table below (include the same age restriction), but for the comparison cities (that is, all cities _except for Miami_). **Because we are looking at the comparison cities, we must exclude Cubans using a filter on the `ehtrace` columns.** Store the pivot table as `not_miami_ue`.

In [None]:
not_miami_ue = ...

not_miami_ue = not_miami_ue....
not_miami_ue

In [None]:
grader.check("q2_1")

In the cell below, we plot the unemployment rates for Miami and the comparison cities for each `ethrace` value. The dashed vertical line in 1980 indicates the Mariel boatlift's occurance. (The function `plot_ue_by_ethrace`, along with the other plotting functions in this notebook, are defined in the `utils.py` file if you want to look at them. They're hidden because the code is very verbose.)

In [None]:
plot_ue_by_ethrace(miami_ue, not_miami_ue)

**Question 2.2:** Why do the "Cubans" have no comparison group?

1. Because there's a mistake in the code
2. Because there are not many Cubans in the comparison cities 
3. Because there are many Cubans in the comparison cities and it would be confusing to include them.

Assign the number corresponding to your answer to `q2_2` below.

In [None]:
q2_2 = ...

In [None]:
grader.check("q2_2")

**Question 2.3:** Unemployment after the Mariel boatlift goes up for all groups. Why does Card argue that there is "There is no evidence that the Mariel influx adversely affected the unemployment rate of either whites or blacks." (p. 250)

1. Because our replication gives different numbers that Card's original analysis
2. Because the increases in unemployment were also seen in cities that didn't have the the sudden Cuban migration.
3. Because Cubans experienced the same effects in Miami as whites and blacks there

Assign the number corresponding to your answer to `q2_3` below.

In [None]:
q2_3 = ...

In [None]:
grader.check("q2_3")

**Question 2.4:** How much attention should we pay to the ups and downs in these graphs? Are these chance fluctuations from the sample survey ("noise"), or are they important information that we should pay attention to ("signal")?

1. They are signal
2. They are noise
3. We can’t tell just by looking, but one could in theory (and with the help of a statistics course) quantify the magnitude of fluctuations that we would expect from random sampling.

Assign the number corresponding to your answer to `q2_4` below.

In [None]:
q2_4 = ...

In [None]:
grader.check("q2_4")

## Part 3: Wages

Now we will try to replicate Card's findings that the Mariel boatlift also had little or no effect on wages of natives. For simplicity we will not deflate the wages but instead consider the nominal wages.

Because some of the values in the `earnhre` column are missing (`nan`), we remove the rows where this is the case in the cell below. **Throughout this part, make sure you use `mariel_ehre` instead of `mariel`, or else your calculations may error.**

In [None]:
mariel_ehre = mariel.where("earnhre", lambda x: not np.isnan(x))

In order to make the wages more linear and to put them on an easier-to-understand scale, we take the natural log of each value in the `earnhre` column and store this as `log_w`.

In [None]:
log_w = np.log(mariel_ehre.column("earnhre")/100)
mariel_ehre = mariel_ehre.with_column("log_w", log_w)
mariel_ehre.show(5)

We want to create a similar pivot table as in part 2, except we want the values in this table to be the mean of the log of wages. We create this table for Miami below, making sure to also filter `merial_ehre` for rows where the individual is employed full-time.

In [None]:
miami_wages = (mariel_ehre.where("age", are.between_or_equal_to(16, 61))
               .where("smsarank", are.equal_to("Miami"))
               .where("ftpt79", are.equal_to("Employed full-time")))

miami_wages = miami_wages.pivot("ethrace", "year", "log_w", np.mean)
miami_wages

**Question 3.1:** Create the same pivot table below, except for the comparison cities (that is, all cities _except for Miami_). Store the pivot table as `not_miami_wages`.

In [None]:
not_miami_wages = ...

not_miami_wages = not_miami_wages....
not_miami_wages

In [None]:
grader.check("q3_1")

In the cell below, we plot the wages for Miami and the comparison cities for each `ethrace` value.

In [None]:
plot_wages_by_ethrace(miami_wages, not_miami_wages)

Our numbers differ from Card's Table 4 because we are not accounting for inflation. In order to make inferences about the effect of the boatlift on wages easier, let's plot the differences between Miami and the Comparison Cities. **If there was an effect on wages in Miami, these plots fall as wages in Miami go down relative to the comparison cities.**

In [None]:
plot_wage_diffs_by_ethrace(miami_wages, not_miami_wages)

**Question 3.2:** If wages were hurt by the influx of migrants, we would expect this graph to show

1. A decrease after 1980, as Miami wages went down relative to other cities
2. Values below 0 for all periods, because Miami would always have lower wages
3. An uptick after 1980 because we are working with logarithms.

Assign the number corresponding to your answer to `q3_2` below.

_Hint:_ $\log A  - \log B = \log \frac{A}{B}$

In [None]:
q3_2 = ...

In [None]:
grader.check("q3_2")

So it seems that indeed our analysis is consistent with Card's conclusion that "the Mariel immigration had virtually no effect on wages or unemployment outcomes of non-Cuban workers in the Miami labor market" (p. 255).

## Part 4: Education

We would expect any negative effect of the influx of immigrants to be strongest on the group that they most resemble. Because most of the Cuban immigrants in the boatlift were unskilled, we would expect the strongest effect on natives with the least education, with perhaps the clearest comparison group being Hispanics with the least education.

Card used a different approach, looking at the effects for low-skilled workers by predicting wages based on education and years of experience. Here we do something a bit simpler, using education only.

**Question 4.1:** If the boatlift had a negative effect on the employment of unskilled workers, what would we expect to see in the unemployment for each of categories of education in both Miami and the comparison cities? 

_Note:_ The possible values of `educ` are `BA`, `HS`, or `lessHS`.

_Type your answer here, replacing this text._

We want to create a similar pivot table as in parts 2 and 3, except we want the values in this table to be the mean of the unemployment rate partitioned by _education_, not `ethrace`. We create this table for Miami below.

In [None]:
miami_educ_ue = (mariel
                 .where("age", are.between_or_equal_to(16, 61))
                 .where("smsarank", are.equal_to("Miami")))

miami_educ_ue = miami_educ_ue.pivot("educ", "year", "esr", get_ue)
miami_educ_ue

**Question 4.2:** What happens to the unemployment rates of those with a college education (BA) between 1980 and 1982, when the effects of the Mariel boatlift should have been felt? What happens to those with the least education? ("lessHS"). Is this consistent with a large effect of immigration on the least educated?

_Type your answer here, replacing this text._

**Question 4.3:** Create the same pivot table below, except for the comparison cities (that is, all cities _except for Miami_). Store the pivot table as `not_miami_educ_ue`.

In [None]:
not_miami_educ_ue = ...

not_miami_educ_ue = not_miami_educ_ue....
not_miami_educ_ue

In [None]:
grader.check("q4_3")

In the cell below, we plot the unemployment rates for Miami and the comparison cities for each `educ` value.

In [None]:
plot_ue_by_educ(miami_educ_ue, not_miami_educ_ue)

**Question 4.4:** Like Card's study, many empirical works find very small or no impact of immigration on local workers' wages and employment. Several studies even found positive impact of skilled immigration on wages and employment. What are 2 possible reasons that having immigrants would benefit the native-born workers?

_Type your answer here, replacing this text._

## Conclusions

In this lab, we saw that the Mariel boatlift had little effect on unemployment among ethnic groups in Miami, as the cities that were not receiving \(many\) immigrants were afflicted by the same trends in unemployment. We also saw that the same is true of education-level groups. In fact, many studies find that influxes of immigrants benefit the native-born workers in a region; that they act as complements rather than supplements to the native workforce.

---

### References

This notebook is based on another assignment by [Prof. Josh Goldstein](https://courses.demog.berkeley.edu/goldstein175).

## Submission

Congrats on finishing another homework notebook! To turn in this homework assignment, **save this file** by going to File > Download As and select **Notebook**; then, run the cell below to generate a PDF of this assignment and download it. Submit this assignment by uploading **BOTH the .ipynb and .pdf files** to Gradescope.

In [None]:
grader.export("proj02.ipynb")