# Assignment 2

## Exercise 1: What factors increase the demand for weather insurance?

This exercise is an adapted case from the following paper (but you do not need to read the paper to complete the assignment):

Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. "Subsidy policies and insurance demand." American Economic Review 110.8 (2020): 2422-2453.

Households face different types of weather risks that can generate large fluctuations in income and consumption. To shield individuals from risks, many governments exercise great efforts on developing and marketing formal insurance products. However, in both developing and developed countries, the value placed by individuals on insurance is usually surprisingly low, and initiatives to provide information, subsidies, and to increase trust have had limited success (Cole et al. 2013, Banerjee et al. 2019). Many countries have given up on trying to sell insurance and moved to make insurance mandatory.

In this assignment we will study the demand for a weather insurance product for rice producers in China. Rice is the most important food crop in China, with nearly 50 percent of the country’s farmers engaged in its production. In order to maintain food security and shield farmers from negative weather shocks, in 2009 the Chinese government asked the People’s Insurance Company of China (PICC) to design and offer the first rice production insurance policy to rural households in 31 pilot counties.5 The program was expanded to 62 counties in 2010 and to 99 in 2011. The experiment we are studying in this assignment was conducted in 2010 and 2011 in randomly selected villages included in the 2010 expansion in Jiangxi province, one of China’s major rice-producing areas.

The product in our study is an area-yield index weather insurance that covers natural disasters, including heavy rains, floods, windstorms, extremely high or low temperatures, and droughts. If any of these disasters occurs and leads to a 30 percent or more average loss in yield in a given area, farmers in that area are eligible to receive payouts from the insurance company. These areas are typically defined as fields that include the plots of 5 to 10 farmers. 

## Data Description
The data for this exercise comes from households in 134 villages in the Jiangxi province, which is considered a representative sample of rice producers in Jiangxi. Households were surveyed and each observation in the dataset, named data_cai_sadoulet_dejanvry.dta, corresponds to a household in that sample.

The variables included in the `data_CSD.dta` that are required for this question are:

• `takeup2011` : dummy equal to 1 if the household decided to take up the insurance product in 2011

• `area`: area of rice production in mou (mou, Chinese unit of land measurement that varies with location but is commonly 806.65 square yards (0.165 acre, or 666.5 square metres).

• `age`: age of household head

• `agpop`: household size (number of people living in the same household)

• `male`: Gender of household head (1 if male, 0 if female)

• `literacy`: dummy equal to 1 if the household head is literate, 0 otherwise

## Question 1 : Descriptive Statistics

Load the dataset `data_CSD.dta`. Notice that this is a `.dta` file so you will need to use the `haven` package. Use the `head()` function to have a look at the dataset.

### a) Descriptive statistics

#### i) Demographic characterisics

How many households are in your data set? How many respondents are male and female? What is the mean age among the household heads in the sample? What is the mean number of members in the households in your sample?
Note there are some missing values in the household size variable. What argument do you have to add to `mean()` to get around this?

Hint: check for the `mean()` syntax in this website: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/mean .

In [1]:
# Type your code here

Type your reply here

#### ii) Measures of dispersion
Compute the standard deviation and standard errors of the age of household head and the household size variables.

You can use canned functions for the standard deviations.

In [2]:
# Type your code here

### b) Histogram for Area of rice production in acres

Construct a variable `area_ac` equal to area of rice production per person in acres. 

Plot a histogram (Hint: use the `hist()` command) of this constructed variable, with 100 bins.

In [4]:
# Type your code here

### c) Comparisons for area for rice production by gender of household head

#### i) Means

Calculate the mean of area for rice production (in acres) between female- and male-headed households and their standard errors. Compare the two means, do they seem substantially different?

In [5]:
# Type your code here

Type your reply here

#### ii) Test Statistics

Create a test statistic for the the difference between male- and female headed household for the area for rice production (in acres). Use a two-tail test. Is the difference statistically significant at the 0.95 confidence level? 


In [6]:
# Type your code here

Type your reply here

## Question 2: Effect of literacy on the demand for weather insurance products

In its first year, the experiment allocated different subsidies to households in 134 randonly selected villages of the province. A subsidy of 70% was offered to all households in the villages. Then, 2 days after this initial sale, households from 62 randomly selected villages were offered the insurance product for free.

In this part of the exercise, we will focus on year 1 of the experiment and estimate the effect of literacy on the takeup of the insurance product. Consider the two following models (with area measured in acres):

Model (1): $TakeUp = \beta_0 + \beta_1 $ Literacy $ + \beta_2 $ Area $ + u $

Model (2):  $TakeUp = \beta_0 + \beta_1 $ Literacy $ + \beta_2 $ Area $ + \beta_3 $ HH size $ + u $

### a) Estimation

Estimate equations (1) and (2) with `lm()`.

In [7]:
# Type your code here

### b) Interpretation

Interpret each of the estimated parameters of equation (2) - remember to include significance.

Type your reply here

### c) Omitted Variable Bias

How did your estimate of $\hat{\beta}_1$ change between equation (1) and equation (2)? Without performing any calculations, what information does this give you about the correlation between the literacy of household heads and household size? (Explain your reasoning in no more than 4 sentences.)

Type your reply here

### d) Prediction

Predict the expected probability of a household taking up the insurance product if its household head is literate, it produces rice on 5 acres and has 3 members using your estimates from equation (2).

In [8]:
# Type your code here

Type your reply here

## Question 3: Price and Payouts

We will now investigate how prices of the insurance product affect its take-up in the second year of the experiment. In that year, subsidies ranging from 40% to 90% of the market price of the insurance products were randomly assigned to households in the village.

The two new variables you will need for this question are:

- `price_final`: the price in RMB/mou offered to the household for the insurance product in year 2.
- `payout_2010`: a dummy variable equal to 1 if the household received an insurance payout in year 1.

### (a) Define estimating Equation

Write an equation you could estimate that would account for price and payouts in addition to the variables whose effects we were estimating in Question 2. Importantly, you want to understand the impact of a 1% increase in prices rather than a 1 unit increase in Chinese Yuan. 

Your want to test two hypotheses:
1) The price of the insurance product does not affect the demand for the insurance product.
2) A 50% reduction in the price of the product has the same effect on the likelihood of purchasing the product as receiving a payout the year before.


Type your reply here

### (b) Summary stats

Look at the summary statistics of your price and payout variables (the table() or summary() variables could come in handy). What percentage of households receive a payout?


In [9]:
# Type your code here

Type your reply here

### (c) Hypothesis 1

Estimate the equation in part (a). What can you conclude about your first hypothesis? 

Note that you might transform your price variable prior to estimating the model.

In [10]:
# Type your code here

Type your reply here

### (d) Hypothesis 2


#### (i) 50% increase in the price

What is the change in likelihood to purchase the product associated with a 50% decrease in the price of the product ? 

In [11]:
# Type your code here

Type your reply here

#### (ii) Receiving a payout in year 1

And what is the increase in the likelihood to purchase the product associated with receiving the payout in year 1?

In [12]:
# Type your code here

Type your reply here

#### (iii) Compare effects

Compare effects found in part (d.i) and (d.ii) with a statistical test.

In [13]:
# Type your code here

Type your reply here