# **Vanguard's \$1,000 Question: Analyzing U.S. Households' Liquid Assets for Mutual Fund Investment**
## Dan Valenzuela

***

## **Overview** <a id="Overview"></a>

[**1. Problem**](#Problem)

[**2. Data Understanding**](#Data-Understanding)

[**3. Data Preparation**](#Data-Preparation)

[**4. Data Analysis**](#Data-Analysis)

[**5. Evaluation**](#Evaluation)

[**6. Conclusion & Next Steps**](#Conclusion)

[**7. Endnotes**](#Endnotes)

***

## **Problem** <a id="Problem"></a>
[*↑ Back to overview*](#Overview)

4 in 5 households that invest with Vanguard utilize mutual funds to meet their financial goals.<a id='fn-1-src'></a>[<sup>1</sup>](#fn-1) Mutual funds are essentially like any other company but their sole purpose is to pool money from investors and invest it into stocks, bonds, or other financial products. When a person invests in Vanguard mutual funds, what they are essentially doing is buying shares in a company that Vanguard has set up for the purpose of investing money in a certain way. In return, that person gets a cut of the returns (or losses) on investment.

Why do so many households use mutual funds? It outsources all the effort of researching companies, executing trades, and paying trading fees to the mutual fund. For that effort, mutual funds charge a fee based on the amount households invest.<a id='fn-2-src'></a>[<sup>2</sup>](#fn-2)

However, it's not so easy as telling companies that set up mutual funds like Vanguard to shut up and take your money. Vanguard requires investors to put in at least \$1,000 for some of their mutual funds. Other mutual fund investment minimums only go higher. <a id='fn-3-src'></a>[<sup>3</sup>](#fn-3)



About half of the households invested with Vanguard have at least one taxable account. 

[*↑ Back to overview*](#Overview)
***

## **Data Understanding** <a id="Data-Understanding"></a>
[*↑ Back to overview*](#Overview)

In [1]:
%load_ext autoreload
%autoreload 2

import dataloading as dl

targetdir = 'data/extracted/'

### Datasets

The data can be summarized by the [Federal Reserve](https://www.federalreserve.gov/publications/files/scf20.pdf) itself <a id='fn-2-src'></a>[<sup>2</sup>](#fn-2): 

> The SCF is a triennial interview survey of U.S. families sponsored by the
Board of Governors of the Federal Reserve System with the cooperation of the U.S.
Department of the Treasury. Since 1992, data for the SCF have been collected by NORC, a
research organization at the University of Chicago. Although the majority of the data are
collected between May and December of each survey year, a small fraction of the data
collection occurs in the first four months of the next calendar year. In the 2019 SCF, this
portion of the data collection overlapped with early months of the COVID-19 pandemic,
with about 9 percent of interviews conducted between February and April 2020.

 

### Sampling
From [Federal Reserve Bulletin 2020 at 40](https://www.federalreserve.gov/publications/files/scf20.pdf)
>First, a standard multistage area-probability sample (a geographically based random sample) is selected
to provide good coverage of characteristics, such as homeownership, that are broadly
distributed in the population.
Second, a supplemental sample is selected to disproportionately include wealthy families,
which hold a relatively large share of such thinly held assets as noncorporate businesses
and tax-exempt bonds. Called the “list sample,” this group is drawn from a list of statistical
records derived from tax returns. These records are used under strict rules governing
confidentiality, the rights of potential respondents to refuse participation in the survey, and
the types of information that can be made available. Persons listed by Forbes as being
among the wealthiest 400 people in the United States are excluded from sampling.

>Of the 5,783 interviews completed for the 2019 SCF, 4,291 were from the area-probability
sample, and 1,492 were from the list sample; for 2016, 4,754 were from the area-probability
sample, and 1,500 were from the list sample.

### Weighting

From [Federal Reserve Bulletin 2020 at 42](https://www.federalreserve.gov/publications/files/scf20.pdf)

>To provide a measure of the frequency with which families similar to the sample families
could be expected to be found in the population of all families, an analysis weight is
computed for each case, accounting for both the systematic properties of the sample design
and differential patterns of nonresponse. The SCF response rates are low by the standards
of some other major government surveys, and analysis of the data confirms that the
tendency to refuse participation is highly correlated with net worth. However, unlike other
surveys, which almost certainly also have differential nonresponse by wealthy households,
the SCF has the means to adjust for such nonresponse.

From [Codebook 2019](https://www.federalreserve.gov/econres/files/codebk2019.txt)
> The issue of weighting in regressions has long been controversial.
Users of the SCF may find two references particularly useful:
(1) Analysis of Complex Surveys, C.J. Skinner, D. Holt, and
T.M.F. Smith (editors), John Wiley and Sons, 1989 (see particularly
pages 8-10, 154-157, and 286-287). (2) The Analysis of Household
Surveys: A Microeconometric Approach to Development Policy, Angus
Deaton, Johns Hopkins University Press, 1997 (see particularly pages
67-73).  

### Imputation

From [Codebook 2019](https://www.federalreserve.gov/econres/files/codebk2019.txt)

> The imputations are stored as five successive replicates
("implicates") of each data record.  Thus, the number of observations
in the full data set (28,915) is five times the actual number of
respondents (5,783)

>Users who want to estimate more complex statistics, particularly
regressions, should be cautious in their treatment of the implicates.
Some regression packages will treat each of the five implicates as an
independent observation and correspondingly inflate the reported
statistical significance of results.  Users who want to calculate
regression estimates, but who have no immediate use for proper significance
tests, could either average the dependent and independent values
across the implicates or multiply their standard errors by the square
root of five.  For an easily understandable discussion of multiple
imputation in the SCF from a user's point of view, see Catherine
Montalto and Jaimie Sung, "Multiple Imputation in the 1992 Survey of
Consumer Finances," Financial Counseling and Planning, Volume 7, 1996,
pages 133-146 (http://afcpe.org/assets/pdf/vol7_133-146multipleimputation.pdf).
That article also contains a set of simple SAS macros to
use to compute correct standard errors from multiply imputed data.
Two alternatives for processing general model estimates are offered
here, one written in SAS (MACRO MISECOMP) and the other in a Stata
ado file (micombine).  (NOTE: both SAS and Stata now include regression
packages for the analysis of multiply imputed data.)  See the section
"ANALYSIS WEIGHTS" below for a brief discussion of the inclusion of
sample design effects in the estimation of complex statistics

### Key Data for Merging Datasets and Analysis <a id="Key-Data" ></a>



[*↑ Back to overview*](#Overview)
***

## **Data Preparation** <a id="Data-Preparation"></a>
[*↑ Back to overview*](#Overview)

### Merging Data





### Cleaning Data <a id="Cleaning-Data"></a>



[*↑ Back to overview*](#Overview)
***

## **Data Analysis** <a id="Data-Analysis"></a>
[*↑ Back to overview*](#Overview)

[*↑ Back to overview*](#Overview)
***

## **Evaluation**<a id="Evaluation"></a>
[*↑ Back to overview*](#Overview)

[*↑ Back to overview*](#Overview)
***

## **Conclusion & Next Steps**<a id="Conclusion"></a>
[*↑ Back to overview*](#Overview)

[*↑ Back to overview*](#Overview)
***

## **Endnotes** <a id="Endnotes"></a>
[*↑ Back to overview*](#Overview)


<a id='fn-1'></a> [1.](#fn-1-src) *How America Invests 2020*. 2020. Vanguard. [https://personal.vanguard.com/pdf/how-america-invests-2020.pdf](https://personal.vanguard.com/pdf/how-america-invests-2020.pdf), 28.


<a id='fn-2'></a> [2.](#fn-2-src) U.S. Securities and Exchange Commission. "Mutual Funds". Investor.gov. Date accessed: Dec. 10, 2020. [https://www.investor.gov/introduction-investing/investing-basics/investment-products/mutual-funds-and-exchange-traded-1](https://www.investor.gov/introduction-investing/investing-basics/investment-products/mutual-funds-and-exchange-traded-1).

<a id='fn-3'></a> [3.](#fn-3-src) Vanguard. "Vanguard mutal fund fees & minimums". Mutual Funds. Date accessed: Dec. 10, 2020. [https://investor.vanguard.com/mutual-funds/fees](https://investor.vanguard.com/mutual-funds/fees).

[*↑ Back to overview*](#Overview)