# Making Sense of 4 National Rates: 
# Inflation, Unemployment, Interest, and Mortgage

In [4]:
import pandas as pd
import plotly

# load dataset
rates = pd.read_csv('rates.csv')
rates = rates.set_index('DATE')

# plot dataset
plotly.offline.init_notebook_mode(connected=True)
inflation = {'type': 'scatter', 'x': rates.index.tolist(), 'y': rates['INFLATION'].tolist(), 'name':'inflation'}
unemployment = {'type': 'scatter', 'x': rates.index.tolist(), 'y': rates['UNEMPLOYMENT'].tolist(), 'name':'unemployment'}
interest = {'type': 'scatter', 'x': rates.index.tolist(), 'y': rates['INTEREST'].tolist(), 'name':'interest'}
mortgage = {'type': 'scatter', 'x': rates.index.tolist(), 'y': rates['MORTGAGE'].tolist(), 'name':'mortgage'}

plotly.offline.iplot([inflation, unemployment, interest, mortgage])

## PURPOSE

My goal for this bootcamp is to apply data science to a topic related to finance.  I figure it would be interesting to also learn more about economics along the way.  The challenge is I do not have domain knowledge on these topics, but that is also part of the journey!  I looked at the 30yr US Mortgage rate for an earlier summary statistics assignment, and I learned that it has historically been influenced by inflation.  So I wanted to look at other economic indicators and explore some underlying relationships.

## THE DATASET

The time series above is from two datasets collected and analyzed for this capstone project.  These two datasets have been cleaned up and merged to focus on four US economic indicators: inflation rate, unemployment rate, interest rate, and mortgage rate.  The rates analyzed are observed from 1971-2016 on a monthly basis.  It is assumed that there is no bias in the data given its very objective subject; and the range of dates have been trimmed to remove null entries.  The first source (interest rate, unemployment rate, and inflation rate) is from Kaggle, [link here](
https://www.kaggle.com/federalreserve/interest-rates/version/1).  The second source (30yr mortgage rate) is directly from Fred Economic Data, [link here](https://fred.stlouisfed.org/series/MORTGAGE30US).  

### Interest Rate

Also labeled as the Effective Federal Funds Rate in the Kaggle dataset.  This rate is different from the other rates because it is directly set by the Federal Reserve.  It is adjusted in response to the current economic conditions, with the purpose of influencing those very conditions to become more favorable.     

The Federal Reserve sets interest rates to promote conditions that achieve the mandate set by the Congress — 

- high employment
- low and stable inflation
- sustainable economic growth
- and moderate long-term interest rates

Interest rates set by the Fed directly influence the cost of borrowing money. Lower interest rates encourage more people to obtain a mortgage for a new home or to borrow money for an automobile or for home improvement. Lower rates encourage businesses to borrow funds to invest in expansion such as purchasing new equipment, updating plants, or hiring more workers. Higher interest rates restrain such borrowing by consumers and businesses.

The federal funds rate is the interest rate at which depository institutions trade federal funds (balances held at Federal Reserve Banks) with each other overnight. The rate that the borrowing institution pays to the lending institution is determined between the two banks; the weighted average rate for all of these types of negotiations is called the effective federal funds rate. 

The interest rate data was published by the Federal Reserve Bank of St. Louis' economic data portal.
The effective federal funds rate is determined by the market but is influenced by the Federal Reserve through open market operations to reach the federal funds rate target. The Federal Open Market Committee (FOMC) meets eight times a year to determine the federal funds target rate; the target rate transitioned to a target range with an upper and lower limit in December 2008. 

The gross domestic product data was provided by the US Bureau of Economic Analysis
The real gross domestic product is calculated as the seasonally adjusted quarterly rate of change in the gross domestic product based on chained 2009 dollars. 

### Unemployment Rate

The unemployment and consumer price index data was provided by the US Bureau of Labor Statistics.
The unemployment rate represents the number of unemployed as a seasonally adjusted percentage of the labor force.

### Inflation Rate

The inflation rate reflects the monthly change in the Consumer Price Index of products excluding food and energy.

### Mortgage Rate

Mortgage rates affect the real estate market, which further influences the US economy.  Lower mortgage rates can make the net cost of homeownership more affordable to more people.  However, it can also be said that the increase in affordability introduces more competing buyers that lead to increase in sale price.  Current homeowners benefit from lower mortgage rates through increase in home equity and additional cash flow after refinancing their homes.  

## METRICS

### Inflation Rate
{mean: 4.02, median: 3.0, min: 0.6, max: 13.6}
![inflation](Images/variable_analysis_INFLATION.png)

### Unemployment Rate
{mean: 6.38, median: 6.0, min: 3.8, max: 10.8}
![unemployment](Images/variable_analysis_UNEMPLOYMENT.png)

### Interest Rate
{mean: 5.35, median: 5.25, min: 0.07, max: 19.1}
![interest](Images/variable_analysis_INTEREST.png)

### 30yr Mortgage Rate
{mean: 8.25, median: 7.8, min: 3.35, max: 18.45}
![mortgage](Images/variable_analysis_MORTGAGE.png)

## TOOLS

![tools1](https://moriohcdn.b-cdn.net/3c9974b51b.png)

## DATA INSIGHTS
![heatmap](Images/heatmap.png)

**Question 1:** From plotting the 30yr US Mortgage rate, the all time high of +18% is said to be in response to the high inflation rate at the time (1980s recession). Is there a correlation between mortgage rate and inflation rate after 1980?

**Question 2:** The federal funds rate (interest rate) is reactive and manually set by the government.  With economic health as a big concern, we can expect it to respond to the behavior of inflation.  What behaviors are observed between the two rates based on the data?

**Question 3:** Average duraton between interest rates (federal funds rate) cut and interest rates increase (need to specify tolerance to define significant interest hikes).  There is a belief that when Feds lower the interest rate, an increase will take a long time. 

## FUTURE MODELING

Using OLS and time-series, can we predict change in one variable based on the changes from another variable? Why is it important?

## APPENDIX

### Notebooks
- [cleanup of mortgage data](cleaning_mortgage.ipynb)
- [cleanup of federal data](cleaning_federal.ipynb)
- [data visualization](data_visualization.ipynb)

### Rates Dataset

In [5]:
import pandas as pd

pd.set_option('display.max_rows', None)

rates = pd.read_csv('rates.csv')
rates

Unnamed: 0,DATE,INTEREST,UNEMPLOYMENT,INFLATION,MORTGAGE
0,1971-04-01,4.15,5.9,5.0,7.31
1,1971-05-01,4.63,5.9,5.2,7.425
2,1971-06-01,4.91,5.9,4.9,7.53
3,1971-07-01,5.31,6.0,4.9,7.604
4,1971-08-01,5.56,6.1,4.6,7.6975
5,1971-09-01,5.55,6.0,4.4,7.6875
6,1971-10-01,5.2,5.8,3.8,7.628
7,1971-11-01,4.91,6.0,3.3,7.55
8,1971-12-01,4.14,6.0,3.1,7.48
9,1972-01-01,3.5,5.8,3.1,7.4375


### Functions

In [None]:
def summary_stat_plot(plt, array, num_bins=20):
    
    # Plot a histogram.
    sns.distplot(array)
    
    # Add a vertical line at the mean.
    plt.axvline(array.mean(), 
                color='b', 
                linestyle='solid', 
                linewidth=2, 
                label=f'mean: {round(array.mean(),2)}')

    # Add a vertical line at one standard deviation above the mean.
    plt.axvline(array.mean() + array.std(), 
                color='b', 
                linestyle='dashed', 
                linewidth=2, 
                label=f'±std: {round(array.std(),2)}')

    # Add a vertical line at one standard deviation below the mean.
    plt.axvline(array.mean() - array.std(), color='b', linestyle='dashed', linewidth=2)
    
    # Add legend with mean and std
    plt.legend(shadow=True,loc=0)
    
    plt.title('US {} RATE DISTRIBUTION'.format(array.name))
    plt.xlabel('{} RATE'.format(array.name))
    plt.ylabel('FREQEUNCY');
    
    return plt


def qq_plot(plt, series, distribution='normal'):
    
    # Making two variables.
    rand1 = np.random.normal(0, 1, len(series))
    rand2 = np.random.gamma(5,1, len(series))

    # Sorting the values in ascending order.
    unknown_dist = series.sort_values()
    unknown_dist_mean = unknown_dist.mean()
    unknown_dist_std = unknown_dist.std()
    unknown_dist_norm = (unknown_dist - unknown_dist_mean) / unknown_dist_std
    rand1.sort()
    rand2.sort()
    
    # Plotting the variable against series.    
    if distribution == 'normal':
        plt.title('QQ Plot - Normal Distribution')
        plt.plot(unknown_dist, rand1, "o")
    elif distribution == 'gamma':
        plt.title('QQ Plot - Gamma Distribution')
        plt.plot(unknown_dist, rand2, "o")
    
    return plt


def combine_plots(series):

    plt.figure(figsize = [10,5])

    # histogram
    plt.subplot(1,2,1)
    fig = summary_stat_plot(plt, series)

    # qq plot
    plt.subplot(1,2,2)
    fig2 = qq_plot(fig, series)

    fig2.savefig('Images/variable_analysis_{}.png'.format(series.name));

### Additional Plots

![boxplot1](Images/boxplot_INFLATION.png)
![boxplot2](Images/boxplot_UNEMPLOYMENT.png)
![boxplot3](Images/boxplot_INTEREST.png)
![boxplot4](Images/boxplot_MORTGAGE.png)


![pairplot](Images/pairplot.png)