# US Financial Health

In this project, we will be importing various types of financial data to try and determine the financial health and volatility of the US between 1999 and 2019.

We will use the techniques we have learned for importing financial data, to import stock and commodity pricing data from csv files and the FRED API. Then grab GDP and goods and services export data from the World Bank API.

Finally, we will find the log returns of the imported data, and use that to determine the volatility of the data over the 20 year period.

Let us get started!

## Importing Commodity Prices

1. In the workspace there are two csv files with historical commodity data for gold and crude oil.

    This is the commodity data we will be importing and operating on.

    In order to import csv files, we will need the pandas library imported into our program.

    Import pandas in a variable called `pd`.

In [1]:
import pandas as pd

2. Now that pandas is imported, use its `read_csv` function to import data from the `gold_prices.csv` file into a variable called `gold_prices`.

    Then print the gold prices DataFrame and look it over.

In [2]:
gold_prices = pd.read_csv('gold_prices.csv')
gold_prices

Unnamed: 0,Date,Gold_Price
0,2019-08-30,1528.40
1,2019-08-29,1540.20
2,2019-08-28,1537.15
3,2019-08-27,1532.95
4,2019-08-26,1503.80
...,...,...
5386,1999-01-07,289.95
5387,1999-01-06,287.65
5388,1999-01-05,287.15
5389,1999-01-04,287.15


3. Now let us do the same for the crude oil data.

    Import the historical data in `crude_oil_prices.csv` into a variable called `crude_oil_prices`.

    Then print it out and look it over as well.

In [3]:
crude_oil_prices = pd.read_csv('crude_oil_prices.csv')
crude_oil_prices

Unnamed: 0,Date,Crude_Oil_Price
0,"Sep 11, 2018",69.25
1,"Sep 10, 2018",67.54
2,"Sep 07, 2018",67.75
3,"Sep 06, 2018",67.77
4,"Sep 05, 2018",68.72
...,...,...
4995,"Jan 08, 1999",13.07
4996,"Jan 07, 1999",13.09
4997,"Jan 06, 1999",12.80
4998,"Jan 05, 1999",11.99


## Importing Stock Prices

4. We have imported the commodity prices from their csv files, now we will focus on historical stock prices.

    Pandas datareader is able to import stock pricing data from the FRED API using the `pandas_datareader.data` library.

    Import `pandas_datareader.data` as `web`.

In [4]:
import pandas_datareader.data as web

5. Since we only want data between 1999 and 2019, we will also want to create some start and end variables.

    Import the `datetime` module and create two datetimes, `start` and `end`, which represent January 1st of 1999 and 2019 respectively.

In [5]:
from datetime import datetime

start = datetime(1999, 1, 1)
end = datetime(2019, 1, 1)

6. We can use the `web.DataReader` function to get historical prices for the NASDAQ 100 from the FRED API.

    `web.DataReader` takes 4 arguments:

    * Data id code (`'NASDAQ100' `)
    * The name of the API we want to call (`'fred'`)
    * Start and end date times

    Call `web.DataReader` with the appropriate arguments, and store the resulting DataFrame in a variable called `nasdaq_data`. Then print it out and look at the results.

In [6]:
nasdaq_data = web.DataReader('NASDAQ100', 'fred', start, end)
nasdaq_data

Unnamed: 0_level_0,NASDAQ100
DATE,Unnamed: 1_level_1
1999-01-01,
1999-01-04,1854.390
1999-01-05,1903.000
1999-01-06,1963.950
1999-01-07,1966.350
...,...
2018-12-26,6262.766
2018-12-27,6288.301
2018-12-28,6285.266
2018-12-31,6329.965


7. The FRED API also stores data from the S&P 500 Index. Let us import that as well.

    Call `web.DataReader` just like in the previous step, except change the data id code from `NASDAQ100` to `SP500`.

    Store the results in a variable called `sap_data` and print it out.

In [7]:
sap_data = web.DataReader('SP500', 'fred', start, end)
sap_data

Unnamed: 0_level_0,SP500
DATE,Unnamed: 1_level_1
2011-03-14,1296.39
2011-03-15,1281.87
2011-03-16,1256.88
2011-03-17,1273.72
2011-03-18,1279.20
...,...
2018-12-26,2467.70
2018-12-27,2488.83
2018-12-28,2485.74
2018-12-31,2506.85


8. In addition to stock and commodity prices, we also want to import more high level economic data like GDP and the total value of goods and services exported in a given year.

    Luckily for us, the World Bank API tracks exactly these things.

    First things first, let us import the World Bank sub-module form pandas datareader.

    Import `pandas_datareader.wb` as `wb`.

In [8]:
import pandas_datareader.wb as wb

9. We can use the `wb.download` function to get GDP data from the World Bank API.

    `wb.download` takes 4 arguments:

    * A data indicator (`NY.GDP.MKTP.CD`)
    * A list of countries to get data for
    * Start and end datetimes

    A call would look something like this:

    `wb.download(indicator='INDICATOR', country=['US'], start=start, end=end)`

    Call `wb.download` with the appropriate arguments, and store the resulting DataFrame in a variable called `gdp_data`.

In [9]:
gdp_data = wb.download(indicator='NY.GDP.MKTP.CD', country=['US'], start=start, end=end)
gdp_data

Unnamed: 0_level_0,Unnamed: 1_level_0,NY.GDP.MKTP.CD
country,year,Unnamed: 2_level_1
United States,2019,21433226000000
United States,2018,20580159776000
United States,2017,19519353692000
United States,2016,18714960538000
United States,2015,18224704440000
United States,2014,17527163695000
United States,2013,16784849196000
United States,2012,16197007349000
United States,2011,15542581104000
United States,2010,14992052727000


10. The World Bank API also stores data about the value of goods and services exported in a given year. Let us import that as well.

    Call `wb.download` just like in the previous step, except change the indicator from `NY.GDP.MKTP.CD` to `NE.EXP.GNFS.CN`.

    Store the results in a variable called `export_data` and print it out.

In [10]:
export_data = wb.download(indicator='NE.EXP.GNFS.CN', country=['US'], start=start, end=end)
export_data

Unnamed: 0_level_0,Unnamed: 1_level_0,NE.EXP.GNFS.CN
country,year,Unnamed: 2_level_1
United States,2019,2514751000000
United States,2018,2528704000000
United States,2017,2374560000000
United States,2016,2227174000000
United States,2015,2265862000000
United States,2014,2371704000000
United States,2013,2273428000000
United States,2012,2191280000000
United States,2011,2102995000000
United States,2010,1846280000000


## Calculating Log Return

11. At this point, we have imported all the data we need. But it is all stored as either daily or yearly dollar amounts.

    Pricing data is useful, but in this case, since we want to compare each data set, it would be even better if instead of daily/yearly pricing, we had information on the log returns from the daily/yearly prices.

    As a first step, let us define a function called `log_return`, which should accept one parameter, prices. You can leave the function body empty for now.

In [11]:
def log_return(prices):
    pass

12. The equation for calculating the log return between two prices is as follows:

    **natural_log(current price/previous price)**

    In our case we want to run this equation for each day/year of pricing data in our imported DataFrame Series (A Series is a single column in a DataFrame).

    The pandas `shift` function can be used to divide each current price by its previous price in the Series.

    `prices / prices.shift(1)`

    And we can use numpy's natural log function to get the log return for each entry in the new Series.
    ```
    import numpy as np
    np.log(Series)
    ```
    Using this information, fill in the code in the `log_return` function (be sure to import numpy at the top of the file).

In [12]:
import numpy as np

def log_return(prices):
    return np.log(prices / prices.shift(1))

13. Let us use our new `log_return` function to calculate the log return of the `gold_prices` DataFrame we imported.

    Create a variable called `gold_returns`, which stores the result of calling `log_return` on the `Gold_Price` column of the `gold_prices` DataFrame.

In [13]:
gold_returns = log_return(gold_prices['Gold_Price'])
gold_returns

0            NaN
1       0.007691
2      -0.001982
3      -0.002736
4      -0.019199
          ...   
5386   -0.003271
5387   -0.007964
5388   -0.001740
5389    0.000000
5390    0.002261
Name: Gold_Price, Length: 5391, dtype: float64

14. Now create log return variables for each additional dataset (`crudeoil_returns`, `sap_returns`, etc.).

    Remember that you only need to pass in the column of the DataFrame that contains the pricing data. In the case of gold it was `gold_prices['Gold_Price']`, but the column name will vary for each dataset.

In [14]:
crude_oil_returns = log_return(crude_oil_prices['Crude_Oil_Price'])
crude_oil_returns

0            NaN
1      -0.025003
2       0.003104
3       0.000295
4       0.013921
          ...   
4995   -0.027916
4996    0.001529
4997   -0.022403
4998   -0.065372
4999    0.028773
Name: Crude_Oil_Price, Length: 5000, dtype: float64

In [15]:
nasdaq_returns = log_return(nasdaq_data['NASDAQ100'])
nasdaq_returns

DATE
1999-01-01         NaN
1999-01-04         NaN
1999-01-05    0.025876
1999-01-06    0.031526
1999-01-07    0.001221
                ...   
2018-12-26         NaN
2018-12-27    0.004069
2018-12-28   -0.000483
2018-12-31    0.007087
2019-01-01         NaN
Name: NASDAQ100, Length: 5218, dtype: float64

In [16]:
sap_returns = log_return(sap_data['SP500'])
sap_returns

DATE
2011-03-14         NaN
2011-03-15   -0.011264
2011-03-16   -0.019687
2011-03-17    0.013309
2011-03-18    0.004293
                ...   
2018-12-26         NaN
2018-12-27    0.008526
2018-12-28   -0.001242
2018-12-31    0.008457
2019-01-01         NaN
Name: SP500, Length: 2037, dtype: float64

In [17]:
gdp_returns = log_return(gdp_data['NY.GDP.MKTP.CD'])
gdp_returns

country        year
United States  2019         NaN
               2018   -0.040615
               2017   -0.052921
               2016   -0.042083
               2015   -0.026545
               2014   -0.039026
               2013   -0.043275
               2012   -0.035650
               2011   -0.041243
               2010   -0.036063
               2009   -0.036900
               2008    0.018100
               2007   -0.017898
               2006   -0.045096
               2005   -0.057963
               2004   -0.065203
               2003   -0.063851
               2002   -0.046611
               2001   -0.032961
               2000   -0.031631
               1999   -0.062554
Name: NY.GDP.MKTP.CD, dtype: float64

In [18]:
export_returns = log_return(export_data['NE.EXP.GNFS.CN'])
export_returns

country        year
United States  2019         NaN
               2018    0.005533
               2017   -0.062895
               2016   -0.064079
               2015    0.017222
               2014    0.045653
               2013   -0.042320
               2012   -0.036803
               2011   -0.041123
               2010   -0.130190
               2009   -0.154485
               2008    0.149476
               2007   -0.100832
               2006   -0.120293
               2005   -0.120663
               2004   -0.102871
               2003   -0.127967
               2002   -0.036798
               2001    0.025597
               2000    0.067562
               1999   -0.099148
Name: NE.EXP.GNFS.CN, dtype: float64

## Comparing Return Volatility

15. We are now ready to compare the volatility of each type of data.

    Variance, in the context of financial data, tells us how volatile an investment is. Use Panda's var() function to calculate the variance of the commodities, stocks and World Bank data returns, and print the results.

    The results can be interpreted in a number of ways, but generally, the higher the variance the more volatile the data.

    What conclusions can you draw from this data? Which dataset was the most volatile? Did any datasets have similar variances?

In [19]:
volatility_df = pd.DataFrame(data={
    'Gold': [gold_returns.var()],
    'Crude Oil': [crude_oil_returns.var()],
    'NASDAQ100': [nasdaq_returns.var()], 
    'S&P 500': [sap_returns.var()],
    'GDP': [gdp_returns.var()],
    'Export': [export_returns.var()]
    })

volatility_df

Unnamed: 0,Gold,Crude Oil,NASDAQ100,S&P 500,GDP,Export
0,0.000114,0.000556,0.000318,8.5e-05,0.000342,0.006202


## Conclusions
```
gold: 0.00011375928671558508
oil: 0.0005563207795629881
nasdaq: 0.0003178379833057229
sap: 8.860342194008153e-05 which is equivalent to 0.00008860342194008153
gdp: 0.0003576341131987123
export: 0.006775724487950144
```
You should have gotten something similar to what we have above, and this is generally in line with what we would expect.

The S&P 500, a collection of 500 large companies listed on stock exchanges in the United States, has the smallest variance, and thus is the least volatile. Given that the S&P 500 index is a weighted measurement of many stocks across a variety of industries, it is seen as a safer, diversified investment.

Gold, notorious for being a stable investment has the second smallest variance.

Crude oil is the most volatile, which makes sense as gas prices are often unpredictable, especially in the last 20 years.

The stocks are interesting. The NASDAQ 100 is more volatile than the S&P 500, which, when you think about it makes sense, as the S&P 500 is far more diversified and tracks more of the market.

Then finally we have GDP and exports.

Exports are very volatile, which could have to do with industries moving overseas in the last 20 years, and global competition for the production of goods.

GDP is actually fairly similar to the NASDAQ 100 in terms of volatility, which is perhaps an interesting correlation.