In [1]:
import numpy as np
import pandas as pd
import quandl

import os
from dotenv import load_dotenv
load_dotenv()

import urllib3 # to get rid of a message error I had 
urllib3.disable_warnings()

from functools import reduce # will be used to merge all the df

from function_file import clean_df

## Quandl 

Quandl is a premier publisher of alternative data for institutional investors. A dedicated team of data scientists, quants and engineers combine uncompromising curation, high-quality standards and experienced data science application to provide some of the most robust data available today. Quandl also publishes free data, scraped from the web and delivered via Nasdaq Data Link’s industry-leading data delivery platform [Source](https://data.nasdaq.com/publishers/QDL).

It was acquired by Nasdaq in 2018, and explain now why it is integrated within the Nasdaq website. 

In [2]:
# Loading quandl and accessing it
QUANDL_PW = os.getenv("QUANDL")
quandl.ApiConfig.api_key = QUANDL_PW
quandl.ApiConfig.verify_ssl = False # By default, SSL verification is enabled. To bypass SSL verification

### 1. M2 - Measure of Money Suppply 1959-2022 - Monthly basis

M2 is a measure of the U.S. money stock that includes M1 (currency and coins held by the non-bank public, checkable deposits, and travelers' checks) plus savings deposits (including money market deposit accounts), small time deposits under $100,000, and shares in retail money market mutual funds [Source](https://www.stlouisfed.org/financial-crisis/data/m2-monetary-aggregate#:~:text=M2%20is%20a%20measure%20of,retail%20money%20market%20mutual%20funds.).

In [3]:
# https://data.nasdaq.com/data/FED/M2_N_M-m2-not-seasonally-adjusted-monthly

df_m2_us = quandl.get('FED/M2_N_M')
df_m2_us.rename(columns={'Value': 'M2'}, inplace=True, errors='raise')
clean_df(df_m2_us,'Date')

Unnamed: 0,Date,M2
0,1959-01-01,289.8
1,1959-02-01,287.7
2,1959-03-01,287.9
3,1959-04-01,290.2
4,1959-05-01,290.2
...,...,...
761,2022-06-01,21585.4
762,2022-07-01,21578.8
763,2022-08-01,21546.4
764,2022-09-01,21459.5


### 2. Average of GDP and GDI, Quarterly, Transactions, NSA 1949-2022 - Annual/Quarterly

GDI and GDP are two slightly different measures of a nation's economic activity. GDI counts what all participants in the economy make or "take in" (like wages, profits, and taxes). GDP counts the value of what the economy produces (like goods, services, and technology) [Source](https://www.investopedia.com/terms/g/gdi.asp#:~:text=GDI%20and%20GDP%20are%20two,%2C%20services%2C%20and%20technology).

In [6]:
# https://data.nasdaq.com/data/FED/FU086902203_Q-average-of-gdp-and-gdi-quarterly-transactions-nsa

df_gdp_us = quandl.get('FED/FU086902203_Q')
df_gdp_us.rename(columns={'Value': 'GDP'}, inplace=True, errors='raise')
clean_df(df_gdp_us,'Date')

Unnamed: 0,Date,GDP
0,1946-12-01,227.0
1,1947-12-01,249.0
2,1948-12-01,275.0
3,1949-12-01,272.0
4,1950-12-01,300.0
...,...,...
283,2021-06-01,5734201.0
284,2021-09-01,5879444.0
285,2021-12-01,6077834.0
286,2022-03-01,6201234.0


### 3. S&P 500 inflation adjusted by month 1871-2022 -Monthly

The S&P 500 is an equity index made up of 500 of the largest companies traded on either the NYSE, Nasdaq, or Cboe. The S&P 500 is calculated by adding each company's float-adjusted market capitalization [Source](https://www.investopedia.com/articles/investing/090414/sp-500-index-you-need-know.asp#:~:text=The%20S%26P%20500%20is%20an,company's%20float%2Dadjusted%20market%20capitalization.).

In [7]:
# https://data.nasdaq.com/data/MULTPL/SP500_INFLADJ_MONTH-sp-500-inflation-adjusted-by-month

df_sp500 = quandl.get('MULTPL/SP500_INFLADJ_MONTH')
df_sp500.rename(columns={'Value': 'S&P 500'}, inplace=True, errors='raise')
clean_df(df_sp500,'Date')
df_sp500 = df_sp500.sort_values('Date').drop_duplicates('Date',keep='first') # To keep only the first value as it the end for the same month, I had value for the beginning and the end of the month
df_sp500

Unnamed: 0,Date,S&P 500
0,1871-01-01,105.76
1,1871-02-01,104.02
2,1871-03-01,105.01
3,1871-04-01,112.01
4,1871-05-01,117.56
...,...,...
1842,2022-07-01,3918.75
1843,2022-08-01,4167.51
1845,2022-09-01,3850.52
1847,2022-10-01,3726.05


### 4. University of Michigan Consumer Survey, Index of Consumer Sentiment  1952-2022 - Quarterly/Annual


The US Index of Consumer Sentiment (ICS), as provided by University of Michigan, tracks consumer sentiment in the US, based on surveys on random samples of US households. The index aids in measuring consumer sentiments in personal finances, business conditions, among other topics. Historically, the index displays pessimism in consumers' confidence during recessionary periods, and increased consumer confidence in expansionary periods [Source](https://ycharts.com/indicators/us_consumer_sentiment_index#:~:text=US%20Index%20of%20Consumer%20Sentiment%20is%20at%20a%20current%20level,15.73%25%20from%20one%20year%20ago).

In [8]:
# https://data.nasdaq.com/data/UMICH/SOC1-university-of-michigan-consumer-surveyindex-of-consumer-sentiment

df_conssen = quandl.get('UMICH/SOC1')
df_conssen.rename(columns={'Index': 'Cons. Sent.'}, inplace=True, errors='raise')
clean_df(df_conssen,'Date')

Unnamed: 0,Date,Cons. Sent.
0,1952-11-01,86.2
1,1953-02-01,90.7
2,1953-08-01,80.8
3,1953-11-01,80.7
4,1954-02-01,82.0
...,...,...
625,2022-06-01,50.0
626,2022-07-01,51.5
627,2022-08-01,58.2
628,2022-09-01,58.6


### 5. Big Mac Index United States - 2000-2022 - Semi-Annually 

The big mac index was invented by The Economist in 1986 as a lighthearted guide to whether currencies are at their “correct” level. It is based on the theory of purchasing-power parity (PPP), the notion that in the long run exchange rates should move towards the rate that would equalise the prices of an identical basket of goods and services (in this case, a burger) in any two countries [Source](https://www.economist.com/big-mac-index).

In [9]:
# https://data.nasdaq.com/data/ECONOMIST/BIGMAC_USA-big-mac-index-united-states

df_bigmac_us = quandl.get('ECONOMIST/BIGMAC_USA')
df_bigmac_us.rename(columns={'dollar_price': 'US Big Mac'}, inplace=True, errors='raise')
df_bigmac_us.drop(['local_price', 'dollar_ex', 'dollar_ppp', 'dollar_valuation', 'dollar_adj_valuation', 'euro_adj_valuation', 'sterling_adj_valuation', 'yen_adj_valuation', 'yuan_adj_valuation'], axis = 1, inplace = True) 
clean_df(df_bigmac_us,'Date')

Unnamed: 0,Date,US Big Mac
0,2000-04-01,2.51
1,2001-04-01,2.54
2,2002-04-01,2.49
3,2003-04-01,2.71
4,2004-05-01,2.9
5,2005-06-01,3.06
6,2006-01-01,3.15
7,2006-05-01,3.1
8,2007-01-01,3.22
9,2007-06-01,3.41


### 6. Consumer Price Index USA  1913-2022 - Monthly

The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. Indexes are available for the U.S. and various geographic areas. Average price data for select utility, automotive fuel, and food items are also available [Source](https://www.bls.gov/cpi/).

In [10]:
# https://data.nasdaq.com/data/RATEINF/CPI_USA-consumer-price-index-usa
df_cpi_us = quandl.get('RATEINF/CPI_USA')
df_cpi_us.rename(columns={'Value': 'CPI'}, inplace=True, errors='raise')
clean_df(df_cpi_us,'Date')

Unnamed: 0,Date,CPI
0,1913-01-01,9.800
1,1913-02-01,9.800
2,1913-03-01,9.800
3,1913-04-01,9.800
4,1913-05-01,9.700
...,...,...
1313,2022-06-01,296.311
1314,2022-07-01,296.276
1315,2022-08-01,296.171
1316,2022-09-01,296.808


### 7. Real estate loans; Residential real estate loans; Revolving home equity loans, all commercial banks, seasonally adjusted, Monthly 1987-2022 - Monthly


In [11]:
# https://data.nasdaq.com/data/FED/B1027NCBAM-real-estate-loans-residential-real-estate-loans-revolving-home-equity-loans-all-commercial-banks-seasonally-adjusted-monthly

df_re_us = quandl.get('FED/B1027NCBAM')
df_re_us.rename(columns={'Value': 'RE Loan'}, inplace=True, errors='raise')
clean_df(df_re_us,'Date')

Unnamed: 0,Date,RE Loan
0,1987-07-01,25001.4
1,1987-08-01,26182.3
2,1987-09-01,27443.4
3,1987-10-01,28477.1
4,1987-11-01,29571.7
...,...,...
419,2022-06-01,248017.4
420,2022-07-01,247933.0
421,2022-08-01,247482.8
422,2022-09-01,248786.1


### 8. Initial Claims 1967-2022 - Weekly 

Jobless claims are a statistic reported weekly by the U.S. Department of Labor that counts people filing to receive unemployment insurance benefits. There are two categories of jobless claims—initial, which comprises people filing for the first time, and continuing, which consists of unemployed people who have already been receiving unemployment benefits. Jobless claims are an important leading indicator of the state of the employment situation and the health of the economy [Source](https://www.investopedia.com/terms/j/jobless-claims.asp).

In [12]:
# https://data.nasdaq.com/data/FRED/ICSA-initial-claims

df_incl_us = quandl.get('FRED/ICSA')
df_incl_us.rename(columns={'Value': 'Initial Claims'}, inplace=True, errors='raise')

# to create a monthly average of the job claim 
df_incl_us.reset_index(drop=False, inplace=True) 
df_incl_us['Date'] = pd.to_datetime(df_incl_us['Date'])
df_incl_us = df_incl_us.resample('M', on='Date').mean()

clean_df(df_incl_us,'Date')

Unnamed: 0,Date,Initial Claims
0,1967-01-01,209000.0
1,1967-02-01,229000.0
2,1967-03-01,260750.0
3,1967-04-01,263000.0
4,1967-05-01,235750.0
...,...,...
657,2021-10-01,294000.0
658,2021-11-01,240000.0
659,2021-12-01,199750.0
660,2022-01-01,245600.0


### 9. Total Revolving Credit Owned and Securitized, Outstanding 1968-2022 - Monthly

Revolving credit is an agreement that permits an account holder to borrow money repeatedly up to a set dollar limit while repaying a portion of the current balance due in regular payments. Each payment, minus the interest and fees charged, replenishes the amount available to the account holder [Source](https://www.investopedia.com/terms/r/revolvingcredit.asp#:~:text=Revolving%20credit%20is%20an%20agreement,available%20to%20the%20account%20holder.).

In [13]:
# https://data.nasdaq.com/data/FRED/REVOLSL-total-revolving-credit-owned-and-securitized-outstanding
df_credit_us = quandl.get('FRED/REVOLSL')
df_credit_us.rename(columns={'Value': 'Revolving Credit'}, inplace=True, errors='raise')

clean_df(df_credit_us,'Date')

Unnamed: 0,Date,Revolving Credit
0,1968-01-01,1.31677
1,1968-02-01,1.35878
2,1968-03-01,1.41275
3,1968-04-01,1.47766
4,1968-05-01,1.53842
...,...,...
643,2021-08-01,1001.19737
644,2021-09-01,1011.00178
645,2021-10-01,1017.06997
646,2021-11-01,1036.35653


### 10. ISM Manufacturing: PMI Composite Index 1948-2022 - Monthly

The ISM manufacturing index, also known as the purchasing managers' index (PMI), is a monthly indicator of U.S. economic activity based on a survey of purchasing managers at more than 300 manufacturing firms. It is considered to be a key indicator of the state of the U.S. economy [Source](https://www.investopedia.com/terms/i/ism-mfg.asp).

In [14]:
# https://data.nasdaq.com/data/FRED/NAPM-ism-manufacturing-pmi-composite-index
df_pmi_us = quandl.get('FRED/NAPM')
df_pmi_us.reset_index(inplace=True)
df_pmi_us.rename(columns={'VALUE': 'PMI', 'DATE' : 'Date'}, inplace=True, errors='raise')
clean_df(df_pmi_us,'Date')

Unnamed: 0,index,Date,PMI
0,0,1948-01-01,51.7
1,1,1948-02-01,50.2
2,2,1948-03-01,43.3
3,3,1948-04-01,45.4
4,4,1948-05-01,49.5
...,...,...,...
816,816,2016-01-01,48.2
817,817,2016-02-01,49.5
818,818,2016-03-01,51.8
819,819,2016-04-01,50.8


### Merging the different dataframe to create one big DF 

In [15]:
# Compile the list of dataframes you want to merge

# List of dataframes that have monthly values
data_frames = [df_m2_us, df_sp500, df_cpi_us, df_re_us, df_incl_us, df_credit_us, df_pmi_us]

# List of the remaining dataframes, but including S&P 500
data_frames_2 = [df_sp500, df_gdp_us, df_conssen, df_bigmac_us]


df_merged_quandl = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), data_frames)
df_merged_quandl = df_merged_quandl.sort_values(by = ['Date'])
df_merged_quandl.reset_index (drop=True, inplace = True)
df_merged_quandl.to_csv (r'.\cleaned_data\02. Quandl\df_merged_quandl.csv', index = None, header=True) 

df_merged_2_quandl = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), data_frames_2)
df_merged_2_quandl = df_merged_2_quandl.sort_values(by = ['Date'])
df_merged_2_quandl.reset_index (drop=True, inplace = True)
df_merged_2_quandl.to_csv (r'.\cleaned_data\02. Quandl\df_merged_2_quandl.csv', index = None, header=True) 


In [16]:
df_merged_quandl

Unnamed: 0,Date,M2,S&P 500,CPI,RE Loan,Initial Claims,Revolving Credit,index,PMI
0,1871-01-01,,105.76,,,,,,
1,1871-02-01,,104.02,,,,,,
2,1871-03-01,,105.01,,,,,,
3,1871-04-01,,112.01,,,,,,
4,1871-05-01,,117.56,,,,,,
...,...,...,...,...,...,...,...,...,...
1818,2022-07-01,21578.8,3918.75,296.276,247933.0,,,,
1819,2022-08-01,21546.4,4167.51,296.171,247482.8,,,,
1820,2022-09-01,21459.5,3850.52,296.808,248786.1,,,,
1821,2022-10-01,21362.5,3726.05,298.012,250692.6,,,,


In [17]:
df_merged_2_quandl

Unnamed: 0,Date,S&P 500,GDP,Cons. Sent.,US Big Mac
0,1871-01-01,105.76,,,
1,1871-02-01,104.02,,,
2,1871-03-01,105.01,,,
3,1871-04-01,112.01,,,
4,1871-05-01,117.56,,,
...,...,...,...,...,...
1818,2022-07-01,3918.75,,51.5,5.15
1819,2022-08-01,4167.51,,58.2,
1820,2022-09-01,3850.52,,58.6,
1821,2022-10-01,3726.05,,59.9,
