# Introduction

Google Trends gives us an estimate of search volume. Let's explore if search popularity relates to other kinds of data. Perhaps there are patterns in Google's search volume and the price of Bitcoin or a hot stock like Tesla. Perhaps search volume for the term "Unemployment Benefits" can tell us something about the actual unemployment rate? 

Data Sources: <br>
<ul>
<li> <a href="https://fred.stlouisfed.org/series/UNRATE/">Unemployment Rate from FRED</a></li>
<li> <a href="https://trends.google.com/trends/explore">Google Trends</a> </li>  
<li> <a href="https://finance.yahoo.com/quote/TSLA/history?p=TSLA">Yahoo Finance for Tesla Stock Price</a> </li>    
<li> <a href="https://finance.yahoo.com/quote/BTC-USD/history?p=BTC-USD">Yahoo Finance for Bitcoin Stock Price</a> </li>
</ul>

# Import Statements

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Read the Data

Download and add the .csv files to the same folder as your notebook.

In [None]:
df_tesla = pd.read_csv('TESLA Search Trend vs Price.csv')

df_btc_search = pd.read_csv('Bitcoin Search Trend.csv')
df_btc_price = pd.read_csv('Daily Bitcoin Price.csv')

df_unemployment = pd.read_csv('UE Benefits Search vs UE Rate 2004-19.csv')

# Data Exploration

### Tesla

**Challenge**: <br>
<ul>
<li>What are the shapes of the dataframes? </li>
<li>How many rows and columns? </li>
<li>What are the column names? </li>
<li>Complete the f-string to show the largest/smallest number in the search data column</li> 
<li>Try the <code>.describe()</code> function to see some useful descriptive statistics</li>
<li>What is the periodicity of the time series data (daily, weekly, monthly)? </li>
<li>What does a value of 100 in the Google Trend search popularity actually mean?</li>
</ul>

In [None]:
print(df_tesla.head())
print(df_btc_search.head())
print(df_btc_price.head())
print(df_unemployment.head())

In [None]:
print(df_tesla.shape)
print(df_btc_search.shape)
print(df_btc_price.shape)
print(df_unemployment.shape)

In [None]:
print(df_tesla.describe())
print(df_btc_search.describe())
print(df_unemployment.describe())

* Search values represents a percentage of total searches based on region & time.
* i.e 100 = peak popularity, 50 = half as popular etc.
* Actual search volumes are not published

# Data Cleaning

### Check for Missing Values

**Challenge**: Are there any missing values in any of the dataframes? If so, which row/rows have missing values? How many missing values are there?

In [None]:
df_tesla.isna().sum()

In [None]:
df_btc_search.isna().sum()

In [None]:
df_unemployment.isna().sum()

In [None]:
print(f'Missing values for Tesla?: ')
print(f'Missing values for U/E?: ')
print(f'Missing values for BTC Search?: ')

In [None]:
df_btc_price.isna().sum()

In [None]:
df_btc_price[df_btc_price.isna().any(axis=1)]

In [None]:
print(f'Missing values for BTC price?: ')

In [None]:
print(f'Number of missing values: ')

**Challenge**: Remove any missing values that you found. 

In [None]:
df_btc_price.dropna(inplace=True)

### Convert Strings to DateTime Objects

**Challenge**: Check the data type of the entries in the DataFrame MONTH or DATE columns. Convert any strings in to Datetime objects. Do this for all 4 DataFrames. Double check if your type conversion was successful.

In [None]:
df_tesla['MONTH'] = pd.to_datetime(df_tesla['MONTH'])
df_btc_price['DATE'] = pd.to_datetime(df_btc_price['DATE'])
df_btc_search['MONTH'] = pd.to_datetime(df_btc_search['MONTH'])
df_unemployment['MONTH'] = pd.to_datetime(df_unemployment['MONTH'])


### Converting from Daily to Monthly Data

[Pandas .resample() documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) <br>


#### Resampling Purpose:
- bitcoin price is daily data
- bitcoin search is monthly data
- need to convert daily data to monthly data
    - 'M' : Monthly
    - 'Y' : Yearly
    - 'T' : minute

In [None]:
# Last available price (Take price at month end)
df_btc_price_monthly = df_btc_price.resample(rule='M', on='DATE').last()

# Average price over entire month (Take average price over entire month)
# df_btc_price_monthly = df_btc_price.resample(rule='M', on='DATE').last()

In [None]:
print(df_btc_price_monthly.shape)
print(df_btc_search.shape)

# Data Visualisation

### Notebook Formatting & Style Helpers

In [None]:
# Create locators for ticks on the time axis

In [None]:
# Register date converters to avoid warning messages

### Tesla Stock Price v.s. Search Volume

**Challenge:** Plot the Tesla stock price against the Tesla search volume using a line chart and two different axes. Label one axis 'TSLA Stock Price' and the other 'Search Trend'. 

In [None]:
df_tesla.head()

In [None]:
years = mdates.YearLocator()
months = mdates.MonthLocator()
years_fmt = mdates.DateFormatter('%Y')

In [None]:
fig = plt.figure(figsize=(16, 8))
ax1 = plt.gca()
ax2 = ax1.twinx() # both share same x-axis
plt.title('Tesla Web Search vs Price', fontsize=18, color='white')


ax1.tick_params(axis='x', colors='white', labelsize=14, size=14, rotation=45)
ax1.tick_params(axis='y', colors='white', labelsize=14, size=14)
ax2.tick_params(axis='both', colors='white', labelsize=14, size=14)

ax1.plot(df_tesla['MONTH'], df_tesla['TSLA_USD_CLOSE'], color='red')
ax2.plot(df_tesla['MONTH'], df_tesla['TSLA_WEB_SEARCH'], color='blue')

ax1.set_xlabel('Year', color='white', fontsize=18)
ax1.set_ylabel('TSLA Stock Price', color='coral', fontsize=14)
ax2.set_ylabel('TSLA Search Trend', color='skyblue', fontsize=14)

ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_minor_locator(months)
ax1.tick_params(which='minor', color='white', size=8)

ax1.xaxis.set_major_formatter(years_fmt)


### Bitcoin (BTC) Price v.s. Search Volume

**Challenge**: Create the same chart for the Bitcoin Prices vs. Search volumes. <br>
1. Modify the chart title to read 'Bitcoin News Search vs Resampled Price' <br>
2. Change the y-axis label to 'BTC Price' <br>
3. Change the y- and x-axis limits to improve the appearance <br>
4. Investigate the [linestyles](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.plot.html ) to make the BTC price a dashed line <br>
5. Investigate the [marker types](https://matplotlib.org/3.2.1/api/markers_api.html) to make the search datapoints little circles <br>
6. Were big increases in searches for Bitcoin accompanied by big increases in the price?

In [None]:
df_btc_price_monthly.head()

In [None]:
fig = plt.figure(figsize=(16, 8))
ax1 = plt.gca()
ax2 = ax1.twinx() # both share same x-axis
plt.title('Bitcoin News Search vs Resampled Price', fontsize=18, color='white')


ax1.tick_params(axis='x', colors='white', labelsize=14, size=14, rotation=45)
ax1.tick_params(axis='y', colors='white', labelsize=14, size=14)
ax2.tick_params(axis='both', colors='white', labelsize=14, size=14)

ax1.plot(df_btc_price_monthly.index, df_btc_price_monthly['CLOSE'], color='red', linestyle='--')
ax2.plot(df_btc_search['MONTH'], df_btc_search['BTC_NEWS_SEARCH'], color='blue', marker='o')

ax1.set_xlabel('Year', color='white', fontsize=18)
ax1.set_ylabel('BTC Monthly Close Price', color='coral', fontsize=14)
ax2.set_ylabel('BTC Monthly News Search', color='skyblue', fontsize=14)

ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_minor_locator(months)
ax1.tick_params(which='minor', color='white', size=8)

ax1.xaxis.set_major_formatter(years_fmt)

### Unemployement Benefits Search vs. Actual Unemployment in the U.S.

**Challenge** Plot the search for "unemployment benefits" against the unemployment rate. 
1. Change the title to: Monthly Search of "Unemployment Benefits" in the U.S. vs the U/E Rate <br>
2. Change the y-axis label to: FRED U/E Rate <br>
3. Change the axis limits <br>
4. Add a grey [grid](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.grid.html) to the chart to better see the years and the U/E rate values. Use dashes for the line style<br> 
5. Can you discern any seasonality in the searches? Is there a pattern? 

In [None]:
df_unemployment.head()

In [None]:
fig = plt.figure(figsize=(16, 8))
ax1 = plt.gca()
ax2 = ax1.twinx() # both share same x-axis
plt.title('Unemployment Benefits Search vs Unemployment Rate', fontsize=18, color='white')


ax1.tick_params(axis='x', colors='white', labelsize=14, size=14, rotation=45)
ax1.tick_params(axis='y', colors='white', labelsize=14, size=14)
ax2.tick_params(axis='both', colors='white', labelsize=14, size=14)

ax1.grid(color='grey', linestyle='--')
ax1.plot(df_unemployment['MONTH'], df_unemployment['UNRATE'], color='red', linestyle='--')
ax2.plot(df_unemployment['MONTH'], df_unemployment['UE_BENEFITS_WEB_SEARCH'], color='blue')

ax1.set_xlabel('Year', color='white', fontsize=18)
ax1.set_ylabel('Unemployment Rate', color='coral', fontsize=14)
ax2.set_ylabel('Unemployment Benefits Search', color='skyblue', fontsize=14)

ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_minor_locator(months)
ax1.tick_params(which='minor', color='white', size=8)

ax1.xaxis.set_major_formatter(years_fmt)

**Challenge**: Calculate the 3-month or 6-month rolling average for the web searches. Plot the 6-month rolling average search data against the actual unemployment. What do you see in the chart? Which line moves first?


In [None]:
fig = plt.figure(figsize=(16, 8))
ax1 = plt.gca()
ax2 = ax1.twinx() # both share same x-axis
plt.title('Unemployment Benefits Search vs Unemployment Rate', fontsize=18, color='white')


ax1.tick_params(axis='x', colors='white', labelsize=14, size=14, rotation=45)
ax1.tick_params(axis='y', colors='white', labelsize=14, size=14)
ax2.tick_params(axis='both', colors='white', labelsize=14, size=14)

rolling_df = df_unemployment[['UNRATE', 'UE_BENEFITS_WEB_SEARCH']].rolling(
    window=5
).mean()

ax1.grid(color='grey', linestyle='--')
ax1.plot(df_unemployment['MONTH'], rolling_df['UNRATE'], color='red', linestyle='--')
ax2.plot(df_unemployment['MONTH'], rolling_df['UE_BENEFITS_WEB_SEARCH'], color='blue')

ax1.set_xlabel('Year', color='white', fontsize=18)
ax1.set_ylabel('Unemployment Rate', color='coral', fontsize=14)
ax2.set_ylabel('Unemployment Benefits Search', color='skyblue', fontsize=14)

ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_minor_locator(months)
ax1.tick_params(which='minor', color='white', size=8)

ax1.xaxis.set_major_formatter(years_fmt)

### Including 2020 in Unemployment Charts

**Challenge**: Read the data in the 'UE Benefits Search vs UE Rate 2004-20.csv' into a DataFrame. Convert the MONTH column to Pandas Datetime objects and then plot the chart. What do you see?