# Create Data
In this notebook we'll build the data to make Facebook's Prophet model able to predict with it.

We will import our stocks data from *Yahoo*, which is a widely used site for importing this kind of data.

Let's import the necessary libraries:

In [11]:
# Imports
import os
os.chdir('../utils')
import pandas_datareader.data as reader
import datetime as dt
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from utils import condition_data

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: '../utils'

We are going to establish a start-date and end-date for our data.

Start-date will be 20 years before now. If there's no available data for that period, Yahoo will return the oldest available data instead. The end-date will be now.

In [3]:
# Establishing start-date and end-date
end = dt.datetime.now()
start = dt.datetime(end.year - 20, end.month, end.day)

In [4]:
# Requesting data
df_amazon = reader.get_data_yahoo('AMZN', start, end)
df_apple = reader.get_data_yahoo('AAPL', start, end)
df_bitcoin = reader.get_data_yahoo('BTC-USD', start, end)
df_ford = reader.get_data_yahoo('F', start, end)
df_microsoft = reader.get_data_yahoo('MSFT', start, end)
df_tesla = reader.get_data_yahoo('TSLA', start, end)

Let's take a look of our data:

In [5]:
df_amazon

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2001-11-26,12.220000,9.800000,9.830000,12.210000,50689200,12.210000
2001-11-27,12.250000,11.220000,12.050000,11.480000,34308800,11.480000
2001-11-28,12.400000,11.180000,11.240000,11.590000,48516200,11.590000
2001-11-29,11.900000,10.790000,11.890000,11.150000,20274000,11.150000
2001-11-30,11.550000,10.800000,11.300000,11.320000,8888800,11.320000
...,...,...,...,...,...,...
2021-11-19,3762.149902,3675.719971,3712.689941,3676.570068,4936700,3676.570068
2021-11-22,3713.459961,3567.500000,3676.379883,3572.570068,4842200,3572.570068
2021-11-23,3621.050049,3527.709961,3585.040039,3580.040039,3690200,3580.040039
2021-11-24,3613.639893,3536.850098,3562.669922,3580.409912,2328000,3580.409912


Our DataFrames have daily data from each stock.

In order to make predictions, we'll using the `Close` price only.

Facebook's Prophet models requires specific column names:
- `ds` for the date
- `y` for the target

We've build a function to make this process automatically:

In [6]:
# Building the data
df_amazon = condition_data(df_amazon)
df_apple = condition_data(df_apple)
df_bitcoin = condition_data(df_bitcoin)
df_ford = condition_data(df_ford)
df_microsoft = condition_data(df_microsoft)
df_tesla = condition_data(df_tesla)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


Showing the current state of the data:

In [7]:
df_amazon

Unnamed: 0,ds,y
0,2001-11-26,12.210000
1,2001-11-27,11.480000
2,2001-11-28,11.590000
3,2001-11-29,11.150000
4,2001-11-30,11.320000
...,...,...
5032,2021-11-19,3676.570068
5033,2021-11-22,3572.570068
5034,2021-11-23,3580.040039
5035,2021-11-24,3580.409912


Finally, we save our datasets in CSV format:

In [8]:
# Saving our data
df_amazon.to_csv('../data/amazon.csv', index=False)
df_apple.to_csv('../data/apple.csv', index=False)
df_bitcoin.to_csv('../data/bitcoin.csv', index=False)
df_ford.to_csv('../data/ford.csv', index=False)
df_microsoft.to_csv('../data/microsoft.csv', index=False)
df_tesla.to_csv('../data/tesla.csv', index=False)