# Stock Market Time-Series Analysis and Forecasting in Python
------
### Introduction to the Stock Market
The New York Stock Exchange (NYSE), the American stock exchange, 
sanctions the buying and trading of shares for publicly 
registered companies. The public stock exchange allows businesses to
raise financial capital by selling shares of ownership. The value of a share to 
investors is measured by a stock price. There are many variables and uncertainties which
can influence a stock's price away from market equilibrium. Overly optimistic or pessimistic 
conditions can drive stock value excessively high or low. The erratic nature of 
stock behavior creates a market risk. Investors look for stocks with 
value expected to rise while avoiding stocks with value expected to decrease.
Knowledge of stock price movement is essential for minimizing market risk.    

### Objective
The goal of this project is to explore the knowledge discovery from
financial data (KDD) process for several stocks in the technology sector (listed in table). 
The information (historical stock quotes) will be retrieved live from the Yahoo! Finance web service. 
The Collected information is then formatted as a financial time series. 
This mathematical model is ideal for finding statistical descriptions and data visualizations of 
asset evaluation over time. 

| Technology Stock | Ticker |
|---------------|--------------|
| Adv Micro Devices | (**AMD**) | 
| Cisco Systems Inc | (**CSCO**) | 
| Intel Corp | (**INTC**) | 
| Micron Technology | (**MU**) | 
| Nvidia Corp | (**NVDA**) |
| Oracle Corp | (**ORCL**) | 
| Qualcomm Inc | (**QCOM**) | 

The process of knowledge discovery from the financial data will be split into two parts:
   
**Part 1**: For the first part, interesting knowledge will be discovered using statistical methods. These will describe 
the collection, analysis, interpretation, and presentation of the data. A *statistical model* is established
mathematical functions describing the behavior of objects in terms of random variables and their associated
probability distributions. This Project is based on the previously mentioned statistical model 
known as a time series. A central idea of the project will be understanding a financial time series using 
*statistical descriptions*. These will be used to identify the properties of the series and find 
data values which are noise or outliers. This will merge with the last concept of Part 1, *relevance analysis*, 
which is the first step in the data mining functionality: Classification and Regression for Predictive Analysis. 
This is described as attempting to identify attributes which are more relevant to the predictive process. 

**Part 2**: The second part of the knowledge discovery from data (KDD) process is based on 
the *predictive analysis* concepts of the data mining functionality: Classification and Regression for 
Predictive Analysis. The predictive process is a type of *Supervised Learning* because the extracted 
dataset will serve as "supervision" for the learning process. Attributes from the preprocessed 
data in Part 1 will make up the training set for the Classification learning phase. A classifier will 
be constructed to predict a financial attribute.    


## (Part 1) Time Series Analysis
---------------
### Description
------
##### Definition:

The **Time Series** on a variable/attribute *a* is indicated as *a<sub>t</sub>*, with the subscript t 
representing time. The first and last observations available on attribute *a* are at t = 1, and t = T.


The set of times t = {1, 2,.. ,T} is referred to as the *observation period*.
<pre>    
    Observations are typically measured in equally spaced intervals (frequency), (i.e minute, hour, 
    day, etc... for finance). 
</pre>

Essentially, a time series contains quantitative observations on one or more assessable characteristics of
an entity, taken at multiple points in time. 

For financial data, because the mean level cannot be regarded as a constant, the series is said to be *nonstationary*
    
### Analysis:

time series analysis applies different statistical methods to explore and model the internal 
structures of the time series data. 

Several interesting internal structures are:  trend, seasonality, stationarity, autocorrelation, etc..

The internal structures require special formulation and techniques for their analysis 

##### Frequency

Financial data is a fixed frequency time series, meaning the 
data points occur at regular intervals. This project will focus on financial time series with a daily 
frequency. Higher frequencies in financial time series is referred to as "high frequency" or "tick-by-tick" data.


##### Time Series Package Imports


In [9]:
#  NumPy and Pandas imports
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

#  Reading time series
from pandas_datareader import data

#  Time stamps
import datetime as datetime

#  Visualization (sns is a visualization library based on matplotlib)
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline


### Getting Stock Price Quotes
-----
The *historical stock quote* is made up of tuples with 6 associated class labels.
Therefore, a tuple is represented as a 6-dimensional attribute vector (High, Low, Open, Close, Volume, Adj Close).

Attribute Information:

*   High: The highest share price for Date
*   Low: The lowest share price for Date
*   Open: The opening share price on Date
*   Close: The closing share price on Date
*   Volume: The number of shares traded on Date
*   Adj Close: The closing price adjusted for stock splits and dividends

The pandas_datareader.Data module returns a *Panel* object. This can be represented as a
3-Dimensional matrix. The first dimension are the date-time indexes, 
the second dimension is made up of the six attributes Yahoo! Finance returns, 
the third dimension are the ticker identifiers. 


In [27]:
#  Stock tickers to retrieve historical index data
ticker_index_data = ['AMD', 'CSCO', 'INTC', 'MU', 'NVDA', 'ORCL', 'QCOM']

#  Assign a database with historical stock quotes from Yahoo! Finance to each ticker
for ticker in ticker_index_data:
    globals()[ticker] = data.get_data_yahoo(ticker, '2017-10-10', '2019-04-10')
                                                    #  Changed start to 1.5 years

The pandas DataFrame **.tail()** method shows the last five rows of stock 
quotes

In [26]:
#  Show the five most recent results for AMD
AMD.tail()


Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-05,29.690001,28.799999,29.639999,28.98,65662700,28.98
2019-04-08,28.950001,28.18,28.690001,28.530001,58002500,28.530001
2019-04-09,28.379999,27.190001,28.24,27.24,75539800,27.24
2019-04-10,28.120001,27.32,27.459999,27.83,64368100,27.83
2019-04-11,28.049999,27.459999,27.809999,27.790001,44801200,27.790001


#### Group Technology Stock Price Quotes 
Use the pandas Series concat() method to concatenate pandas objects.   


In [30]:
tech_stocks = pd.concat([AMD, CSCO, INTC, MU, NVDA, ORCL, QCOM], axis=1, keys=ticker_index_data)
#  The "axis" param of 1 represents concatenating along the column axis. 
#  The "keys" param is a hierarchical index for each technology stock ticker. 


Use the pandas DataFrame **.head()** method to display the first 5 rows of stock quotes for
the group DataFrame

In [32]:
tech_stocks.head()


Unnamed: 0_level_0,AMD,AMD,AMD,AMD,AMD,AMD,CSCO,CSCO,CSCO,CSCO,...,ORCL,ORCL,ORCL,ORCL,QCOM,QCOM,QCOM,QCOM,QCOM,QCOM
Unnamed: 0_level_1,High,Low,Open,Close,Volume,Adj Close,High,Low,Open,Close,...,Open,Close,Volume,Adj Close,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2017-10-10,13.79,13.44,13.72,13.7,43304000,13.7,33.91,33.470001,33.880001,33.549999,...,48.16,48.209999,15630500.0,47.049263,53.900002,52.900002,52.950001,53.869999,8761700.0,50.250084
2017-10-11,13.96,13.61,13.62,13.88,38746600,13.88,33.630001,33.25,33.380001,33.59,...,48.16,48.279999,12588300.0,47.117577,54.380001,53.66,53.790001,54.119999,9427300.0,50.483288
2017-10-12,14.37,13.81,13.85,14.2,69874100,14.2,33.459999,33.169998,33.259998,33.259998,...,48.27,48.23,11715600.0,47.068783,54.18,52.959999,53.880001,53.0,7062300.0,49.438553
2017-10-13,14.41,14.12,14.32,14.22,37515800,14.22,33.57,33.32,33.400002,33.470001,...,48.369999,48.610001,10142300.0,47.439636,53.380001,52.740002,53.380001,52.82,7005600.0,49.270645
2017-10-16,14.35,14.12,14.25,14.26,34136800,14.26,33.639999,33.470001,33.599998,33.540001,...,48.610001,48.860001,9378100.0,47.68362,53.0,52.310001,52.98,52.380001,5930900.0,48.86021


### Exploring the Data
-----

##### Data Structure
The Pandas DataFrame **.shape** attribute displays the database's dimensionality

In [34]:
AMD.shape


(378, 6)

The Pandas DataFrame **.dtypes** attribute returns a series with the data type of
each column


In [33]:
AMD.dtypes


High         float64
Low          float64
Open         float64
Close        float64
Volume         int64
Adj Close    float64
dtype: object

##### Descriptive Statistics
The pandas Series' **.describe()** method generates descriptive statistics summarizing the central measures
(min, mean, max), dispersion (std), and shape of distribution.
 
The mean, std, min, max, and percentiles for each attribute in the AMD stock quote. 

In [7]:
AMD.describe()


Unnamed: 0,High,Low,Open,Close,Volume,Adj Close
count,378.0,378.0,378.0,378.0,378.0,378.0
mean,17.925212,17.053095,17.494392,17.497381,79278000.0,17.497381
std,6.445175,6.008648,6.23715,6.233647,44332420.0,6.233647
min,9.77,9.04,9.08,9.53,11035800.0,9.53
25%,12.0425,11.5625,11.73,11.8125,46798450.0,11.8125
50%,16.85,16.15,16.51,16.535,65789950.0,16.535
75%,22.3875,21.122499,21.8075,22.0625,99657580.0,22.0625
max,34.139999,32.189999,33.18,32.720001,325058400.0,32.720001


##### Visualising Historical Closing Quotes of Financial Data

Matplotlib displays a historical view of the shares closing price ('Adj Closing' preferred to 'Closing')

In [18]:
#  Add each stock's historical closing price time series to plot
AMD['Adj Close'].plot(legend=True, label='AMD', title='Historical View of Closing Prices', figsize=(12, 8))
CSCO['Adj Close'].plot(legend=True, label='CSCO')
INTC['Adj Close'].plot(legend=True, label='INTC')
MU['Adj Close'].plot(legend=True, label='MU')
NVDA['Adj Close'].plot(legend=True, label='NVDA')
ORCL['Adj Close'].plot(legend=True, label='ORCL')
QCOM['Adj Close'].plot(legend=True, label='QCOM')


TypeError: list indices must be integers or slices, not str