# Understanding Stock Market Analysis
Stock market analysis can be divided into two parts- Fundamental Analysis and Technical Analysis.

## a. Fundamental Analysis
This includes analyzing the current business environment and finances to predict the future profitability of the company.

## b. Technical Analysis
This deals with charts and statistics to identify trends in the stock market.

### Predicting Stock with Python
we will discuss two ways to predict stock with Python- Support Vector Regression (SVR) and Linear Regression.

## Support Vector Regression (SVR)
It is a supervised learning algorithm which analyzes data for regression analysis. The cost function for building a model with SVR ignores training data close to the prediction model, so the model produced depends on only a subset of the training data.

SVMs are effective in high-dimensional spaces, with clear margin of separation and where the number of samples is less than the number of dimensions. However, they don’t perform so well with large or noisy datasets.

## Linear Regression
Linear Regression linearly models the relationship between a dependent variable and one or more independent variables.
 
This is simple to implement and is used for predicting numeric values. But this is prone to overfitting and can’t be used where there’s a non-linear relationship between dependent and independent variables.

# Code for Stock Prices Prediction

1. We will use the quandl package for the stock data for Amazon. Quandl indexes millions of numerical datasets across the world and extracts its most recent version for you. It cleans the dataset and lets you take it in whatever format you want.

In [2]:
# Imports Libraries
import quandl
import numpy as np 
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split

2. Get the Amazon stock data from quandl. Print the top 5 rows.

In [4]:
# Get Amazon stock data
amazon = quandl.get("WIKI/AMZN")
amazon.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1997-05-16,22.38,23.75,20.5,20.75,1225000.0,0.0,1.0,1.865,1.979167,1.708333,1.729167,14700000.0
1997-05-19,20.5,21.25,19.5,20.5,508900.0,0.0,1.0,1.708333,1.770833,1.625,1.708333,6106800.0
1997-05-20,20.75,21.0,19.63,19.63,455600.0,0.0,1.0,1.729167,1.75,1.635833,1.635833,5467200.0
1997-05-21,19.25,19.75,16.5,17.13,1571100.0,0.0,1.0,1.604167,1.645833,1.375,1.4275,18853200.0
1997-05-22,17.25,17.38,15.75,16.75,981400.0,0.0,1.0,1.4375,1.448333,1.3125,1.395833,11776800.0


3. Now get only the data for the Adjusted Close column. Print the first 5 rows for this.

In [5]:
# Get only the data for the Adjusted Close column
amazon = amazon[['Adj. Close']]
print(amazon.head())

            Adj. Close
Date                  
1997-05-16    1.729167
1997-05-19    1.708333
1997-05-20    1.635833
1997-05-21    1.427500
1997-05-22    1.395833


4. Set the forecast length to 30 days. Create a new column ‘Predicted’- this should have the data of the Adj. Close column shifted up by 30 rows. The last 5 rows will have NaN values for this column.

In [6]:
# Predict for 30 days; Predicted has the data of Adj. Close shifted up by 30 rows
forecast_len=30
amazon['Predicted'] = amazon[['Adj. Close']].shift(-forecast_len)
print(amazon.tail())

            Adj. Close  Predicted
Date                             
2018-03-21     1581.86        NaN
2018-03-22     1544.10        NaN
2018-03-23     1495.56        NaN
2018-03-26     1555.86        NaN
2018-03-27     1497.05        NaN


5. Now, drop the predicted column and create a NumPy array from it, call it ‘x’. This is the independent dataset. Remove the last 30 rows and print x.

In [15]:
# Drop the Predicted column, turn it into a NumPy array to create dataset
amazon = pd.DataFrame(amazon)

# Drop the Predicted column and turn the remaining data into a NumPy array
x = np.array(amazon.drop(columns=['Predicted']))

# Remove the last 30 rows
x = x[:-forecast_len]
print(x)

[[   1.72916667]
 [   1.70833333]
 [   1.63583333]
 ...
 [1350.47      ]
 [1338.99      ]
 [1386.23      ]]


6. Create a dependent dataset y and remove the last 30 rows. Print it then.

In [16]:
# Create dependent dataset for predicted values, remove the last 30 rows
y = np.array(amazon['Predicted'])
y = y[:-forecast_len]
print(y)

[1.54166667e+00 1.51583333e+00 1.58833333e+00 ... 1.49556000e+03
 1.55586000e+03 1.49705000e+03]


7. Split the datasets into training and testing sets. Keep 80% for training.

In [17]:
# Split datasets into training and test sets (80% and 20%)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

8. Create an SVR model now and train it.

In [18]:
# Create SVR model and train it
svr_rbf = SVR(kernel = 'rbf', C = 1e3, gamma = 0.1) 
svr_rbf.fit(x_train, y_train)

9. Get the score of this model and print it in percentage.

In [19]:
# Scoring the SVR model
svr_rbf_confidence = svr_rbf.score(x_test, y_test)
print(f"SVR Confidence: {round(svr_rbf_confidence * 100, 2)}%")

SVR Confidence: 94.96%


10. Now, create a model for Linear Regression and train it.

In [20]:
# Create Linear Regression model and train it
lr = LinearRegression()
lr.fit(x_train, y_train)

11. Get the score for this model and print it in percentage.

In [21]:
# Get score for Linear Regression
lr_confidence = lr.score(x_test, y_test)
print(f"Linear Regression Confidence: {round(lr_confidence * 100, 2)}%")

Linear Regression Confidence: 98.83%
