## Stock Price Prediction
Predicting the stock market has been the bane and goal of investors since its inception. Every day billions of dollars are traded on the stock exchange, and behind every dollar is an investor hoping to make a profit in one way or another.

Entire companies rise and fall daily depending on market behaviour. If an investor is able to accurately predict market movements, he offers a tantalizing promise of wealth and influence. 

Today, so many people are making money staying at home trading in the stock market. It is a plus point for you if you use your experience in the stock market and your machine learning skills for the task of stock price prediction.

Let’s see how to predict stock prices using Machine Learning and the python programming language. I will start this task by importing all the necessary python libraries that we need for this task:

In [1]:
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

## Data Preparation
In the above section, I started the task of stock price prediction by importing the python libraries. Now I will write a function that will prepare the dataset so that we can fit it easily in the Linear Regression model:

In [2]:
def prepare_data(df,forecast_col,forecast_out,test_size):
    label = df[forecast_col].shift(-forecast_out) #creating new column called label with the last 5 rows are nan
    X = np.array(df[[forecast_col]]) #creating the feature array
    X = preprocessing.scale(X) #processing the feature array
    X_lately = X[-forecast_out:] #creating the column i want to use later in the predicting method
    X = X[:-forecast_out] # X that will contain the training and testing
    label.dropna(inplace=True) #dropping na values
    y = np.array(label)  # assigning Y
    X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=test_size, random_state=0) #cross validation

    response = [X_train,X_test , Y_train, Y_test , X_lately]
    return response

In [5]:
df = pd.read_csv("prices.csv")


In [8]:
#check the null values  
df.isnull().sum()

#removing the null values
df=df.dropna()

#check the null values
df.isnull().sum()

Date                  0
Symbol                0
Series                0
Prev Close            0
Open                  0
High                  0
Low                   0
Last                  0
Close                 0
VWAP                  0
Volume                0
Turnover              0
Trades                0
Deliverable Volume    0
%Deliverble           0
dtype: int64

In [10]:
#check the columns of the data
df.columns
    

Index(['Date', 'Symbol', 'Series', 'Prev Close', 'Open', 'High', 'Low', 'Last',
       'Close', 'VWAP', 'Volume', 'Turnover', 'Trades', 'Deliverable Volume',
       '%Deliverble'],
      dtype='object')

In [12]:
#check the head of the data
df.head()

Unnamed: 0,Date,Symbol,Series,Prev Close,Open,High,Low,Last,Close,VWAP,Volume,Turnover,Trades,Deliverable Volume,%Deliverble
3849,01-06-2011,RELIANCE,EQ,951.85,952.0,958.65,943.65,947.5,946.8,947.83,1838452,174000000000000.0,58630.0,901415.0,0.4903
3850,02-06-2011,RELIANCE,EQ,946.8,936.55,954.7,936.55,952.5,951.05,947.09,2152963,204000000000000.0,63061.0,1066759.0,0.4955
3851,03-06-2011,RELIANCE,EQ,951.05,960.5,967.0,931.5,936.0,934.6,951.69,4368279,416000000000000.0,128784.0,1035791.0,0.2371
3852,06-06-2011,RELIANCE,EQ,934.6,934.65,940.8,928.15,938.6,937.75,935.29,1405741,131000000000000.0,43384.0,476631.0,0.3391
3853,07-06-2011,RELIANCE,EQ,937.75,933.55,960.0,933.55,959.6,958.25,950.55,4025919,383000000000000.0,88703.0,2424958.0,0.6023


In [14]:
df.columns

Index(['Date', 'Symbol', 'Series', 'Prev Close', 'Open', 'High', 'Low', 'Last',
       'Close', 'VWAP', 'Volume', 'Turnover', 'Trades', 'Deliverable Volume',
       '%Deliverble'],
      dtype='object')

In [15]:
#drop the symbol column , date and series
df.drop(['Symbol','Date','Series'],1,inplace=True)  


  df.drop(['Symbol','Date','Series'],1,inplace=True)


In [16]:
#check the head of the datasets
df.head()

Unnamed: 0,Prev Close,Open,High,Low,Last,Close,VWAP,Volume,Turnover,Trades,Deliverable Volume,%Deliverble
3849,951.85,952.0,958.65,943.65,947.5,946.8,947.83,1838452,174000000000000.0,58630.0,901415.0,0.4903
3850,946.8,936.55,954.7,936.55,952.5,951.05,947.09,2152963,204000000000000.0,63061.0,1066759.0,0.4955
3851,951.05,960.5,967.0,931.5,936.0,934.6,951.69,4368279,416000000000000.0,128784.0,1035791.0,0.2371
3852,934.6,934.65,940.8,928.15,938.6,937.75,935.29,1405741,131000000000000.0,43384.0,476631.0,0.3391
3853,937.75,933.55,960.0,933.55,959.6,958.25,950.55,4025919,383000000000000.0,88703.0,2424958.0,0.6023


In [None]:
#check the info
df.info()