In [2]:
import sys
print(sys.executable)
print(sys.prefix)

G:\StockPricePrediction\venv\python.exe
G:\StockPricePrediction\venv


## Key Columns in Stock Market Data
1. Date: The trading date.

    - Interpretation: This is the timestamp for the data point. It helps to sequence the data chronologically.
2. Open: The price at which the stock started trading when the market opened.

    - Interpretation: Indicates the starting price for a given day. It can be compared with the closing price to see how the stock moved during the day.
3. High: The highest price at which the stock traded during the day.

    - Interpretation: Useful for understanding the peak price within a day. It can signal volatility or strong buying interest at higher prices.
4. Low: The lowest price at which the stock traded during the day.

    - Interpretation: Indicates the lowest point of the stock’s price within the day. It helps to understand the range of price movement.
5. Close: The price at which the stock closed at the end of the trading day.

    - Interpretation: This is the most commonly used price for prediction as it reflects the final price of the day after all trading has occurred.
6. Adj Close: The adjusted closing price, which accounts for any corporate actions like dividends, stock splits, etc.

    - Interpretation: This gives a more accurate reflection of the stock’s value over time, especially if there are corporate actions that affect the stock price.
7. Volume: The number of shares traded during the day.

    - Interpretation: High volume indicates strong interest and liquidity, whereas low volume might indicate less interest or potential price manipulation.
## How to Use These Columns for Prediction

- Open, High, Low, Close, and Adj Close: These can be used to understand the price movement and calculate various technical indicators.
1. Volume: Important for gauging market interest and the potential for significant price movements.

## Feature Engineering for Stock Price Prediction
1. Moving Averages: Averages of closing prices over specific periods (e.g., 10-day, 50-day).

    - Interpretation: Smooths out price data to identify trends. For example, a 10-day moving average can show short-term trends, while a 50-day moving average can show longer-term trends.
2. Daily Returns: The percentage change in closing price from one day to the next.

    - Formula: (Close[i] - Close[i-1]) / Close[i-1]
    - Interpretation: Measures daily price movement. Helps to understand volatility.
3. Exponential Moving Averages (EMA): Similar to moving averages but gives more weight to recent prices.

    - Interpretation: Useful for capturing more current trends.
4. Technical Indicators:

    - Relative Strength Index (RSI): Measures the speed and change of price movements.
    - Interpretation: Values above 70 indicate overbought conditions, and values below 30 indicate oversold conditions.
5. Moving Average Convergence Divergence (MACD): Shows the relationship between two EMAs.
    - Interpretation: Used to identify potential buy and sell signals

In [3]:
import numpy as np
import pandas as pd
import yfinance as yf

In [4]:
# Download stock data for Apple (AAPL)
data = yf.download('AAPL', start='2010-01-01', end='2023-01-01')

[*********************100%%**********************]  1 of 1 completed


In [5]:
#saving the dataset
data.to_csv('data/AAPL_stock_dataset.csv' , index=False)

In [7]:
#load dataset
df = pd.read_csv('data/AAPL_stock_dataset.csv' , parse_dates=True)
df

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
0,7.622500,7.660714,7.585000,7.643214,6.461977,493729600
1,7.664286,7.699643,7.616071,7.656429,6.473148,601904800
2,7.656429,7.686786,7.526786,7.534643,6.370184,552160000
3,7.562500,7.571429,7.466071,7.520714,6.358407,477131200
4,7.510714,7.571429,7.466429,7.570714,6.400679,447610800
...,...,...,...,...,...,...
3267,130.919998,132.419998,129.639999,131.860001,130.782578,63814900
3268,131.380005,131.410004,128.720001,130.029999,128.967514,69007800
3269,129.669998,131.029999,125.870003,126.040001,125.010124,85438400
3270,127.989998,130.479996,127.730003,129.610001,128.550949,75703700


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3272 entries, 0 to 3271
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       3272 non-null   float64
 1   High       3272 non-null   float64
 2   Low        3272 non-null   float64
 3   Close      3272 non-null   float64
 4   Adj Close  3272 non-null   float64
 5   Volume     3272 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 153.5 KB
