<a href="https://colab.research.google.com/github/Dharmeshgadhiya161/Stock-price-prediction-using-machine-learning/blob/main/Stock_Market_ML_and_Prediction_June_2025.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predict future stock prices (usually Close price) using historical data.

### Overview
This dataset contains comprehensive stock market data for June 2025, featuring daily trading information across multiple sectors. The dataset includes 14 key financial metrics and indicators, making it ideal for financial analysis, machine learning projects, and algorithmic trading research.

### Dataset Features
Core Price Data
* Date: Trading date in YYYY-MM-DD format
* Ticker: Stock symbol identifier
* Open: Opening price for the trading day
* Close: Closing price for the trading day
* High: Highest price reached during the day
* Low: Lowest price reached during the day
Trading Metrics
* Volume: Number of shares traded
* Market Cap: Total market capitalization
Financial Ratios & Indicators
* PE Ratio: Price-to-Earnings ratio for valuation analysis
* Dividend Yield: Annual dividend as percentage of stock price
* EPS: Earnings Per Share
* 52 Week High: Highest price in the past 52 weeks
* 52 Week Low: Lowest price in the past 52 weeks

### Import Library



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [2]:
#loading data
from google.colab import drive
drive.mount('/content/drive')
df= pd.read_csv("/content/drive/My Drive/Colab Notebooks/Module: Machine Learning/Machine Learning Project/stock_market_2025.csv")

Mounted at /content/drive


In [3]:
df.head()

Unnamed: 0,Date,Ticker,Open Price,Close Price,High Price,Low Price,Volume Traded,Market Cap,PE Ratio,Dividend Yield,EPS,52 Week High,52 Week Low,Sector
0,01-06-2025,SLH,34.92,34.53,35.22,34.38,2966611,57381360000.0,29.63,2.85,1.17,39.39,28.44,Industrials
1,01-06-2025,WGB,206.5,208.45,210.51,205.12,1658738,52747070000.0,13.03,2.73,16.0,227.38,136.79,Energy
2,01-06-2025,ZIN,125.1,124.03,127.4,121.77,10709898,55969490000.0,29.19,2.64,4.25,138.35,100.69,Healthcare
3,01-06-2025,YPY,260.55,265.28,269.99,256.64,14012358,79640890000.0,19.92,1.29,13.32,317.57,178.26,Industrials
4,01-06-2025,VKD,182.43,186.89,189.4,179.02,14758143,72714370000.0,40.18,1.17,4.65,243.54,165.53,Technology


In [12]:
df.tail()

Unnamed: 0,Date,Ticker,Open Price,Close Price,High Price,Low Price,Volume Traded,Market Cap,PE Ratio,Dividend Yield,EPS,52 Week High,52 Week Low,Sector
1757,21-06-2025,ZTL,196.09,199.18,200.08,195.42,2749236,99466620000.0,22.17,1.26,8.98,232.2,186.46,Technology
1758,21-06-2025,XOE,105.08,108.08,109.9,104.68,4582198,50174500000.0,35.74,1.79,3.02,129.28,75.71,Technology
1759,21-06-2025,EIE,18.88,18.46,18.95,18.23,6997077,8502518000.0,10.18,2.09,1.81,24.5,14.64,Energy
1760,21-06-2025,XYQ,154.55,158.57,160.85,152.37,3549117,96638990000.0,29.64,1.71,5.35,199.23,149.78,Communication Services
1761,21-06-2025,UAA,201.52,198.09,205.08,195.7,6484178,15636420000.0,27.75,1.18,7.14,258.92,184.92,Materials


In [4]:
df.shape

(1762, 14)

In [7]:
# Basic information of dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1762 entries, 0 to 1761
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            1762 non-null   object 
 1   Ticker          1762 non-null   object 
 2   Open Price      1762 non-null   float64
 3   Close Price     1762 non-null   float64
 4   High Price      1762 non-null   float64
 5   Low Price       1762 non-null   float64
 6   Volume Traded   1762 non-null   int64  
 7   Market Cap      1762 non-null   float64
 8   PE Ratio        1762 non-null   float64
 9   Dividend Yield  1762 non-null   float64
 10  EPS             1762 non-null   float64
 11  52 Week High    1762 non-null   float64
 12  52 Week Low     1762 non-null   float64
 13  Sector          1762 non-null   object 
dtypes: float64(10), int64(1), object(3)
memory usage: 192.8+ KB


Chage to datetime

In [None]:
df.duplicated().sum()

In [9]:
df.describe()

Unnamed: 0,Open Price,Close Price,High Price,Low Price,Volume Traded,Market Cap,PE Ratio,Dividend Yield,EPS,52 Week High,52 Week Low
count,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0,1762.0
mean,157.500443,157.567054,160.423258,154.703956,8075851.0,65209770000.0,23.143859,2.337327,7.663621,189.009381,125.964574
std,82.043046,82.227448,83.626559,80.647073,5104890.0,146716800000.0,7.498239,1.124037,5.202411,99.082291,67.767279
min,15.02,14.77,15.12,14.48,500727.0,1290761000.0,8.02,0.01,0.47,17.03,10.01
25%,86.5925,87.055,88.4875,85.38,4268123.0,27174060000.0,16.9975,1.45,3.745,104.6875,66.6625
50%,155.675,155.675,158.82,152.725,7911528.0,51353220000.0,22.62,2.33,6.79,185.425,123.04
75%,227.2075,227.9025,231.6275,223.1025,11600260.0,76230760000.0,28.9775,3.1275,10.35,271.92,179.825
max,434.4,447.43,453.86,431.5,65377740.0,3481112000000.0,44.89,5.0,54.1,571.57,345.53


In [None]:
df.isnull().sum()

In [None]:
df['Date'].value_counts()