# Stock Market Analysis 

## Data Cleaning

## Objective

Prepare clean and comparible historical price data for Reliance Industries and Nifty 50 by:

* Aligning trading dates
* Standardizing columns and formats 
* Calculating daily returns
* Creating an analysis ready dataset

This step ensures fair comparison and relaible downstream analysis


### Import Libraries

In [4]:
import pandas as pd
import numpy as np


### Load Datsets

In [5]:
reliance = pd.read_csv(r"C:\Users\sande\OneDrive\Documents\Data science AI&ML\Projects\Projects for Resume\Stock Market Analysis & Forecasting- Project\Data\raw\RELIANCE.csv")
nifty = pd.read_csv(r"C:\Users\sande\OneDrive\Documents\Data science AI&ML\Projects\Projects for Resume\Stock Market Analysis & Forecasting- Project\Data\raw\NIFTY50.csv")

In [6]:
reliance.sample(5),nifty.sample(5)

(            Date         Open         High          Low        Close    Volume
 1132  31-07-2025  1382.578056  1397.020374  1376.701502  1384.669678  17065827
 458   09-11-2022  1191.593235  1196.911413  1182.349092  1188.717285  11254205
 546   15-03-2023  1042.683971  1049.166148  1016.800540  1021.205750  21728555
 307   30-03-2022  1201.186345  1223.072403  1190.812051  1216.224487  15811550
 468   23-11-2022  1175.478796  1176.802665  1165.093474  1167.284668   6413408,
            Date         Open         High          Low        Close  Volume
 405  22-08-2022  17682.90039  17690.05078  17467.34961  17490.69922  287600
 554  27-03-2023  16984.30078  17091.00000  16918.55078  16985.69922  218400
 817  25-04-2024  22316.90039  22625.94922  22305.25000  22570.34961  475000
 930  09-10-2024  25065.80078  25234.05078  24947.69922  24981.94922  290600
 2    05-01-2021  14075.15039  14215.59961  14048.15039  14199.50000  492500)

In [7]:
reliance.shape,nifty.shape

((1241, 6), (1239, 6))

### Standardize column Names

In [8]:
'''df.columns gives object like column names
str.lower() converts all names to lowercase
str.strip() removes leading and trailing spaces'''
reliance.columns = reliance.columns.str.lower().str.strip()
nifty.columns = nifty.columns.str.lower().str.strip()

### Date Parsing & Sorting

In [9]:
reliance['date'] = pd.to_datetime(reliance['date'], format = '%d-%m-%Y')
nifty['date'] = pd.to_datetime(nifty['date'],format = '%d-%m-%Y')

reliance = reliance.sort_values('date')
nifty = nifty.sort_values('date')

### Rename columns

In [10]:
reliance = reliance.rename(columns = {
    'open' : 'reliance_open',
    'high':'reliance_high',
    'low':'reliance_low',
    'close':'reliance_close',
    'volume':'reliance_volume', 
    })


nifty = nifty.rename(columns = { 
    'open' : 'nifty_open',
    'high':'nifty_high',
    'low':'nifty_low',
    'close':'nifty_close',
    'volume':'nifty_volume',
     })

### Merging Both datsets on commom dates

In [11]:
'''Stock and index might have different holidays so align both datasets on common trading dates to avoid misleading comparisons'''
market_df = pd.merge(
    reliance,
    nifty,
    on = 'date',
    how = 'inner'
)

### Check Data Types and Fix Data types

In [12]:
market_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1239 entries, 0 to 1238
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             1239 non-null   datetime64[ns]
 1   reliance_open    1239 non-null   float64       
 2   reliance_high    1239 non-null   float64       
 3   reliance_low     1239 non-null   float64       
 4   reliance_close   1239 non-null   float64       
 5   reliance_volume  1239 non-null   int64         
 6   nifty_open       1239 non-null   float64       
 7   nifty_high       1239 non-null   float64       
 8   nifty_low        1239 non-null   float64       
 9   nifty_close      1239 non-null   float64       
 10  nifty_volume     1239 non-null   int64         
dtypes: datetime64[ns](1), float64(8), int64(2)
memory usage: 106.6 KB


In [13]:
'''Convert selected columns to numeric'''
cols_convert = ['reliance_open','reliance_high','reliance_low','reliance_close','reliance_volume','nifty_open','nifty_high','nifty_low','nifty_close','nifty_volume']

for col in cols_convert:
    market_df[col] = pd.to_numeric(market_df[col], errors = 'coerce')

### Calculate Daily Returns

In [14]:
market_df['reliance_return']  = market_df['reliance_close'].pct_change()
market_df['nifty_return'] = market_df['nifty_close'].pct_change()


### Check Misiing Values

In [15]:
market_df = market_df.dropna()

In [16]:
market_df.isna().sum()

date               0
reliance_open      0
reliance_high      0
reliance_low       0
reliance_close     0
reliance_volume    0
nifty_open         0
nifty_high         0
nifty_low          0
nifty_close        0
nifty_volume       0
reliance_return    0
nifty_return       0
dtype: int64

In [17]:
market_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1238 entries, 1 to 1238
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             1238 non-null   datetime64[ns]
 1   reliance_open    1238 non-null   float64       
 2   reliance_high    1238 non-null   float64       
 3   reliance_low     1238 non-null   float64       
 4   reliance_close   1238 non-null   float64       
 5   reliance_volume  1238 non-null   int64         
 6   nifty_open       1238 non-null   float64       
 7   nifty_high       1238 non-null   float64       
 8   nifty_low        1238 non-null   float64       
 9   nifty_close      1238 non-null   float64       
 10  nifty_volume     1238 non-null   int64         
 11  reliance_return  1238 non-null   float64       
 12  nifty_return     1238 non-null   float64       
dtypes: datetime64[ns](1), float64(10), int64(2)
memory usage: 135.4 KB


In [18]:
market_df.shape

(1238, 13)

In [19]:
market_df.to_csv("../data/processed/market_data.csv", index=False)
