**1. Import Necessary Libraries**

In [8]:
import pandas as pd
import numpy as np

**2. To read the data into Python.**

In [None]:
df = pd.read_csv('preprocessed_engineered_data.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Shares Traded,Turnover (Cr Rs)
0,2024-08-12,24320.05,24472.8,24212.1,24347.0,279925100,30311.85
1,2024-08-13,24342.35,24359.95,24116.5,24139.0,239727640,25459.58
2,2024-08-14,24184.4,24196.5,24099.7,24143.75,303254705,27834.61
3,2024-08-16,24334.85,24563.9,24204.5,24541.15,271611087,28521.9
4,2024-08-19,24636.35,24638.8,24522.95,24572.65,243645503,22124.41


**3. To produce a numerical summary of the variables in the data set.**

In [10]:
df.describe()

Unnamed: 0,Open,High,Low,Close,Shares Traded,Turnover (Cr Rs)
count,249.0,249.0,249.0,249.0,249.0,249.0
mean,24265.656024,24378.899598,24138.593173,24257.550201,312547100.0,29410.09755
std,929.147347,916.159219,933.388444,922.539753,100456000.0,9363.829115
min,21758.4,22105.05,21743.65,22082.65,38811390.0,3348.45
25%,23543.8,23689.85,23433.5,23532.7,251849800.0,24202.41
50%,24419.5,24537.6,24295.55,24435.5,287535200.0,27573.99
75%,24999.4,25073.1,24825.9,24971.3,351082000.0,32606.34
max,26248.25,26277.35,26151.4,26216.05,853891000.0,89554.91


In [11]:
numerical_cols = df.select_dtypes(include=np.number).columns
print(numerical_cols)

Index(['Open', 'High', 'Low', 'Close', 'Shares Traded', 'Turnover (Cr Rs)'], dtype='object')


**4. Coefficient Of Variation**

In [12]:
print("--- Coefficient of Variation (CV) ---")
for col in numerical_cols:
    mean = df[col].mean()
    std = df[col].std()
    if mean != 0:
        cv = (std / mean) * 100
        print(f"CV for {col}: {cv:.2f}%")
    else:
        print(f"CV for {col}: Mean is zero, cannot calculate CV.")

--- Coefficient of Variation (CV) ---
CV for Open: 3.83%
CV for High: 3.76%
CV for Low: 3.87%
CV for Close: 3.80%
CV for Shares Traded: 32.14%
CV for Turnover (Cr Rs): 31.84%


**5. Median and Mode**

In [13]:
print("--- Median and Mode ---")
for col in numerical_cols:
    median = df[col].median()
    mode = df[col].mode()
    mean = df[col].mean()
    print(f"{col}:")
    print(f"  Mean: {mean:.2f}")
    print(f"  Median: {median:.2f}")
    if not mode.empty:
        print(f"  Mode: {', '.join(mode.astype(str).tolist())}")
    else:
        print("  Mode: No unique mode found (or multiple modes)")

--- Median and Mode ---
Open:
  Mean: 24265.66
  Median: 24419.50
  Mode: 21758.4, 21974.45, 22073.05, 22194.55, 22345.95, 22353.15, 22433.4, 22446.75, 22460.3, 22476.35, 22508.65, 22516.45, 22521.85, 22536.35, 22541.5, 22568.95, 22609.35, 22662.25, 22695.4, 22809.9, 22821.1, 22847.25, 22857.2, 22874.95, 22940.15, 22960.45, 22963.65, 23026.75, 23036.6, 23050.8, 23055.75, 23096.45, 23099.15, 23128.3, 23150.3, 23165.9, 23168.25, 23169.5, 23183.9, 23190.4, 23192.6, 23195.4, 23250.45, 23277.1, 23290.4, 23296.75, 23319.35, 23341.1, 23344.1, 23368.35, 23377.25, 23383.55, 23401.85, 23411.8, 23421.65, 23433.95, 23488.45, 23509.9, 23515.4, 23528.6, 23529.55, 23542.15, 23543.8, 23551.9, 23560.6, 23600.4, 23605.3, 23637.65, 23649.5, 23674.75, 23679.9, 23700.95, 23738.2, 23746.65, 23751.5, 23761.95, 23769.1, 23775.8, 23783.0, 23796.9, 23801.4, 23801.75, 23822.45, 23877.15, 23916.5, 23927.15, 23935.75, 23949.15, 23960.7, 24045.8, 24070.25, 24087.25, 24140.85, 24184.4, 24185.4, 24196.4, 24204.8, 242

**6. Skewness Coefficient**

In [14]:
print("--- Skewness Coefficient ---")
for col in numerical_cols:
    skewness = df[col].skew()
    print(f"Skewness for {col}: {skewness:.2f}")
    if skewness > 0.5:
        print(f"  - {col} is right-skewed.")
    elif skewness < -0.5:
        print(f"  - {col} is left-skewed.")
    elif skewness >= -0.5 and skewness <= 0.5:
        print(f"  - {col} is fairly symmetrical.")
    else:
        print(f"  - {col} shows moderate skewness.")

--- Skewness Coefficient ---
Skewness for Open: -0.45
  - Open is fairly symmetrical.
Skewness for High: -0.45
  - High is fairly symmetrical.
Skewness for Low: -0.39
  - Low is fairly symmetrical.
Skewness for Close: -0.40
  - Close is fairly symmetrical.
Skewness for Shares Traded: 1.72
  - Shares Traded is right-skewed.
Skewness for Turnover (Cr Rs): 2.62
  - Turnover (Cr Rs) is right-skewed.


**Conclusion of 4,5,6th steps: Presence Of Potential Outliers**

Since,
* Coefficient of Variation of attribute `Shares Traded` & `Turnover` is higher,
* comparison of Mean, Median and Mode:
    - For `Shares Traded`, Mean > Median
    - For `Turnover`, Mean > Median
* Skewness Coefficient for the above attributes further shows their values are higher than 0.5

We conclude that the above mentioned attributes have **potential outliers** and **right-Skewed**.

----
**FURTHER EXPLORATIONS WILL BE IN EXPLORATORY DATA ANALYSIS (EDA)**

-----