# EXPLORATORY DATA ANALYSIS PROJECT

# Introduction

In today’s fast-paced financial markets, data-driven decision-making has become crucial for investors. This project focuses on the Exploratory Data Analysis of stock market data of Tesla. By analyzing historical stock prices, trading volumes, and financial indicators, we can better understand the market's behavior, identify potential investment opportunities, and assess risk factors.

# Column Name Description

Date-The date which trade happened.

High-The highest price of the day.

Low-The lowest price of the day.

Open-The opening price of the day.

Close-The closing price of the day.

Volume-The number of stocks traded.

Adj Close-the adjusted closing price of the day.

#  Assumptions for the analysis

1.How many times the stock price become same in the opening and closing?

2.How much is the highest open,close and adjusted stock price showed ?

3.How much is the lowestopen,close and adjusted stock price showed?

4.How is the open price and closing price differentiated in each days?

5.In which day the highest number and least number of stock traded?

6.How the adjusting closing price differentiated from opening and closing price?



#  Importing all the necessary libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


#  Data reading and understanding

In [3]:
df=pd.read_csv("D:\\data science\\dataset\\TSLA.csv")
print(df)

           Date         High          Low         Open        Close  \
0    2019-09-30    48.796001    47.222000    48.599998    48.174000   
1    2019-10-01    49.189999    47.826000    48.299999    48.938000   
2    2019-10-02    48.930000    47.886002    48.658001    48.625999   
3    2019-10-03    46.896000    44.855999    46.372002    46.605999   
4    2019-10-04    46.956001    45.613998    46.321999    46.285999   
..          ...          ...          ...          ...          ...   
634  2022-04-05  1152.869995  1087.300049  1136.300049  1091.260010   
635  2022-04-06  1079.000000  1027.699951  1073.469971  1045.760010   
636  2022-04-07  1076.589966  1021.539978  1052.390015  1057.260010   
637  2022-04-08  1048.439941  1022.440002  1043.209961  1025.489990   
638  2022-04-11  1008.469971   974.640015   980.400024   975.929993   

         Volume    Adj Close  
0    29399000.0    48.174000  
1    30813000.0    48.938000  
2    28157000.0    48.625999  
3    75422500.0    46.6

#  Avoiding extra rows

In [4]:
df=df.sample(600).reset_index()

In [5]:
df.head(10)

Unnamed: 0,index,Date,High,Low,Open,Close,Volume,Adj Close
0,297,2020-12-02,571.539978,541.210022,556.440002,568.820007,47775700.0,568.820007
1,74,2020-01-15,107.568001,103.358002,105.952003,103.699997,86844000.0,103.699997
2,275,2020-10-30,407.589996,379.109985,406.899994,388.040009,42511300.0,388.040009
3,586,2022-01-26,987.690002,906.0,952.429993,937.409973,34955800.0,937.409973
4,622,2022-03-18,907.849976,867.390015,874.48999,905.390015,33471400.0,905.390015
5,48,2019-12-06,67.772003,66.954002,67.0,67.178001,38062000.0,67.178001
6,582,2022-01-20,1041.660034,994.0,1009.72998,996.27002,23496200.0,996.27002
7,124,2020-03-27,105.160004,98.806,101.0,102.872002,71887000.0,102.872002
8,481,2021-08-26,715.400024,697.619995,708.309998,701.159973,13214300.0,701.159973
9,159,2020-05-18,166.944,160.776001,165.556,162.725998,58329000.0,162.725998


In [6]:
df.tail(10)

Unnamed: 0,index,Date,High,Low,Open,Close,Volume,Adj Close
590,422,2021-06-03,604.549988,571.219971,601.799988,572.840027,30111900.0,572.840027
591,177,2020-06-12,197.595993,182.520004,196.0,187.056,83817000.0,187.056
592,315,2020-12-29,669.900024,655.0,661.0,665.98999,22910800.0,665.98999
593,420,2021-06-01,633.799988,620.549988,627.799988,623.900024,18084900.0,623.900024
594,206,2020-07-24,293.0,273.308014,283.201996,283.399994,96983000.0,283.399994
595,323,2021-01-11,854.429993,803.619995,849.400024,811.190002,59301600.0,811.190002
596,506,2021-10-01,780.780029,763.590027,778.400024,775.219971,17031400.0,775.219971
597,123,2020-03-26,112.0,102.449997,109.477997,105.632004,86903500.0,105.632004
598,212,2020-08-03,301.962006,288.876007,289.839996,297.0,44046500.0,297.0
599,247,2020-09-22,437.76001,417.600006,429.600006,424.230011,79580800.0,424.230011


#  Checking Datatypes

In [7]:
df.dtypes


index          int64
Date          object
High         float64
Low          float64
Open         float64
Close        float64
Volume       float64
Adj Close    float64
dtype: object

In [8]:
df['Date']=pd.to_datetime(df['Date'])


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   index      600 non-null    int64         
 1   Date       600 non-null    datetime64[ns]
 2   High       600 non-null    float64       
 3   Low        600 non-null    float64       
 4   Open       600 non-null    float64       
 5   Close      600 non-null    float64       
 6   Volume     600 non-null    float64       
 7   Adj Close  600 non-null    float64       
dtypes: datetime64[ns](1), float64(6), int64(1)
memory usage: 37.6 KB


In [10]:
for col in df:
    unique=np.unique(df[col])
    print(f'{col} unique value: ',unique, f',\n{col} unique count: ', len(unique))

index unique value:  [  0   1   2   3   4   5   6   7   8   9  10  11  12  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  48  49  50  51  52  53  54  55
  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  74
  75  76  77  78  79  80  81  82  83  84  85  86  88  89  90  91  92  93
  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 112
 113 114 115 116 117 118 119 120 121 122 123 124 125 126 128 129 130 131
 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
 222 223 225 226 227 228 229 230 231 232 234 235 236 237 238 239 240 241
 242 243 244 245 246 247 248 2

In [11]:
df.describe()

Unnamed: 0,index,High,Low,Open,Close,Volume,Adj Close
count,600.0,600.0,600.0,600.0,600.0,600.0,600.0
mean,314.758333,533.830072,508.716364,521.605561,522.126872,48540600.0,522.126872
std,184.118184,339.241453,323.910646,331.854451,332.118308,34675130.0,332.118308
min,0.0,46.896,44.855999,45.959999,46.285999,9800600.0,46.285999
25%,155.75,166.814999,160.6665,164.368504,163.015499,24336680.0,163.015499
50%,312.5,612.174988,582.654999,601.044983,599.044983,34822500.0,599.044983
75%,474.25,776.294998,749.324982,761.845016,762.75,65008380.0,762.75
max,638.0,1239.869995,1208.0,1228.0,1222.089966,242119000.0,1222.089966


#  Checking null values and duplicates

In [12]:
df.isnull().sum().sum()

0

In [13]:
df.duplicated().sum()

0

#  Identifying how many times the stock price remained same in opening and closing.

In [None]:
df['Open'] = df['Open'].shift(1)
same_opening_price = (df['Open'] == df['Open']).sum()
print("Number of times the opeing price remained the same:",same_opening_price)


df['Close'] = df['Close'].shift(1)
same_closing_price = (df['Close'] == df['Close']).sum()
print("Number of times the closing price remained the same:",same_closing_price)




Number of times the opeing price remained the same: 599
Number of times the opeing price remained the same: 599


# Identifying highest open,close and adjusted stock price .

In [15]:
Highest_open_price=df['Open'].max()
print("Highest Open Price=",Highest_open_price)

Highest_closing_price=df['Close'].max()
print("Highest Closing Price=",Highest_closing_price)

Highest_adjusted_price=df['Adj Close'].max()
print("Highest adjusted closing price=",Highest_adjusted_price)

Highest Open Price= 1228.0
Highest Closing Price= 1222.0899658203125
Highest adjusted closing price= 1222.0899658203125


# Identifying lowest open,close and adjusted stock price.

In [16]:
Lowest_Opening_price=df['Open'].min()
print("Lowest opening price=",Lowest_Opening_price)

Lowest_closing_price=df['Close'].min()
print("Lowest closing price=",Lowest_closing_price)

Lowest_adjusted_price=df['Adj Close'].min()
print("Lowest adjusted closing price=",Lowest_adjusted_price)

Lowest opening price= 45.959999084472656
Lowest closing price= 46.2859992980957
Lowest adjusted closing price= 46.2859992980957


# Analysing how the opening and closing price differentiated in each days.

In [17]:
difference= df['Close'] - df['Open']

print(difference)

0            NaN
1      12.380005
2      -2.252007
3     -18.859985
4     -15.020020
         ...    
595     0.197998
596   -38.210022
597    -3.180054
598    -3.845993
599     7.160004
Length: 600, dtype: float64


# Identifying the day in which highest and least number of stock traded.

In [18]:
highest_traded_day = df.loc[df['Volume'].idxmax()]
least_traded_day = df.loc[df['Volume'].idxmin()]
print(f"Day with highest stock traded: {highest_traded_day['Date']} with {highest_traded_day['Volume']} stocks.")
print(f"Day with least stock traded: {least_traded_day['Date']} with {least_traded_day['Volume']} stocks.")

Day with highest stock traded: 2020-02-05 00:00:00 with 242119000.0 stocks.
Day with least stock traded: 2021-08-11 00:00:00 with 9800600.0 stocks.


# Analysing how the adjusting closing price is differentiated from opening and closing price.

In [20]:
df['Diff_AdjClose_Open'] = df['Adj Close'] - df['Open']   
df['Diff_AdjClose_Close'] = df['Adj Close'] - df['Close'] 


print(df[['Open', 'Close', 'Adj Close', 'Diff_AdjClose_Open', 'Diff_AdjClose_Close']].head())

         Open       Close   Adj Close  Diff_AdjClose_Open  Diff_AdjClose_Close
0         NaN         NaN  568.820007                 NaN                  NaN
1  556.440002  568.820007  103.699997         -452.740005          -465.120010
2  105.952003  103.699997  388.040009          282.088005           284.340012
3  406.899994  388.040009  937.409973          530.509979           549.369965
4  952.429993  937.409973  905.390015          -47.039978           -32.019958


# Conclusion

1.Same opening anc closing price analysis:The data shows that the number of times with same closing and opening prices as 599 and 599 respectively.

2.Identifying highest opening,closing and adjusted stock price:The data shows the highest opening price as 1228,highest closing price as 1222.08 and the highest adjusted closing price as 1222.08.

3.Identifying lowest opening,closing and adjusted closing price:The data shows the lowest opening price as 45.959,lowest closing price as 46.285 and the lowest adjusted closing price as 46.285.

4.Analysing the difference between opening and closing price of each days:We can see that the price in each day is variating because of many reasons.We can see negative and positive change in the difference of each days.

5.Identifying in which day has highest number and less number of stocks traded:The data shows the highest number of stocks raded in the day 2020-02-05 00:00:00 with 242119000.0 stocks and the least number of stocks traded in the day  2021-08-11 00:00:00 with 9800600.0 stocks.

6.Analysing the difference between adjusted closing price from opening and closing price:We can see that the difference in some cases are in negative and some cases are in positive effect.

The data sows a dynamic stock market characterized by periods of stability, sharp price movements, varying trading volumes, and daily price fluctuations. These trends result from a combination of market fluctuations, economic conditions, and investor sentiment and further emphasizing the stock market's volatility and responsiveness to external factors.