<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Python Notebook for Stock Market Analysis

- [Reference: Energy](https://finance.yahoo.com/quote/%5EGSPE/history?p=%5EGSPE)
- [Reference: Communication Services](https://finance.yahoo.com/quote/%5ESP500-50/history?p=%5ESP500-50)
- [Reference: Consumer Staples](https://finance.yahoo.com/quote/%5ESP500-30/history?p=%5ESP500-30)
- [Reference: Industrials](https://finance.yahoo.com/quote/%5ESP500-20/history?p=%5ESP500-20)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

### Table of Contents <a class="anchor" id="PSMA_toc"></a>

* [Table of Contents](#PSMA_toc)
    * [1. Abstract](#PSMA_page_1)
    * [2. Imported Libraries](#PSMA_page_2)
    * [3. Import the Dataset](#PSMA_page_3)
    * [4. Looking at the Data](#PSMA_page_4)
    * [5. Looking at Data Types](#PSMA_page_5)
    * [6. Checking the Column Names](#PSMA_page_6)
    * [7. Cleaning the Column Names](#PSMA_page_7)
    * [8. Creating a new Cleaned Table](#PSMA_page_8)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 1 - Abstract <a class="anchor" id="PSMA_page_1"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

The S&P 500 is a widely followed stock market index that represents the performance of 500 large-cap U.S. companies. Within the S&P 500, there are various sectors, each comprising companies from specific industries. The sectors of energy, industrials, communication services, and consumer staples play significant roles in influencing the overall performance of the S&P 500.

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 2 - Import Libraries <a class="anchor" id="PSMA_page_2"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

In [4]:
import pandas as pd
import numpy as np
from scipy import stats 
import matplotlib.pyplot as plt
import seaborn as sns

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 3 - Import Dataset <a class="anchor" id="PSMA_page_3"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

In [87]:
ConStaples = pd.read_csv("C:/Users/Owner/Documents/GitHub/SmartInvest/Data/Consumer_Staples_JC.csv")

In [88]:
ConStaples.head()

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,0,"May 30, 2023",760.46,762.27,754.13,756.55,756.55,164164200
1,1,"May 26, 2023",761.29,765.94,758.95,764.84,764.84,147625000
2,2,"May 25, 2023",762.06,764.78,756.94,762.16,762.16,165712800
3,3,"May 24, 2023",773.43,773.68,767.33,768.05,768.05,128804700
4,4,"May 23, 2023",776.58,776.58,771.93,773.06,773.06,137650500


In [137]:
CommS = pd.read_csv("C:/Users/Owner/Documents/GitHub/SmartInvest/Data/Communication_Services_JC.csv")

In [138]:
CommS.head()

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,0,"May 30, 2023",210.92,212.64,209.39,210.77,210.77,290422500
1,1,"May 26, 2023",207.37,211.14,206.53,210.92,210.92,307898400
2,2,"May 25, 2023",206.48,209.82,206.39,207.37,207.37,399977700
3,3,"May 24, 2023",207.14,207.45,205.2,206.48,206.48,253331100
4,4,"May 23, 2023",210.85,210.9,207.73,207.73,207.73,272091700


In [91]:
Energy = pd.read_csv("C:/Users/Owner/Documents/GitHub/SmartInvest/Data/Energy_JC.csv")

In [92]:
Energy.head()

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,0,"May 30, 2023",602.51,602.51,590.07,596.86,596.86,153156800
1,1,"May 26, 2023",604.77,610.46,599.85,602.51,602.51,138167500
2,2,"May 25, 2023",616.43,616.43,599.83,604.77,604.77,162236500
3,3,"May 24, 2023",613.23,621.22,611.35,616.43,616.43,137879400
4,4,"May 23, 2023",606.94,620.61,606.94,613.23,613.23,139637800


In [139]:
Industrials = pd.read_csv("C:/Users/Owner/Documents/GitHub/SmartInvest/Data/Industrials_JC.csv")

In [140]:
Industrials.head()

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,0,"May 30, 2023",831.07,832.83,824.67,828.48,828.48,156913400
1,1,"May 26, 2023",826.17,833.27,825.8,830.39,830.39,153779000
2,2,"May 25, 2023",821.42,826.06,817.01,824.03,824.03,193548900
3,3,"May 24, 2023",829.92,829.93,820.46,821.53,821.53,174945800
4,4,"May 23, 2023",838.55,840.97,830.95,832.07,832.07,180217300


<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 4 - Looking at the Data <a class="anchor" id="PSMA_page_4"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

In [95]:
Industrials.head()
Industrials.tail()
Industrials.sample(5)

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
978,978,"Jul 11, 2019",644.62,649.0,643.81,648.92,648.92,164062000
170,170,"Sep 23, 2022",724.18,724.18,708.76,717.6,717.6,245348200
1247,1247,"Jun 14, 2018",636.74,639.66,632.08,633.36,633.36,-
1057,1057,"Mar 19, 2019",630.57,633.97,625.2,627.14,627.14,-
1238,1238,"Jun 27, 2018",606.53,613.76,599.94,599.95,599.95,-


In [96]:
Energy.head()
Energy.tail()
Energy.sample(5)

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
721,721,"Jul 17, 2020",282.57,286.35,277.27,278.27,278.27,185348400
951,951,"Aug 19, 2019",419.03,429.29,419.03,428.01,428.01,-
647,647,"Oct 30, 2020",216.41,217.17,211.94,216.82,216.82,298983700
1166,1166,"Oct 09, 2018",571.09,580.35,570.64,576.72,576.72,-
946,946,"Aug 26, 2019",410.68,416.16,410.68,412.53,412.53,-


In [97]:
CommS.head()
CommS.tail()
CommS.sample(5)

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
606,606,"Dec 30, 2020",221.42,222.22,219.68,219.79,219.79,130863900
451,451,"Aug 12, 2021",275.75,276.79,274.31,276.72,276.72,118879000
170,170,"Sep 23, 2022",170.09,170.09,165.05,167.08,167.08,314016900
350,350,"Jan 05, 2022",268.08,268.74,260.09,260.2,260.2,302828500
935,935,"Sep 11, 2019",170.15,171.8,170.06,171.76,171.76,196500200


In [98]:
ConStaples.head()
ConStaples.tail()
ConStaples.sample(5)

Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
1015,1015,"May 17, 2019",592.5,596.34,591.8,593.5,593.5,-
389,389,"Nov 09, 2021",752.18,755.84,750.3,755.07,755.07,86345900
411,411,"Oct 08, 2021",728.21,729.42,725.08,726.43,726.43,94664300
466,466,"Jul 22, 2021",729.07,730.29,723.7,728.17,728.17,91088600
1225,1225,"Jul 17, 2018",538.81,544.03,538.81,543.17,543.17,-


<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 5 - Looking at Data Types <a class="anchor" id="PSMA_page_5"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

In [99]:
Industrials.loc[3]

Unnamed: 0                3
Date           May 24, 2023
Open                 829.92
High                 829.93
Low                  820.46
Close*               821.53
Adj Close**          821.53
Volume          174,945,800
Name: 3, dtype: object

In [100]:
Energy.loc[3]

Unnamed: 0                3
Date           May 24, 2023
Open                 613.23
High                 621.22
Low                  611.35
Close*               616.43
Adj Close**          616.43
Volume          137,879,400
Name: 3, dtype: object

In [101]:
CommS.loc[3]

Unnamed: 0                3
Date           May 24, 2023
Open                 207.14
High                 207.45
Low                  205.20
Close*               206.48
Adj Close**          206.48
Volume          253,331,100
Name: 3, dtype: object

In [102]:
ConStaples.loc[3]

Unnamed: 0                3
Date           May 24, 2023
Open                 773.43
High                 773.68
Low                  767.33
Close*               768.05
Adj Close**          768.05
Volume          128,804,700
Name: 3, dtype: object

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 6 - Checking the Columns <a class="anchor" id="PSMA_page_6"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

In [103]:
CommS.Open.describe()

count       1258
unique      1186
top       166.12
freq           4
Name: Open, dtype: object

In [104]:
Energy.Open.describe()

count    1258.000000
mean      460.451121
std       126.962312
min       179.940000
25%       371.645000
50%       453.375000
75%       558.555000
max       720.160000
Name: Open, dtype: float64

In [105]:
ConStaples.Open.describe()

count     1258
unique    1234
top          -
freq         3
Name: Open, dtype: object

In [106]:
Industrials.Open.describe()

count       1258
unique      1224
top       658.41
freq           3
Name: Open, dtype: object

In [107]:
Energy['Close*']

0       596.86
1       602.51
2       604.77
3       616.43
4       613.23
         ...  
1253    557.43
1254    554.30
1255    556.21
1256    561.37
1257    558.35
Name: Close*, Length: 1258, dtype: float64

In [108]:
CommS['Close*']

0       210.77
1       210.92
2       207.37
3       206.48
4       207.73
         ...  
1253    148.01
1254    145.81
1255    145.08
1256    145.23
1257    144.87
Name: Close*, Length: 1258, dtype: object

In [109]:
ConStaples['Close*']

0       756.55
1       764.84
2       762.16
3       768.05
4       773.06
         ...  
1253    510.31
1254    509.59
1255    512.06
1256    507.88
1257    508.01
Name: Close*, Length: 1258, dtype: object

In [110]:
Industrials['Close*']

0       828.48
1       830.39
2       824.03
3       821.53
4       832.07
         ...  
1253    636.59
1254    630.93
1255    629.84
1256    630.67
1257    623.49
Name: Close*, Length: 1258, dtype: object

After anaylyzing the columns I will be using, I noticed my 'Open'and 'Close' columns are all objects except for 'Energy' dataset which is a float.

In [111]:
Energy['Unnamed: 0']

0          0
1          1
2          2
3          3
4          4
        ... 
1253    1253
1254    1254
1255    1255
1256    1256
1257    1257
Name: Unnamed: 0, Length: 1258, dtype: int64

In [112]:
CommS['Unnamed: 0']

0          0
1          1
2          2
3          3
4          4
        ... 
1253    1253
1254    1254
1255    1255
1256    1256
1257    1257
Name: Unnamed: 0, Length: 1258, dtype: int64

In [113]:
ConStaples['Unnamed: 0']

0          0
1          1
2          2
3          3
4          4
        ... 
1253    1253
1254    1254
1255    1255
1256    1256
1257    1257
Name: Unnamed: 0, Length: 1258, dtype: int64

In [114]:
Industrials['Unnamed: 0']

0          0
1          1
2          2
3          3
4          4
        ... 
1253    1253
1254    1254
1255    1255
1256    1256
1257    1257
Name: Unnamed: 0, Length: 1258, dtype: int64

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 7 - Cleaning the Columns <a class="anchor" id="PSMA_page_7"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# I am converting my 'Date' column into a datetime format to be able to graph my data

In [141]:
Industrials['Date1'] = pd.to_datetime(Industrials.Date)

In [142]:
Industrials1=Industrials.drop(["Unnamed: 0", "Date"],axis=1)

In [143]:
Industrials1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1
0,831.07,832.83,824.67,828.48,828.48,156913400,2023-05-30
1,826.17,833.27,825.80,830.39,830.39,153779000,2023-05-26
2,821.42,826.06,817.01,824.03,824.03,193548900,2023-05-25
3,829.92,829.93,820.46,821.53,821.53,174945800,2023-05-24
4,838.55,840.97,830.95,832.07,832.07,180217300,2023-05-23
...,...,...,...,...,...,...,...
1253,631.43,636.59,630.07,636.59,636.59,-,2018-06-06
1254,630.22,631.77,627.92,630.93,630.93,-,2018-06-05
1255,631.86,635.66,629.01,629.84,629.84,-,2018-06-04
1256,626.20,631.84,626.20,630.67,630.67,-,2018-06-01


In [144]:
CommS['Date1'] = pd.to_datetime(Industrials.Date)

In [145]:
CommS1=CommS.drop(["Unnamed: 0", "Date"],axis=1)

In [146]:
CommS1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1
0,210.92,212.64,209.39,210.77,210.77,290422500,2023-05-30
1,207.37,211.14,206.53,210.92,210.92,307898400,2023-05-26
2,206.48,209.82,206.39,207.37,207.37,399977700,2023-05-25
3,207.14,207.45,205.20,206.48,206.48,253331100,2023-05-24
4,210.85,210.90,207.73,207.73,207.73,272091700,2023-05-23
...,...,...,...,...,...,...,...
1253,145.81,148.31,145.40,148.01,148.01,-,2018-06-06
1254,145.08,145.99,144.81,145.81,145.81,-,2018-06-05
1255,145.23,146.14,144.93,145.08,145.08,-,2018-06-04
1256,144.87,145.73,144.71,145.23,145.23,-,2018-06-01


In [147]:
ConStaples['Date1'] = pd.to_datetime(Industrials.Date)

In [148]:
ConStaples1=ConStaples.drop(["Unnamed: 0", "Date"],axis=1)

In [150]:
ConStaples1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1
0,760.46,762.27,754.13,756.55,756.55,164164200,2023-05-30
1,761.29,765.94,758.95,764.84,764.84,147625000,2023-05-26
2,762.06,764.78,756.94,762.16,762.16,165712800,2023-05-25
3,773.43,773.68,767.33,768.05,768.05,128804700,2023-05-24
4,776.58,776.58,771.93,773.06,773.06,137650500,2023-05-23
...,...,...,...,...,...,...,...
1253,510.07,510.33,506.94,510.31,510.31,-,2018-06-06
1254,512.34,512.57,508.12,509.59,509.59,-,2018-06-05
1255,509.50,512.63,509.16,512.06,512.06,-,2018-06-04
1256,508.82,510.14,506.57,507.88,507.88,-,2018-06-01


In [151]:
Energy['Date1'] = pd.to_datetime(Industrials.Date)

In [152]:
Energy1=Energy.drop(["Unnamed: 0", "Date"],axis=1)

In [153]:
Energy1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1
0,602.51,602.51,590.07,596.86,596.86,153156800,2023-05-30
1,604.77,610.46,599.85,602.51,602.51,138167500,2023-05-26
2,616.43,616.43,599.83,604.77,604.77,162236500,2023-05-25
3,613.23,621.22,611.35,616.43,616.43,137879400,2023-05-24
4,606.94,620.61,606.94,613.23,613.23,139637800,2023-05-23
...,...,...,...,...,...,...,...
1253,554.30,558.77,552.54,557.43,557.43,-,2018-06-06
1254,556.21,558.96,552.34,554.30,554.30,-,2018-06-05
1255,561.37,567.25,554.92,556.21,556.21,-,2018-06-04
1256,558.35,564.92,558.26,561.37,561.37,-,2018-06-01


# Here I am converting my 'Open' and 'Close' columns into floats to make a new table to graph. 

In [154]:
Industrials1['Open'] = pd.to_numeric(Industrials1['Open'], errors='coerce')

In [155]:
Industrials1['Close*'] = pd.to_numeric(Industrials1['Close*'], errors='coerce')

In [156]:
CommS1['Open'] = pd.to_numeric(CommS['Open'], errors='coerce')

In [157]:
CommS1['Close*'] = pd.to_numeric(CommS['Close*'], errors='coerce')

In [158]:
ConStaples1['Open'] = pd.to_numeric(ConStaples1['Open'], errors='coerce')

In [159]:
ConStaples1['Close*'] = pd.to_numeric(ConStaples['Close*'], errors='coerce')

In [160]:
Industrials1.Open.describe()

count    1255.000000
mean      728.240534
std       112.353174
min       426.470000
25%       636.120000
50%       722.000000
75%       841.875000
max       906.660000
Name: Open, dtype: float64

In [161]:
CommS1.Open.describe()

count    1255.000000
mean      195.051753
std        40.122986
min       130.860000
25%       163.865000
50%       185.160000
75%       221.485000
max       288.460000
Name: Open, dtype: float64

In [162]:
ConStaples1.Open.describe()

count    1255.000000
mean      673.865275
std        86.551305
min       502.120000
25%       600.345000
50%       672.790000
75%       753.835000
max       843.880000
Name: Open, dtype: float64

In [175]:
Energy1.Open.describe()

count    1258.000000
mean      460.451121
std       126.962312
min       179.940000
25%       371.645000
50%       453.375000
75%       558.555000
max       720.160000
Name: Open, dtype: float64

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# Page 8 - Creating a New Clean Table <a class="anchor" id="PSMA_page_8"></a>

[Back to Top](#PSMA_toc)

<hr style="height:5px;border-width:0;color:MediumAquamarine;background-color:MediumAquamarine">

# I am making a new column of the Difference between the 'Open' and 'Close' columns to graph my data accordingly

In [163]:
column1 = Energy1["Open"]
column2 = Energy1["Close*"]

In [164]:
Energy1['Difference'] = column2 - column1

In [165]:
Energy1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1,Difference
0,602.51,602.51,590.07,596.86,596.86,153156800,2023-05-30,-5.65
1,604.77,610.46,599.85,602.51,602.51,138167500,2023-05-26,-2.26
2,616.43,616.43,599.83,604.77,604.77,162236500,2023-05-25,-11.66
3,613.23,621.22,611.35,616.43,616.43,137879400,2023-05-24,3.20
4,606.94,620.61,606.94,613.23,613.23,139637800,2023-05-23,6.29
...,...,...,...,...,...,...,...,...
1253,554.30,558.77,552.54,557.43,557.43,-,2018-06-06,3.13
1254,556.21,558.96,552.34,554.30,554.30,-,2018-06-05,-1.91
1255,561.37,567.25,554.92,556.21,556.21,-,2018-06-04,-5.16
1256,558.35,564.92,558.26,561.37,561.37,-,2018-06-01,3.02


In [166]:
column1 = CommS1["Open"]
column2 = CommS1["Close*"]

In [167]:
CommS1['Difference'] = column2 - column1

In [168]:
CommS1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1,Difference
0,210.92,212.64,209.39,210.77,210.77,290422500,2023-05-30,-0.15
1,207.37,211.14,206.53,210.92,210.92,307898400,2023-05-26,3.55
2,206.48,209.82,206.39,207.37,207.37,399977700,2023-05-25,0.89
3,207.14,207.45,205.20,206.48,206.48,253331100,2023-05-24,-0.66
4,210.85,210.90,207.73,207.73,207.73,272091700,2023-05-23,-3.12
...,...,...,...,...,...,...,...,...
1253,145.81,148.31,145.40,148.01,148.01,-,2018-06-06,2.20
1254,145.08,145.99,144.81,145.81,145.81,-,2018-06-05,0.73
1255,145.23,146.14,144.93,145.08,145.08,-,2018-06-04,-0.15
1256,144.87,145.73,144.71,145.23,145.23,-,2018-06-01,0.36


In [169]:
column1 = Industrials1["Open"]
column2 = Industrials1["Close*"]

In [170]:
Industrials1['Difference'] = column2 - column1

In [171]:
Industrials1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1,Difference
0,831.07,832.83,824.67,828.48,828.48,156913400,2023-05-30,-2.59
1,826.17,833.27,825.80,830.39,830.39,153779000,2023-05-26,4.22
2,821.42,826.06,817.01,824.03,824.03,193548900,2023-05-25,2.61
3,829.92,829.93,820.46,821.53,821.53,174945800,2023-05-24,-8.39
4,838.55,840.97,830.95,832.07,832.07,180217300,2023-05-23,-6.48
...,...,...,...,...,...,...,...,...
1253,631.43,636.59,630.07,636.59,636.59,-,2018-06-06,5.16
1254,630.22,631.77,627.92,630.93,630.93,-,2018-06-05,0.71
1255,631.86,635.66,629.01,629.84,629.84,-,2018-06-04,-2.02
1256,626.20,631.84,626.20,630.67,630.67,-,2018-06-01,4.47


In [172]:
column1 = ConStaples1["Open"]
column2 = ConStaples1["Close*"]

In [173]:
ConStaples1['Difference'] = column2 - column1

In [174]:
ConStaples1

Unnamed: 0,Open,High,Low,Close*,Adj Close**,Volume,Date1,Difference
0,760.46,762.27,754.13,756.55,756.55,164164200,2023-05-30,-3.91
1,761.29,765.94,758.95,764.84,764.84,147625000,2023-05-26,3.55
2,762.06,764.78,756.94,762.16,762.16,165712800,2023-05-25,0.10
3,773.43,773.68,767.33,768.05,768.05,128804700,2023-05-24,-5.38
4,776.58,776.58,771.93,773.06,773.06,137650500,2023-05-23,-3.52
...,...,...,...,...,...,...,...,...
1253,510.07,510.33,506.94,510.31,510.31,-,2018-06-06,0.24
1254,512.34,512.57,508.12,509.59,509.59,-,2018-06-05,-2.75
1255,509.50,512.63,509.16,512.06,512.06,-,2018-06-04,2.56
1256,508.82,510.14,506.57,507.88,507.88,-,2018-06-01,-0.94
