# Stock Market Analysis

**Ibrahim Alsaeed**

**23 Feb 2026**

**Purpose**: Extraxt and quality check for yfinance data for 8 major stocks
(Jan 2023- Feb 2026) to powe the financial dashboard.

In [5]:
import yfinance as yf
import pandas as pd

## 1. Data Extraction
Fetching daily data from Yahoo Finance for 8 stocks across 4 sectors:
Technology (AAPL, MSFT, NVDA, GOOGL), Financial (JPM), Healthcare (JNJ), Energy (XOM), and Automotive (TSLA).


In [6]:
tickers = ['AAPL','MSFT', 'NVDA',
           'JPM',
           'JNJ',
           'XOM',
           'TSLA','GOOGL']

raw = yf.download(tickers,start='2023-01-01', end='2026-02-20')

raw.to_csv('data/stock_data_raw.csv')
print(f'raw data shape: {raw.shape}')
print(f'data range: {raw.index.min().date()} -> {raw.index.max().date()}')


[*********************100%***********************]  8 of 8 completed


raw data shape: (785, 40)
data range: 2023-01-03 -> 2026-02-19


In [7]:
raw.head()

Price,Close,Close,Close,Close,Close,Close,Close,Close,High,High,...,Open,Open,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Ticker,AAPL,GOOGL,JNJ,JPM,MSFT,NVDA,TSLA,XOM,AAPL,GOOGL,...,TSLA,XOM,AAPL,GOOGL,JNJ,JPM,MSFT,NVDA,TSLA,XOM
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2023-01-03,123.096024,88.451691,162.655899,124.928711,233.452805,14.300685,108.099998,95.434219,128.834003,90.367218,...,118.470001,98.364175,112117500,28131200,6344900,11054800,25740000,401277000,231402800,15146200
2023-01-04,124.365669,87.419487,164.426758,126.093681,223.240845,14.734249,113.639999,95.711975,126.629372,89.970214,...,109.110001,93.902034,89113600,34854800,9788800,11687600,50623400,431324000,180389000,18058400
2023-01-05,123.046799,85.553581,163.212692,126.065758,216.624481,14.250733,110.339996,97.853432,125.753395,86.91331,...,110.510002,95.281876,80962700,27194400,6255300,8381300,39585600,389168000,157986300,15946600
2023-01-06,127.574196,86.685028,164.536285,128.478043,219.17749,14.844141,113.059998,99.036156,128.233619,87.032409,...,103.0,98.632953,87754700,41381500,5706000,10029100,43613600,405044000,220911100,16348100
2023-01-09,128.095825,87.35994,160.273422,127.947151,221.311462,15.612373,119.769997,97.190399,131.304382,89.374723,...,118.959999,100.030755,70790800,29003900,7925300,8482300,27369800,504231000,190284000,17964600


In [8]:
raw.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 785 entries, 2023-01-03 to 2026-02-19
Data columns (total 40 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   (Close, AAPL)    785 non-null    float64
 1   (Close, GOOGL)   785 non-null    float64
 2   (Close, JNJ)     785 non-null    float64
 3   (Close, JPM)     785 non-null    float64
 4   (Close, MSFT)    785 non-null    float64
 5   (Close, NVDA)    785 non-null    float64
 6   (Close, TSLA)    785 non-null    float64
 7   (Close, XOM)     785 non-null    float64
 8   (High, AAPL)     785 non-null    float64
 9   (High, GOOGL)    785 non-null    float64
 10  (High, JNJ)      785 non-null    float64
 11  (High, JPM)      785 non-null    float64
 12  (High, MSFT)     785 non-null    float64
 13  (High, NVDA)     785 non-null    float64
 14  (High, TSLA)     785 non-null    float64
 15  (High, XOM)      785 non-null    float64
 16  (Low, AAPL)      785 non-null    float64
 1

## 2. Data Quality Assessment
Checking for missing values, suspicious price movements (>50% daily change), and duplicate dates.


In [9]:
raw.isnull().sum()

Price   Ticker
Close   AAPL      0
        GOOGL     0
        JNJ       0
        JPM       0
        MSFT      0
        NVDA      0
        TSLA      0
        XOM       0
High    AAPL      0
        GOOGL     0
        JNJ       0
        JPM       0
        MSFT      0
        NVDA      0
        TSLA      0
        XOM       0
Low     AAPL      0
        GOOGL     0
        JNJ       0
        JPM       0
        MSFT      0
        NVDA      0
        TSLA      0
        XOM       0
Open    AAPL      0
        GOOGL     0
        JNJ       0
        JPM       0
        MSFT      0
        NVDA      0
        TSLA      0
        XOM       0
Volume  AAPL      0
        GOOGL     0
        JNJ       0
        JPM       0
        MSFT      0
        NVDA      0
        TSLA      0
        XOM       0
dtype: int64

In [10]:
# we have 5 columns for each ticker
# Close: Price when market closed
# High: Highest price reached during the day
# Low: Lowest price reached during the day
# Open: Price when market opened
# Volume: Number of shares traded that day

# in my analysis i'll be foucus on Close
print(raw.columns.get_level_values(0).unique().tolist())

['Close', 'High', 'Low', 'Open', 'Volume']


In [11]:
raw['Close'].describe()

Ticker,AAPL,GOOGL,JNJ,JPM,MSFT,NVDA,TSLA,XOM
count,785.0,785.0,785.0,785.0,785.0,785.0,785.0,785.0
mean,205.030065,170.513515,157.851161,208.14436,396.633466,103.054916,274.985388,106.1815
std,35.741096,58.487595,20.996025,64.117213,74.667864,55.510849,92.500173,9.586629
min,123.046799,85.553581,135.852158,116.341866,216.624481,14.250733,108.099998,89.695213
25%,176.895203,131.724731,146.173523,141.953857,334.103821,46.065285,199.729996,99.52446
50%,200.7491,161.959351,151.108673,201.005508,407.465912,111.409775,251.050003,105.559509
75%,229.592392,184.597046,159.551804,261.449066,443.779816,141.931381,338.73999,111.101234
max,285.922455,343.690002,246.910004,334.609985,539.825256,207.028473,489.880005,154.529999


In [12]:
# here to check if there is wrong values 50% change in one day considered wrong information
price = raw['Close']

daily_return = price.pct_change().dropna()
sus = (daily_return.abs() > 0.5)
print('Suspicious price movements:')
print(daily_return[sus].dropna(how='all'))

Suspicious price movements:
Empty DataFrame
Columns: [AAPL, GOOGL, JNJ, JPM, MSFT, NVDA, TSLA, XOM]
Index: []


In [13]:
# to check if there is any duplication in dates
raw.index.duplicated().sum()

np.int64(0)

## 3. Quality Summary

| Check | Result 
|-------|--------|
| Missing values | 0 found 
| Suspicious movements (>50%) | 0 found 
| Duplicate dates | 0 found 

Data is clean.


In [17]:
# Daily returns
daily_returns = price.pct_change().dropna() #to calculate each day indivisually

# Cumulative returns
cumulative_returns = (1 + daily_returns).cumprod() - 1 #to calculate the cumulative change from day 1 to day 784 in our case

total_returns = (cumulative_returns.iloc[-1] * 100).sort_values(ascending=False).round(2)
print("Total Return % per Stock (Jan 2023 – Feb 2026):")
print(total_returns)

Total Return % per Stock (Jan 2023 – Feb 2026):
Ticker
NVDA     1213.92
TSLA      280.86
GOOGL     242.39
JPM       146.58
AAPL      111.69
MSFT       70.68
XOM        58.19
JNJ        51.80
Name: 2026-02-19 00:00:00, dtype: float64


In [None]:
print(price.dtypes)
#to check the type before converting to csv

Ticker
AAPL     float64
GOOGL    float64
JNJ      float64
JPM      float64
MSFT     float64
NVDA     float64
TSLA     float64
XOM      float64
dtype: object


## 4. Export
Saving the cleaned price data and engineered features for use in the dashboard.


In [16]:
price.to_csv('data/stock_data_clean.csv')