<a href="https://colab.research.google.com/github/Ikrammsr/apple-stock-analysis-project/blob/main/apple_stock_analysis_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Introduction**

## About Apple Inc.

Apple Inc. is a global technology company based in Cupertino, California. It was founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne. Apple is famous for its innovative products like the iPhone, iPad, Mac computers, Apple Watch, and services such as Apple Music and iCloud. The company is a leader in multiple industries, including smartphones, personal computers, and streaming services, and has a huge influence on global markets.

## About the Dataset

This dataset provides a detailed look at Apple’s historical stock performance (AAPL) in a clean, daily format. It covers the period from Apple’s IPO in 1980 up to the present, making it perfect for both financial analysis and machine learning projects.  

The dataset contains **8 columns**:

- **Date:** The trading day (YYYY-MM-DD)  
- **ticker:** The stock symbol on NASDAQ (AAPL)  
- **name:** Full company name (Apple Inc.)  
- **Open:** Stock price at the start of the day (USD)  
- **High:** Highest price during the day (USD)  
- **Low:** Lowest price during the day (USD)  
- **Close:** Stock price at market close (USD)  
- **Volume:** Number of shares traded during the day  

The data is collected via the **yfinance Python library**, which pulls information directly from Yahoo Finance.

## Project Goal

The goal of this project is to explore Apple’s stock trends, understand patterns in daily returns, and build predictive models using Python and statistical methods. Specifically, this project will let us:

- Handle and clean data efficiently using Python  
- Explore trends with visualizations and descriptive statistics  
- Test assumptions like normality and run hypothesis tests  
- Apply regression models (linear & logistic) and a Random Forest classifier  
- Understand factors that influence high-return days  

By combining programming and statistical analysis, this project will give us a deeper understanding of Apple’s stock behavior and strengthen our skills in both coding and data analysis.
## Research Question

Based on this goal, the main research question is:  
**How has Apple’s stock performance changed over time, and which factors can help predict daily stock returns?**

To answer this, we will explore trends in stock prices, daily returns, and trading volumes, test statistical assumptions like normality, compare performance across different periods, and build predictive models using linear regression, logistic regression, and Random Forest classifiers.  


## Data Loading & Cleaning

In this section, we will load the Apple stock dataset, check for missing values or duplicates, and prepare the data for analysis. This step ensures that all calculations, visualizations, and statistical tests are accurate and reliable.


In [None]:
# Import necessary libraries
import pandas as pd

# Load the CSV file
df = pd.read_csv('Apple_historical_data.csv')
df



Unnamed: 0,Date,Open,High,Low,Close,Volume,ticker,name
0,1980-12-12 00:00:00-05:00,0.098485,0.098913,0.098485,0.098485,469033600,AAPL,Apple Inc. (AAPL) Historical Data
1,1980-12-15 00:00:00-05:00,0.093775,0.093775,0.093347,0.093347,175884800,AAPL,Apple Inc. (AAPL) Historical Data
2,1980-12-16 00:00:00-05:00,0.086924,0.086924,0.086495,0.086495,105728000,AAPL,Apple Inc. (AAPL) Historical Data
3,1980-12-17 00:00:00-05:00,0.088636,0.089064,0.088636,0.088636,86441600,AAPL,Apple Inc. (AAPL) Historical Data
4,1980-12-18 00:00:00-05:00,0.091206,0.091634,0.091206,0.091206,73449600,AAPL,Apple Inc. (AAPL) Historical Data
...,...,...,...,...,...,...,...,...
11302,2025-10-16 00:00:00-04:00,248.250000,249.039993,245.130005,247.449997,39777000,AAPL,Apple Inc. (AAPL) Historical Data
11303,2025-10-17 00:00:00-04:00,248.020004,253.380005,247.270004,252.289993,49147000,AAPL,Apple Inc. (AAPL) Historical Data
11304,2025-10-20 00:00:00-04:00,255.889999,264.380005,255.630005,262.239990,90483000,AAPL,Apple Inc. (AAPL) Historical Data
11305,2025-10-21 00:00:00-04:00,261.880005,265.290009,261.829987,262.769989,46695900,AAPL,Apple Inc. (AAPL) Historical Data


In [None]:
#Preview the first 5 rows
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,ticker,name
0,1980-12-12 00:00:00-05:00,0.098485,0.098913,0.098485,0.098485,469033600,AAPL,Apple Inc. (AAPL) Historical Data
1,1980-12-15 00:00:00-05:00,0.093775,0.093775,0.093347,0.093347,175884800,AAPL,Apple Inc. (AAPL) Historical Data
2,1980-12-16 00:00:00-05:00,0.086924,0.086924,0.086495,0.086495,105728000,AAPL,Apple Inc. (AAPL) Historical Data
3,1980-12-17 00:00:00-05:00,0.088636,0.089064,0.088636,0.088636,86441600,AAPL,Apple Inc. (AAPL) Historical Data
4,1980-12-18 00:00:00-05:00,0.091206,0.091634,0.091206,0.091206,73449600,AAPL,Apple Inc. (AAPL) Historical Data


In [None]:
# Check data types and missing values
df.info()




<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11307 entries, 0 to 11306
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    11307 non-null  object 
 1   Open    11307 non-null  float64
 2   High    11307 non-null  float64
 3   Low     11307 non-null  float64
 4   Close   11307 non-null  float64
 5   Volume  11307 non-null  int64  
 6   ticker  11307 non-null  object 
 7   name    11307 non-null  object 
dtypes: float64(4), int64(1), object(3)
memory usage: 706.8+ KB


In [None]:
#Check how many missing values are in each column
df.isnull().sum()

Unnamed: 0,0
Date,0
Open,0
High,0
Low,0
Close,0
Volume,0
ticker,0
name,0


In [None]:
# Convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'], utc=True)
df



Unnamed: 0,Date,Open,High,Low,Close,Volume,ticker,name
0,1980-12-12 05:00:00+00:00,0.098485,0.098913,0.098485,0.098485,469033600,AAPL,Apple Inc. (AAPL) Historical Data
1,1980-12-15 05:00:00+00:00,0.093775,0.093775,0.093347,0.093347,175884800,AAPL,Apple Inc. (AAPL) Historical Data
2,1980-12-16 05:00:00+00:00,0.086924,0.086924,0.086495,0.086495,105728000,AAPL,Apple Inc. (AAPL) Historical Data
3,1980-12-17 05:00:00+00:00,0.088636,0.089064,0.088636,0.088636,86441600,AAPL,Apple Inc. (AAPL) Historical Data
4,1980-12-18 05:00:00+00:00,0.091206,0.091634,0.091206,0.091206,73449600,AAPL,Apple Inc. (AAPL) Historical Data
...,...,...,...,...,...,...,...,...
11302,2025-10-16 04:00:00+00:00,248.250000,249.039993,245.130005,247.449997,39777000,AAPL,Apple Inc. (AAPL) Historical Data
11303,2025-10-17 04:00:00+00:00,248.020004,253.380005,247.270004,252.289993,49147000,AAPL,Apple Inc. (AAPL) Historical Data
11304,2025-10-20 04:00:00+00:00,255.889999,264.380005,255.630005,262.239990,90483000,AAPL,Apple Inc. (AAPL) Historical Data
11305,2025-10-21 04:00:00+00:00,261.880005,265.290009,261.829987,262.769989,46695900,AAPL,Apple Inc. (AAPL) Historical Data


In [None]:
# Sort by date just in case
df=df.sort_values("Date").reset_index(drop=True)
df

Unnamed: 0,Date,Open,High,Low,Close,Volume,ticker,name
0,1980-12-12 05:00:00+00:00,0.098485,0.098913,0.098485,0.098485,469033600,AAPL,Apple Inc. (AAPL) Historical Data
1,1980-12-15 05:00:00+00:00,0.093775,0.093775,0.093347,0.093347,175884800,AAPL,Apple Inc. (AAPL) Historical Data
2,1980-12-16 05:00:00+00:00,0.086924,0.086924,0.086495,0.086495,105728000,AAPL,Apple Inc. (AAPL) Historical Data
3,1980-12-17 05:00:00+00:00,0.088636,0.089064,0.088636,0.088636,86441600,AAPL,Apple Inc. (AAPL) Historical Data
4,1980-12-18 05:00:00+00:00,0.091206,0.091634,0.091206,0.091206,73449600,AAPL,Apple Inc. (AAPL) Historical Data
...,...,...,...,...,...,...,...,...
11302,2025-10-16 04:00:00+00:00,248.250000,249.039993,245.130005,247.449997,39777000,AAPL,Apple Inc. (AAPL) Historical Data
11303,2025-10-17 04:00:00+00:00,248.020004,253.380005,247.270004,252.289993,49147000,AAPL,Apple Inc. (AAPL) Historical Data
11304,2025-10-20 04:00:00+00:00,255.889999,264.380005,255.630005,262.239990,90483000,AAPL,Apple Inc. (AAPL) Historical Data
11305,2025-10-21 04:00:00+00:00,261.880005,265.290009,261.829987,262.769989,46695900,AAPL,Apple Inc. (AAPL) Historical Data


# Programming Note
We used pandas functions to load and clean the dataset efficiently. Converting the Date column allows us to perform time-based analyses and visualizations.

# Statistics Note
Checking for missing values ensures our statistical calculations (mean, standard deviation, normality tests, etc.) are not affected by incomplete data.
