# Data Exploration for Stock Market Prediction

In this notebook, we will explore the Microsoft stock data obtained from Yahoo Finance. We will visualize trends, check for missing values, and understand the dataset's structure to prepare for model training.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the stock data
data_path = '../data/raw/microsoft_stock_data.csv'  # Path to the data file
stock_data = pd.read_csv(data_path)

# Display the first few rows of the dataset
stock_data.head()

In [3]:
# Check for missing values
missing_values = stock_data.isnull().sum()
missing_values[missing_values > 0]

In [4]:
# Visualize the closing price over time
plt.figure(figsize=(14, 7))
plt.plot(stock_data['Date'], stock_data['Close'], label='Closing Price', color='blue')
plt.title('Microsoft Stock Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.xticks(rotation=45)
plt.legend()
plt.show()

In [5]:
# Visualize the distribution of daily returns
stock_data['Daily Return'] = stock_data['Close'].pct_change()

plt.figure(figsize=(14, 7))
sns.histplot(stock_data['Daily Return'].dropna(), bins=50, kde=True)
plt.title('Distribution of Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Correlation heatmap of features
plt.figure(figsize=(10, 8))
correlation_matrix = stock_data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this notebook, we explored the Microsoft stock data by visualizing trends, checking for missing values, and analyzing the distribution of daily returns. This exploration will help us in the subsequent steps of data preprocessing and model training.