# Exploratory Data Analysis

In this notebook, we will perform exploratory data analysis (EDA) on the stock price or sales data. EDA is crucial for understanding the underlying patterns in the data, identifying trends, seasonality, and anomalies, and preparing for further modeling.

## Steps to Follow:
1. Load the cleaned data.
2. Visualize the time series data.
3. Analyze trends and seasonality.
4. Identify anomalies.
5. Decompose the time series.

## Import Libraries
Let's start by importing the necessary libraries.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose

# Set visualization style
sns.set(style='whitegrid')

## Load the Data
Next, we will load the cleaned data from the processed directory.

In [None]:
data_path = '../data/processed/cleaned_data.csv'
data = pd.read_csv(data_path, parse_dates=['date'], index_col='date')
data.head()

## Visualize the Time Series Data
Let's visualize the time series data to get an initial understanding of its behavior.

In [None]:
plt.figure(figsize=(14, 7))
plt.plot(data, label='Stock Price')
plt.title('Stock Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

## Analyze Trends and Seasonality
We will analyze the trends and seasonality in the data using decomposition.

In [None]:
decomposition = seasonal_decompose(data, model='additive')
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

plt.figure(figsize=(14, 10))
plt.subplot(411)
plt.plot(data, label='Original')
plt.legend(loc='upper left')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='upper left')
plt.subplot(413)
plt.plot(seasonal, label='Seasonal')
plt.legend(loc='upper left')
plt.subplot(414)
plt.plot(residual, label='Residual')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()

## Identify Anomalies
We can also identify anomalies in the data by looking at the residuals.

In [None]:
plt.figure(figsize=(14, 7))
plt.plot(residual, label='Residuals')
plt.axhline(y=0, color='r', linestyle='--')
plt.title('Residuals')
plt.xlabel('Date')
plt.ylabel('Residual')
plt.legend()
plt.show()

## Conclusion
In this notebook, we performed exploratory data analysis on the stock price or sales data. We visualized the data, analyzed trends and seasonality, and identified anomalies. This analysis will help inform our feature engineering and modeling steps.