<a href="https://colab.research.google.com/github/Sumit-56/Colab_WorkSpace/blob/main/Complete_TSA_and_Forecasting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#What is Time Series Data?
Time series data is a collection of observations recorded over a sequence of time. This data is typically ordered chronologically and can be used to analyze trends, seasonality, and other patterns that evolve over time. Examples include stock prices, weather data, and website traffic.

In [None]:
#import library
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
#load the data
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/bitcoin_price.csv')
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%m-%d")
df.set_index('Date', inplace=True)
df.head()

#Time series index
 we need to convert the Date column to a datetime format and then set it as the index of the DataFrame.

In [None]:
#convert 'date' to a datetime and set as index
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%m-%d")
df.set_index('Date', inplace=True)
df.head()

In [None]:
#load the data and set the index
df1 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/bitcoin_price.csv', index_col='Date', parse_dates=True)
df1.index

# Data Resampling
Upsample: Increase the frequency (e.g., from daily to hourly). This often involves creating new data points through interpolation.
Downsample: Decrease the frequency (e.g., from daily to monthly or yearly). This typically involves aggregating data over the new, lower frequency intervals (e.g., calculating the mean, sum, or other statistics for each month).
Resampling is a crucial step in time series analysis because it allows you to analyze data at different granularities, align time series with different frequencies, and prepare data for various modeling techniques. In your notebook, you used df.resample('M').mean() to downsample your daily Bitcoin price data to a monthly frequency and calculate the mean for each month

In [None]:
# Resampling to monthly frequency and calculate the mean closing price
df.resample('ME').mean()

#Exploring data

In [None]:
# 7 day rolling average of closing price
df['7_day_rolling'] = df['Close'].rolling(window =7).mean()
df[['Close', '7_day_rolling']].loc['2023'].plot()
plt.show()

In [None]:
#fins out the highest avg month
df.resample('M').mean()['Close'].idxmax()

In [None]:
#calculate daily return
df['daily_returns'] = df['Close'].pct_change()*100

In [None]:
#days with moret than 10% change in closing price
df[df['daily_returns']>10].head(3)

#Data  Visualization

In [None]:
#daily closing priceplot
df['Close'].plot(title="Daily Closing Price")
plt.show()


In [None]:
#plot the yearly volume
df.resample('YE').sum().plot()
plt.show()

In [None]:
#plotting closing price and 30 day rolling volume
df['30_day_rolling_vol'] = df['Volume'].rolling(window = 30).mean()
df['30_day_rolling_vol'].plot(legend=True)
ax = df['Close'].plot(secondary_y=True, legend = True)
ax.set_ylabel('Volume')
plt.show()

In [None]:
#correlation between close and 30 day rolling
df33 = df[['Close','30_day_rolling_vol']].corr()

In [None]:
df33.plot()

#Data Manipulation

In [None]:
#missing values
df.isnull().sum()

In [None]:
#extract time variables
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df['weekday'] = df.index.weekday
df['weekday_numeric'] = df.index.weekday
df['is_weekend'] = df.index.weekday > 4
df.head()

#Feature Engineering - Lagged Values
Feature engineering is the process of creating new features from existing data to improve the performance of machine learning models. In the context of time series data, this often involves extracting relevant information from the time index, such as the year, month, day, or day of the week, as you have done in your notebook. It can also involve creating lagged variables, rolling averages, or other aggregations to capture temporal dependencies and patterns.

In [None]:
df['Close_lag1'] = df['Close'].shift(1)
df['Close_lag2'] = df['Close'].shift(2)

#Seasonal Decomposition
Seasonal decomposition is a technique used in time series analysis to break down a time series into its underlying components: trend, seasonality, and residual (or remainder).

* ***Trend***: The long-term movement in the data.

* ***Seasonality***: The repeating patterns or cycles within a fixed period (e.g., daily, monthly, yearly).

* ***Residual***: The irregular or random fluctuations in the data that are not explained by the trend or seasonality.

By decomposing a time series, we can better understand the individual components that influence the data and use this information for forecasting or further analysis.

## Types of Seasonality
There are two main types of seasonal decomposition:

* ***Additive Decomposition:*** This model is used when the amplitude of the seasonal fluctuations is roughly constant over time. The formula is:
`Y(t) = T(t) + S(t) + R(t)`

 Where:
Y(t) is the observed value at time t
T(t) is the trend component at time t
S(t) is the seasonal component at time t
R(t) is the residual component at time t
* ***Multiplicative Decomposition:*** This model is used when the amplitude of the seasonal fluctuations changes proportionally to the level of the series (e.g., the fluctuations get larger as the overall value of the time series increases). The formula is: `Y(t) = T(t) * S(t) * R(t)`

# Seasonality
Seasonality refers to the repeating patterns or cycles that occur within a fixed period. These patterns are predictable and tend to repeat themselves at regular intervals.

For example, in your Bitcoin price data, you might observe:

* Daily seasonality: Higher trading volume during certain hours of the day.
* Weekly seasonality: Different price movements on weekends compared to weekdays.
* Yearly seasonality: Price trends that tend to repeat around certain times of the year (e.g., holiday seasons).

period()
* 12 for monthly
* 24 for hourly
* 7 or 365 for daily, but 7 is preferred
* 52 for weekly
* 4 for quartely
* 5 for weekdays

In [None]:
from statsmodels.graphics.tsaplots import month_plot, quarter_plot
from statsmodels.tsa.seasonal import seasonal_decompose

In [None]:
#plotting the monyhly seasonality
month_plot(df['Close'].resample('M').mean(), ylabel='Closing Price')

In [None]:
#Quarter plot
quarter_plot(df['Close'].resample('Q').mean())

In [None]:
#load new data
df_choco = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/choco_monthly_revenue.csv')
df_choco.head()

In [None]:
df_choco['Month with Year'] = pd.to_datetime(df_choco['Month with Year'])
df_choco.set_index('Month with Year', inplace=True)
month_plot(df_choco['revenue'], ylabel='Revenue')
plt.show()

In [None]:
#seasonal decomposition plots for bitcoin data
decomposition = seasonal_decompose(df_choco['revenue'], model = 'multiplicative', period = 12)
fig = decomposition.plot()
fig.set_size_inches(11.5, 10.5)
plt.show()