<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/Week_6/Section_6_Python_Example__Time_Series_Analysis_in_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 6 - Python example: time series analysis in pandas

Time series analysis is crucial across various domains such as finance, meteorology, and economics, where understanding trends, seasonality, and cycles can lead to better forecasts and strategic planning. Pandas, with its robust tools and functionality tailored for time series data, offers an excellent environment for conducting such analyses. This section provides a detailed Python example to demonstrate handling and analysing time series data using Pandas, focusing on data manipulation, resampling, and forecasting techniques.

1. Setting Up the Environment:

To perform time series analysis with Pandas, ensure your Python environment is set up with the necessary libraries. If Pandas and Matplotlib are not installed, they can be added using pip:

In [None]:
pip install pandas numpy matplotlib

2. Importing Required Libraries:

Begin by importing Pandas and Matplotlib. These libraries are essential for data manipulation and visualization:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

3. Loading and Preparing Time Series Data:

For this example, let's consider a dataset of daily temperatures:

In [None]:
# Create a time series of daily temperatures
dates = pd.date_range(start='2021-01-01', periods=365, freq='D')
data = pd.DataFrame(data={'Temperature': (20 + np.random.normal(0, 5, size=(365)))}, index=dates)
print(data.head())

4. Time-Based Indexing:

Pandas allows for easy slicing based on time-based indexing, which is extremely useful for time series analysis:

In [None]:
# Get data from January 2021
january_data = data.loc['2021-01']
print(january_data)

5. Resampling and Frequency Conversion:

Resampling is used to change the frequency of your time series observations. Two types of resampling are:

Downsampling: Reducing the frequency of the data points (e.g., from days to months).

Upsampling: Increasing the frequency of the data points (e.g., from minutes to seconds).

In [None]:
# Downsampling from daily to monthly means
monthly_mean = data.resample('M').mean()
print(monthly_mean)
# Plotting the data
monthly_mean.plot()
plt.title('Monthly Average Temperatures')
plt.xlabel('Month')
plt.ylabel('Temperature')
plt.show()

6. Rolling Windows:

Rolling window calculations provide another method to smooth out short-term fluctuations and highlight longer-term trends in data.

In [None]:
# Calculate the 7-day rolling mean
data['7-day mean'] = data['Temperature'].rolling(window=7).mean()
# Plotting original and smoothed data
data[['Temperature', '7-day mean']].plot()
plt.title('Daily and 7-Day Mean Temperature')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.show()

7. Dealing with Missing Data:

Handling missing values is a common task in time series analysis. Pandas provides several methods to deal with missing data, such as forward filling or backward filling:

In [None]:
# Introduce missing values for the example
data.loc[data.sample(frac=0.1).index, 'Temperature'] = np.nan
# Forward fill missing values
data['Temperature'].fillna(method='ffill', inplace=True)

8. Time Series Decomposition:

Decomposing time series data helps in understanding underlying patterns such as trend, seasonality, and noise.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose time series
result = seasonal_decompose(data['Temperature'], model='additive')
result.plot()
plt.show()

This example has illustrated how to perform time series analysis using Pandas, showcasing its capabilities in handling, analysing, and visualizing time series data. These techniques provide powerful insights into the data, aiding in better forecasting and decision-making. Whether you are dealing with financial markets, weather data, or any other temporal data, mastering Pandas’ time series tools can greatly enhance your analytical capabilities.