### Time series analysis
Time series analysis is a powerful tool in data science for examining time-ordered data points. In Python, time series analysis is typically done with the `pandas`, `matplotlib`, `statsmodels`, and `scipy` libraries for data manipulation, visualization, and statistical analysis. Here’s a basic outline of how to perform time series analysis in Python.

#### Step 1: Import Libraries

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
```

#### Step 2: Load and Inspect Data

Load your time series data, typically containing a date/time column and a corresponding value column.

```python
# Example: Loading data with a DateTime index
data = pd.read_csv('your_time_series_data.csv', parse_dates=['Date'], index_col='Date')
print(data.head())
```

#### Step 3: Visualize the Data

Plotting the time series can help you understand trends, seasonality, and irregularities.

```python
plt.figure(figsize=(12,6))
plt.plot(data, label="Time Series Data")
plt.title("Time Series Data")
plt.xlabel("Date")
plt.ylabel("Values")
plt.legend()
plt.show()
```

#### Step 4: Check for Stationarity

Stationarity is a key concept in time series analysis. You can use the Augmented Dickey-Fuller (ADF) test to check for stationarity.

```python
result = adfuller(data['Value'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')
```

If the p-value is less than 0.05, the data is considered stationary.

##### If Not Stationary, Differencing Can Help

```python
data_diff = data.diff().dropna()
plt.plot(data_diff, label="Differenced Time Series")
plt.legend()
plt.show()
```

#### Step 5: Decompose the Time Series

Seasonal decomposition can separate the time series into trend, seasonal, and residual components.

```python
decomposition = sm.tsa.seasonal_decompose(data, model='additive')
decomposition.plot()
plt.show()
```

#### Step 6: Build a Model (e.g., ARIMA)

An ARIMA model is often used for time series forecasting. The `p`, `d`, and `q` parameters represent autoregression, differencing, and moving average, respectively.

```python
# Fit an ARIMA model
model = ARIMA(data['Value'], order=(1, 1, 1))
model_fit = model.fit()
print(model_fit.summary())
```

#### Step 7: Forecasting

Use the model to forecast future values.

```python
forecast = model_fit.forecast(steps=10)  # Forecast 10 steps ahead
plt.plot(data, label="Original Data")
plt.plot(forecast, label="Forecast", color='red')
plt.legend()
plt.show()
```

#### Step 8: Evaluate Model Performance

Evaluating your model can involve calculating error metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).

```python
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Sample example to calculate errors
test = data[-10:]  # Assuming the last 10 points are for testing
predictions = forecast[-10:]
mae = mean_absolute_error(test, predictions)
rmse = np.sqrt(mean_squared_error(test, predictions))
print(f'MAE: {mae}, RMSE: {rmse}')
```

#### Full Workflow Summary

1. Import libraries and load data.
2. Visualize data and check for stationarity.
3. Perform differencing if necessary.
4. Decompose the series to understand trend and seasonality.
5. Build an ARIMA (or SARIMA, if seasonal) model.
6. Forecast future values.
7. Evaluate the model with metrics.

This should give you a strong starting point for time series analysis in Python!

### Time series analysis with covariates

Time series analysis with covariates, often referred to as *time series regression* or *multivariate time series analysis*, is useful when you want to predict a target time series based on both its past values and additional features or covariates. Here’s a step-by-step guide on performing time series analysis with covariates in Python.

### 1. **Setting Up the Environment**

Start by installing the necessary packages:

```python
!pip install pandas numpy statsmodels matplotlib
```

Import the libraries:

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error
```

### 2. **Loading and Understanding the Data**

Suppose you have a time series dataset containing a target variable, such as sales, and covariates, such as promotions and economic indicators. Here’s an example of creating a synthetic dataset:

```python
# Generate synthetic data
np.random.seed(42)
date_range = pd.date_range(start='2020-01-01', end='2022-01-01', freq='W')
n = len(date_range)

# Target variable (e.g., sales) with a trend and seasonality
sales = 100 + 0.5 * np.arange(n) + 10 * np.sin(2 * np.pi * np.arange(n) / 52) + np.random.normal(0, 5, n)

# Covariates
promotion = np.random.choice([0, 1], size=n)  # Binary variable indicating promotions
economy_index = np.random.normal(1.5, 0.1, n)  # Continuous economic index

# Create DataFrame
data = pd.DataFrame({
    'Date': date_range,
    'Sales': sales,
    'Promotion': promotion,
    'EconomyIndex': economy_index
})
data.set_index('Date', inplace=True)
```

### 3. **Exploratory Data Analysis (EDA)**

Plot the target variable (`Sales`) and covariates (`Promotion` and `EconomyIndex`) over time.

```python
# Plotting Sales and covariates
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.plot(data['Sales'], label='Sales')
plt.title('Sales Over Time')
plt.legend()

plt.subplot(3, 1, 2)
plt.plot(data['Promotion'], label='Promotion', color='orange')
plt.title('Promotion Over Time')
plt.legend()

plt.subplot(3, 1, 3)
plt.plot(data['EconomyIndex'], label='Economy Index', color='green')
plt.title('Economy Index Over Time')
plt.legend()

plt.tight_layout()
plt.show()
```

### 4. **Stationarity Check and Differencing**

Time series models usually assume that the series is stationary. We can use the Augmented Dickey-Fuller test to check for stationarity.

```python
result = adfuller(data['Sales'])
print(f"ADF Statistic: {result[0]}")
print(f"p-value: {result[1]}")

# If p-value > 0.05, difference the series
data['Sales_diff'] = data['Sales'].diff().dropna()
```

If `p-value > 0.05`, the series is non-stationary. Differencing might be necessary.

### 5. **Model Selection: SARIMAX with Covariates**

The `SARIMAX` model in `statsmodels` can handle both seasonal ARIMA modeling and covariates.

- `p`, `d`, `q` represent the AR, differencing, and MA terms.
- `P`, `D`, `Q`, `s` represent seasonal AR, differencing, MA terms, and season length.

Let’s assume we have seasonal data with a weekly seasonality (52 weeks).

```python
# Set parameters for SARIMAX with weekly seasonality
p, d, q = 1, 1, 1  # Adjust these based on model tuning
P, D, Q, s = 1, 1, 1, 52

# Fit the SARIMAX model with covariates
model = SARIMAX(
    data['Sales'], 
    order=(p, d, q), 
    seasonal_order=(P, D, Q, s), 
    exog=data[['Promotion', 'EconomyIndex']]
)
sarimax_model = model.fit(disp=False)

# Model summary
print(sarimax_model.summary())
```

### 6. **Model Diagnostics**

Plot diagnostics to check residuals and see if the model fits well.

```python
sarimax_model.plot_diagnostics(figsize=(12, 8))
plt.show()
```

### 7. **Forecasting with Covariates**

Suppose you want to forecast the next 10 weeks, assuming you have predictions for the covariates (Promotion and EconomyIndex).

```python
# Generate sample covariate data for forecast period
future_dates = pd.date_range(start='2022-01-02', periods=10, freq='W')
future_promotion = np.random.choice([0, 1], size=10)
future_economy_index = np.random.normal(1.5, 0.1, 10)

# Create DataFrame for future covariates
future_data = pd.DataFrame({
    'Promotion': future_promotion,
    'EconomyIndex': future_economy_index
}, index=future_dates)

# Forecast
forecast = sarimax_model.get_forecast(steps=10, exog=future_data)
forecast_ci = forecast.conf_int()
forecast_values = forecast.predicted_mean

# Plot forecast
plt.figure(figsize=(10, 6))
plt.plot(data['Sales'], label='Observed')
plt.plot(forecast_values, label='Forecast', color='red')
plt.fill_between(forecast_ci.index, 
                 forecast_ci.iloc[:, 0], 
                 forecast_ci.iloc[:, 1], color='red', alpha=0.3)
plt.title('Sales Forecast with Covariates')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.show()
```

### 8. **Model Evaluation**

If you have a test dataset, calculate error metrics such as Mean Squared Error (MSE) or Mean Absolute Percentage Error (MAPE) to evaluate the model.

```python
# Example calculation for Mean Squared Error
actual_values = data['Sales'][-10:]  # Last 10 values as actuals
mse = mean_squared_error(actual_values, forecast_values)
print(f"Mean Squared Error: {mse}")
```

### Summary of Steps

1. **Load and prepare the data**.
2. **Plot and examine the time series and covariates**.
3. **Check for stationarity**.
4. **Build a SARIMAX model with covariates**.
5. **Interpret model diagnostics**.
6. **Make forecasts using the model**.
7. **Evaluate model performance**.

This is a basic introduction to using SARIMAX for time series forecasting with covariates in Python. For improved results, consider tuning parameters, testing alternative models, or using more advanced libraries like Facebook's `Prophet` if you have daily data with more complex seasonality.

### Time series with Prophet 

Prophet, developed by Facebook, is a popular tool for time series forecasting. It is designed to handle time series with strong seasonal components and multiple seasonality periods, making it suitable for both daily and weekly seasonality data. Here’s a tutorial on how to perform time series forecasting using Prophet in Python.

#### 1. **Installing Prophet**

First, install the `prophet` package. 

```python
!pip install prophet
```

#### 2. **Setting Up the Environment**

Import the required libraries:

```python
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
```

#### 3. **Loading and Preparing the Data**

Prophet expects a DataFrame with two columns:
- `ds`: Date column in a format Prophet can recognize.
- `y`: Target variable (e.g., sales, stock prices).

Let’s create a synthetic dataset to illustrate.

```python
# Create a date range and generate sample data
date_range = pd.date_range(start='2020-01-01', end='2022-01-01', freq='D')
n = len(date_range)

# Simulated sales data with a trend and seasonality
sales = 100 + 0.5 * np.arange(n) + 10 * np.sin(2 * np.pi * np.arange(n) / 365) + np.random.normal(0, 5, n)

# Create DataFrame for Prophet
data = pd.DataFrame({'ds': date_range, 'y': sales})
```

#### 4. **Modeling with Prophet**

Initialize the Prophet model and fit it to the data:

```python
# Initialize the model with default parameters
model = Prophet()

# Fit the model
model.fit(data)
```

### 5. **Making a Forecast**

To forecast future values, you need to create a dataframe with future dates. Prophet will use this to generate forecasts.

```python
# Define the forecast horizon (e.g., 90 days into the future)
future = model.make_future_dataframe(periods=90)

# Make the forecast
forecast = model.predict(future)
```

### 6. **Visualizing the Forecast**

You can visualize the forecast using Prophet’s built-in plotting function.

```python
## Plot the forecast
fig = model.plot(forecast)
plt.title("Prophet Forecast")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.show()
```

#### 7. **Plotting Forecast Components**

Prophet provides a breakdown of the forecast into its components: trend, weekly seasonality, and yearly seasonality (if applicable). This can help understand the impact of each component on the forecast.

```python
# Plot the forecast components
fig2 = model.plot_components(forecast)
plt.show()
```

#### 8. **Adding Covariates (External Regressors)**

Prophet allows you to add additional regressors, such as promotions or economic indicators, as covariates. Let’s add a `promotion` variable as an example.

```python
# Generate a synthetic promotion feature
data['promotion'] = np.random.choice([0, 1], size=n)

# Add the covariate to the model
model = Prophet()
model.add_regressor('promotion')

# Fit the model with the new covariate
model.fit(data)

# Forecast with future covariate data
future = model.make_future_dataframe(periods=90)
future['promotion'] = np.random.choice([0, 1], size=len(future))  # Random promotions for future

forecast = model.predict(future)
```

#### 9. **Evaluating the Model**

If you have a separate test set, you can evaluate Prophet’s predictions by calculating metrics such as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).

```python
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Split data for train/test evaluation
train_data = data.iloc[:-90]
test_data = data.iloc[-90:]

# Fit the model on the training set
model.fit(train_data)

# Predict for the test period
test_forecast = model.predict(pd.DataFrame({'ds': test_data['ds'], 'promotion': test_data['promotion']}))

# Calculate error metrics
mae = mean_absolute_error(test_data['y'], test_forecast['yhat'])
rmse = np.sqrt(mean_squared_error(test_data['y'], test_forecast['yhat']))

print(f"MAE: {mae}")
print(f"RMSE: {rmse}")
```

#### Summary

1. **Load and prepare data** in a format Prophet can use.
2. **Initialize and fit the Prophet model**.
3. **Forecast future values**.
4. **Visualize the forecast and components**.
5. **Add covariates** to improve the model (optional).
6. **Evaluate the model** with error metrics.

This tutorial provides a foundation for using Prophet in Python for time series forecasting, including using covariates to enhance predictions.

https://www.kaggle.com/code/sumi25/understand-arima-and-tune-p-d-q
https://github.com/williewheeler/time-series-demos/blob/master/arima/arima-python.ipynb