## Section 1: Environment Setup and CJA Initialization

To kickstart our data analysis and visualization tasks using Python, we begin by importing the essential libraries and modules. This setup ensures we have all the necessary tools at our disposal for effective data manipulation and graphical representation.

### Key Components of the Setup:
- **cjapy Library**: Facilitates interaction with Adobe's Customer Journey Analytics (CJA) API, enabling us to query and retrieve data seamlessly.
- **Plotly**: A versatile graphing library for creating interactive and aesthetically pleasing visualizations.
- **Datetime and JSON**: Support working with dates and handling JSON data, which are crucial for processing and analyzing time-series data and configuration files.

After importing the libraries, we proceed to configure and initialize the CJA object. This step involves loading a configuration file (`python_config.json`), which contains necessary credentials and settings for accessing CJA. Finally, we specify a Data View ID, targeting the specific data set within CJA that we wish to analyze.

In [None]:
import cjapy
from datetime import datetime, timedelta
import plotly.graph_objs as go
import json

# Load the configuration and initialize the CJA object
cjapy.importConfigFile("python_config.json")
cja = cjapy.CJA()

# Specify the Data View ID for analysis
data_view = "dv_62ba17d5a5d7845496f5fb4d"

## Section 2: Retrieving and Processing CJA Data

In this section, we'll enhance our data analysis workflow by retrieving a specific report from Customer Journey Analytics (CJA) using the `cjapy` library. Our objective is to convert 'day of the year' data points into actual calendar dates for more intuitive analysis and visualization. This process involves defining a custom function to perform the date conversion, specifying the dimension and metric for our report, and setting an appropriate date range for data retrieval.

### Steps Involved:
1. **Date Conversion Function**: We'll implement a Python function that converts the 'day of the year' into a comprehensible date format (`YYYY-MM-DD`).
2. **Report Specification**: Selecting the relevant dimension and metric to tailor our report request to our analytical needs.
3. **Data Retrieval**: Utilizing `cjapy` to query CJA and pull the desired report within a specified date range.
4. **Dataframe Preparation**: Post-retrieval, we'll transform the 'day of the year' data into actual dates within our dataframe, sort the data by date, and ensure numeric data types are correctly applied for further analysis.

This approach allows us to work with date-specific data more effectively, laying the groundwork for insightful temporal analyses.

In [None]:
# Function to convert day of year to date
def day_of_year_to_date(year, day_of_year):
    day_of_year = int(day_of_year)  # Convert to integer
    return (datetime(year, 1, 1) + timedelta(day_of_year - 1)).strftime('%Y-%m-%d')

# Pick dimension and metric
dimension = "variables/timepartdayofyear"
metric = "metrics/orders"
dateRange = "2024-01-01T00:00:00.000/2024-01-31T00:00:00.000"

# Define the report request
myRequest = cjapy.RequestCreator()
myRequest.setDataViewId(data_view)
myRequest.setDimension(dimension)
myRequest.addMetric(metric)
myRequest.addGlobalFilter(dateRange)

# Pull and print the report from CJA
myReport = cja.getReport(myRequest)

# Convert day of year to date and sort the dataframe
sorted_df = myReport.dataframe.copy()
sorted_df[dimension] = sorted_df[dimension].apply(lambda x: day_of_year_to_date(2024, x))
sorted_df.sort_values(by=dimension, inplace=True)

# Convert "metrics/orders" column to whole numbers
sorted_df[metric] = sorted_df[metric].astype(int)

# Print the sorted dataframe with dimension and metric
print(sorted_df[[dimension, metric]])

## Section 3: Visualizing Trended Orders with a Line Plot

In this section, we turn our attention to visualizing the trend of orders over time using a line plot. This approach is particularly effective for observing changes and patterns in order volume across a specific timeframe.

### Creating a Line Plot with Plotly
We utilize Plotly to create an interactive line plot, enabling us to trace the fluctuation in order numbers by day. This visualization method allows for an intuitive understanding of trends, peaks, and troughs in the data.

- **X-axis**: Represents the dates, providing a chronological sequence of data points.
- **Y-axis**: Shows the number of orders, allowing for a quantitative assessment of order volume over time.
- **Line and Markers**: The plot combines lines and markers to highlight individual data points while illustrating the overall trend.

This line plot serves as a foundational tool for temporal analysis, offering insights into the dynamics of customer orders.

In [None]:
# Define data for the plot
x_values = sorted_df[dimension]
y_values = sorted_df[metric]

# Create a line plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_values, y=y_values, mode='lines+markers', name=metric))

# Update layout
fig.update_layout(title='Orders Over Time',
                   xaxis_title='Date',
                   yaxis_title='Orders')

# Show plot
fig.show()

## Section 4: Visualizing Orders with a Bar Plot

Following our trend analysis, we explore another visualization technique—creating a bar plot to examine orders over time. Bar plots are particularly useful for comparing quantities across different categories or time periods.

### Crafting a Bar Plot Using Plotly
With Plotly, we construct a bar plot that categorizes the number of orders by day. This visualization method is adept at revealing variations in order volume, providing a clear comparative view across different dates.

- **X-axis**: Displays dates, offering a discrete comparison between different days.
- **Y-axis**: Indicates the number of orders, quantifying customer activity.
- **Bars**: Each bar represents the order volume for a given day, facilitating an easy comparison of data across the timeline.

The bar plot enriches our analytical toolkit, enabling straightforward comparisons and enhancing our understanding of order trends.

In [None]:
# Define data for the plot
x_values = sorted_df[dimension]
y_values = sorted_df[metric]

# Create a bar chart
fig = go.Figure()
fig.add_trace(go.Bar(x=x_values, y=y_values, name=metric))

# Update layout
fig.update_layout(title='Orders Over Time',
                   xaxis_title='Date',
                   yaxis_title='Orders')

# Show plot
fig.show()


## Section 5: Visualizing Orders with a Scatter Plot

To complement our previous visualizations, this section introduces a scatter plot to depict the distribution of orders over time. Scatter plots excel at showcasing individual data points, making them ideal for identifying outliers and understanding the spread of data.

### Generating a Scatter Plot with Plotly
Employing Plotly, we create a scatter plot that plots each order's occurrence by day. This method emphasizes the discrete nature of data points without assuming continuity between them.

- **X-axis**: Represents the dates, serving as the basis for data distribution.
- **Y-axis**: Reflects the number of orders, highlighting the variance in daily order counts.
- **Markers**: Each marker denotes an individual data point, allowing for detailed observation of the data's characteristics.

This scatter plot offers a nuanced view of the data, providing a granular look at the daily order patterns and enhancing our analytical depth.

In [None]:
# Define data for the plot
x_values = sorted_df[dimension]
y_values = sorted_df[metric]

# Create a scatter plot with markers only
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_values, y=y_values, mode='markers', name=metric))

# Update layout
fig.update_layout(title='Orders Over Time',
                   xaxis_title='Date',
                   yaxis_title='Orders')

# Show plot
fig.show()


## Section 6: Forecasting and Visualizing Future Orders

Predictive analytics is an essential facet of data science, allowing us to make educated guesses about future trends. In this section, we apply a Seasonal Autoregressive Integrated Moving Average (SARIMAX) model, a sophisticated statistical algorithm capable of capturing both trends and seasonality in time series data. Our goal is to forecast the number of orders for the upcoming month and visualize these predictions along with their confidence intervals.

### Steps for Forecasting:
1. **Model Specification**: We define a SARIMAX model with appropriate parameters to fit our historical orders data. The parameters `(1, 1, 1)` and seasonal order `(1, 1, 1, 12)` are assumed for demonstration purposes and may need to be fine-tuned for your specific dataset.
2. **Model Fitting**: We fit the model to our historical data, allowing us to capture the underlying patterns and relationships.
3. **Prediction and Confidence Intervals**: We forecast future orders for the next 30 days and calculate the confidence intervals to understand the potential range of variation in the predictions.
4. **Visualization**: Using Plotly, we plot the historical data, the forecasted orders, and the confidence intervals. The forecasted orders are distinguished by a dotted line, and the confidence intervals are shaded to indicate the level of certainty in our predictions.

By incorporating these forecasting techniques into our analysis, we not only gain insight into what the future might hold but also quantify our uncertainty, which is invaluable for risk assessment and strategic planning.

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX
import pandas as pd
import numpy as np
import plotly.graph_objs as go

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

# 'x_values' are the dates, and 'y_values' are the corresponding orders
x_values = pd.to_datetime(sorted_df[dimension])
y_values = sorted_df[metric]

# Define the SARIMAX model
model = SARIMAX(y_values, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))

# Fit the model
fitted_model = model.fit(disp=False)

# Forecast the next 30 days
forecast_length = 30
forecast_result = fitted_model.get_forecast(steps=forecast_length)

# Define forecast index as a date range starting the day after the last date in x_values
forecast_index = pd.date_range(start=x_values.iloc[-1] + pd.Timedelta(days=1), periods=forecast_length, freq='D')

# Get the forecast and the confidence intervals
forecast = forecast_result.predicted_mean
conf_int = forecast_result.conf_int()

# Create a new figure for plotting
fig = go.Figure()

# Plot historical data
fig.add_trace(go.Scatter(x=x_values, y=y_values, mode='lines', name='Historical Orders'))

# Plot forecasted data
fig.add_trace(go.Scatter(x=forecast_index, y=forecast, mode='lines+markers', name='Forecast', line=dict(dash='dot')))

# Plot confidence intervals
fig.add_trace(go.Scatter(x=np.concatenate([forecast_index, forecast_index[::-1]]),
                         y=np.concatenate([conf_int.iloc[:, 0], conf_int.iloc[:, 1][::-1]]),
                         fill='toself', fillcolor='rgba(0,100,80,0.2)',
                         line=dict(color='rgba(255,255,255,0)'), name='Confidence Band'))

# Update layout
fig.update_layout(title='Forecast of Orders Over Time with Confidence Bands',
                  xaxis_title='Date', yaxis_title='Orders', legend=dict(y=0.5, traceorder='reversed'))

# Show plot
fig.show()