# Chapter 71: Data Visualization

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand the principles of effective data visualization for time‑series analysis
- Choose the right chart type for different analytical tasks (trends, distributions, comparisons)
- Create clear and informative time‑series plots using Matplotlib, Seaborn, and Plotly
- Visualize decompositions of time series into trend, seasonality, and residual components
- Build interactive visualizations with Plotly to enable exploration and drill‑down
- Use heatmaps and correlation matrices to understand relationships between multiple time series
- Create candlestick charts and other financial visualizations for NEPSE stock data
- Apply best practices for color, labeling, and accessibility in visualizations
- Integrate visualizations into reports and dashboards for effective communication

---

## Introduction

Data visualization is a cornerstone of exploratory data analysis and communication in any data science project. For time‑series prediction systems like the one we are building for NEPSE, visualizations serve multiple purposes:

- **Exploration**: Understanding patterns, trends, seasonality, and anomalies in the raw data.
- **Diagnostics**: Checking model residuals, feature distributions, and prediction errors.
- **Communication**: Presenting results to stakeholders (traders, management) in an intuitive way.

A well‑designed visualization can reveal insights that summary statistics miss. Conversely, a poorly designed chart can mislead or confuse. In this chapter, we will explore the principles of effective visualization and apply them to the NEPSE dataset using Python's rich ecosystem of visualization libraries: **Matplotlib**, **Seaborn**, and **Plotly**. We'll cover everything from basic line charts to advanced interactive financial plots.

---

## 71.1 Principles of Effective Visualization

Before diving into code, it's essential to understand what makes a visualization effective. Edward Tufte, a pioneer in data visualization, introduced several principles, including:

- **Show the data**: Avoid chartjunk (unnecessary decorations) that distracts from the data.
- **Maximize the data‑ink ratio**: Use as much ink as possible to display the data, and as little as possible for non‑data elements.
- **Avoid distorting the data**: Ensure axes are scaled appropriately and not truncated misleadingly.
- **Use clear labeling**: Titles, axis labels, and legends should be unambiguous.
- **Choose the right chart type**: Different data and tasks require different visual encodings.

For time‑series data specifically, we often want to:

- Show trends over time (line charts).
- Compare multiple series (multiple lines, small multiples).
- Show distributions (histograms, box plots).
- Highlight relationships (scatter plots, correlation heatmaps).
- Focus on seasonality and cycles (decomposition plots, seasonal subseries plots).

---

## 71.2 Setting Up the Visualization Environment

We'll use the following libraries:

- **Matplotlib**: The foundational plotting library; highly customizable.
- **Seaborn**: Built on Matplotlib, provides high‑level interface for statistical graphics.
- **Plotly**: Interactive plots that can be embedded in web pages or Jupyter notebooks.

Install them if you haven't already:

```bash
pip install matplotlib seaborn plotly pandas numpy
```

In Jupyter notebooks, you can enable Plotly's offline mode:

```python
import plotly.offline as pyo
pyo.init_notebook_mode()
```

Or simply use `plotly.express` which works directly.

---

## 71.3 Basic Time‑Series Plots with Matplotlib

Matplotlib is the workhorse of Python visualization. It gives you fine‑grained control over every element of a plot.

### 71.3.1 Loading NEPSE Data

Assume we have a CSV file with daily data for a single stock (e.g., NABIL). We'll load it and set the date as index.

```python
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Load data
df = pd.read_csv('nepse_nabil.csv', parse_dates=['Date'], index_col='Date')
df.sort_index(inplace=True)

# Display first few rows
print(df.head())
```

### 71.3.2 Simple Line Chart

A line chart of closing prices over time.

```python
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Close'], color='blue', linewidth=1)
plt.title('NABIL Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price (NPR)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```

**Explanation:**  
`plt.plot` creates a line chart. We set the figure size, add a title, labels, and a grid. `tight_layout` adjusts spacing to prevent clipping of labels.

### 71.3.3 Formatting Date Axis

Matplotlib can format date axes nicely. Use `mdates` to set locators and formatters.

```python
fig, ax = plt.subplots(figsize=(12,6))
ax.plot(df.index, df['Close'], color='blue', linewidth=1)

# Set major ticks to years, minor ticks to months
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())

plt.title('NABIL Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price (NPR)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```

**Explanation:**  
`YearLocator` places ticks at the start of each year; `MonthLocator` places minor ticks at each month. The formatter displays the year.

### 71.3.4 Multiple Series on One Plot

Compare closing prices of two stocks.

```python
# Load data for another stock, e.g., NTC
df_ntc = pd.read_csv('nepse_ntc.csv', parse_dates=['Date'], index_col='Date')
df_ntc.sort_index(inplace=True)

plt.figure(figsize=(12,6))
plt.plot(df.index, df['Close'], label='NABIL', linewidth=1)
plt.plot(df_ntc.index, df_ntc['Close'], label='NTC', linewidth=1)
plt.title('NABIL vs NTC Closing Prices')
plt.xlabel('Date')
plt.ylabel('Price (NPR)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```

**Explanation:**  
We use `label` in each plot and then call `plt.legend()` to display the legend. This is useful for comparing series.

### 71.3.5 Subplots for Multiple Views

Sometimes you want to show different aspects in separate subplots.

```python
fig, axes = plt.subplots(3, 1, figsize=(12,10), sharex=True)

# Price
axes[0].plot(df.index, df['Close'], color='blue')
axes[0].set_ylabel('Price')
axes[0].set_title('NABIL Price and Volume')

# Volume
axes[1].bar(df.index, df['Volume'], color='green', width=1)  # width=1 for daily bars
axes[1].set_ylabel('Volume')

# Returns
returns = df['Close'].pct_change().dropna()
axes[2].plot(returns.index, returns, color='red')
axes[2].set_ylabel('Daily Return')
axes[2].set_xlabel('Date')

plt.tight_layout()
plt.show()
```

**Explanation:**  
`subplots` creates an array of axes. We plot different data on each, and `sharex=True` ensures they share the same x‑axis for easy comparison.

---

## 71.4 Statistical Visualizations with Seaborn

Seaborn simplifies the creation of statistical plots. It works well with pandas DataFrames.

### 71.4.1 Distribution of Returns

A histogram with kernel density estimate (KDE) shows the distribution of daily returns.

```python
import seaborn as sns

returns = df['Close'].pct_change().dropna().to_frame(name='return')
sns.histplot(data=returns, x='return', kde=True, bins=50)
plt.title('Distribution of Daily Returns')
plt.xlabel('Return')
plt.ylabel('Frequency')
plt.show()
```

**Explanation:**  
`sns.histplot` creates a histogram. The `kde=True` adds a smoothed density curve, which helps see the shape of the distribution.

### 71.4.2 Box Plot by Month

To see how returns vary by month, we can create a box plot.

```python
# Create a month column
returns['month'] = returns.index.month

plt.figure(figsize=(10,6))
sns.boxplot(data=returns, x='month', y='return')
plt.title('Monthly Distribution of Returns')
plt.xlabel('Month')
plt.ylabel('Return')
plt.show()
```

**Explanation:**  
Box plots show median, quartiles, and outliers. This can reveal seasonal patterns in volatility.

### 71.4.3 Correlation Heatmap

When analyzing multiple stocks, a correlation matrix shows relationships.

```python
# Assume we have a DataFrame `prices` with columns for each stock's closing price
prices = pd.DataFrame({
    'NABIL': df_nabil['Close'],
    'NTC': df_ntc['Close'],
    'SBI': df_sbi['Close']
}).dropna()

# Compute correlation matrix
corr = prices.corr()

# Plot heatmap
plt.figure(figsize=(8,6))
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, center=0)
plt.title('Correlation Matrix of Stock Prices')
plt.show()
```

**Explanation:**  
`annot=True` displays the correlation values in each cell. The `coolwarm` colormap diverges from blue (negative) to red (positive), with white at zero. This is useful for identifying stocks that move together.

### 71.4.4 Pair Plot for Multivariate Relationships

A pair plot shows scatter plots of all pairs of variables, plus distributions on the diagonal.

```python
# Use returns for better stationarity
returns = prices.pct_change().dropna()
sns.pairplot(returns)
plt.show()
```

**Explanation:**  
This is a great exploratory tool, but can be overwhelming with many variables. It shows both the relationship between each pair and the distribution of each variable.

---

## 71.5 Interactive Visualizations with Plotly

Interactive plots allow users to zoom, pan, hover for details, and toggle series. Plotly is excellent for this.

### 71.5.1 Basic Line Chart with Plotly Express

```python
import plotly.express as px

fig = px.line(df, x=df.index, y='Close', title='NABIL Closing Price')
fig.show()
```

**Explanation:**  
Plotly Express creates an interactive chart. Hover shows the exact value at each point. You can zoom and pan.

### 71.5.2 Adding Multiple Lines

```python
# Combine two dataframes
df_combined = pd.DataFrame({
    'NABIL': df_nabil['Close'],
    'NTC': df_ntc['Close']
}).dropna()

fig = px.line(df_combined, x=df_combined.index, y=df_combined.columns,
              title='NABIL vs NTC Closing Prices')
fig.show()
```

**Explanation:**  
Passing the entire DataFrame with multiple columns automatically creates a line for each column, with a legend.

### 71.5.3 Candlestick Charts

For financial data, candlestick charts show open, high, low, close for each period. Plotly has a dedicated `plotly.graph_objects` Candlestick trace.

```python
import plotly.graph_objects as go

fig = go.Figure(data=[go.Candlestick(
    x=df.index,
    open=df['Open'],
    high=df['High'],
    low=df['Low'],
    close=df['Close'],
    name='NABIL'
)])

fig.update_layout(
    title='NABIL Candlestick Chart',
    yaxis_title='Price (NPR)',
    xaxis_rangeslider_visible=False  # hides the range slider at the bottom
)
fig.show()
```

**Explanation:**  
Candlestick charts are essential for traders. Each candle represents a time period (day, hour, etc.). The body shows open‑close range, and the wicks show high‑low. Red (or hollow) typically indicates a down period (close < open), green (or filled) indicates up.

### 71.5.4 Interactive Subplots

Plotly can create subplots with shared interactivity.

```python
from plotly.subplots import make_subplots

fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    vertical_spacing=0.05,
    subplot_titles=('Price', 'Volume')
)

# Price trace
fig.add_trace(go.Scatter(x=df.index, y=df['Close'], name='Price'), row=1, col=1)

# Volume trace (bar chart)
fig.add_trace(go.Bar(x=df.index, y=df['Volume'], name='Volume'), row=2, col=1)

fig.update_layout(height=600, title='NABIL Price and Volume')
fig.update_xaxes(rangeslider_visible=False)
fig.show()
```

**Explanation:**  
`make_subplots` creates a grid of plots. We add traces to specific rows and columns. The resulting chart is fully interactive.

---

## 71.6 Time‑Series Decomposition Plots

Decomposing a time series into trend, seasonal, and residual components is a common task. We can visualize the results using Matplotlib.

```python
from statsmodels.tsa.seasonal import seasonal_decompose

# Assuming daily data with weekly seasonality (period=5 for trading days)
decomposition = seasonal_decompose(df['Close'], model='additive', period=5)

fig, axes = plt.subplots(4, 1, figsize=(12,10), sharex=True)

df['Close'].plot(ax=axes[0], title='Original')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')

plt.tight_layout()
plt.show()
```

**Explanation:**  
`seasonal_decompose` returns components. We plot each on separate subplots. This helps diagnose if the decomposition is meaningful.

For interactive decomposition, we could use Plotly with subplots.

---

## 71.7 Visualizing Model Performance

After training a model, we need to visualize its predictions against actuals, and analyze errors.

### 71.7.1 Actual vs. Predicted Scatter Plot

```python
# Assume we have test predictions
test_df = pd.DataFrame({
    'actual': y_test,
    'predicted': y_pred
}, index=X_test.index)

fig = px.scatter(test_df, x='actual', y='predicted', 
                 title='Actual vs Predicted Returns',
                 labels={'actual': 'Actual Return', 'predicted': 'Predicted Return'})
fig.add_shape(type='line', x0=test_df['actual'].min(), y0=test_df['actual'].min(),
              x1=test_df['actual'].max(), y1=test_df['actual'].max(),
              line=dict(color='red', dash='dash'))
fig.show()
```

**Explanation:**  
A scatter plot of actual vs. predicted, with a diagonal line indicating perfect predictions. Points above the line are over‑predictions, below are under‑predictions.

### 71.7.2 Residual Plot

Residuals (errors) should be randomly scattered around zero.

```python
test_df['residual'] = test_df['predicted'] - test_df['actual']

fig = px.scatter(test_df, x='actual', y='residual', 
                 title='Residual Plot',
                 labels={'actual': 'Actual Return', 'residual': 'Residual'})
fig.add_hline(y=0, line_dash='dash', line_color='red')
fig.show()
```

**Explanation:**  
If there's a pattern in the residuals (e.g., increasing variance), it suggests heteroscedasticity or model misspecification.

### 71.7.3 Time Series of Predictions

Plot predictions and actuals over time.

```python
fig = go.Figure()
fig.add_trace(go.Scatter(x=test_df.index, y=test_df['actual'], mode='lines', name='Actual'))
fig.add_trace(go.Scatter(x=test_df.index, y=test_df['predicted'], mode='lines', name='Predicted'))
fig.update_layout(title='Actual vs Predicted Over Time', xaxis_title='Date', yaxis_title='Return')
fig.show()
```

---

## 71.8 Advanced Financial Visualizations

### 71.8.1 Moving Averages Overlay

```python
# Compute moving averages
df['SMA_20'] = df['Close'].rolling(20).mean()
df['SMA_50'] = df['Close'].rolling(50).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df['Close'], mode='lines', name='Close'))
fig.add_trace(go.Scatter(x=df.index, y=df['SMA_20'], mode='lines', name='SMA 20'))
fig.add_trace(go.Scatter(x=df.index, y=df['SMA_50'], mode='lines', name='SMA 50'))
fig.update_layout(title='NABIL with Moving Averages')
fig.show()
```

### 71.8.2 Bollinger Bands

```python
window = 20
df['SMA'] = df['Close'].rolling(window).mean()
df['STD'] = df['Close'].rolling(window).std()
df['Upper'] = df['SMA'] + 2 * df['STD']
df['Lower'] = df['SMA'] - 2 * df['STD']

fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df['Close'], mode='lines', name='Close'))
fig.add_trace(go.Scatter(x=df.index, y=df['Upper'], mode='lines', name='Upper Band', line=dict(dash='dash')))
fig.add_trace(go.Scatter(x=df.index, y=df['SMA'], mode='lines', name='SMA'))
fig.add_trace(go.Scatter(x=df.index, y=df['Lower'], mode='lines', name='Lower Band', line=dict(dash='dash')))

# Fill between upper and lower
fig.add_trace(go.Scatter(x=df.index, y=df['Upper'], fill=None, mode='lines', showlegend=False))
fig.add_trace(go.Scatter(x=df.index, y=df['Lower'], fill='tonexty', mode='lines', showlegend=False))

fig.update_layout(title='Bollinger Bands')
fig.show()
```

**Explanation:**  
Bollinger Bands show volatility. When the price touches the upper band, it may be overbought; touching the lower band, oversold. The fill between bands helps visualize the channel.

### 71.8.3 Relative Strength Index (RSI)

```python
def compute_rsi(data, period=14):
    delta = data.diff()
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    avg_gain = gain.rolling(window=period).mean()
    avg_loss = loss.rolling(window=period).mean()
    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

df['RSI'] = compute_rsi(df['Close'])

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.05)

fig.add_trace(go.Scatter(x=df.index, y=df['Close'], name='Price'), row=1, col=1)
fig.add_trace(go.Scatter(x=df.index, y=df['RSI'], name='RSI'), row=2, col=1)
fig.add_hline(y=70, line_dash='dash', line_color='red', row=2, col=1)
fig.add_hline(y=30, line_dash='dash', line_color='green', row=2, col=1)

fig.update_layout(title='Price and RSI')
fig.show()
```

**Explanation:**  
RSI is a momentum oscillator. Values above 70 indicate overbought, below 30 oversold. The horizontal lines help identify these zones.

---

## 71.9 Visualization Best Practices

1. **Label axes clearly**: Include units (e.g., NPR, volume in millions).
2. **Use titles and legends**: Ensure the chart is self‑contained.
3. **Choose appropriate scales**: Log scales can be useful for wide‑ranging data.
4. **Avoid 3D charts**: They often distort perception.
5. **Use color purposefully**: For categorical data, use distinct colors; for sequential data, use gradients.
6. **Highlight important points**: Use annotations to draw attention.
7. **Keep it simple**: Remove unnecessary gridlines, borders, and decorations.
8. **Test on different devices**: Ensure readability on screens and in print.
9. **Provide context**: Include benchmarks or historical averages where relevant.

---

## Chapter Summary

In this chapter, we explored data visualization techniques for time‑series analysis, using the NEPSE stock prediction system as a running example. We covered:

- Principles of effective visualization.
- Basic and advanced plots with Matplotlib, Seaborn, and Plotly.
- Time‑series specific plots: line charts, candlestick charts, moving averages, Bollinger Bands, RSI.
- Statistical visualizations: distributions, box plots, correlation heatmaps.
- Interactive visualizations with Plotly for exploration and communication.
- Visualizing model performance with actual vs. predicted and residual plots.
- Best practices to create clear, informative, and honest visualizations.

Visualizations are a powerful tool for understanding data, diagnosing models, and communicating results. By mastering these techniques, you can turn the NEPSE prediction system's outputs into actionable insights for traders and stakeholders.

In the next chapter, we will discuss **Interactive Exploration Tools**, building on these visualizations to create interactive applications for deeper analysis.

---

**End of Chapter 71**