# Are Your Trendlines Meaningful? A Guide for Data Visualization.
## How to ensure that your trendlines are valid

You can draw a trendline on any scatter chart. But should you?

A trendline should show some relationship between the variables being plotted but how do we know if there really is a meaningful relationship?

Plotting a trendline over random data is clearly not sensible. Plotting a trendline over a set of points that form a straight line will simply overlay that straight line. Most plots will be somewhere in between.



In [5]:
import plotly.express as px
import numpy as np
import pandas as pd

# set the default plotly express template to 'plotly_white'
px.defaults.template = 'plotly_white'
# set the default width and height for all subsequent plotly figures
px.defaults.width = 800
px.defaults.height = 600

In [11]:
# Generate random data
x = np.random.rand(100) * 100  # Random x values between 0 and 100
y = np.random.rand(100) * 50  # Random x values between 0 and 100

# Create a DataFrame
data = pd.DataFrame({'x': x, 'y': y})

# Plot scatter chart with trendline
fig = px.scatter(data, x='x', y='y', title='Scatter Chart with Trendline (Random data)',
                 labels={'x': 'X-axis', 'y': 'Y-axis',},
                 trendline='ols', trendline_color_override='red')  # Ordinary Least Squares regression for trendline

# Show the plot
fig.show()


# Extract trendline results
results = px.get_trendline_results(fig)
ols_model = results.iloc[0]["px_fit_results"]  # OLS regression model

# Print p-value and R²
print("Random Data:")
print("P-Value:", ols_model.pvalues[1])  # p-value for the slope
print("R²:", ols_model.rsquared)


Random Data:
P-Value: 0.4897676270987973
R²: 0.00488066061729886


In [10]:
import plotly.express as px
import numpy as np
import pandas as pd

# Generate linear data
x = np.linspace(0, 100, 10)  # 100 evenly spaced x values between 0 and 100
y = 5 * x + 10  # Linear function: y = 5x + 10

# Create a DataFrame
data = pd.DataFrame({'x': x, 'y': y})

# Plot scatter chart with trendline
fig = px.scatter(data, x='x', y='y', title='Scatter Chart with Trendline (Linear Data)',
                 labels={'x': 'X-axis', 'y': 'Y-axis'},
                 trendline='ols', trendline_color_override='red')  # Ordinary Least Squares regression for trendline

# Show the plot
fig.show()

# Extract trendline results
results = px.get_trendline_results(fig)
ols_model = results.iloc[0]["px_fit_results"]  # OLS regression model

# Print p-value and R²
print("Random Data:")
print("P-Value:", ols_model.pvalues[1])  # p-value for the slope
print("R²:", ols_model.rsquared)

Random Data:
P-Value: 1.4099324978279833e-124
R²: 1.0


In [4]:
# Generate random data
x = np.random.rand(100) * 100  # Random x values between 0 and 100
y = 3 * x + np.random.randn(100) * 30  # y = 3x with noise

# Create a DataFrame
data = pd.DataFrame({'x': x, 'y': y})

# Plot scatter chart with trendline
fig = px.scatter(data, x='x', y='y', title='Scatter Chart with Trendline',
                 labels={'x': 'X-axis', 'y': 'Y-axis'},
                 trendline='ols', trendline_color_override='red')  # Ordinary Least Squares regression for trendline

# Show the plot
fig.show()

# Extract trendline results
results = px.get_trendline_results(fig)
ols_model = results.iloc[0]["px_fit_results"]  # OLS regression model

# Print p-value and R²
print("Random Data:")
print("P-Value:", ols_model.pvalues[1])  # p-value for the slope
print("R²:", ols_model.rsquared)


Random Data:
P-Value: 1.5509131226498723e-51
R²: 0.9034004881292087


Output Explanation
P-Value:

For random data, the p-value may vary, depending on how strong the correlation is. A low p-value (< 0.05) indicates statistical significance.
For perfectly linear data, the p-value for the slope will be extremely small (close to 0), indicating a perfect fit.
R²:

For random data, 
𝑅
2
R 
2
  will reflect how much variability in 
𝑦
y is explained by 
𝑥
x. It may not be close to 1.
For perfectly linear data, 
𝑅
2
R 
2
  will be exactly 1.0, indicating all variability in 
𝑦
y is explained by 
𝑥
x.
How It Works
The px.get_trendline_results() function retrieves the regression results generated by Plotly Express.
The px_fit_results column contains the fitted OLS model, which is an object from statsmodels. You can use .pvalues and .rsquared to access the regression statistics.
This method allows you to validate your trendlines with statistical rigor.