A Guide To Modular Prompting GPT-4 For Interactive Python Dashboards

Creating interactive Python Plotly dashboards one chart at a time

As a computer science professor, over the last 8 months I have tirelessly tasked GPT-4 to generate Python Plotly dashboard code.

I have found recently that GPT-4 has improved immensely in its ability to create Python Plotly code for displaying interactive data visuals.

Your edge as a data scientist may be your ability to whip up quick and clean interactive charts — to offer a set of views into a dataset — over time.

Need proof? Yes!

So, let’s create a set of 4 diverse visualizations for our interactive dashboard: stacked area, line, pie, and box plot charts.

And I will leave you with 3 awesome rules to ensure you can re-create an interactive dashboard with any data set.

When? Right now!

The Data Set
Let’s start with an interesting data set. Let’s use a data set on population projections — from a new report — the new UN population projections report (found HERE).

To start off with, we can upload the UN Population Projections dataset to GPT-4 and ask it to give us an overview of what it finds:

To start off with, we can upload the UN Population Projections dataset to GPT-4 and ask it to give us an overview of what it finds:

By doing this, we can see if GPT-4’s interpretation of the data set is aligned with our interpretation of the dataset.

An additional benefit to performing this step is that it gives GPT-4 an idea of what fields actually exist for analysis (so it doesn’t just “make some up”).

GPT-4’s response (only including relevant fields for this exercise):

Region, subregion, country, or area: The name of the country or region the data pertains to.
ISO3 Alpha-code: The 3-letter country code as per ISO standards.
Year: The year to which the data entry corresponds.
Age columns: The remaining columns from “0–4” to “80+” appear to be age-specific data, likely representing percentages of the population within that age group.
Now that we know GPT-4 has the correct information, we can move on to the charts.

NOTE: To ensure GPT-4 doesn’t forget about your data set, make sure you remind GPT-4 to use the provided data set when you are prompted.

Chart 1. Stacked Area — Age Proportion over Time
Let’s start out with one of my favorite visualizations — a stacked area chart.

A stacked area chart shows how different parts add up to a whole over time, making it easy to see how each part changes in relation to others. It helps in understanding not just the overall trends but also the size of each category through time.

For our stacked area chart, let’s allow a user to select a country from the dataset, and then we can display the distribution of age categories over time.

To easily create this as a Python Plotly dashboard, we can easily prompt ChatGPT to create the Python code for us.

Prompt to GPT-4: Using the data set provided, generate a Python script using Plotly and Dash to create an interactive stacked area chart. The chart should allow users to select any country from a dropdown menu to visualize the proportions of various age groups over time based on the provided dataset. Ensure the dropdown lists the country names rather than their ISO codes.

Response from GPT-4:

In [None]:
data = pd.read_csv('/Users/ulrike_imac_air/projects/R_themes/dashboards/WPP2022.csv')

In [17]:
import dash
from dash import html, dcc, Input, Output
import plotly.express as px
import pandas as pd

# Load your dataset
file_path = '/Users/ulrike_imac_air/projects/R_themes/dashboards/WPP2022.csv'
data = pd.read_csv(file_path)

# Ensure there are no leading or trailing spaces in column names
data.columns = data.columns.str.strip()

# Debug: Print column names to identify age group columns
print("Columns in the DataFrame:", data.columns)

# Identify age group columns (assuming similar names in your dataset)
# Adjust these column names based on the actual column names in your dataset
age_group_columns = ['Pop0_14', 'Pop15_64', 'Pop65Plus']  # Example placeholders, replace with actual names

# Filter the data to include only necessary columns for this visualization
columns_needed = ['Location', 'Time'] + age_group_columns
data_filtered = data[columns_needed].dropna()

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    html.H1("Interactive Age Group Trends"),
    dcc.Dropdown(
        id='country-dropdown',
        options=[{'label': country, 'value': country} for country in data_filtered['Location'].unique()],
        value=data_filtered['Location'].unique()[0]  # Default to the first country
    ),
    dcc.Graph(id='age-trend-graph')
])

# Define callback to update graph based on dropdown selection
@app.callback(
    Output('age-trend-graph', 'figure'),
    Input('country-dropdown', 'value')
)
def update_figure(selected_country):
    # Filter the data based on the selected country
    filtered_data = data_filtered[data_filtered['Location'] == selected_country]
    # Create figure using Plotly Express
    fig = px.area(filtered_data, 
                  x='Time', 
                  y=age_group_columns,
                  labels={'value': 'Percentage', 'variable': 'Age Group'},
                  title=f"Age Group Proportions Over Time: {selected_country}")
    
    return fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)


Columns in the DataFrame: Index(['SortOrder', 'LocID', 'Notes', 'ISO3_code', 'ISO2_code', 'SDMX_code',
       'LocTypeID', 'LocTypeName', 'ParentID', 'Location', 'VarID', 'Variant',
       'Time', 'TPopulation1Jan', 'TPopulation1July', 'TPopulationMale1July',
       'TPopulationFemale1July', 'PopDensity', 'PopSexRatio', 'MedianAgePop',
       'NatChange', 'NatChangeRT', 'PopChange', 'PopGrowthRate',
       'DoublingTime', 'Births', 'Births1519', 'CBR', 'TFR', 'NRR', 'MAC',
       'SRB', 'Deaths', 'DeathsMale', 'DeathsFemale', 'CDR', 'LEx', 'LExMale',
       'LExFemale', 'LE15', 'LE15Male', 'LE15Female', 'LE65', 'LE65Male',
       'LE65Female', 'LE80', 'LE80Male', 'LE80Female', 'InfantDeaths', 'IMR',
       'LBsurvivingAge1', 'Under5Deaths', 'Q5', 'Q0040', 'Q0040Male',
       'Q0040Female', 'Q0060', 'Q0060Male', 'Q0060Female', 'Q1550',
       'Q1550Male', 'Q1550Female', 'Q1560', 'Q1560Male', 'Q1560Female',
       'NetMigrations', 'CNMR'],
      dtype='object')



Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.



KeyError: "['Pop0_14', 'Pop15_64', 'Pop65Plus'] not in index"

Chart 2. Line Chart for Median Age By Country
Let’s add in a second chart — let’s look at the median age by country over time. We’ll need to make an assumption here for our demonstration. Let’s consider the ‘65+’ age group as an indicator of median age, as it represents an older segment of the population.

Prompt to GPT-4: Using the same dataset as with the previous chart, let’s add a second chart. Please create a line chart that displays the percentage of the ‘65+’ age group over time for the selected country (using the same dropdown). Display this new chart on the same row as our initial chart. Do not revise the previous working code. Please include ALL the code in your response.

Response from GPT-4 (code snippet with new code):



In [24]:
import dash
from dash import html, dcc, Input, Output
import plotly.express as px
import pandas as pd

# Load your dataset
file_path = '/Users/ulrike_imac_air/projects/R_themes/dashboards/WPP2022.csv'
data = pd.read_csv(file_path)

# Ensure there are no leading or trailing spaces in column names
data.columns = data.columns.str.strip()

# Identify age group columns
# For this example, we'll use Q0040, Q0060, and Q1560 as proxies for '0-14', '15-64', '65+' respectively
# Adjust these column names based on actual data insights
age_group_columns = {
    '0-14': 'Q0040',
    '15-64': 'Q0060',
    '65+': 'Q1560'
}

# Filter the data to include only necessary columns for this visualization
columns_needed = ['Location', 'Time', 'MedianAgePop'] + list(age_group_columns.values())
data_filtered = data[columns_needed].dropna()

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    html.H1("Interactive Age Group Trends"),
    dcc.Dropdown(
        id='country-dropdown',
        options=[{'label': country, 'value': country} for country in data_filtered['Location'].unique()],
        value=data_filtered['Location'].unique()[0]  # Default to the first country
    ),
    html.Div([
        dcc.Graph(id='age-trend-graph'),
        dcc.Graph(id='median-age-graph')
    ], style={'display': 'flex', 'flex-direction': 'row'})
])

# Define callback to update graphs based on dropdown selection
@app.callback(
    [Output('age-trend-graph', 'figure'),
     Output('median-age-graph', 'figure')],
    [Input('country-dropdown', 'value')]
)
def update_figures(selected_country):
    # Filter the data based on the selected country
    filtered_data = data_filtered[data_filtered['Location'] == selected_country]
    
    # Create the stacked area chart using Plotly Express
    age_trend_fig = px.area(filtered_data, 
                            x='Time', 
                            y=list(age_group_columns.values()),
                            labels={'value': 'Percentage', 'variable': 'Age Group'},
                            title=f"Age Group Proportions Over Time: {selected_country}")
    
    # Create the line chart for median age
    median_age_fig = px.line(filtered_data,
                             x='Time',
                             y='MedianAgePop',
                             labels={'MedianAgePop': 'Median Age', 'Time': 'Year'},
                             title=f"Median Age Over Time: {selected_country}")
    
    return age_trend_fig, median_age_fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)



Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.



Chart 3. Pie Chart Age Distribution By Country, 2100
For our pie chart, as the results are static, we must decide on a specific year to generate the age distribution.

As with the area chart, we can use the 3 main age ranges to help visualize the age distribution for the chosen year.

With my example, I am choosing the year 2100 to view age projections.

Prompt to GPT-4: Using the same dataset AND using the code generated previously, please add a 3rd chart on the row below, a pie chart that shows the age distribution by country for the year 2100. Do not revise the previous working code. Please include ALL the code in your response.

Response from GPT-4 (code snippet with new code):

In [25]:
import dash
from dash import html, dcc, Input, Output
import plotly.express as px
import pandas as pd

# Load your dataset
file_path = '/Users/ulrike_imac_air/projects/R_themes/dashboards/WPP2022.csv'
data = pd.read_csv(file_path)

# Ensure there are no leading or trailing spaces in column names
data.columns = data.columns.str.strip()

# Identify age group columns
# For this example, we'll use Q0040, Q0060, and Q1560 as proxies for '0-14', '15-64', '65+' respectively
# Adjust these column names based on actual data insights
age_group_columns = {
    '0-14': 'Q0040',
    '15-64': 'Q0060',
    '65+': 'Q1560'
}

# Filter the data to include only necessary columns for this visualization
columns_needed = ['Location', 'Time', 'MedianAgePop'] + list(age_group_columns.values())
data_filtered = data[columns_needed].dropna()

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    html.H1("Interactive Age Group Trends"),
    dcc.Dropdown(
        id='country-dropdown',
        options=[{'label': country, 'value': country} for country in data_filtered['Location'].unique()],
        value=data_filtered['Location'].unique()[0]  # Default to the first country
    ),
    html.Div([
        dcc.Graph(id='age-trend-graph'),
        dcc.Graph(id='median-age-graph')
    ], style={'display': 'flex', 'flex-direction': 'row'}),
    html.Div([
        dcc.Graph(id='age-distribution-pie')
    ])
])

# Define callback to update graphs based on dropdown selection
@app.callback(
    [Output('age-trend-graph', 'figure'),
     Output('median-age-graph', 'figure'),
     Output('age-distribution-pie', 'figure')],
    [Input('country-dropdown', 'value')]
)
def update_figures(selected_country):
    # Filter the data based on the selected country
    filtered_data = data_filtered[data_filtered['Location'] == selected_country]
    
    # Create the stacked area chart using Plotly Express
    age_trend_fig = px.area(filtered_data, 
                            x='Time', 
                            y=list(age_group_columns.values()),
                            labels={'value': 'Percentage', 'variable': 'Age Group'},
                            title=f"Age Group Proportions Over Time: {selected_country}")
    
    # Create the line chart for median age
    median_age_fig = px.line(filtered_data,
                             x='Time',
                             y='MedianAgePop',
                             labels={'MedianAgePop': 'Median Age', 'Time': 'Year'},
                             title=f"Median Age Over Time: {selected_country}")
    
    # Filter the data for the year 2100
    data_2100 = data_filtered[(data_filtered['Location'] == selected_country) & (data_filtered['Time'] == 2100)]
    
    # Create the pie chart for MedianAgePop distribution in 2100
    # Since a pie chart for a single value does not make sense, showing MedianAgePop distribution among age groups
    age_distribution_fig = px.pie(data_2100,
                                  values='MedianAgePop',
                                  names=list(age_group_columns.keys()),
                                  title=f"Median Age Distribution in 2100: {selected_country}")

    return age_trend_fig, median_age_fig, age_distribution_fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)



Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 76, in update_figures(selected_country='World')
     72 data_2100 = data_filtered[(data_filtered['Location'] == selected_country) & (data_filtered['Time'] == 2100)]
     74 # Create the pie chart for MedianAgePop distribution in 2100
     75 # Since a pie chart for a single value does not make sense, showing MedianAgePop distribution among age groups
---> 76 age_distribution_fig = px.pie(data_2100,
        data_2100 =     Location  Time  MedianAgePop    Q0040    Q0060    Q1560
150    World  2100       42.3005  35.3659  89.7165  75.0603
        px = <module 'plotly.express' from '/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/plotly/express/__init__.py'>
        age_group_columns = {'0-14': 'Q0040', '15-64': 'Q0060', '65+': 'Q1560'}
        selected_country = 'World'
     77                               values=

Chart 4. Box Plot— Variability for Each Age Group
For our 4th chart, the box plot will show the distribution and variability of each age group’s percentage over time for the selected country.

Prompt to GPT-4: Using the same dataset AND using the code generated previously, please display a fourth chart (on the same row as the third chart) as a box plot chart that shows the distribution and variability of each age group’s percentage over time, by country. Do not revise the previous working code. Please include ALL the code in your response.

Response from GPT-4: