# Tutorial 9: Communicating Data Science Insights

## Objectives

After this tutorial you will be able to:

*   Build informative and interactive charts with Plotly.
*   Build dynamic interactive dashboards with Plotly and Dash.
*   Share insights effectively to enable making data-driven decisions.

<h2>Table of Contents</h2>

<ol>
    <li>
        <a href="#import">Import dataset</a>
    </li>
    <br>
    <li>
        <a href="#plotly">Introduction to Plotly Express</a>
    </li>
    <br>
    <li>
        <a href="#html">HTML & CSS Primer</a>
    </li>
    <br>
    <li>
        <a href="#dash">Introduction to Dash</a>
    </li>
    <br>
    <li>
        <a href="#report">Project Report</a>
    </li>
    <br>
</ol>


<hr id="import">

<h2>1. Import the dataset</h2>

Import the `Pandas` library

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px

Read the data from `csv` into a `Pandas DataFrame`

In [2]:
df = pd.read_csv('CO2_Emissions_Canada.csv')
df.head()

Unnamed: 0,Make,Model,Vehicle Class,Engine Size [L],Cylinders,Transmission,Fuel Type,Fuel Consumption City [L/100 km],Fuel Consumption Hwy [L/100 km],Fuel Consumption Comb [L/100 km],Fuel Consumption Comb [mpg],CO2 Emissions [g/km]
0,ACURA,ILX,COMPACT,2.0,4,AS5,Z,9.9,6.7,8.5,33,196
1,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6.0,5.8,5.9,48,136
3,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
4,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244


Get information about the columns of the `DataFrame`

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7385 entries, 0 to 7384
Data columns (total 12 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Make                              7385 non-null   object 
 1   Model                             7385 non-null   object 
 2   Vehicle Class                     7385 non-null   object 
 3   Engine Size [L]                   7385 non-null   float64
 4   Cylinders                         7385 non-null   int64  
 5   Transmission                      7385 non-null   object 
 6   Fuel Type                         7385 non-null   object 
 7   Fuel Consumption City [L/100 km]  7385 non-null   float64
 8   Fuel Consumption Hwy [L/100 km]   7385 non-null   float64
 9   Fuel Consumption Comb [L/100 km]  7385 non-null   float64
 10  Fuel Consumption Comb [mpg]       7385 non-null   int64  
 11  CO2 Emissions [g/km]              7385 non-null   int64  
dtypes: flo

In [4]:
df.describe()

Unnamed: 0,Engine Size [L],Cylinders,Fuel Consumption City [L/100 km],Fuel Consumption Hwy [L/100 km],Fuel Consumption Comb [L/100 km],Fuel Consumption Comb [mpg],CO2 Emissions [g/km]
count,7385.0,7385.0,7385.0,7385.0,7385.0,7385.0,7385.0
mean,3.160068,5.61503,12.556534,9.041706,10.975071,27.481652,250.584699
std,1.35417,1.828307,3.500274,2.224456,2.892506,7.231879,58.512679
min,0.9,3.0,4.2,4.0,4.1,11.0,96.0
25%,2.0,4.0,10.1,7.5,8.9,22.0,208.0
50%,3.0,6.0,12.1,8.7,10.6,27.0,246.0
75%,3.7,6.0,14.6,10.2,12.6,32.0,288.0
max,8.4,16.0,30.6,20.6,26.1,69.0,522.0


<hr id="plotly">

<h2>2. Introduction to Plotly</h2>

Plotly is a data visualization library for Python that bridges the gap between static charts and interactive dashboards.  
It boasts a user-friendly API for quick exploration and a rich set of interactive features like hover text, zooming, and selections.  
While it might not offer the same level of customization as Matplotlib, Plotly excels in web-based applications and interactive storytelling, making it ideal for those who want to share their data insights beyond static images.  

In [5]:
# Scatter plot
fig = px.scatter(df, x='Engine Size [L]', y='CO2 Emissions [g/km]', color='Fuel Type')
fig.show()


In [6]:
# add trendline
fig = px.scatter(df, x='Engine Size [L]', y='CO2 Emissions [g/km]', trendline='ols', trendline_color_override='red')
fig.show()

In [7]:
# Bar plot
fig = px.bar(df, x='Fuel Type', y='CO2 Emissions [g/km]')
fig.show()


In [8]:
# Box plot
fig = px.box(df, x='Fuel Type', y='CO2 Emissions [g/km]')
fig.show()


In [9]:
# Histogram
fig = px.histogram(df, x='CO2 Emissions [g/km]')
fig.show()

In [10]:
# Pie chart
fig = px.pie(df, names='Fuel Type')
fig.show()


In [11]:
# 3d scatter plot
fig = px.scatter_3d(df, x='Engine Size [L]', y='Fuel Consumption Comb [L/100 km]', z='CO2 Emissions [g/km]', color='Fuel Type')
fig.update_traces(marker=dict(size=3))
fig.show()

In [12]:
# add slider
fig = px.scatter_3d(df, x='Engine Size [L]', y='Fuel Consumption Comb [L/100 km]', z='CO2 Emissions [g/km]', animation_frame='Fuel Type')
fig.show()

<hr id="html">

<h2>3. HTML & CSS Primer</h2>

<h4>HTML</h4>

HyperText Markup Language (HTML) is the backbone of web pages. It uses tags to structure and define website content, like headings, paragraphs, images, and links. These tags tell browsers how to display the information, allowing you to build the visual layout and interactive elements we experience online.

Open a blank file in notepad or your IDE of choice and enter the following code, then save the file as `.html` extension and open it in your browser to view the web page.

```
<!DOCTYPE html>
<html>
    <head>
        <title>Page Title</title>
    </head>

    <body>
        <h1>This is a Heading</h1>
        <p>This is a paragraph.</p>
        <div>This is a division/container.</div>
    </body>
</html>
```

<i>You can find a great HTML tutorial here <a href="https://www.w3schools.com/html/default.asp">W3Schools: HTML</a></i>

<h4>CSS</h4>

Cascading Style Sheets (CSS) is a declarative scripting language for applying visual presentation to HTML documents. Acting as the stylist for HTML's structural content, CSS defines the visual appearance of webpage elements like font styles, colors, background images, borders, and layout properties.

Adjust the above HTML code to add some styling.

```
<!DOCTYPE html>
<html>
    <head>
        <title>Page Title</title>
    </head>

    <body>
        <h1 style="text-align: center;">This is a Heading</h1>
        <p style="color: red;">This is a paragraph.</p>
        <div style="background-color: lightblue; height: 500px; padding: 10px;">This is a division/container.</div>
    </body>
</html>
```

<i>You can find a great CSS tutorial here <a href="https://www.w3schools.com/css/default.asp">W3Schools: CSS</a></i>

<hr id="dash">

<h2>4. Introduction to Dash</h2>

Dash is a Python framework that empowers you to build interactive web dashboards using the powerful data visualization capabilities of Plotly.

Dash empowers you to:
-   **Transform static data into dynamic stories:** Engage your audience with interactive visualizations that react to their curiosity.
-   **Create self-service dashboards:** Let users explore your data on their own terms, asking questions and discovering hidden trends.
-   **Share insights beyond images:** Present your data as interactive web applications, accessible to anyone with a browser.
-   **Bridge the gap between data and action:** Inspire informed decisions and actions by making your data truly accessible and engaging.

To create a `dash` app, we create a `python` file and include the main following steps:
1. import the required modules
2. create/initialize a dash app
3. load and process (if needed) data
4. create app layout (HTML & CSS go here)
5. add necessary callbacks for interactivity
6. run the dash app when the python file is executed
7. save the file and execute the file using "`python file_name.py`"

For more details, visit <a href="https://dash.plotly.com/minimal-app">Dash</a>

In [None]:
# STEP 1: import libraries
import pandas as pd
from plotly import express as px
from dash import Dash, dcc, html
from dash.dependencies import Input, Output


# STEP 2: create a Dash app
app = Dash(__name__)


# STEP 3: load & process data
df = pd.read_csv("CO2_Emissions_Canada.csv")

# create a list of numerical features
features_num = df.select_dtypes(include=['int64', 'float64']).columns


# STEP 4: create a Dash layout that contains a Dropdown component
# and a Plotly graph
app.layout = html.Div([
    html.H1('CO2 Emissions in Canada'),
    html.Div([
        dcc.Dropdown(
            id='feature-num',
            options=[{'label': feature, 'value': feature} for feature in features_num],
            value='Engine Size [L]'
        ),
        dcc.Graph(id='plot-num')
    ])
])


# STEP 5: add a callback to update the graph
@app.callback(
    Output(component_id='plot-num', component_property='figure'),
    [
        Input(component_id='feature-num', component_property='value')
    ]
)
def update_graph(selected_feature):
    fig = px.scatter(df, x=selected_feature, y='CO2 Emissions [g/km]')
    return fig


# STEP 6: run the Dash app
if __name__ == '__main__':
    app.run_server(debug=True)

Let's apply some styling and add a second graph for categorical parameters

In [None]:
# import libraries
import pandas as pd
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
from plotly import express as px

# create a Dash app
app = Dash(__name__)

# load data
data = pd.read_csv("CO2_Emissions_Canada.csv")

# create a list of numerical features
features_num = data.select_dtypes(include=['int64', 'float64']).columns

# create a list of categorical features
features_cat = data.select_dtypes(include=['object']).columns


# create a Dash layout that contains a Dropdown component
html_title = html.H1("CO2 Emissions in Canada", style={
    'text-align': 'center', 
    'margin': '10px',
    'padding': '20px', 
    'background-color': 'white',
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
})

html_card_num = html.Div([
    html.H3("Numerical Features"),
    dcc.Dropdown(
        id='feature-num',
        options=[{'label': i, 'value': i} for i in features_num],
        value='Engine Size [L]'
    ),
    dcc.Graph(id='plot-num')
], style={
    'background-color': 'white', 
    'padding': '10px', 
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
    'flex-basis': '50%',
})

html_card_cat = html.Div([
    html.H3("Categorical Features"),
    dcc.Dropdown(
        id='feature-cat',
        options=[{'label': i, 'value': i} for i in features_cat],
        value='Fuel Type'
    ),
    dcc.Graph(id='plot-cat')
], style={
    'background-color': 'white', 
    'padding': '10px', 
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
    'flex-basis': '50%',
})

# and a Plotly graph
app.layout = html.Div([
    # title
    html_title,

    # graphs container
    html.Div([
        # numerical features CARD
        html_card_num,

        # categorical features CARD
        html_card_cat,
    ], style={
        'padding': '10px',
        'display': 'flex',
        'justify-content': 'space-between',
        'align-items': 'stretch',
        'gap': '20px',
    }),
])


# add a callback to update the NUM graph
@app.callback(
    Output(component_id='plot-num', component_property='figure'),
    [Input(component_id='feature-num', component_property='value')]
)
def update_graph_num(selected_feature):
    fig = px.scatter(data, x=selected_feature, y='CO2 Emissions [g/km]')
    return fig



# add a callback to update the CAT graph
@app.callback(
    Output(component_id='plot-cat', component_property='figure'),
    [Input(component_id='feature-cat', component_property='value')]
)
def update_graph_cat(selected_feature):
    fig = px.box(data, x=selected_feature, y='CO2 Emissions [g/km]')
    return fig


# run the Dash app
if __name__ == '__main__':
    app.run_server(debug=True)

We Can also load the final model and use it for predictions in the dashboard using `joblib`

In [None]:
# import libraries
import pandas as pd
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
from plotly import express as px
import joblib

# create a Dash app
app = Dash(__name__)

# load data
data = pd.read_csv("CO2_Emissions_Canada.csv")

# load trained model
model = joblib.load('tree.joblib')

# create a list of numerical features
features_num = data.select_dtypes(include=['int64', 'float64']).columns

# create a list of categorical features
features_cat = data.select_dtypes(include=['object']).columns

# create a list of fuel types
fuel_types = [
    {'label': 'Diesel', 'value': 'D'}, 
    {'label': 'Ethanol', 'value': 'E'}, 
    {'label': 'Gasoline', 'value': 'X'}, 
    {'label': 'Premium Gasoline', 'value': 'Z'}
]


# create Dash layout and components
html_title = html.H1("CO2 Emissions in Canada", style={
    'text-align': 'center',
    'margin': '10px',
    'padding': '20px',
    'background-color': 'white',
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
})

html_card_num = html.Div([
    html.H3("Numerical Features"),
    dcc.Dropdown(
        id='feature-num',
        options=[{'label': i, 'value': i} for i in features_num],
        value='Engine Size [L]'
    ),
    dcc.Graph(id='plot-num')
], style={
    'background-color': 'white', 
    'padding': '10px', 
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
    'flex-basis': '50%',
})

html_card_cat = html.Div([
    html.H3("Categorical Features"),
    dcc.Dropdown(
        id='feature-cat',
        options=[{'label': i, 'value': i} for i in features_cat],
        value='Fuel Type'
    ),
    dcc.Graph(id='plot-cat')
], style={
    'background-color': 'white', 
    'padding': '10px', 
    'box-shadow': '0px 0px 5px 5px lightgrey',
    'border-radius': '10px',
    'flex-basis': '50%',
})

html_input_form = html.Div([
    html.H3("Predict CO2 Emissions"),
    html.Div([
        html.H5("Fuel Type"),
        dcc.Dropdown(
            id='fuel-type',
            options=fuel_types,
            value='Z'
        )
    ]),
    html.Div([
        html.H5("Engine Size [L]"),
        dcc.Input(id='engine-size', value=2.0, type='number', style={
            'width': 'calc(100% - 20px)',
            'padding': '10px',
            'border': '1px solid #cccccc',
            'outline': 'none',
            'border-radius': '4px',
        })
    ]),
    html.Div([
        html.H5("Fuel Consumption Comb [L/100 km]"),
        dcc.Input(id='fuel-consumption-comb', value=10, type='number', style={
            'width': 'calc(100% - 20px)',
            'padding': '10px',
            'border': '1px solid #cccccc',
            'outline': 'none',
            'border-radius': '4px',
        })
    ]),
], style={
    'padding': '10px',
    'margin-top': '20px',
    'flex-basis': '50%',
})

html_prediction = html.Div([
    html.Div([
        html.P("Prediction", style={'font-size': '20px'}),
        html.P(id='prediction', style={
            'font-size': '50px',
            'font-weight': 'bold',
            'text-align': 'center',
            'width': '100%',
            'color': '#636efa',
        })
    ], style={
        'padding': '20px',
        'font-size': '30px',
        'background-color': '#e5ecf6',
        'border-radius': '5px',
    })
], style={
    'display': 'flex',
    'justify-content': 'center',
    'align-items': 'center',
    'padding': '10px',
    'margin-top': '20px',
    'flex-basis': '50%',
})


# add components to the app layout
app.layout = html.Div([
    # title
    html_title,

    # graphs container
    html.Div([
        # numerical features CARD
        html_card_num,

        # categorical features CARD
        html_card_cat,
    ], style={
        'padding': '10px',
        'display': 'flex',
        'justify-content': 'space-between',
        'align-items': 'stretch',
        'gap': '20px',
    }),

    # prediction container
    html.Div([
        # input form
        html_input_form,

        # prediction
        html_prediction,
    ], style={
        'margin': '10px',
        'padding': '10px',
        'border-radius': '10px',
        'display': 'flex',
        'justify-content': 'space-between',
        'align-items': 'stretch',
        'box-shadow': '0px 0px 5px 5px lightgrey',
        'gap': '20px',
    }),
])


# add a callback to update the NUM graph
@app.callback(
    Output(component_id='plot-num', component_property='figure'),
    [Input(component_id='feature-num', component_property='value')]
)
def update_graph_num(selected_feature):
    fig = px.scatter(data, x=selected_feature, y='CO2 Emissions [g/km]')
    return fig



# add a callback to update the CAT graph
@app.callback(
    Output(component_id='plot-cat', component_property='figure'),
    [Input(component_id='feature-cat', component_property='value')]
)
def update_graph_cat(selected_feature):
    fig = px.box(data, x=selected_feature, y='CO2 Emissions [g/km]')
    return fig


# add a callback to update the prediction
@app.callback(
    Output(component_id='prediction', component_property='children'),
    [Input(component_id='fuel-type', component_property='value'),
     Input(component_id='engine-size', component_property='value'),
     Input(component_id='fuel-consumption-comb', component_property='value')]
)
def update_prediction(fuel_type, engine_size, fuel_consumption_comb):
    df = pd.DataFrame({
        'Engine Size [L]': [engine_size],
        'Fuel Consumption Comb [L/100 km]': [fuel_consumption_comb],
        'Fuel Type_D': [1 if fuel_type == 'D' else 0],
        'Fuel Type_E': [1 if fuel_type == 'E' else 0],
        'Fuel Type_X': [1 if fuel_type == 'X' else 0],
        'Fuel Type_Z': [1 if fuel_type == 'Z' else 0],
    })
    pred = model.predict(df)[0]
    return f'{pred:.2f} g/km'


# run the Dash app
if __name__ == '__main__':
    app.run_server(debug=True)

<h3>Other Dashboarding and Business Intelligence Tools</h3>

<ul>
    <li><a href="https://powerbi.microsoft.com/en-us/desktop/">Microsoft Power BI</a></li>
    <br>
    <li><a href="https://lookerstudio.google.com/">Google Looker Studio</a></li>
    <br>
    <li><a href="https://www.tableau.com/">Tableau</a></li>
</ul>

<hr id="report">

<h2>5. Project Report</h2>

The findings of the data science project can be presented in a report format for written reporting or presentation purposes.  
  
The report should contain the following sections:
1. **Title Page**  
   *the title, author name, and date*  

2. **Outline (table of contents)**  
   *the different sections of the report (with page numbers for printed report)*  

3. **Executive Summary**  
   *a summary/overview of the problem, methodology, findings, and conclusions*  

4. **Introduction**  
   *problem statement and background*  

5. **Methodology**  
   *description of the different data science project steps (data collection, cleaning, exploration, different models tested, etc.)*  

6. **Results**  
   *the findings with visualization charts, etc.*  

7. **Discussion**  
   *analysis of the findings*  

8. **Conclusion**  
   *drawn conclusions based on the findings*  
   
9. **Appendix**  
   *any supporting data, charts, etc. that were not used in the report but could be useful to review (if any)*


<hr style="margin-top: 4rem;">
<h2>Author</h2>

<a href="https://github.com/SamerHany">Samer Hany</a>

<h2>References</h2>
<a href="https://dash.plotly.com/minimal-app">Dash Tutorial</a>
<br>
<a href="https://plotly.com/examples/">Dash Examples</a>
<br>
<a href="https://www.kaggle.com/datasets/mrmorj/car-fuel-emissions">CO2 emissions dataset (kaggle.com)</a>