<table>
  <tr>
    <td>
      <img src="./pics/data-viz.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>Introduction to Data Visualization in Python</h1>
    </td>
  </tr>
</table>

## Why Visualize Data?

- Understand and communicate insights

- Identify patterns and trends

- Make data-driven decisions

In [1]:
%%capture
from IPython.display import IFrame, HTML

# Elements of a Good Graph

## Key Components

- **Title**: Clearly describes the graph's purpose

- **Axis Labels**: Indicate the data and units on each axis

- **Legend**: Identifies data series

- **Gridlines**: Optional, but can enhance readability

## Design Principles

- Clarity: Avoid clutter and distractions

- Accuracy: Ensure scales and labels reflect data accurately

- Simplicity: Highlight the main story without unnecessary elements

# Examples of Bad Data Visualizations

<table>
  <tr>
    <td>
      <img src="./pics/bad-3d-bar.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>A 3D bar chart gone wrong</h1>
    </td>
  </tr>
</table>

### Critique of 3D Bar Chart

- **3D Effects**: 
  - Distorts data, making it hard to compare bar heights accurately.
  - **Solution**: Use 2D to avoid distortion and make comparisons easier.

- **Overlapping Bars**: 
  - Bars overlap, increasing cognitive load to separate them visually.
  - **Solution**: Use side-by-side bars in 2D.

- **Color Choices**: 
  - Colors lack clear meaning, potentially distracting viewers.
  - **Solution**: Use colors with purpose or a monochromatic scheme.

- **Lack of Clear Labels**: 
  - Missing clear axis titles and legends for color groups.
  - **Solution**: Add descriptive axis labels and a legend if needed.

- **Complex Gridlines**: 
  - Cluttered, making it hard to focus on data.
  - **Solution**: Reduce gridlines or make them subtler.

- **Tilted Labels**: 
  - X-axis labels are hard to read.
  - **Solution**: Use horizontal or abbreviated labels.

- **Unclear Grouping**: 
  - No clear explanation of “Dose Quartile” and “Hematocrit Group.”
  - **Solution**: Clarify grouping with annotations or labels.

### Recommendation
Switch to a 2D grouped bar chart with clear labels, simplified colors, and reduced clutter for better readability.

<center><img src="./pics/bad-3d-bar-corrected.png" alt="data-viz.png" width="1000"/></center>

<table>
  <tr>
    <td>
      <img src="./pics/bad-pie-chart.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>A pie chart that should have been a bar chart</h1>
    </td>
  </tr>
</table>

### Critique of Donut Chart

- **Comparison Difficulty**: 
  - Circular format makes it hard to compare segment sizes accurately.
  - **Solution**: Use a bar chart for easier comparison.

- **Color Usage**:
  - Colors lack a clear scheme and some are too similar.
  - **Solution**: Use a consistent color scheme or colorblind-friendly palette.

- **Misleading Circular Format**:
  - Donut charts imply parts of a whole, which isn’t the case here.
  - **Solution**: Use a bar chart to display individual values.

- **Text Readability**:
  - Text size and contrast vary, making it hard to read.
  - **Solution**: Increase font size and improve contrast.

- **Lack of Order**:
  - No clear ordering of segments by value.
  - **Solution**: Order segments by descending value for clarity.

### Recommendation
Switch to a bar chart with consistent colors, clearer labels, and value ordering for better readability and comparison.

<table>
  <tr>
      <img src="./pics/bad-pie-chart-corrected.png" alt="Player Scores Chart" width="100%">
    <td width="40%">
      <h3>Critique & Improvements</h3>
      <ul style="font-size: 1em;">
        <li style="font-size: 1em;"><strong>Chart Type</strong>: Used a horizontal bar chart, which allows for easier comparison of player scores.</li>
        <li style="font-size: 1em;"><strong>Color Scheme</strong>: Selected a colorblind-friendly palette, ensuring accessibility.</li>
        <li style="font-size: 1em;"><strong>Ordering</strong>: Sorted bars in descending order, with the highest score at the top, making it easy to identify top performers.</li>
        <li style="font-size: 1em;"><strong>Readability</strong>: Displayed score labels at the end of each bar, enhancing readability without clutter.</li>
      </ul>
      <h3>Benefits</h3>
      <p style="font-size: 1em;">This design improves clarity, accessibility, and makes comparisons straightforward for viewers.</p>
    </td>
  </tr>
</table>

<table>
  <tr>
    <td>
      <img src="./pics/bad-line-chart.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>A continuous line chart used to show discrete data</h1>
    </td>
  </tr>
</table>

### Critique of Line Chart

- **Discrete Data Misrepresented as Continuous**:
  - Continuous lines imply trends between data points, which misrepresents annual, discrete data.
  - **Solution**: Use a step chart or discrete markers.

- **Line Style and Color**:
  - Dashed lines with low-contrast colors may confuse viewers, especially colorblind readers.
  - **Solution**: Use solid lines with a colorblind-friendly palette (e.g., blue and orange).

- **Lack of Data Labels**:
  - Difficult to determine exact values for each year.
  - **Solution**: Add data labels or grid lines for easier value estimation.

- **Biased Title**:
  - Title is styled as a slogan rather than a neutral description.
  - **Solution**: Use a factual title like “Market Share (%) of Britannia vs. Competitor Over Time.”

- **Excessive Background Styling**:
  - Decorative elements distract from the data.
  - **Solution**: Remove background elements for a cleaner look.

### Recommendation
Use a step chart or discrete markers, solid lines in high-contrast colors, a clear title, and minimalistic styling for clarity and accurate representation.

<table>
  <tr>
    <td width="50%">
      <img src="./pics/bad-line-chart-corrected.png" alt="data-viz.png" width="2000"/>
    </td>
    <td width="50%">
      <img src="./pics/bad-line-chart-better.png" alt="data-viz.png" width="2000"/>
    </td>
  </tr>
</table>

<table>
  <tr>
    <td>
      <img src="./pics/bad-geo-visual.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>A misleading geography visual</h1>
    </td>
  </tr>
</table>

### Critique of Map Visualization

- **Misleading Use of Map**:
  - Map implies geographic variation that isn’t present in national population percentages.
  - **Solution**: Use a stacked bar chart or line chart.

- **Ineffective Color Scheme**:
  - Bold colors may not be colorblind-friendly; "Asian" and "Other" are hard to distinguish.
  - **Solution**: Use a colorblind-friendly palette with clear contrasts.

- **Overwhelming Red Color**:
  - Strong red for "White" category draws excessive attention.
  - **Solution**: Choose a balanced color scheme to avoid emphasis on one group.

- **Cluttered Labels**:
  - Labels are hard to read, especially where they overlap.
  - **Solution**: Separate labels clearly or use a data table for percentages.

- **Lack of Context**:
  - Title is vague and doesn’t provide sufficient context for data.
  - **Solution**: Add a descriptive title and subtitle explaining data scope.

### Recommendation
Switch to a stacked bar chart or line chart with balanced colors, clear labels, and a descriptive title for a more accurate and readable presentation.

<center><img src="./pics/bad-geo-visual-corrected.png" alt="data-viz.png" width="80%"/></center>

<table>
  <tr>
    <td>
      <img src="./pics/bad-wtf.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>What is this?</h1>
    </td>
  </tr>
</table>

### Critique of MLS Salary Visualization

- **Overwhelming Colors**:
  - Too many colors make it visually overwhelming.
  - **Solution**: Simplify color scheme or group by role/salary range.

- **Inconsistent Layout**:
  - Bar lengths vary, making it hard to compare teams.
  - **Solution**: Use a consistent scale for bars.

- **Lack of Data Labels**:
  - Small blocks lack player details, making interpretation difficult.
  - **Solution**: Add labels or interactive tooltips for player name and salary.

- **Tiny Legend**:
  - Legend is small and hard to read.
  - **Solution**: Enlarge legend with clearer labels.

- **Excessive Text on Right**:
  - Title and text crowd the visualization area.
  - **Solution**: Move title and explanation above chart.

- **Hard to Compare Across Teams**:
  - Difficult to understand team salary distribution.
  - **Solution**: Group players by salary range or create a separate summary chart.

### Recommendation
Streamline the design with a simpler color scheme, clearer labels, consistent scales, and more space for data to enhance readability and comparison.

<table>
  <tr>
      <img src="./pics/bad-wtf-corrected.png" alt="Player Scores Chart" width="200%">
    <td width="20%">
      <h3>Critique & Improvements</h3>
      <ul style="font-size: 1em;">
        <li style="font-size: 1em;"><strong>Individual Player Representation</strong>: Each segment represents an individual player's salary, maintaining the original intent to display salaries within each team.</li>
        <li style="font-size: 1em;"><strong>Color Scheme</strong>: Used a limited, colorblind-friendly palette to distinguish players without overwhelming the viewer.</li>
        <li style="font-size: 1em;"><strong>Consistent Scale Across Teams</strong>: Each team’s bar has the same scale, making comparisons across teams easier.</li>
        <li style="font-size: 1em;"><strong>Simplified Legend</strong>: The legend is clear and concise, improving readability and focusing on essential information.</li>
      </ul>
      <h3>Benefits</h3>
      <p style="font-size: 1em;">This design enhances clarity, accessibility, and facilitates straightforward comparisons of player salaries across MLS teams.</p>
    </td>
  </tr>
</table>

<table>
  <tr>
    <td>
      <img src="./pics/bad-bubble-graph.png" alt="data-viz.png" width="2000"/>
    </td>
    <td>
      <h1>A bubble chart gone wrong</h1>
    </td>
  </tr>
</table>

# More bad viz

In [2]:
url ='https://viz.wtf/'
IFrame(url, width="100%", height="700")

# Bad viz or lying with graphs?

<table>
    <h1>Truncated Y-Axis</h1>
  <tr>
    <td width="50%">
      <img src="./pics/blog-misleading-y-axis.avif" alt="Lying yaxis Chart - source: https://www.heap.io/blog/how-to-lie-with-data-visualization" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-tax-rate-graph.avif" alt="Lying yaxis Chart 2" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-baseball-graph.avif" alt="Lying yaxis Chart 3" width="100%">
    </td>
  </tr>
</table>

<table>
    <h1>Cumulative graphs</h1>
  <tr>
    <td width="50%">
      <img src="./pics/blog-cumulative-annual-revenue.avif" alt="Cumulative Chart" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-annual-revenue-graph.avif" alt="Changes Chart" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-annual-revenue-graph.avif" alt="Changes Chart" width="100%">
    </td>
  </tr>
</table>

<table>
    <h1>Ignoring conventions</h1>
  <tr>
    <td width="50%">
      <img src="./pics/blog-misleading-presidential-results.avif" alt="Conventions Chart" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-gun-deaths-graph.avif" alt="Conventions Chart" width="100%">
    </td>
    <td width="50%">
      <img src="./pics/blog-misleading-gun-deaths-graph.avif" alt="Conventions Chart" width="100%">
    </td>
  </tr>
</table>

# Further Examples
- [How to Spot Visualization Lies](https://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/)
- [How charts lie](https://www.youtube.com/watch?v=oX74Nge8Wkw)
- [Lies, Distortions, and Misrepresentations in Visualization](https://www.youtube.com/watch?v=IFA-3uXEcb0)
- [How People Actually Lie With Charts](https://vdl.sci.utah.edu/blog/2023/04/17/misleading/)
- [VisLies](https://www.vislies.org/)

# Examples of Good Data Visualizations

In [3]:
IFrame("https://informationisbeautiful.net/", width="100%", height="700")

# NYTimes

In [4]:
IFrame("https://www.nytimes.com/newsgraphics/2014/01/05/poverty-map/index.html", width="100%", height="700")

# The Economist

In [5]:
IFrame("./pics/economist.jpg", width="100%", height="300")

# Tools and Advice for Data Visualization

- **[Data to Viz](https://www.data-to-viz.com/)**: A comprehensive guide that helps users choose the most effective chart type based on their dataset's structure. It provides examples, best practices, and explanations for a wide range of visualizations.

In [27]:
IFrame("https://www.data-to-viz.com/", width="100%", height="700")

- **[Data Viz Project](https://datavizproject.com/)**: A comprehensive collection of data visualization types, categorized and explained in detail. It offers inspiration and guidance on choosing the right visualization based on data type and intended message.

In [7]:
IFrame("https://datavizproject.com/", width="100%", height="700")

- **[Datawrapper](https://www.datawrapper.de/)**: A user-friendly online tool for creating professional charts, maps, and tables. Datawrapper is popular for data journalists and anyone looking to create clean, interactive visualizations without coding.

In [8]:
IFrame("https://www.datawrapper.de/", width="100%", height="700")

- **[Python Graph Gallery](https://python-graph-gallery.com/)**: A resource for creating visualizations in Python, with code examples and tutorials for various chart types, including bar charts, line charts, and advanced data visualizations.

In [9]:
IFrame("https://python-graph-gallery.com/", width="100%", height="700")

# Tools for Visualization in Python

- **[matplotlib](https://matplotlib.org/)**: The foundational Python library for creating static, animated, and interactive visualizations.
<iframe src="https://matplotlib.org/" width="100%" height="400"></iframe>

In [10]:
# @hidden_cell
# Display an iframe with the specified URL
IFrame("https://matplotlib.org/", width="100%", height="700")

- **[pandas plotting](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)**: The Pandas library includes simple plotting capabilities built on matplotlib, making it easy to create basic visualizations directly from DataFrames.

In [11]:
# Pandas Plotting
IFrame("https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#basic-plotting-plot", width="100%", height="700")

- **[Geopandas](https://geopandas.org/)**: Extends Pandas to work with geospatial data and provides easy-to-use tools for plotting maps and geospatial data.

In [12]:
# Geopandas
IFrame("https://geopandas.org/en/stable/getting_started/introduction.html#Making-maps", width="100%", height="700")

- **[seaborn](https://seaborn.pydata.org/)**: Built on top of matplotlib, seaborn provides a high-level interface for drawing attractive statistical graphics.

In [13]:
# Seaborn
IFrame("https://seaborn.pydata.org/", width="100%", height="700")

- **[geoplot](https://residentmario.github.io/geoplot/)**: A high-level Python geospatial plotting library, designed to work with geopandas and matplotlib. It simplifies the creation of thematic maps and allows for various types of geographic data visualization, such as choropleths, cartograms, and point maps.

In [14]:
# Geoplot
IFrame("https://residentmario.github.io/geoplot/", width="100%", height="700")

- **[Plotly](https://plotly.com/python/)**: A versatile library for creating interactive and web-based plots. Plotly Express, a part of Plotly, offers a simple syntax for creating complex visualizations quickly.

In [15]:
# Plotly
IFrame("https://plotly.com/python/", width="100%", height="700")

- **[lets-plot](https://lets-plot.org/)**: A plotting library developed by JetBrains that brings the ggplot2-like API to Python. It supports interactive and static visualizations and is particularly suited for data scientists familiar with the grammar of graphics approach.

In [16]:
# Lets-Plot
IFrame("https://lets-plot.org/", width="100%", height="700")

- **[plotnine](https://plotnine.readthedocs.io/)**: A Python library that replicates the ggplot2 syntax from R. Based on the grammar of graphics, plotnine makes it easy to create complex and layered visualizations, especially for users with an R or ggplot2 background.

In [17]:
# Plotnine
IFrame("https://plotnine.readthedocs.io/", width="100%", height="700")

- **[Bokeh](https://bokeh.org/)**: Designed for creating interactive visualizations that are ideal for web applications, Bokeh allows detailed control over both visual presentation and interactivity.

In [18]:
# Bokeh
IFrame("https://bokeh.org/", width="100%", height="700")

- **[Altair](https://altair-viz.github.io/)**: A declarative statistical visualization library based on the Vega and Vega-Lite visualization grammars, known for its simplicity and effectiveness for creating statistical plots.

In [19]:
# Altair
IFrame("https://altair-viz.github.io/", width="100%", height="700")

- **[Holoviews](https://holoviews.org/)**: A high-level library for interactive data exploration, built on top of Bokeh and Matplotlib. It allows users to create complex visualizations with minimal code.

In [20]:
# Holoviews
IFrame("https://holoviews.org/", width="100%", height="700")

- **[Dash](https://dash.plotly.com/)**: A framework by Plotly for building interactive web applications with Python. Dash is especially useful for creating interactive, data-driven web apps with complex visualizations.

In [21]:
# Dash
IFrame("https://dash.plotly.com/", width="100%", height="700")

# Creating a Simple Visualization in Python

In [22]:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]

# Plotting a basic line chart
plt.plot(x, y, marker='o')
plt.title("Sample Line Chart")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()

In [23]:
# Enhanced line chart with gray background and red line
import seaborn as sns
import matplotlib.pyplot as plt

# Set Seaborn style for a darker background
sns.set(style="darkgrid", context="talk")  # "darkgrid" for gray background
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")

# Create the plot with a red line and customized labels
plt.figure(figsize=(10, 6))
sns.lineplot(x=x, y=y, marker='o', color='red', linewidth=2.5)

# Customizing title and axis labels
plt.title("Sample Line Chart", fontsize=20, fontweight='bold')
plt.xlabel("X Axis", fontsize=16, fontweight='bold')
plt.ylabel("Y Axis", fontsize=16, fontweight='bold')

# Customize tick parameters to make them smaller relative to the axis labels
plt.tick_params(axis='both', labelsize=12)

# Adjust grid appearance with thicker white lines for visibility on a gray background
plt.grid(True, linestyle='-', linewidth=1.2, color='white', alpha=0.6)

# Show the plot
plt.show()

In [24]:
import plotly.graph_objects as go

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]

# Create the line chart using Plotly with adjusted aspect ratio
fig = go.Figure()

# Add trace for the line with markers
fig.add_trace(go.Scatter(
    x=x, y=y, mode='lines+markers', marker=dict(color='red', size=8), line=dict(color='red', width=2.5)
))

# Customize layout for a white background with gray grid lines and square aspect ratio
fig.update_layout(
    title=dict(text="Sample Line Chart", font=dict(size=20, family='Arial', color='black')),
    xaxis=dict(title="X Axis", titlefont=dict(size=16, color='black'), tickfont=dict(size=12, color='black')),
    yaxis=dict(title="Y Axis", titlefont=dict(size=16, color='black'), tickfont=dict(size=12, color='black')),
    plot_bgcolor='white',
    paper_bgcolor='white',
    xaxis_showgrid=True, 
    yaxis_showgrid=True,
    xaxis_gridcolor='lightgray',
    yaxis_gridcolor='lightgray',
    width=1170*.7,  # Set width and height to be equal for a square aspect ratio
    height=827*.7
)

# Display the plot
fig.show()

In [25]:
from lets_plot import *
import pandas as pd

# Initialize Lets-Plot for display
LetsPlot.setup_html()

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]
data = pd.DataFrame({'x': x, 'y': y})

# Set width and height based on the specified dimensions
width = int(1170 * 0.7)
height = int(827 * 0.7)

# Create the plot with labeled axes
p = (ggplot(data, aes(x='x', y='y'))
     + geom_line(color='red', size=1.5)
     + geom_point(color='red', size=3)
     + ggtitle("Sample Line Chart")
     + xlab("X Axis")   # Label for X-axis
     + ylab("Y Axis")   # Label for Y-axis
     + theme(
         plot_background=element_rect(fill='white'),       # Set overall background to white
         panel_background=element_rect(fill='white'),      # Background behind the plot area
         panel_grid_major=element_line(color='lightgray', size=0.7),  # Light gray grid lines
         plot_title=element_text(size=20, face='bold', color='black'), # Title styling
         axis_title_x=element_text(size=16, face='bold', color='black'), # X-axis label styling
         axis_title_y=element_text(size=16, face='bold', color='black'), # Y-axis label styling
         axis_text=element_text(size=12, color='black')    # Axis tick labels
     )
     + ggsize(width, height)  # Adjust the size of the plot
)

# Display the plot
p.show()

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=bf601dcc-fd47-4740-9d61-f01d1c864e97' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>