# Introduction

## Choose widely

- **Identify the three KWYs**: Learn to pinpoint the key ideas you're trying to communicate with your data.
- **Chart Categories**: Understand the seven types of chart categories and when to use each.
- **Alternatives to Standard Charts**: Discover how to determine when to use alternatives to standard bar charts.
- **Advantages and Disadvantages of Donut Charts**: Analyze the pros and cons of using donut charts for data display.
- **Effective Charts for Correlation**: Identify the most effective charts to show the correlation between variables.
- **Cumulative Charts**: Learn when it is most effective to use cumulative charts.


## What you should know

- **No Specific Skills Required**: You don't need any specific technical skills or deep data analysis knowledge to take this course.
- **Focus on Communication**: The course is designed for people who need to communicate data effectively, not just for data scientists or statisticians.
- **Benefit for Data-Centric Roles**: Even if you are in a data-centric role, you'll gain insights on thinking like a communicator when choosing visualizations.


# 1. Getting to KWYRWTS

## What are KWYs?

- **KWYDIS**: Know what your data is saying. Understand the facts and the message your data conveys.
- **KWYANTH**: Know what your audience needs to hear. Tailor your communication to the audience's knowledge level and needs.
- **KWYRWTS**: Know what you really want to say. Focus on the most important point you want to communicate to avoid confusing or boring your audience.

## Getting to KWYs

**KWYs**: These are the key things you need to focus on when creating a data visualization. They include:

- **What your data is saying**: Understand the main message your data is trying to communicate.
- **What your audience needs to hear**: Think about what your audience needs to know and how they understand information.
- **What you really want to say**: Focus on the most important point you want to make.

**Prioritize**: You can't show everything in one chart. Focus on the most important information.

**Ask Questions**: Pretend you know nothing and ask basic questions like "Why are we showing this data?" and "Who is this for?" This helps you narrow down what's important.

**Refine Your Focus**: Keep asking questions until you get to the core message you need to communicate.

**Design Impact**: Your KWYs will guide not just your chart choice but also how you design and label it to make the message clear.



# 2. The Standards

## Chart categories

- Chart Categories: The video categorizes charts into several primary types: Comparisons, Trends, Proportions, Relationships, Distribution, Deviation, and Geographic.

- Purpose of Each Category:
  - Comparisons: Used to compare full values or rankings.
  - Trends: Show changes over time.
  - Proportions: Display relative market share or part-to-whole values.
  - Relationships: Express correlations or connections between values.
  - Distribution: Show frequency and circumstances of values in a dataset.
  - Deviation: Focus on variation from a starting point.
  - Geographic: Represent data with a spatial or geographic component.
- Translation: Chart picking often involves translating terms like "share of" or "segmentation" into appropriate chart categories to find the right visual representation.

## Comparisons: Bars and columns

- Popularity of Bar Charts: Bar charts are one of the oldest and most commonly used chart types, dating back over 230 years. They became especially popular with the rise of visual data in general publications in the 1980s.
- Effectiveness: Bar charts are effective because humans are naturally good at comparing the areas of rectangles, making it easy to compare values visually.
- Comparison Types: Bar charts are ideal for comparing specific values across different categories, not just rankings. They help in understanding both the rank and the actual values of the data.


## Comparisons: Beyond bars

- **Lollipop Chart**: Similar to a bar chart but with a circle at the end of a line. It reduces visual clutter and highlights the actual values.
- **Dot Plot**: Consists of dots without lines, useful for showing clusters and trends when dots are aligned.
- **Bubble Chart**: Uses circles instead of bars. Visually appealing but can be challenging for accurate size comparison.
- **Isotypes**: Uses icons to represent data. Effective for whole numbers but should be used sparingly.
- **Grouped Bars**: Multiple bars grouped together to compare several variables. Effective but can become confusing with too many variables.
- **Small Multiples**: A series of small charts placed side by side, ideal for comparing complex data across categories or variables.
- **Bullet Chart**: Provides detailed information on a single variable, showing ranges like good, better, best, and comparisons to past performance.

## Trends: Line charts

- **Line Charts:** These are used to show how data changes over time. Imagine plotting points on a graph and connecting them with a line to see the trend.
- **Trends vs. Snapshots:** If you want to show how something changes over time (like sales going up or down), a line chart is great. If you want to show specific values at certain times (like sales on January 1st each year), a bar chart might be better.
- **Multiple Lines:** You can have more than one line on a chart to compare different trends. Even if there are many lines, you can still make sense of the overall trend.
- **Sparklines:** These are tiny line charts that show trends without detailed scales. They are useful for a quick view of trends.
- **Choosing the Right Chart:** Use a line chart to show overall changes over time. Use bar charts for specific values at different times.

## Trends: Beyond the line

- **Slopegraph:** This is a simple line chart with only two data points, showing the start and end values. It's great for highlighting overall change over a period.
- **Bump Chart:** This shows changes in rank over time. It doesn't show the actual values, just the rank positions (e.g., moving from 1st to 3rd place).
- **Fan Chart:** Used for showing real data up to a point and then different future scenarios (best case, worst case, average).
- **Area Chart:** Similar to a line chart but fills the area below the line. It emphasizes the total shape and can show cumulative data.
- **Streamgraph:** This shows changes over time and can also indicate distribution or proportions. It can be harder to read but is effective with good labeling.
- **X and Y Axis:** Time is usually on the X-axis (horizontal) because it's a standard practice. The cause (independent variable) goes on the X-axis, and the effect (dependent variable) goes on the Y-axis (vertical).

## Proportions: Pie charts and more

- **Pie Charts:** They are commonly used to show proportions but have limitations. While that are visually  appealing, humans struggle to accurately judge the size of pie slices, especially when comparing them.
- **Rectangles vs. Circles:** Rectangular shapes are easier for people to compare accurately than circular shapes. If precision is needed, consider using bar charts instead of pie charts.
- **Donut Charts:** These are similar to pie charts but come with additional challenges. The outer segments of donut charts can appear smaller due to optical illusions, making them less reliable for accurate comparisons.

## Proportions: Beyond the circle

- **Stacked Bars and Columns:** Imagine a bar divided into segments to show different parts of a whole. If you stack several bars together, you can see both the total values and the parts within each bar.
- **100% Stacked Bars:** Instead of showing actual values, these bars show percentages. Each bar is the same length, representing 100%, and the segments within show the proportion of each part.
- **Stacked Area Charts:** Similar to stacked bars but used over time. They show how different parts of a whole change over time.
- **Treemaps:** Think of a treemap as a rectangular pie chart. It uses rectangles to show proportions, and we are good at comparing the areas of rectangles.
- **Marimekko Charts:** These show two dimensions of data at once. Each column is like a 100% stacked bar, and the width of the columns shows the actual values.
- **Isotypes or Pictograms:** These use images or icons to represent data. For example, if you're showing the proportion of dogs with fur vs. hair, you could use dog icons to make it visually engaging.


## Relationships: Correlation

- **Correlation:** This shows how two things are related. For example, as you climb higher on a mountain, the temperature drops. This is a negative correlation.
- **Line Chart:** If you have one set of data points (like temperature at different altitudes), you can connect them with a line to show the trend.
- **Scatter Plot:** When you have many data points (like temperatures from different mountains), you use dots to show each point. Adding a trend line helps show the overall pattern.
- **Heat Map:** This uses colors to show data. For example, a grid with colors can show temperature changes at different altitudes.
- **Bubble Chart:** If you have three variables (like altitude, temperature, and oxygen), you can use a scatter plot where the size of the dots represents the third variable.
- **Radar Chart:** This shows multiple variables in a circular format but can be hard to read with lots of data.
- **Parallel Coordinates:** This lays out multiple variables side by side, making it easier to see how they relate.

## Relationships: Hierarchical and network

Hierarchical Relationships
- **Org Chart (Tree Diagram):** Think of this like a family tree. It shows who reports to whom in a company. It's easy to understand because it's a classic way to show hierarchy.
- **Circle Packing:** Imagine a bunch of circles inside a big circle. The big circle is like a parent, and the smaller circles inside are the children. This method saves space but can be harder to read because there's less room for labels.
- **Sunburst:** Picture a pie chart, but with layers. Each layer represents a level in the hierarchy, and the size of each slice shows its importance. It looks nice and can show proportions well, but might take some time to understand.

Network Relationships
- **Node-Link Diagram (Hairball):** Visualize a bunch of dots (nodes) connected by lines (links). It's like a messy web showing how things are connected. It can be confusing but sometimes it's the only way to show complex relationships.
- **Hive Plot:** An alternative to the hairball, it organizes the data in a cleaner way, making it easier to see patterns.

## Relationshios: Flow

Visualizing Flow Relationships
1. **Flow Diagrams:**
   - **Definition:** These diagrams show the progression from one point to another, often indicating a time-based or cause-and-effect relationship.
   - **Typical Visuals:** Nodes (decision points or results) connected by lines showing the flow.
2. **Arc Diagrams:**
   - **Definition:** These diagrams depict connections between nodes over time, such as repeated phrases in a book.
   - **Typical Visuals:** Arcs connecting nodes on a timeline.
3. **Gantt Charts:**
   - **Definition:** Standard in project management to show tasks and timelines.
   - **Typical Visuals:** Overlapping rectangles on a shared X-axis with categories on the Y-axis.
4. **Waterfall Charts:**
   - **Definition:** Show cumulative effects, such as income and expenses leading to profit.
    - **Typical Visuals:** Stacked bar chart with separated objects to show cumulative effects, including negative values.
5. **Sankey Diagrams:**
   - **Definition:** Communicate flows and the proportional size of those flows from one point to another.
   - **Typical Visuals:** Nodes with links of varying thickness to show the volume of flow.
6. **Key Considerations:**
   - **Purpose:** Determine if you are emphasizing correlation, hierarchy, or flow.
   - **Proportionality:** Decide if proportionality needs to be included.
   - **Focus:** Identify whether the emphasis is on the nodes or the links between them.
   - **Volume vs. Direction:** Consider if the focus is on the volume of connections or the direction of those connections.

**Practical Application**
In your field of data science, visualizing flow can be crucial for understanding processes like data pipelines, machine learning workflows, or even user journey maps. For example, a Sankey Diagram could help illustrate how data flows through different stages of a machine learning model, showing the volume of data at each stage.


## Distribution: Histograms

### What is a Distribution Visualization?
- **Purpose:** To show how data is spread out or distributed across different values.
- **Example:** Think of a bell curve that shows how test scores are spread out among students.

### What is a Histogram?
- **Definition:** A chart that uses bars (called bins) to show the number of data points that fall within certain ranges.
- **Example:** For test scores from 0 to 100, a histogram might have bins for scores 0-10, 11-20, etc., showing how many students scored within each range.

### Key Points
#### Bell Curve vs. Histogram
- **Bell Curve:** A smooth, curved line that shows the overall distribution of data.
- **Histogram:** Uses bars to show specific counts of data points within each range.

#### Why Use a Histogram?
- To see how data is spread out.
- To identify patterns, like most students scoring in the middle range.

#### Difference from a Column Chart
- **Histogram:** Bars are close together to show continuous data.
- **Column Chart:** Bars are separate to show distinct categories.

### Simplified Example
- **Bell Curve:** Imagine a hill where most people are in the middle, and fewer people are at the top or bottom.
- **Histogram:** Think of a bar graph where each bar shows how many people are in each part of the hill.


## Distribution: Beyond histograms

### Different Ways to Show Data Distribution

#### Violin Plot
- **What it is:** Imagine a bell curve that's mirrored and filled in, showing the distribution of data on both sides of a central line.
- **Why use it:** It shows detailed distribution within each category, like a histogram within a histogram.

#### Box Plot (Box and Whiskers)
- **What it is:** A box shows the middle 50% of data, with lines (whiskers) extending to show the range.
- **Why use it:** It highlights the spread and concentration of data, making it easy to see medians and outliers.

#### Dot Plot
- **What it is:** Dots represent individual data points, often stacked to show frequency.
- **Why use it:** It shows every data point, making it clear how data is distributed.

#### Swarm Plot
- **What it is:** Similar to a dot plot, but dots are spread out to avoid overlap.
- **Why use it:** It makes each data point visible without crowding.


#### Stem and Leaf Plot
- **What it is:** Numbers are used instead of dots or bars, showing individual values within categories.
- **Why use it:**  It combines the detail of a dot plot with the structure of a histogram.

### Key Takeaway
Each of these charts offers a unique way to visualize data distribution, helping you choose the best one based on your data and the message you want to communicate.


## Deviation
Let's simplify the concepts from the "Deviation" video:

### Understanding Deviation Charts

1. **Purpose:**
- **Deviation Charts:** These charts show how much values change from a norm, average, or target.

2. **Types of Deviation Charts:**
- **Diverging Bar Chart:** Focuses on the change between periods (e.g., quarterly sales). Positive and negative changes are highlighted.
- **Slope Graph:** Shows changes over time with lines indicating increases or decreases.
- **Indexed Bar Chart:** Compares performance against a target, making it easy to see how values deviate from the goal.

3. **Key Points:**
- **Emphasis on Change:** Deviation charts highlight changes rather than absolute values.
- **Labeling:** Clear labels are crucial to ensure the audience understands the story the chart is telling.
- **Avoiding Misleading Charts:** Always start the y-axis at zero to avoid misleading the audience.

**Simplified Example:**
- **Diverging Bar Chart:** Imagine you have sales data for two years. Instead of showing total sales, you show how much sales increased or decreased each quarter.
- **Slope Graph:** Think of a line chart where the slope of the line shows whether sales are going up or down over time.
- **Indexed Bar Chart:** Picture a bar chart where the center line is the sales target, and bars show how much sales are above or below that target.

**Key Takeaway:**
- Deviation charts are great for highlighting changes and differences from a norm or target, making it easier to understand trends and performance.



## Geographic

- **Maps as Effective Tools:** Maps are powerful for visualizing geographic data due to their strong cultural connection and ease of interpretation.
- **Types of Maps:**
  - **Point Map:** Simple markers for locations.
  - **Proportional Symbol Map:** Uses symbols like bars or bubbles to show quantity at different locations.
  - **Choropleth Map:** Uses color to show data within geographic boundaries like states.
  - **Cartogram:** Resizes regions based on data, useful if the audience knows the shapes well.
  - **Flow Map:** Shows movement like imports/exports or migration.
  - **Isopleth Map:** Common in weather maps, shows data concentrations without strict geographic boundaries.
- **Choosing the Right Map:** Select a map type based on the specificity of your data, audience familiarity with the geography, and whether regional trends are important.

# 3. Beyond the Standards

## To cumulative or not cumulative?

### Cumulative vs. Regular Charts
- **Regular Line Chart:**
  - **What it shows:** Data points over time, like sales each month.
  - **Example:** You see the ups and downs of sales every month.
- **Cumulative Line Chart:**
  - **What it shows:** Total accumulation over time, like total sales up to each month.
  - **Example:** You see a steady increase showing total sales growth, not the monthly ups and downs.

### When to Use Each:
- **Regular Line Chart:** Use when you want to see the changes or trends in individual periods (e.g., monthly sales variations).
- **Cumulative Line Chart:** Use when you want to emphasize overall growth or total accumulation (e.g., total sales over a year).

### Combining Charts:
- **Dual Axis Chart:** Combines both cumulative and regular charts but can be confusing.
- **Alternative:** Show two separate charts or create a ratio to simplify.

## Outside the box

### Going Outside the Box in Data Visualization
- **Thinking Differently:**
  - **Standard Charts:** Often, we stick to familiar charts like bar graphs or line charts.
  - **Outside the Box:** Sometimes, using unconventional charts can reveal new insights.
- **Example 1: Transportation Safety:**
  - **Standard Chart:** A basic chart showing vehicle miles driven and traffic fatalities.
  - **Innovative Chart:** The New York Times used a scatterplot with time represented by dot labels instead of on the x-axis. This unusual approach highlighted interesting patterns, like the impact of the 1970s oil crisis on driving and fatalities.
- **Benefits of Unconventional Charts:**
  - **Revealing Patterns:** Non-standard charts can uncover trends and correlations that standard charts might miss.
  - **Engaging the Audience:** These charts can make the audience think more deeply about the data, sparking curiosity and deeper understanding.
- **Example 2: Density Contour Plot:**
  - **Weather Radar Analogy:** Just like weather maps show rainfall density, a density contour plot shows data concentration.
  - **Simplified Visualization:** Instead of plotting millions of individual data points, this chart type shows where data points are densely packed, making it easier to see overall trends.

### Key Takeaways:
- **Experimentation:** Don't be afraid to try different chart types. Sketch ideas on a whiteboard and see if they make sense.
- **Audience Understanding:** Test your charts on someone who isn't data-savvy. If they understand, you've likely succeeded.
- **Inspiration:** Look at how others visualize data to find creative solutions.

### Simplified Example:
- **Scatterplot with Time as Dots:** Imagine plotting years as dots on a scatterplot instead of a timeline. This might feel strange but can show unique patterns.
- **Density Contour Plot:** Think of it like a heat map showing where most data points are concentrated, rather than plotting every single point.
