<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/DVRG-Book-Cover-Small.png"><br>

This notebook contains an excerpt from the **`Data Visualization Reference Guide - For Beginners`** book written by *Balasubramanian Chandran*; the content is available [on GitHub](https://github.com/BalaChandranGH/Books/Data-Visualization-Reference-Guide).

<br>
<!--NAVIGATION-->

<[ [Contents and Acronyms](00.00-dvrg-Contents-and-Acronyms.ipynb) | [Data Visualization with Matplotlib](02.00-dvrg-Data-Visualization-with-Matplotlib.ipynb) ]>

# 1. Introduction to Visualizations

## 1.1. What is data visualization?
* The process of conveying/presenting the data with the help of plots and graphics
* The plots and graphics take numerical data as input and display output in the form of charts, figures, and tables that help to analyze and visualize the data clearly, and make decisions
* Matplotlib is the basic data visualization tool (plotting library) of the Python programming language
* Python, R, etc. have different data visualization tools available that are suitable for different purposes

## 1.2. Why do we build visuals?
* A way to summarize the findings and display them in a form that facilitates interpretations and help in identifying trends and patterns
* For Exploratory Data Analysis (EDA)
* Use them to support recommendations to different stakeholders

## 1.3. Goals of data visualizations
* To communicate data clearly and efficiently 
* To share unbiased representation of data
* To make complex data more accessible and easily understandable
* Must comply with the **“CEO rule”**
  - If the CEO of the company attends your presentation, you should hook him/her into the presentation within the first 5 min so that he/she does not leave or return for the rest of the presentation if he/she goes out for an urgent phone call
* Must pass the **“so what?”** test
![](figures/DVRG-SoWhatTest.png)
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Image Credit [ (Source) ](https://cognitiveclass.ai/)

## 1.4. Best practice in data visualizations
_**`“Less is more Effective, more Attractive and more Impactive.”`**_

_**`“The results of our findings need to tell an appealing story and their representations should be easy to interpret and understand for non-technical audiences."`**_

## 1.5. Qualitative vs Quantitative data
* One thing to always keep in mind is the type of data we are trying to create graphs for
* In general, we categorize data into two big groups:
  - `Qualitative` data is also called `Categorical` or `Nominal` data
  - `Quantitative` data is also called `Numerical` or `Continuous` or `Ordinal` data

## 1.6. Rules for better visualization
There are **Ten Simple Rules for Better Figures** introduced by Nicolas P. Rougier, Michael Droettboom, Philip E. Bourne, and they are:
```
* Rule  1: Know your audience
* Rule  2: Identify your message
* Rule  3: Adapt the figure to support the medium
* Rule  4: Captions are not optional
* Rule  5: Do not trust the defaults
* Rule  6: Use colors effectively
* Rule  7: Do not mislead the reader
* Rule  8: Avoid “chart-junk”
* Rule  9: Message trumps beauty
* Rule 10: Get the right tool
```

## 1.7. Qualities of a great visualization
* **Truthful**
  - Be aware of your actions when cleaning, summarizing, and manipulating data and ensure you are not
    - Misleading yourself (self-deception)
    - Misleading your audience
* **Functionality**
* **Beauty**
* **Insightful**
* **Enlightening**
  - A combination of the previous four, but with a social-ethical responsibility

## 1.8. Visualization wheel
The visualization wheel was created by Alberto Cairo as shown below.
![](figures/DVRG-VizWheelDims.png)<br>

<img align="left" style="padding-right:10px;" src="figures/DVRG-VizWheel1.png"><br>
<br><br><br><br><br><br><br><br><br>
Image credit [ (Source) ](https://www.coursera.org/in)<br>

<img align="left" style="padding-right:10px;" src="figures/DVRG-VizWheel2.png"><br>
<br><br><br><br><br><br><br>
Image credit [ (Source) ](https://www.coursera.org/in)

## 1.9. Edward Tufte’s graphical heuristics
* Edward Tufte introduced 2 graphical heuristics for the visual display of information, and they are:
  - The data-ink-ratio
  - Chart-junk

* **The data-ink-ratio:** _**`Remove to improve`**_
  - Remove backgrounds
  - Remove redundant labels
  - Remove borders
  - Reduce colors
  - Remove special effects
  - Lighten labels
  - Direct label

<img align="left" style="padding-right:10px;" src="figures/DVRG-DataInkRatio.png"><br>
<br><br><br>
Image credit [ (Source) ](https://www.darkhorseanalytics.com/)

* **The chart-junk:** _**`3 types`**_
  - Unintended optical art - excessive shading, patterning, coloring, etc. 
  - The grid - most of the time it is unnecessary and introduces competition to the data being shared
  - The duck
  
![](figures/DVRG-ChartJunk1.png) ![](figures/DVRG-ChartJunk2.png)<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Images credit [ (Source) ](https://www.coursera.org/in)

* **`Overall, follow simplicity and minimalism in data visualization`**

## 1.10. Basic visualization tools
* `Bar charts`
  - Displays the **frequencies of qualitative** variables
  - Representation of grouped data (e.g., counts by group, averages by group)
* `Histograms`
  - Displays the **frequencies of quantitative** variables
  - Looks like Bar charts, but displays quantitative/numerical data
  - Graphical visualization of data counts
  - A particular type of Bar chart
  - Shows the distribution of a quantitative variable
* `Pie charts`
  - A representation of counts of qualitative data
  - A circular graph that shows proportions of data to an overall total
* `Scatter plots`
  - A plot that shows the relationship between two variables (Predictor X & Target Y)
* `Bubble plots`
  - A variation of the Scatter plot that displays 3 dimensions of data X, Y & Z where Z represents the sizes of data-points
* `Line plots/charts/graphs`
  - A graph that uses lines to connect individual data points
  - Displays quantitative values over a specified time interval
* `World clouds`
  - An image composed of words that occur in a particular text or subject
  - The size of a word indicates its frequency or importance in a body of text
* `Radar charts`
  - A way to display multivariate data within one plot
  - They are circular plots with spokes that represent an axis for each variable.
* `Waffle charts`
  - A great way to visualize data in relation to a whole
  - Provide a fine-grained view of the proportions of different categories
  - Can be used as an addition to other visualization tools, such as a Pie chart
* `Box plots`
  - A plot that summarizes the distribution of sorted numerical data
  - A convenient way to represent the degree of dispersion (spread) and skewness in the data, and show outliers without making any assumption of the underlying statistical distribution
* `Leaflet maps`
  - Displays spatial information

## 1.11. How to choose the right chart type for a visualization?
* Numbers don't lie, but a bad chart decision makes it extremely difficult to understand what those numbers mean
* Before putting together a presentation, make sure your pick the right type of chart to clearly communicate the information you want to share
* Working with and collecting huge amounts of data is rewarding, but that data is only as good as how well you can communicate what it means
* It's easy to throw your data up on a scatter or bar chart, slip it into a presentation, and convince yourself you'll do the explaining, but that's a terrible shortcut
* When the presentation is over and the only thing left behind is your slides, no one will have a clue what your chart was trying to communicate
* The problem is that there are so many chart types, styles, and methods of presenting data that it can be confusing and difficult to pick the right one
* Follow the following 3-points rule:
  1) Understand the message you are trying to present with the data
  2) Select the best arrangement
  3) Format the chart

**1) Understand the message you are trying to present with the data:**
* When you're putting together a chart, you're trying to show one of four things: 
  - `Relationship` between data points
  - `Comparison` of data points
  - `Composition` of data
  - `Distribution` of data
  - A `relationship` tries to show a connection or correlation between two or more variables through the data presented, like the market cap of given stock overtime versus the overall market trend
  - A `comparison` tries to set one set of variables apart from another, and display how those two variables interact, like the number of visitors to five competing websites in a single month
  - A `composition` tries to collect different types of information that make up a whole and display them together, like the search terms that those visitors used to land on your site, or how many of them came from links, search engines, or direct traffic.
  - A `distribution` tries to lay out a collection of related or unrelated information simple to see how it correlates, if at all, and to understand if there's any interaction between the variables, like the number of bugs reported during each month of a beta.

**2) Select the best arrangement:**
* Once you understand what message you're trying to send with the data you have, it's time to select the best method for displaying that information
* Different chart types cater best to different methods. For example, 
  - `Scatter plots` are best used to show distributions
  - `Line charts` (essentially, Scatter plots with a defined trend) are better suited for relationships
  - `Pie charts` do well when you're trying to communicate a composition, but make for poor comparisons or distributions

**3) Format the chart:**
* Once you've selected the right type of chart for your data, make sure you don't do your data a disservice by forgetting some basic design tips
* Kill the grid lines unless they're absolutely necessary, or at least make them subtle so they don't distract from the information you're trying to present
* Make sure your chart is centered on the data you want to present, your axes are clearly labeled, and your axes have units on them where necessary, so no one has to guess or infer what you're trying to say
* Remember, your goal is that anyone can pick up your chart, whether you're there to talk about it or not, and understand what information the data is trying to communicate


<img align="left" style="padding-right:10px;" src="figures/DVRG-ChartSuggestions.png"><br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Image credit [ (Source) ](https://www.coursera.org/in)

## 1.12. Similarities & differences between histograms and boxplots
**Similarities between Histograms and Boxplots:**
* Both are graphical representations for the frequency of numeric data values
* Both allow to visually assess the central tendency, the amount of variation in the data as well as the distributions, presence of gaps, outliers, or unusual data points
* Both are used to explore and present the data in an easy and understandable manner
* Both are used to verify whether an improvement has been achieved by exploring the data before and after the improvement initiative
* Both are ideal to represent a moderate to large amount of data. They may not accurately display the distribution shape if the data size is too small. In practice, a sample size of at least 30 data values would be sufficient for both tools

<img align="left" style="padding-right:10px;" src="figures/DVRG-HistBoxSimilarities.png"><br>
<br><br><br>
Image credit [ (Source) ](https://citoolkit.com/articles/histograms-and-boxplots/)

**Differences between Histograms and Boxplots:**
* While Boxplots and Histograms are visualizations used to show the distributions of the data, they communicate information differently

<img align="left" style="padding-right:10px;" src="figures/DVRG-HistBoxDifferences1.png"><br>
<br><br>
Image credit [ (Source) ](https://citoolkit.com/articles/histograms-and-boxplots/)

<img align="left" style="padding-right:10px;" src="figures/DVRG-HistBoxDifferences2.png"><br>
<br><br>
Image credit [ (Source) ](https://citoolkit.com/articles/histograms-and-boxplots/)

In symmetric distribution, the mean and median are nearly the same, and the two whiskers have almost the same length.

![](figures/DVRG-HistBoxDifferences3.png)

## 1.13. Different types of charts/plots/graphs
![](figures/DVRG-DifferentTypesofCharts.png)

<!--NAVIGATION-->
<br>

<[ [Contents and Acronyms](00.00-dvrg-Contents-and-Acronyms.ipynb) | [Data Visualization with Matplotlib](02.00-dvrg-Data-Visualization-with-Matplotlib.ipynb) ]>