## Histogram

[refference](https://searchsoftwarequality.techtarget.com/definition/histogram)
A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.

A histogram is a display of statistical information that uses rectangles to show the frequency of data items in successive numerical intervals of equal size.

In the most common form of histogram, the [independent variable](https://whatis.techtarget.com/definition/independent-variable) is plotted along the horizontal axis and the [dependent variable](https://whatis.techtarget.com/definition/dependent-variable) is plotted along the vertical axis. The data appears as colored or shaded rectangles of variable area.


The illustration, below, is a histogram showing the results of a final exam given to a hypothetical class of students.

If this histogram were compared with those of classes from other years that received the same test from the same professor, conclusions might be drawn about intelligence changes among students over the years. Conclusions might also be drawn concerning the improvement or decline of the professor's teaching ability with the passage of time.

If this histogram were compared with those of other classes in the same semester who had received the same final exam but who had taken the course from different professors, one might draw conclusions about the relative competence of the professors.

![image.png](attachment:image.png)

## What is a Histogram?

[refference](https://corporatefinanceinstitute.com/resources/excel/study/histogram/)

A histogram is used to summarize discrete or continuous data. In other words, it provides a [visual interpretation](https://corporatefinanceinstitute.com/resources/knowledge/other/data-presentation-guide/) of numerical data by showing the number of data points that fall within a specified range of values (called “bins”). It is similar to a vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no gaps between the bars.

### Part of a Histogram

1. **The title**: The title describes the information included in the histogram.
2. **X-axis**: The X-axis are intervals that show the scale of values in which the measurements fall under.
3. **Y-axis**: The Y-axis shows the number of times that the values occurred within the intervals set by the X-axis.
4. **The bars**: The height of the bar shows the number of times that the values occurred within the interval while the width of the bar shows the interval that is covered. For a histogram with equal bins, the width should be the same across all bars.

### Importance of a Histogram

Creating a histogram provides a visual representation of data distribution. Histograms display a large amount of data and the frequency of the data values. The median and distribution of the data can be determined by a histogram. In addition, it can show any outliers or gaps in the data.

### Distributions of a Histogram

**A normal distribution**: In a normal distribution, points on one side of the average are as likely to occur as on the other side of the average.

![image.png](attachment:image.png)

**A bimodal distribution**: In a bimodal distribution, there are two peaks. In a bimodal distribution, the data should be separated and analyzed as separate normal distributions.

![image.png](attachment:image.png)

**A right-skewed distribution**: A right-skewed distribution is also called a positively skewed distribution. In a right-skewed distribution, a large number of the data values occur on the left side with a fewer number of data values on the right side.

![image.png](attachment:image.png)

Now the picture is not symmetric around the mean anymore. For a right skewed distribution, the mean is typically greater than the median. Also notice that the tail of the distribution on the right hand (positive) side is longer than on the left hand side.

![image.png](attachment:image.png)

From the box and whisker diagram we can also see that the median is closer to the first quartile than the third quartile. The fact that the right hand side tail of the distribution is longer than the left can also be seen.

**A left-skewed distribution**: A left-skewed distribution is also called a negatively skewed distribution. In a left-skewed distribution, a large number of the data values occur on the right side with a fewer number of data values on the left side.

**A random distribution**: A random distribution lacks an apparent pattern and has several peaks. In a random-distribution histogram, it can be the case that different data properties were combined. Therefore, the data should be separated and analyzed separately.

![image.png](attachment:image.png)

## Histograms - Why Are They So Useful?

Why are histograms so useful? Well, first of all, charts are much more visual than tables; after looking at a chart for 10 seconds, you can tell much more about your data than after inspecting the corresponding table for 10 seconds. Generally, **charts convey information about our data faster than tables** -albeit less accurately.

On top of that, histograms also give us a much **more complete information** about our data. Keep in mind that you can reasonably estimate a variable’s mean, [standard deviation](https://www.spss-tutorials.com/standard-deviation-what-is-it/), skewness and kurtosis from a histogram. However, you can't estimate a variable’s histogram from the aforementioned statistics. We'll illustrate this with an example.

## 3 Things a Histogram can Tell

[Refference](https://blog.minitab.com/blog/3-things-a-histogram-can-tell-you)

Histograms are one of the [most common graphs](https://blog.minitab.com/blog/real-world-quality-improvement/seven-basic-quality-tools-to-keep-in-your-back-pocket) used to display numeric data. Anyone who takes a statistics course is likely to learn about the histogram, and for good reason: histograms are easy to understand and can instantly tell you a lot about our data.

Here are three of the most important things you can learn by looking at a histogram. 

### Shape—Mirror, On the Wall ...

If the left side of a histogram resembles a mirror image of the right side, then the data are said to be symmetric. In this case, the mean (or average) is a good approximation for the center of the data. And we can therefore safely utilize [statistical tools](http://www.minitab.com/products/minitab/) that use the mean to analyze our data, such as t-tests.

If the data are not symmetric, then the data are either left-skewed or right-skewed. If the data are skewed, then the [mean may not provide a good estimate](https://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk) for the center of the data and represent where most of the data fall. In this case, you should consider using the median to evaluate the center of the data, rather than the mean.

    Did you know...
    If the data are left-skewed, then the mean is typically LESS THAN the median.    
    If the data are right-skewed, then the mean is typically GREATER THAN the median.

![image.png](attachment:image.png)

### Span—A Little or a Lot?

Suppose you have a data set that contains the salaries of people who work at your organization. It would be interesting to know where the minimum and maximum values fall, and where you are relative to those values. Because histograms use bins to display data—where a bin represents a given range of values—you can’t see exactly what the specific values are for the minimum and maximum, like you can on an [individual value plot](https://blog.minitab.com/blog/real-world-quality-improvement/three-ways-individual-value-plots-can-help-you-analyze-data). However, you can still observe an approximation for the range and see how spread out the data are. And you can answer questions such as "Is there a little bit of variability in my organization's salaries, or a lot?"

![image.png](attachment:image.png)

### Outliers (and the ozone layer)

Outliers can be described as extremely low or high values that do not fall near any other data points. Sometimes outliers represent unusual cases. Other times they represent data entry errors, or perhaps data that does not belong with the other data of interest. Whatever the case may be, outliers can easily be identified using a histogram and should be investigated as they can shed interesting information about your data. 

![image.png](attachment:image.png)

Rewind to the mid-1980s when scientists reported depleting ozone levels above Antarctica. The Goddard Space Center had studied atmospheric ozone levels, but surprisingly didn’t discover the issue. Why? The analysis they used automatically eliminated any Dobson readings below 180 units because ozone levels that low were thought to be impossible.

## Refference:

https://searchsoftwarequality.techtarget.com/definition/histogram

https://corporatefinanceinstitute.com/resources/excel/study/histogram/

https://www.spss-tutorials.com/histogram-what-is-it/

https://statistics.laerd.com/statistical-guides/understanding-histograms.php