# Comprehensive Guide to Data Visualization Plots

This notebook provides a comprehensive guide to various data visualization plots. Each section includes a description of the plot, its use cases, and a Python code example with dummy data to generate the plot.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import squarify
from scipy.cluster.hierarchy import linkage, dendrogram

## 1. Line Plot

**Description:** A line plot is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments.

**Use Cases:** It is a basic type of chart common in many fields. It is used to show trends over a continuous range, such as time.

In [None]:
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8, 4))
plt.plot(x, y)
plt.title('Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()

## 2. Bar Plot

**Description:** A bar plot or bar chart is a chart that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.

**Use Cases:** Bar charts are used for comparing the values of different categories.

In [None]:
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]
plt.figure(figsize=(8, 4))
plt.bar(categories, values)
plt.title('Bar Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

## 3. Histogram

**Description:** A histogram is an approximate representation of the distribution of numerical data.

**Use Cases:** It is used to show the frequency distribution of a dataset.

In [None]:
data = np.random.randn(1000)
plt.figure(figsize=(8, 4))
plt.hist(data, bins=30)
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

## 4. Scatter Plot

**Description:** A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.

**Use Cases:** It is used to show the relationship between two numerical variables.

In [None]:
x = np.random.rand(50)
y = np.random.rand(50)
plt.figure(figsize=(8, 4))
plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

## 5. Box Plot

**Description:** A box plot is a method for graphically depicting groups of numerical data through their quartiles.

**Use Cases:** It is used to show the distribution of numerical data and identify outliers.

In [None]:
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.figure(figsize=(8, 4))
plt.boxplot(data, vert=True, patch_artist=True, labels=['A', 'B', 'C'])
plt.title('Box Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

## 6. Heatmap

**Description:** A heatmap is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions.

**Use Cases:** It is used to show the correlation between variables in a dataset.

In [None]:
data = np.random.rand(10, 10)
plt.figure(figsize=(8, 6))
sns.heatmap(data, annot=True, cmap='viridis')
plt.title('Heatmap')
plt.show()

## 7. Pie Chart

**Description:** A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion.

**Use Cases:** It is used to show the proportion of different categories in a dataset. It is best used for a small number of categories.

In [None]:
labels = 'A', 'B', 'C', 'D'
sizes = [15, 30, 45, 10]
plt.figure(figsize=(6, 6))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title('Pie Chart')
plt.axis('equal')
plt.show()

## 8. Violin Plot

**Description:** A violin plot is a method of plotting numeric data. It is similar to a box plot, with the addition of a rotated kernel density plot on each side.

**Use Cases:** It is used to visualize the distribution of the data and its probability density.

In [None]:
df = sns.load_dataset('tips')
plt.figure(figsize=(8, 6))
sns.violinplot(x='day', y='total_bill', data=df)
plt.title('Violin Plot')
plt.show()

## 9. Area Plot

**Description:** An area plot or area chart displays graphically quantitative data. It is based on the line chart.

**Use Cases:** It is used to show the trend of quantitative data over time.

In [None]:
x = range(1, 6)
y = [1, 4, 6, 8, 4]
plt.figure(figsize=(8, 4))
plt.fill_between(x, y)
plt.title('Area Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

## 10. Pair Plot

**Description:** A pair plot is a matrix of scatterplots that allows you to understand the pairwise relationship between different variables in a dataset.

**Use Cases:** It is used to visualize the relationships between all pairs of numerical variables in a dataset.

In [None]:
df = sns.load_dataset('iris')
sns.pairplot(df, hue='species')
plt.suptitle('Pair Plot', y=1.02)
plt.show()

## 11. Bubble Plot

**Description:** A bubble plot is a scatter plot where a third dimension of the data is shown through the size of markers.

**Use Cases:** It is used to show the relationship between three numerical variables.

In [None]:
x = np.random.rand(20)
y = np.random.rand(20)
z = np.random.rand(20) * 1000
plt.figure(figsize=(8, 6))
plt.scatter(x, y, s=z, alpha=0.5)
plt.title('Bubble Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

## 12. Stacked Bar Plot

**Description:** A stacked bar plot is a variation of the bar plot, where each bar is divided into a number of sub-bars stacked one on top of the other.

**Use Cases:** It is used to compare the total values across categories and also show the composition of each category.

In [None]:
categories = ['A', 'B', 'C', 'D']
series1 = np.array([1, 2, 3, 4])
series2 = np.array([4, 3, 2, 1])
series3 = np.array([2, 2, 2, 2])
plt.figure(figsize=(8, 4))
plt.bar(categories, series1, label='Series 1')
plt.bar(categories, series2, bottom=series1, label='Series 2')
plt.bar(categories, series3, bottom=series1+series2, label='Series 3')
plt.title('Stacked Bar Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.legend()
plt.show()

## 13. Radar Chart

**Description:** A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point.

**Use Cases:** It is used to compare multiple quantitative variables.

In [None]:
labels=np.array(['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4', 'Feature 5'])
stats=np.array([10, 20, 15, 5, 25])
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False).tolist()
stats=np.concatenate((stats,[stats[0]]))
angles=np.concatenate((angles,[angles[0]]))
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(polar=True))
ax.plot(angles, stats, color='red', linewidth=2)
ax.fill(angles, stats, color='red', alpha=0.25)
ax.set_yticklabels([])
ax.set_xticks(angles[:-1])
ax.set_xticklabels(labels)
plt.title('Radar Chart')
plt.show()

## 14. Funnel Plot

**Description:** A funnel plot is a type of chart, often used in marketing, to represent the stages in a sales process and show the amount of potential revenue for each stage.

**Use Cases:** It is used to visualize the flow of users through a process or a sales funnel.

In [None]:
data = dict(
    number=[39, 27.4, 20.6, 11, 2],
    stage=["Website visit", "Downloads", "Potential customers", "Requested price", "invoice sent"])
fig = px.funnel(data, x='number', y='stage')
fig.update_layout(title_text='Funnel Plot')
fig.show()

## 15. Treemap

**Description:** A treemap is a method for displaying hierarchical data using nested figures, usually rectangles.

**Use Cases:** It is used to show hierarchical data in a compact way.

In [None]:
sizes = [50, 25, 12, 13]
labels = ['A', 'B', 'C', 'D']
plt.figure(figsize=(8, 6))
squarify.plot(sizes=sizes, label=labels, alpha=.8 )
plt.title('Treemap')
plt.axis('off')
plt.show()

## 16. Chord Diagram

**Description:** A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix.

**Use Cases:** It is used to visualize the flows or connections between several entities.

In [None]:
labels = ["A", "B", "C", "D", "E"]
matrix = [
    [0, 5, 10, 15, 20],
    [5, 0, 25, 30, 35],
    [10, 25, 0, 40, 45],
    [15, 30, 40, 0, 50],
    [20, 35, 45, 50, 0]
]
fig = go.Figure(go.Chord(
    labels=labels,
    source=np.repeat(np.arange(len(labels)), len(labels)),
    target=np.tile(np.arange(len(labels)), len(labels)),
    value=np.array(matrix).flatten(),
    labelfont=dict(size=11, color='black'),
    line=dict(width=1, color='black'),
    color=px.colors.qualitative.Plotly
))
fig.update_layout(title_text='Chord Diagram')
fig.show()

## 17. Sunburst Chart

**Description:** A sunburst chart is a type of chart that is used to visualize hierarchical data. It consists of an inner circle surrounded by rings of deeper hierarchy levels.

**Use Cases:** It is used to show how a whole is divided into parts and sub-parts.

In [None]:
df = px.data.tips()
fig = px.sunburst(df, path=['day', 'time', 'sex'], values='total_bill')
fig.update_layout(title_text='Sunburst Chart')
fig.show()

## 18. Waterfall Plot

**Description:** A waterfall plot is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values.

**Use Cases:** It is used to show how an initial value is affected by a series of intermediate positive or negative values.

In [None]:
fig = go.Figure(go.Waterfall(
    name = "20", orientation = "v",
    measure = ["relative", "relative", "total", "relative", "relative", "total"],
    x = ["Sales", "Consulting", "Net revenue", "Purchases", "Other expenses", "Profit before tax"],
    textposition = "outside",
    text = ["+60", "+80", "", "-40", "-20", "Total"],
    y = [60, 80, 0, -40, -20, 0],
    connector = {"line":{"color":"rgb(63, 63, 63)"}},
))

fig.update_layout(
    title = "Waterfall Plot",
    showlegend = True
)

fig.show()

## 19. Contingency Table

**Description:** A contingency table is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables.

**Use Cases:** It is used to summarize the relationship between several categorical variables. Often visualized as a heatmap.

In [None]:
df = sns.load_dataset('tips')
contingency_table = pd.crosstab(df['day'], df['time'])
plt.figure(figsize=(8, 6))
sns.heatmap(contingency_table, annot=True, cmap='viridis')
plt.title('Contingency Table Heatmap')
plt.show()

## 20. Dendrogram

**Description:** A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.

**Use Cases:** It is used to illustrate the arrangement of the clusters produced by hierarchical clustering.

In [None]:
X = np.random.rand(15, 3)
linked = linkage(X, 'ward')
plt.figure(figsize=(10, 7))
dendrogram(linked,
            orientation='top',
            distance_sort='descending',
            show_leaf_counts=True)
plt.title('Dendrogram')
plt.show()