<a href="https://colab.research.google.com/github/SSSpock/skillspire/blob/main/datavis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Introduction to Data Visualization

## Importance of Data Visualization
Data visualization is a key aspect of data science, as it allows us to better understand and communicate complex information. By representing data in a visual format, we can:

Quickly grasp essential information and patterns in the data
Identify outliers and anomalies
Make informed decisions based on the data
Communicate our findings to others in a clear and concise manner

## Types of Data Visualizations
There are many types of data visualizations, each suited to different types of data and purposes. Some common types include:

Line plots: Display continuous data over time or another variable
Bar plots: Compare categorical data or show the distribution of values across categories
Scatter plots: Show the relationship between two continuous variables
Histograms: Display the distribution of a continuous variable
Pie charts: Show proportions of a whole
Heatmaps: Represent data in a matrix format, using colors to indicate values
Box plots: Visualize the distribution of data and identify outliers
Area charts: Display continuous data over time or another variable, with emphasis on the magnitude

## Applications in Data Science
Data visualization plays a crucial role in various stages of data science, including:

Exploratory Data Analysis (EDA): Visualization helps identify patterns, trends, and relationships in the data, which can guide the development of hypotheses and inform feature selection.
Model Evaluation: Visualizing model performance, such as learning curves or confusion matrices, can help us understand the strengths and weaknesses of our models and guide improvements.
Reporting and Communication: Creating clear and informative visualizations can help communicate our findings to stakeholders and facilitate data-driven decision-making.

# Introduction to Matplotlib

## Overview of the Matplotlib Library

Matplotlib is a powerful and popular Python library for creating static, animated, and interactive visualizations. It provides a wide range of plotting functions, enabling users to create various types of visualizations with ease.

Some key features of Matplotlib include:

High-quality output in multiple formats (PNG, PDF, SVG, etc.)
Extensive customization options for plot appearance
Integration with Jupyter Notebook and other Python libraries

## Basic Plotting Functions

Matplotlib provides several basic plotting functions for creating various types of plots. Here are a few common ones:

plot(): Creates line plots

scatter(): Creates scatter plots

bar(): Creates bar plots

hist(): Creates histograms

boxplot(): Creates box plots

pie(): Creates pie charts

## Customizing Plots
With Matplotlib, you can customize various aspects of your plots to make them more informative and visually appealing. Some common customizations include:

Adding titles, axis labels, and legends
Changing line styles, colors, and markers
Modifying axis limits and ticks
Annotating plots with text, arrows, or other shapes
Applying gridlines and background colors

# Understanding Matplotlib Control Patterns

In this section, we will explain the high-level control patterns in Matplotlib, specifically focusing on the use of functions like plt.title() and how they can access and modify elements of your plots.

## Matplotlib's Object Hierarchy
At its core, Matplotlib is built on an object-oriented hierarchy. At the top of this hierarchy is the Figure object, which represents the entire plot and can contain one or more Axes objects. Each Axes object represents a single plot within the figure and contains various elements, such as lines, markers, labels, and more.

When you create a plot using Matplotlib functions like plt.plot(), you are implicitly working with the Figure and Axes objects in the background. The pyplot (imported as plt) interface provides a simple way to interact with these objects without explicitly referencing them.

## The Pyplot State Machine
Matplotlib's pyplot interface operates as a state machine. It keeps track of the current Figure and Axes objects, and most pyplot functions implicitly target the current objects. When you call functions like plt.title(), they access and modify elements of the current Axes object.

Here's a brief overview of how some common pyplot functions interact with the object hierarchy:

plt.figure(): Creates a new Figure object and sets it as the current figure

plt.plot(), plt.scatter(), etc.: Creates the corresponding plot element within the current Axes object

plt.title(), plt.xlabel(), plt.ylabel(): Sets the title, x-axis label, or y-axis label of the current Axes object

plt.legend(): Adds a legend to the current Axes object

plt.show(): Displays the current Figure object

## Working with Multiple Figures and Axes
In some cases, you may want to create multiple figures or axes within a single plot. You can achieve this by explicitly creating and referencing Figure and Axes objects using the plt.subplots() function.

For example, to create a 2x2 grid of subplots, you can do the following:

python


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate data for line plot
x_line = np.linspace(0, 2 * np.pi, 100)
y_line = np.sin(x_line)

# Generate data for scatter plot
x_scatter = np.random.rand(50)
y_scatter = np.random.rand(50)

# Generate data for bar plot
x_bar = ['A', 'B', 'C', 'D', 'E']
y_bar = [3, 7, 2, 5, 8]

# Generate data for histogram
data_hist = np.random.randn(1000)

# Create a 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 10))

# Line plot
axes[0, 0].plot(x_line, y_line)
axes[0, 0].set_title('Line Plot')

# Scatter plot
axes[0, 1].scatter(x_scatter, y_scatter)
axes[0, 1].set_title('Scatter Plot')

# Bar plot
axes[1, 0].bar(x_bar, y_bar)
axes[1, 0].set_title('Bar Plot')

# Histogram
axes[1, 1].hist(data_hist, bins=30)
axes[1, 1].set_title('Histogram')

plt.show()



By explicitly referencing the Axes objects, you can apply customizations and other functions to each individual plot.

In summary, Matplotlib's pyplot interface provides a high-level way to interact with the underlying object hierarchy, making it easy to create and customize plots. Functions like plt.title() work with the current Figure and Axes objects to modify their elements, while more advanced control can be achieved by explicitly working with the objects themselves.

## Hands-on Exercise: Creating and Customizing Basic Plots with Matplotlib

In this section, we will demonstrate how to create and customize various types of plots using Matplotlib's basic plotting functions. We will start with a simple line plot and then move on to other types of plots, such as scatter plots and bar plots. Finally, we will explore various customization techniques to enhance our visualizations.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate a sine wave
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

# Create a line plot
plt.plot(x, y)
plt.show()


## Creating Scatter and Bar Plots
Now, let's create a scatter plot and a bar plot using the scatter() and bar() functions, respectively.

In [None]:
# Generate random data
x_scatter = np.random.rand(50)
y_scatter = np.random.rand(50)

# Create a scatter plot
plt.scatter(x_scatter, y_scatter)
plt.show()

# Generate data for a bar plot
x_bar = ['A', 'B', 'C', 'D', 'E']
y_bar = [3, 7, 2, 5, 8]

# Create a bar plot
plt.bar(x_bar, y_bar)
plt.show()


## Customizing Plots
Now, let's explore various customization techniques to enhance our plots. We will modify the appearance of our line plot and add titles, labels, and legends to our scatter and bar plots.

In [None]:
# Create a customized line plot
plt.plot(x, y, linestyle='--', color='red', linewidth=2, marker='o', markersize=5)
plt.title('Sine Wave')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.grid(True)
plt.show()


## Adding Titles, Labels, and Legends to Scatter and Bar Plots

In [None]:
# Create a customized scatter plot
plt.scatter(x_scatter, y_scatter, color='blue', marker='s', s=50, alpha=0.5, label='Random Points')
plt.title('Scatter Plot')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.legend(loc='upper left')
plt.show()

# Create a customized bar plot
plt.bar(x_bar, y_bar, color='green', edgecolor='black', width=0.5, label='Categories')
plt.title('Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.legend(loc='upper right')
plt.show()
