<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/Week_7/Section_2_Python_Example__Visual_Data_Analysis_with_Matplotlib.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 2 - Python example - visual data analysis with matplotlib
Visual data analysis is an integral part of exploratory data analysis (EDA), providing an immediate way to recognize trends, distributions, and anomalies that might not be apparent from raw data alone. Matplotlib, a versatile plotting library in Python, is widely used for creating a wide range of static, animated, and interactive visualizations. This section provides a detailed Python example demonstrating how to use Matplotlib for visual data analysis, focusing on common types of plots and their application in understanding data.

1. Setting Up the Environment:

To use Matplotlib for data visualization, ensure your Python environment includes Matplotlib. If it’s not installed, you can easily add it via pip:

In [None]:
pip install matplotlib

2. Importing Required Libraries:

Alongside Matplotlib, it's common to use Pandas for handling data, as it provides convenient data structures and data manipulation methods that integrate well with plotting functions.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

3. Preparing Data:

For this example, let’s use a simple dataset that represents sales data across different regions:

In [None]:
# Create a sample DataFrame
data = pd.DataFrame({ 'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], 'Sales': [200, 220, 250, 210, 215, 235], 'Region': ['East', 'West', 'East', 'West', 'East', 'West'] })

4. Basic Line Plot:

A line plot is suitable for visualizing data changes over time, showing trends in a straightforward manner.

In [None]:
# Plotting monthly sales data
plt.figure(figsize=(10, 5))
plt.plot(data['Month'], data['Sales'], marker='o')
plt.title('Monthly Sales Data')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.grid(True)
plt.show()

5. Bar Plot for Categorical Comparison:

Bar plots are useful for comparing quantities corresponding to different groups or categories.

In [None]:
# Aggregate sales by region
region_sales = data.groupby('Region')['Sales'].sum()
# Creating a bar plot
plt.figure(figsize=(7, 4))
plt.bar(region_sales.index, region_sales.values, color='blue')
plt.title('Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.show()

6. Histogram for Distribution Analysis:

Histograms help analyse the distribution of continuous data variables.

In [None]:
# Generate a larger dataset for histogram
np.random.seed(0)
sales_data = np.random.normal(loc=200, scale=30, size=300)
# Creating a histogram
plt.figure(figsize=(7, 4))
plt.hist(sales_data, bins=15, color='green', alpha=0.7)
plt.title('Sales Distribution')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.show()

7. Scatter Plot for Relationship Observations:

Scatter plots are essential for observing the relationships between two continuous variables.

In [None]:
# Simulate a second variable
data['Customer Traffic'] = data['Sales'] * 1.5 + np.random.normal(50, 10, size=len(data))
# Creating a scatter plot
plt.figure(figsize=(7, 4))
plt.scatter(data['Sales'], data['Customer Traffic'], color='red')
plt.title('Sales vs. Customer Traffic')
plt.xlabel('Sales')
plt.ylabel('Customer Traffic')
plt.grid(True)
plt.show()

8. Box Plot for Statistical Summaries:

Box plots provide summaries of the numerical data through their quartiles, highlighting outliers effectively.

In [None]:
# Box plot
plt.figure(figsize=(7, 4))
plt.boxplot([data['Sales'], data['Customer Traffic']], labels=['Sales', 'Customer Traffic'])
plt.title('Sales and Customer Traffic Distribution')
plt.ylabel('Values')
plt.grid(True)
plt.show()

9. Conclusion:

Visual data analysis using Matplotlib enables data scientists to quickly identify patterns, trends, and outliers in data. The ability to visualize complex information through plots and charts not only facilitates a deeper understanding of the data but also assists in communicating findings clearly and effectively to stakeholders. Mastery of these visualization techniques is a crucial skill for any data analyst or scientist, significantly enhancing the exploratory data analysis process.