<a href="https://colab.research.google.com/github/dnmuasya/Case_Study_1/blob/main/%5BWorkshop_Notebook%5D_AfterWork_Data_Visualization_Best_Practices_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font color="blue">To use this notebook on Colaboratory, you will need to make a copy of it. Go to File > Save a Copy in Drive. You can then use the new copy that will appear in the new tab.</font>

### Table of contents
In this notebook we'll cover the following concepts regarding data visualization best practices are covered:

1. Clarity and comprehension in data visualization
2. Choosing the right visualization type
3. Format styles
> - Color
> - Labelling
> - Use of gridlines
4. Highlighting what's important



# AfterWork: Data Visualization Best Practices with Python Course

## Prerequisites

Import the libraries below in order to use the notebook

In [None]:
# importing matplotlib for plotting
# ---
#
import matplotlib.pyplot as plt

# importing seaborn for plotting
# ---
#
import seaborn as sns

# importing numpy for scientific computations
# ---
#
import numpy as np

# importing pandas for data manipulation
# ---
#
import pandas as pd

We will be using the `Seaborn` and `Matplotlib` libraries because while `Matplotlib` serves as a versatile plotting library, `Seaborn` adds additional functionality and aesthetics.

## 1. Basic Overview

It is essential to employ best practices when we carry out data visualization related tasks, as this will aid in ensuring strategic decisions can be made based on the data.

**Clarity and comprehension** is a key best practice in data visualization, as when a chart fails to effectively communicate what the data is representing, it might lead to misunderstanding the data.

#### Example 1

In the example below, we will look at a dataset containing `Average Score` by `Years of Experience`, and show how clarity and comprehension can be used as a data visualization best practice through the use of:
> - Chart titles and descriptions


In [None]:
# Dataset URL (CSV) = https://bit.ly/avg-scores-csv

# Prepare dataset
scores_df = pd.read_csv('https://bit.ly/avg-scores-csv', sep=',')

# View dataframe
scores_df.head()

In [None]:
# Set plot size for better visibility
plt.figure(figsize=(8, 6))

# Plot line chart
plt.plot(scores_df["Years of Experience"], # Define the x-axis
         scores_df["Average Score"], # Define y-axis
         color="green", label="Average Score") # Style chart

# Add a title
plt.title('Average Score by Years of Experience')

# Add an x-axis label with clear description
plt.xlabel('Years of Experience')

# Add a y-axis label with clear description
plt.ylabel('Average Score')

# Add a legend for better understanding
plt.legend()

# Display the plot
plt.show()

In the above examples, we have used the following techniques to ensure the plots are easily understood:

* The codes ensure clear visualizations through descriptive labels, and appropriate plot sizes.
* Incorporating legends and clear labels provides context and aids in the comprehension of the plotted data.
* Adhering to best practices promotes clarity and comprehension in data visualization.

#### <font color='green'> Challenge 1.1: Clarity and comprehension

Using the same `scores_df` dataset, create a bar chart that shows the distribution of average scores by years of experience.

Ensure the chart is well-labelled.

In [None]:
# Set the chart size
plt.figure(figsize=(10, 6))

# Plot the bar chart
plt.bar(scores_df["Years of Experience"], # Define the x-axis
        scores_df["Average Score"], # Define the y-axis
        color='royalblue') # Set the color of the chart

# YOUR CODE GOES HERE
# ---

# Set the chart title
plt.title("")

# Set the axes labels
plt.xlabel("Years of Experience")
plt.ylabel("Average Score")

# Display chart
plt.show()

## 2. Clarity and comprehension

Our charts should easily understand and communicate what is in the data effectively.

We can achieve this by using the following techniques:
1. Chart titles and descriptions
2. Sorting data
3. Annotating the data

### Examples

#### 2.1 Chart titles and descriptions

Two ways we could use chart titles:
> 1. Describe the query - Used when monitoring data without bias
> 2. Explaining insights - Using data to tell a story

##### Example 1: Describe the query

This approach is suitable when our goal is to present data without any interpretation or bias, often used in monitoring or reporting scenarios.

In [None]:
# Load dataset
# This dataset will be used for all examples in the notebook
# ---
# Dataset url (https://bit.ly/shop-data)

# Load data
customer_data = pd.read_csv("https://bit.ly/shop-data")

# View the dataset
customer_data.head()

In [None]:
# Question
# ---
# Plot a histogram to show the distribution of customer ages.
# The title should describe the query
# ---

# Set figure size
plt.figure(figsize=(10, 5))

# Plot chart
plt.hist(customer_data['Age'], bins=20, edgecolor='black', alpha=0.7)

# Add title describing the query
plt.title('Distribution of Customer Ages')

# Add axes label
plt.xlabel('Age')
plt.ylabel('Number of Customers')

# Display plot
plt.show()

- In these charts, we use a neutral title that simply state what the data represents.
- They are factual and unbiased, perfect for a general audience or ongoing monitoring.

##### Example 2: Explaining the insights

We use this approach when the goal is to convey a specific insight or story from the data. It's more interpretive and often used in presentations or reports to highlight specific findings or trends.

In [None]:
# Question
# ---
# Plot a histogram to analyze the age distribution of customers, highlighting any significant trends or concentrations.
# The title should reflect a specific insight, such as concentration in a particular age group.
# ---

# Set figure size
plt.figure(figsize=(10, 5))

# Plot chart
plt.hist(customer_data['Age'], bins=20, edgecolor='black', alpha=0.7)

# Add title explaining the insights
plt.title('Customer Age Analysis: Concentration in Senior Age Group')

# Add axes label
plt.xlabel('Age')
plt.ylabel('Number of Customers')

# Display plot
plt.show()

- This chart's title highlights a specific aspect of the data by telling a story or pointing out a particular trend.
- We can use this approach when we need to engage the audience and drive specific points home.

####  <font color='green'> Challenge 2.1: Chart titles


In [None]:
# Load challenge data
# ---
# Dataset URL (CSV) -- "https://bit.ly/profit-data"

# Load data using Pandas
df_profit = pd.read_csv("https://bit.ly/profit-data")

# View dataframe
df_profit.head()

**Describing the query**

In [None]:
# Question
# ---
# Plot a bar chart to show the trend of the monthly profits.
# The title should describe the query
# ---

# YOUR CODE GOES BELOW

# Set chart size
plt.figure(figsize=(10, 5))

# Plot bar chart
plt.barh(df_profit['Month'],  # Define x-axis
        df_profit['Profit'],  # Define y-axis
        color='skyblue')  # Define color of the chart

# Set chart title here
plt.title('')

# Set axes label
plt.ylabel('Month')
plt.xlabel('Profit')

# Display plot
plt.show()

**Explaining insights**

Using the same data from the question above, can you use a chart title to explain insights from the data.

In [None]:
# Question
# ---
# Plot a bar chart to show the trend of the monthly profits.
# The title should explain insights from the data.
# ---

# YOUR CODE GOES BELOW

# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart
plt.bar(df_profit['Month'], # Define x-axis
        df_profit['Profit'], # Define y-axis
        color='skyblue') # Set the color of the chart


# Set chart title here
plt.title('')

# Set axes label
plt.xlabel('Month')
plt.ylabel('Profit')

# Display plot
plt.show()

#### 2.2 Sorting data

We sort the data in our visualizations to allow the insights in our data sets to be communicated and illustrated in the right order.

We can sort our data by:

- Alphabetical order
- Ascending order
- Descending order

This can be done using the `sort_values` function in Pandas.

##### Example 1: Alphabetical order

Sorting data by alphabetical order allows us to quickly identify categories in the data.

This can be useful when dealing with multiple categories, and we need to quickly identify a particular category.

In [None]:
# Question: Alphabetical order
# ---
# Sort the data in alphabetical order by Customer ID and display using a bar chart
# ---

# Sort data by Customer ID in alphabetical order
# Limit to 10
top_customers_data = customer_data.sort_values(by='Customer_ID', ascending=True).head(10)

# View output
top_customers_data

In [None]:
# Set figure size
plt.figure(figsize=(10, 6))

# Plot bar chart with sorted data
plt.bar(top_customers_data['Customer_ID'], # Define x-axis
        top_customers_data['Total_Purchases'], # Define y-axis
        color='purple') # Set the color of the chart

# Add title for top customers
plt.title('Bar Chart of Total Purchases (Alphabetically Sorted)')

# Add axes labels
plt.xlabel('Customer ID')
plt.ylabel('Total Purchases')

# Display plot
plt.show()

##### Example 2: Ascending order

In this example, the data is sorted from *lowest to highest*.

This is useful when we need to tell a story with our data and showing incremental changes.

In [None]:
# Question : Ascending order
# ---
# Create a bar plot to show the total purchases of customers, arranged in ascending order.
# ---

# Sort data by total purchases in ascending order
top_10_ascending_data = customer_data.sort_values(by='Total_Purchases').head(10)

# View output
top_10_ascending_data

In [None]:
# Set figure size
plt.figure(figsize=(10, 6))

# Plot chart with sorted data
plt.bar(top_10_ascending_data['Customer_ID'], # Define x-axis
        top_10_ascending_data['Total_Purchases'], # Define y-axis
        color='green') # Set the color of the chart

# Add title for ascending order (top 10)
plt.title('Top 10 Customers: Total Purchases in Ascending Order by Customer ID')

# Add axes labels
plt.xlabel('Customer ID')
plt.ylabel('Total Purchases')

# Display plot
plt.show()

##### Example 3: Descending order


Sorting data from highest to lowest is useful when we need to compare between categories.

In [None]:
# Question : Descending order
# ---
# Create a bar plot to show the total purchases of customers, arranged in descending order.
# ---

# Sort data by total purchases in descending order
top_10_descending_data = customer_data.sort_values(by='Total_Purchases', ascending=False).head(10)

# View output
top_10_descending_data

In [None]:
# Set figure size
plt.figure(figsize=(10, 6))

# Plot chart with sorted data
plt.bar(top_10_descending_data['Customer_ID'], # Define the x-axis
        top_10_descending_data['Total_Purchases'], # Define the y-axis
        color='royalblue') # Set the color of the chart

# Add title for descending order (top 10)
plt.title('Top 10 Customers: Total Purchases in Descending Order by Customer ID')

# Add axes labels
plt.xlabel('Customer ID')
plt.ylabel('Total Purchases')

# Display plot
plt.show()

#### <font color='green'> Challenge 2.2: Sorting data

**Ascending order**

In [None]:
# Question
# ---
# Sort the data by 'Profit' in ascending order and visualize it using a bar chart
# ---
# YOUR CODE GOES BELOW
#

# Sort profit in ascending order



# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart with sorted values


# Set chart title
plt.title('Profits Sorted in Ascending Order')

# Set axes label
plt.xlabel('Month')
plt.ylabel('Profit')

# Display plot
plt.show()

#### 2.3 Annotating your data


Annotating our data visualizations is key as it can add context and perspective to them.

We can achieve this by:

1. Annotations in charts
2. Comments for context

##### Example 1: Annotations in charts

We can use annotations directly in our charts to highlight specific points or trends.

In [None]:
# Question
# ---
# Create a histogram to display the distribution of customer ages.
# Annotate the age group with the least number of customers.
# ---

# Set figure size
plt.figure(figsize=(10, 6))

# Plot histogram
n, bins, patches = plt.hist(customer_data['Age'], edgecolor='royalblue', bins=10, alpha=0.5)

# Find the min height for annotations
min_height = min(n)

# Adding annotations for specific age groups
for i in range(len(patches)):  # Loop will iterate over each bar in the histogram

    # Check if the current histogram bar has the minimum height
    if n[i] == min_height:  # Highlight the least populated age group

    # Adding annotations to the plot using 'plt.text' function
        plt.text(patches[i].xy[0], n[i], f'Least\n{int(bins[i])}-{int(bins[i+1])} yrs',
                 fontsize=12, ha='left', va='bottom') # Styling the annotations

# Add title
plt.title('Distribution of Customer Ages with Annotations')

# Add axes labels
plt.xlabel('Age')
plt.ylabel('Number of Customers')


# Display plot
plt.show()

##### Example 2: Comments for context


We use comments within our code in order to explain what's happening or the rationale behind certain choices, this is **not seen** in the visual output but it is important for understanding the code.

In [None]:
# Question
# ---
# Create a histogram to display the distribution of customer ages.
# Write well-written comments in the code.
# ---

# Set figure size
plt.figure(figsize=(10, 6))

# Plot histogram
plt.hist(customer_data['Age'], edgecolor='royalblue', bins=10, alpha=0.5)

# Add title
plt.title('Distribution of Customer Ages')

# Add axes labels
plt.xlabel('Age')
plt.ylabel('Number of Customers')

# Display plot
plt.show()

- In this second example, we use comments to explain each part of the code, including the rationale behind the annotation.

- This is essential for anyone reading the code to understand the decisions made during the visualization process. It is also suitable for the reproducibility of our code.

#### <font color='green'> Challenge 2.3 : Annotating data

In [None]:
# Question
# ---
# Plot a bar chart that shows the month with the highest profit
# ---
# YOUR CODE GOES BELOW

# Find the month with the highest profit
max_profit_month = df_profit.loc[df_profit['Profit'].idxmax(), 'Month']

# View output
print(f"The month with the highest profit is {max_profit_month}.")

In [None]:
# Set chart size
plt.figure(figsize=(10, 6))

# Plot chart
plt.bar(df_profit['Month'], df_profit['Profit'], color='skyblue', label='Monthly Profits')

# Annotate chart





# Style chart; add title and labels
plt.title('Monthly Profits')
plt.xlabel('Month')
plt.ylabel('Profit')

# Display chart
plt.legend()
plt.show()

## 3. Selecting the right visualization type

Selecting the right chart types will be dependent on the data we are working with.

Here are some main categories of chart types.
>
 > 1. Trend analysis
 > 2. Correlation/Relationship exploration
 > 3. Composition/Part to whole analysis
 > 4. Distribution
 > 5. Comparisons of items

### Examples

#### 3.1 Trend analysis

These types of charts are most suitable when we are visualizing changes/trends over a period of time.

Under this category, we have the following types of charts:

> 1. Line charts
> 2. Area plots


##### **Line charts**

Line charts help us understand trends over time, particulary patterns that show acceleration, decelaration and volatility.

In [None]:
# Question: Line chart
# ---
# Create a line chart that shows the average purchase trend with age
# ---

# Grouping data by 'Age' and calculating the mean of 'Total_Purchases'
age_grouped = customer_data.groupby('Age')['Total_Purchases'].mean()

# View output
age_grouped.head()

In [None]:
# Set chart size
plt.figure(figsize=(10, 6))

# Plot chart
plt.plot(age_grouped, # Define data
         linestyle='--', # Set the line style
         color='seagreen') # Set color of the line chart

# Set chart title
plt.title('Mean Total Purchases Trend with Age')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Mean Total Purchases')


# Display the plot
plt.show()

##### **Area chart**

This type of chart is used when we need to visualize cumulative trends of different components over time.

In [None]:
# Question
# ---
# Visualize an area chart to show the cumulative total purchases over age
# ---

# Set chart size
plt.figure(figsize=(10, 6))

# Group the data
# Use fill between function to fill the area under the curve
plt.fill_between(customer_data.groupby('Age')['Total_Purchases'].sum().index,
                 customer_data.groupby('Age')['Total_Purchases'].sum().values,
                 color='orange', alpha=0.5) # Style the chart

# Set chart title
plt.title('Cumulative Total Purchases over Age')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Cumulative Total Purchases')

# Display plot
plt.show()

#### <font color='green'> Challenge 3.1: Trend analysis chart

In [None]:
# Question
# ---
# Conduct a trend analysis by visualizing the monthly profits over the year using a line chart.
# ---
# YOUR CODE GOES BELOW

# Set chart size
plt.figure(figsize=(10, 5))

# Plot line chart here


# Set chart title
plt.title('Trend Analysis: Monthly Profits Over the Year')

# Set label axes
plt.xlabel('Month')
plt.ylabel('Profit')

# Display the line chart
plt.show()

#### 3.2 Correlation/Relationship exploration

These types of charts are used when we explore the relationship between two or more variables in a dataset.

They include:
> 1. Correlation heat map
> 2. Scatter plots

##### **Correlation heat map**

Used to highlight correlations between two dimensions with a metric of color.

In [None]:
# View correlation table using corr() function
corr_matrix = customer_data.corr(numeric_only=True)

# View output
corr_matrix

In [None]:
# Question
# ---
# Create a heatmap using seaborn to visualize the correlation matrix
# ---

# Set the figure size for the plot
plt.figure(figsize=(8,6))

# Generate a heatmap using seaborn
sns.heatmap(corr_matrix,
            annot=True, # 'annot=True' displays the correlation values in each cell of the heatmap
            cmap='coolwarm') # 'cmap' sets the color theme of the heatmap

# Set the title of the heatmap
plt.title('Correlation Map: Age vs Total Purchases')

# Display the heatmap
plt.show()

##### **Scatter plots**

We use scatter plots to show how different variables correlate with each other.

In [None]:
# Question : Scatter plot
# ---
# Explore the relationship between customer age and total purchases using a scatter plot.
# ---

# Set the figure size for the plot
plt.figure(figsize=(10, 6))

# Plot the chart
plt.scatter(customer_data['Age'], # Define x-axis
            customer_data['Total_Purchases']) # Define y-axis

# Set the chart title
plt.title('Relationship Between Age and Total Purchases')

# Set the axes label
plt.xlabel('Age')
plt.ylabel('Total Purchases')

# Display the plot
plt.show()

#### <font color='green'> Challenge 3.2: Relationship exploration

In [None]:
# Question
# ---
# Create a scatter plot to show the relationship between month and profit
# ---
# YOUR CODE GOES BELOW
# ---

# Set the figure size for the plot
plt.figure(figsize=(10, 6))

# Plot the chart here



# Set the chart title
plt.title('Relationship Between Profit and Month')

# Set the axes label
plt.xlabel('Month')
plt.ylabel('Profit')

# Display the plot
plt.show()

#### 3.3 Compositions analysis

These types of charts are used when we visualize part-to-whole relationships in our data. They include:
> 1. Pie charts
> 2. Stacked area charts

###### **Pie chart**

Used when we show totals divided into categories by percentage.

In [None]:
# Question
# ---
# Create a pie chart to visualize the composition of customers by gender.
# ---

# Calculating the number of customers by gender
gender_data = customer_data['Gender'].value_counts()

# View output
gender_data

In [None]:
# Set chart size
plt.figure(figsize=(8, 8))

# Plot pie chart
plt.pie(gender_data, # Define data to plot
        labels=gender_data.index, # Set labels to the chart using the index of gender_data
        autopct='%1.1f%%', # Display percentage values in the chart
        colors=['lightcoral', 'skyblue', 'seagreen']) # Define the colors to use in the chart

# Add title
plt.title('Composition of Customers by Gender')

# Add legend
plt.legend(gender_data.index, title='Gender')

# Display the pie chart
plt.show()

###### **Stacked area chart**

Used when we show part to whole relationships over a period of time.

In [None]:
# Question : Stacked area chart
# ---
# Create a stacked area chart that shows the pattern of total purchase by age group and gender
# ---

# Categorizing ages into groups
bins = [0, 20, 30, 40, 50, 60, 70, 80]
labels = ['0-20', '21-30', '31-40', '41-50', '51-60', '61-70', '71+']
customer_data['Age_Group'] = pd.cut(customer_data['Age'], bins=bins, labels=labels, right=False)

# Pivot table for the stacked area chart
pivot_data = customer_data.pivot_table(index='Age_Group', columns='Gender', values='Total_Purchases', aggfunc='sum')

# View output
pivot_data

In [None]:
# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.stackplot(pivot_data.index, # Define x-axis using the pivot_data index
              pivot_data['Male'], # Define y-axis with Male data
              pivot_data['Female'], # Define y-axis with Female data
              pivot_data['Other'], # Define y-axis with Other data
              labels=['Male', 'Female', 'Other'], # Labels to be used in the legend
              colors=['lightcoral', 'skyblue', 'seagreen'] # Set colors for the categories
              )

# Set chart title and label axes
plt.title('Total Purchases by Age Group and Gender')
plt.xlabel('Age Group')
plt.ylabel('Total Purchases')

# Display legend
plt.legend()

# Display the plot
plt.show()

#### 3.4 Distribution analysis


These types of charts are used when we need to understand the frequency of values in our dataset.
> 1. Histogram
> 2. Box and whisker plot

###### **Histogram**

Used when we need to show the frequency values occur in the data.


In [None]:
# Question
# ---
# Create a histogram to display the distribution of age data
# ---

# Set chart size
plt.figure(figsize=(10, 6))

# Plot the histogram
plt.hist(customer_data['Age'], # Define data
         bins=8, # Define the ranges
         # Style the chart
         color='lightcoral', # Color of bars
         edgecolor='black') # Color of the edge of the bar

# Set chart title
plt.title('Distribution of Customer Ages')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Number of Customers')

# Display the chart
plt.show()

###### **Box and whisker plot**

We use box and whisker plots to show the distribution in data sets by category.

Box plots: Contains the median, first and third quartile ranges.

Whisker plots: Represents the minimum and maximum points in the dataset.

In [None]:
# Question
# ---
# Create a Box and whisker plot to display the distribution of ages
# ---

# Set the figure size for the plot
plt.figure(figsize=(10, 6))

# Plot a box plot
sns.boxplot(x=customer_data['Age'], # Define x-axis
            color='red') # Set the color of the chart

# Set the title
plt.title('Box Plot: Distribution of Customer Ages')

# Label for the x-axis
plt.xlabel('Age')

# Display the box plot
plt.show()

#### <font color='green'> Challenge 3.4: Distribution analysis

In [None]:
# Question
# ---
# Analyze the distribution of monthly profits using a histogram.
# ---
# YOUR CODE GOES BELOW


# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart


# Set chart title
plt.title('Distribution: Histogram of Monthly Profits')

# Label axes
plt.xlabel('Profit Range')
plt.ylabel('Frequency')

# Display the histogram
plt.show()

#### 3.5 Comparisons of items analysis

These types of charts are most suitable when we need to compare values across different categories. They include:
> 1. Bar charts
> 2. Clustered bar/column charts

###### **Bar chart**


We use horizontal bar charts to compare values across category groups.


In [None]:
# Question
# ---
# Create a bar chart to compare the total purchases across the customer age groups
# ---

# Grouping by Age_Group and summing up the Total_Purchases
age_group_data = customer_data.groupby('Age_Group')['Total_Purchases'].sum()

# View output
age_group_data

In [None]:
# Set chart size
plt.figure(figsize=(12, 6))
plt.bar(age_group_data.index, age_group_data.values, color='skyblue')

# Set chart title
plt.title('Comparison of Total Purchases Across Age Groups')

# Set axes label
plt.xlabel('Age Group')
plt.ylabel('Total Purchases')

# Display plot
plt.xticks(rotation=25)
plt.show()

##### **Clustered bar chart**

This is useful when we need to show the values of multiple category groups.

In [None]:
# Question
# ---
# Create a clustered bar chart to compare the total purchases across different age groups and genders.
# ---

# Group by Age_Group and Gender, and calculate the sum of Total_Purchases
grouped_data = customer_data.groupby(['Age_Group', 'Gender'])['Total_Purchases'].sum().unstack()

# View output
grouped_data

In [None]:
# Set figure size
plt.figure(figsize=(12, 8))


# Plot clustered bars
grouped_data.plot(kind='bar',
                  width=.6, # Set the width of the bars
                  color=['lightpink', 'lightblue', 'lightgreen']) # Set the color of the chart

# Set labels and title
plt.title('Total Purchases by Age Group and Gender')
plt.xlabel('Age Group')
plt.ylabel('Total Purchases')

# Style x-ticks
plt.xticks(rotation=35)

# Display legend
plt.legend(title='Gender',
           loc='upper right') # Style the legend

# Display plot
plt.show()

#### <font color='green'> Challenge 3.5: Comparisons of items analysis

In [None]:
# Question
# ---
# Compare the monthly profits across different months using a bar chart.
# ---
# YOUR CODE GOES BELOW

# Set chart size
plt.figure(figsize=(10, 5))

# Plot bar chart here


# Set chart title
plt.title('Comparison: Monthly Profits Across Different Months')

# Label axes
plt.xlabel('Month')
plt.ylabel('Profit')

# Display legend
plt.legend()

# Display the bar chart
plt.show()

## 4. Format style

Formatting the styles of our charts is a key data visualization best practice as it ensures that our charts are easily understood, as well as remaining aesthetically appealing.

Within this category it includes the following techniques:
> 1. Color selection
> 2. Use of labels
> 3. Use of gridlines

### Examples

#### 4.1 Color selection

Color can be used to represent different categories in your data, or highlight a secondary metric.

In Python, using matplotlib or seaborn, we can select pre-defined colors/color palettes, as well as customize the colors that we use in the visualization by using the color hex codes.

##### **Using pre-defined colors in the `seaborn` and `matplotlib` libraries**

- [Here](https://matplotlib.org/stable/gallery/color/named_colors.html) is a link to the list of named colors available in the matplotlib library.

- [This](https://seaborn.pydata.org/tutorial/color_palettes.html) is the documentation for the color palettes available in `seaborn`.

In [None]:
# Question
# ---
# Create a bar chart to visualize the average total purchases by different age groups.
# Utilize a distinct color using hex color codes for each age group
# ---

# Grouping by Age_Group and calculating the average Total_Purchases
age_group_avg_purchases = customer_data.groupby('Age_Group')['Total_Purchases'].mean()

# View output
age_group_avg_purchases

In [None]:
# Setting up the color palette using matplotlib's named color list
colors = ['royalblue', 'limegreen', 'tomato', 'gold', 'violet', 'skyblue', 'lightcoral']

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.bar(age_group_avg_purchases.index, age_group_avg_purchases.values, color=colors)

# Set chart title and axes label
plt.title('Average Total Purchases by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Average Total Purchases')

# Style x-ticks
plt.xticks(rotation=30)

# Display chart
plt.show()

##### **Defining colors in the chart by using color hex codes**

Here is a helpful resource to select color hex codes --> [Color hex codes website](https://htmlcolorcodes.com/)

In [None]:
# Setting up the color palette using hex color codes
colors = ['#4169E1', '#32CD32', '#FF6347', '#FFD700', '#EE82EE', '#87CEEB', '#F08080']

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.bar(age_group_avg_purchases.index, age_group_avg_purchases.values, color=colors)

# Set chart title and axes label
plt.title('Average Total Purchases by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Average Total Purchases')

# Style x-ticks
plt.xticks(rotation=30)

# Display chart
plt.show()

#### <font color='green'> Challenge 4.1: Color selection

In [None]:
# Question
# ---
# Create a line chart to visualize the changes in profits over the year.
# Utilize distinct colors from matplotlib
# ---

# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart with defined color from matplotlib library


# Set chart title
plt.title('Color Selection: Monthly Profits Trend Over the Year')

# Label axes
plt.xlabel('Month')
plt.ylabel('Profit')

# Display legend
plt.legend()

# Show the plot
plt.show()

#### 4.2 Labelling

Labels in our charts can help to enhance the visual representation of your data, as they include the *actual* values from the data.

In [None]:
# Question
# ---
# Create a bar chart to visualize the average total purchases by different age groups.
# Include average total purchases as data labels on each bar.
# ---

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
bars = plt.bar(age_group_avg_purchases.index, age_group_avg_purchases.values, color='skyblue')

# Adding data labels
for bar in bars:
    yval = bar.get_height()
    # Add labels using plt.text and style the labels
    plt.text(bar.get_x() + bar.get_width()/2, yval, round(yval, 2), va='bottom', ha='center', color='black')

# Set chart title and axes label
plt.title('Average Total Purchases by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Average Total Purchases')

# Style x-ticks
plt.xticks(rotation=45)

# Display plot
plt.show()

#### <font color='green'> Challenge 4.2: Labelling

In [None]:
# Challenge 2: Use of Labels
# ---
# Create a line chart to visualize the changes in profits over the year.
# Ensure clear and concise labels
# ---
# YOUR CODE GOES BELOW

# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart with labels



# Set chart title
plt.title('Monthly Profits Trend Over the Year (Labels)')

# Label axes
plt.xlabel('Month')
plt.ylabel('Profit')

# Display legend
plt.legend()

# Show the plot
plt.show()

#### 4.3 Use of gridlines

Gridlines can help us to enhance the clarity and readability of our charts. In addition, they can be helpful when we need to compare between different thresholds in our dataset.

##### **Adding gridlines from the visualization**

In [None]:
# Question
# ---
# Create a bar chart with gridlines that displays the average total purchases by age group
# ---

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.bar(age_group_avg_purchases.index,
        age_group_avg_purchases.values,
        color='salmon') # Style chart

# Set chart title
plt.title('Average Total Purchases by Age Group')

# Set axes label
plt.xlabel('Age Group')
plt.ylabel('Average Total Purchases')
plt.xticks(rotation=45)

# Display grid
plt.grid(True)

# Display plot
plt.show()

##### **Removing gridlines from the visualization**

There are certain situations where using a gridline would add clutter to the chart.

In [None]:
# Question
# ---
# Create a bar chart without gridlines that displays the average total purchases by age group
# ---

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.bar(age_group_avg_purchases.index, age_group_avg_purchases.values, color='salmon')

# Set chart title
plt.title('Average Total Purchases by Age Group')

# Set axes label
plt.xlabel('Age Group')
plt.ylabel('Average Total Purchases')
plt.xticks(rotation=45)

# Remove grid
plt.grid(False)

# Display plot
plt.show()

#### <font color='green'> Challenge 4.3: Use of gridlines

In [None]:
# Challenge 3: Use of Gridlines

# Question
# ---
# Create a line chart to visualize the changes in profits over the year.
# Incorporate grid lines in the chart.
# ---

# YOUR CODE GOES BELOW
# ---

# Set chart size
plt.figure(figsize=(10, 5))

# Plot chart
plt.plot(df_profit['Month'], # Define x-axis
         df_profit['Profit'], # Define y-axis
         marker='o', # Style of markers
         color='mediumorchid', # Set color
         linestyle='-', # Style of line
         linewidth=2, # Set line width
         label='Monthly Profits') # Set label

# Set chart title
plt.title('Monthly Profits Trend Over the Year (Grid)')

# Label axes
plt.xlabel('Month')
plt.ylabel('Profit')

# Display legend
plt.legend()

# Add grid layout here



# Show the plot
plt.show()

## 5. Highlighting what's important

- Techniques like conditional formatting and reference lines make significant data points stand out, aiding comprehension.

- They are useful in complex datasets or reports to quickly convey key insights and trends, guiding decision-makers.

### Examples

#### 5.1 Conditional formatting

We use conditional formatting in data visualizations, involving different colors to highlight specific data points or trends, making it easier for us to identify and interpret important information.

In [None]:
# Question
# ---
# Plot a bar chart to show total purchases by age with a different color for high total purchases (> 13000).
# ---

# Total purchases by age
age_group_total_purchases = customer_data.groupby('Age')['Total_Purchases'].sum()

# View output
age_group_total_purchases.head()

In [None]:
# Set chart size
plt.figure(figsize=(12, 6))

# Define the colors based on the conditions in place
colors = ['crimson' if purchases > 13000
          else 'lightgrey' for purchases in age_group_total_purchases]

# Plot chart
plt.bar(age_group_total_purchases.index, age_group_total_purchases, color=colors)

# Set chart title
plt.title('Total Purchases by Age with Conditional Formatting')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Total Purchases')

# Display plot
plt.show()

#### <font color='green'> Challenge 5.1: Conditional formatting

In [None]:
# Question
# ---
# Create a bar chart using the DataFrame with monthly profit data.
# Apply conditional formatting to highlight months with profits exceeding $50,000 in a distinctive color.
# ---
# YOUR CODE GOES BELOW
# ---


# Set chart size
plt.figure(figsize=(10, 6))

# Apply conditional formatting to the chart



# Set axis labels and chart title
plt.title('Monthly Profit with Conditional Formatting')
plt.xlabel('Month')
plt.ylabel('Profit')

# Display plot
plt.show()

#### 5.2 Reference lines

We use reference lines to act as visual guides in a chart, providing a benchmark or reference point to better understand and compare data against specific values, such as averages or goals.

In [None]:
# Question
# ---
# Plot a bar chart to show total purchases by age with a reference line for the average total purchases.
# ---

# Calculate the average
avg_purchases = age_group_total_purchases.mean().round()

# View output
avg_purchases

In [None]:
# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart
plt.bar(age_group_total_purchases.index, age_group_total_purchases)

# Define the reference line
plt.axhline(y=avg_purchases, color='red', linestyle='--', label=f'Average: {avg_purchases}')

# Set chart title
plt.title('Total Purchases by Age with Reference Line')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Total Purchases')

# Display legend and chart
plt.legend()
plt.show()

#### <font color='green'>Challenge 5.2: Reference lines

In [None]:
# ---
# Qustion
# Create a bar chart using the DataFrame with monthly profit data.
# Add a horizontal reference line to indicate the average profit across the year.
# ---
# YOUR CODE GOES BELOW
# ---

# Create bar chart
plt.figure(figsize=(10, 6))
plt.bar(df_profit['Month'], df_profit['Profit'], color='skyblue')

# Add red dashed line at average profit



# Set chart title and label axes
plt.title('Monthly Profit with Average Line')
plt.xlabel('Month')
plt.ylabel('Profit')

# Display the plot
plt.show()

#### 5.3 Highlighting trends

Highlighting trends in our data visualizations involves emphasizing the overall patterns or directions in the data, making it simpler for our viewers to grasp the key insights and changes over time.

In [None]:
# Question
# ---
# Plot a line chart to highlight trends in total purchases by age.
# ---

# Set chart size
plt.figure(figsize=(12, 6))

# Plot chart with trend line
plt.plot(age_group_total_purchases.index,
         age_group_total_purchases,
         # Style chart
         color='purple',
         marker='s',
         linestyle='--',
         label='Trend Line')

# Set chart title
plt.title('Trend of Total Purchases by Age')

# Set axes label
plt.xlabel('Age')
plt.ylabel('Total Purchases')

# Display legend and plot
plt.legend()
plt.show()

#### <font color='green'> Challenge 5.3: Highlighting trends

In [None]:
# ---
# Question
# Using a line chart, visualize the profit trend for the second quarter (April, May, June)
# while displaying the rest of the year's data.
# Use different colors to show the trend
# ---
# YOUR CODE GOES BELOW

# Plotting the data
plt.figure(figsize=(12, 6))

# Plot line for the entire year
plt.plot(df_profit['Month'], df_profit['Profit'], color='lightcoral', marker='o')

# Highlighting Q2 (April, May, June)
q2_months = ['Apr', 'May', 'Jun']
q2_data = df_profit[df_profit['Month'].isin(q2_months)]

# Plot Q2 data in a different color


# Adding title and labels
plt.title('Profit Trend for Q2 Highlighted in a Line Chart')
plt.xlabel('Month')
plt.ylabel('Profit')

# Display the plot
plt.show()

#### 5.4 Projecting trends

Projecting trends extends our visualizations beyond existing data, offering a glimpse into potential future developments based on historical patterns, helping stakeholders anticipate and plan for the future.

In [None]:
# Question
# ---
# Create a scatter chart to visualize the actual data points and the projected trend of total purchases by age.
# ---

# Fit a linear trend line using numpy
z = np.polyfit(range(len(age_group_total_purchases)), age_group_total_purchases, 1)

# Extend the age range for projection
extended_ages = np.arange(min(age_group_total_purchases.index), max(age_group_total_purchases.index) + 10)

# View output
extended_ages

In [None]:
# Set chart size
plt.figure(figsize=(12, 6))

# Plot projected line chart
plt.plot(extended_ages, np.poly1d(z)(np.arange(len(extended_ages))), linestyle='--', color='orange', label='Projected Trend')

# Plot scatter chart
plt.scatter(age_group_total_purchases.index, age_group_total_purchases, color='blue', label='Actual Data')

# Set chart title and axes labels
plt.title('Projected Trend of Total Purchases by Age')
plt.xlabel('Age')
plt.ylabel('Total Purchases')

# Display legend and plot
plt.legend()
plt.show()