# Introduction
Making plots and static or interactive visualizations is one of the most important tasks in data analysis. It may be a part of the exploratory process; for example, helping identify outliers, needed data transformations, or coming up with ideas for models.

Matplotlib is the most extensively used library of python for data visualization due to it's high flexibility and extensive functionality that it provides.

# 1. Setting up
Importing matplotlib
Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use standard shorthands for Matplotlib import:

import matplotlib.pyplot as plt

We import the pyplot interface of matplotlib with a shorthand of plt and we will be using it like this in the entire notebook.

Matplotlib for Jupyter notebook
You can directly use matplotlib with this notebook to create different visualizations in the notebook itself. In order to do that, the following command is used:

%matplotlib inline

In [None]:
# importing required libraries
import numpy as np
import pandas as pd

# importing matplotlib
import matplotlib.pyplot as plt

# display plots in the notebook itself
%matplotlib inline

# 2. Matplotlib basics
Make a simple plot
Let's create a basic plot to start working with!

In [None]:
height = [150,160,165,185]
weight = [70, 80, 90, 100]

# draw the plot
plt.plot(height, weight)

We pass two arrays as our input arguments to plot() method and invoke the required plot. Here note that the first array appears on the x-axis and second array appears on the y-axis of the plot.

Title, Labels, and Legends
Now that our first plot is ready, let us add the title, and name x-axis and y-axis using methods title(), xlabel() and ylabel() respectively.

In [None]:
# draw the plot
plt.plot(height,weight)
# add title
plt.title("Relationship between height and weight")
# label x axis
plt.xlabel("Height")
# label y axis
plt.ylabel("Weight")

We can explore multiple information on a single plot

In [None]:
calories_burnt = [65, 75, 95, 99]

# draw the plot for calories burnt
plt.plot(calories_burnt)
# draw the plot for weight
plt.plot(weight)

Adding legends is also simple in matplotlib, you can use the legend() which takes labels and loc as label names and location of legend in the figure as paremeters.
The attribute Loc in legend() is used to specify the location of the legend.Default value of loc is loc="best" (upper left).

The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes/figure.

In [None]:
# draw the plot for calories burnt
plt.plot(calories_burnt)
# draw the plot for weight
plt.plot(weight)

# add legend in the lower right part of the figure
plt.legend(labels=['Calories Burnt', 'Weight'], loc='best')

Notice that in the previous plot, we are not able to understand that each of these values belong to different persons.
Look at the X axis, can we add labels to show that each belong to different persons?
The labeled values on any axis is known as a tick.
You can use the xticks to change both the location of each tick and it's label. Let's see this in an example

In [None]:
# draw the plot
plt.plot(calories_burnt)
plt.plot(weight)

# add legend in the lower right part of the figure
plt.legend(labels=['Calories Burnt', 'Weight'], loc='lower right')

# set labels for each of these persons
plt.xticks(ticks=[0,1,2,3], labels=['p1', 'p2', 'p3', 'p4']);

Size, Colors, Markers and Line styles
You can also specify the size of the figure using method figure() and passing the values as a tuple of the length of rows and columns to the argument figsize.
The values of length are considered to be in inches.

In [None]:
# figure size in inches
plt.figure(figsize=(10,5))

# draw the plot
plt.plot(calories_burnt,marker='o',color='red')
plt.plot(weight,marker='*',color='black')

# add legend in the lower right part of the figure
plt.legend(labels=['Calories Burnt', 'Weight'], loc='lower right')

# set labels for each of these persons
plt.xticks(ticks=[0,1,2,3], labels=['p1', 'p2', 'p3', 'p4']);

In [None]:
# draw the plot
plt.plot(calories_burnt)
plt.plot(weight,  'go')

# add legend in the lower right part of the figure
plt.legend(labels=['Calories Burnt', 'Weight'], loc='lower right')

# set labels for each of these persons
plt.xticks(ticks=[0,1,2,3], labels=['p1', 'p2', 'p3', 'p4']);

In [None]:
# read the dataset
import pandas as pd
data_BM = pd.read_csv(r'C:\Users\neels\Downloads\bigmart_data (6).csv')
# drop the null values
data_BM = data_BM.dropna(how="any")
# view the top results
data_BM.head()

# Line Chart
We will create a line chart to denote the mean price per item. Let's have a look at the code.
With some datasets, you may want to understand changes in one variable as a function of time, or a similarly continuous variable.
In matplotlib, line chart is the default plot when using the plot().

In [None]:
price_by_item = data_BM.groupby('Item_Type').Item_MRP.mean()[:10]
price_by_item

In [None]:
# mean price based on item type
price_by_item = data_BM.groupby('Item_Type').Item_MRP.mean()[:10]

x = price_by_item.index.tolist()
y = price_by_item.values.tolist()

# set figure size
plt.figure(figsize=(14, 8))

# set title
plt.title('Mean price for each item type')

# set axis labels
plt.xlabel('Item Type')
plt.ylabel('Mean Price')

# set xticks 
plt.xticks(labels=x, ticks=np.arange(len(x)))

plt.plot(x, y)

# Bar Chart
Suppose we want to have a look at what is the mean sales for each outlet type?
A bar chart is another simple type of visualization that is used for categorical variables.
You can use plt.bar() instead of plt.plot() to create a bar chart.

In [None]:
data_BM['Outlet_Size'].unique()

In [None]:
# sales by outlet size
sales_by_outlet_size = data_BM.groupby('Outlet_Size').Item_Outlet_Sales.mean()

# sort by sales
sales_by_outlet_size.sort_values(inplace=True)

x = sales_by_outlet_size.index.tolist()
y = sales_by_outlet_size.values.tolist()

# set axis labels
plt.xlabel('Outlet Size')
plt.ylabel('Sales')

# set title
plt.title('Mean sales for each outlet type')

# set xticks 
plt.xticks(labels=x, ticks=np.arange(len(x)))

plt.bar(x, y, color=['red', 'orange', 'magenta'])

# Histogram
Distribution of Item price
Histograms are a very common type of plots when we are looking at data like height and weight, stock prices, waiting time for a customer, etc which are continuous in nature.
Histogram’s data is plotted within a range against its frequency.
Histograms are very commonly occurring graphs in probability and statistics and form the basis for various distributions like the normal -distribution, t-distribution, etc.

In [None]:
# title
plt.title('Item MRP (price) distribution')

# xlabel
plt.xlabel('Item_MRP')

# ylabel
plt.ylabel('Frequency')

# plot histogram
plt.hist(data_BM['Item_MRP'], bins=20, color='lightblue');

In [None]:
# title
plt.title('Item MRP (price) distribution')

# xlabel
plt.xlabel('Item_MRP')

# ylabel
plt.ylabel('Frequency')

# plot histogram
plt.hist(data_BM['Item_MRP'], bins=20, color='lightblue',alpha = 0.7,density=0,histtype='bar',ec='black');

# Box Plots
Distribution of sales
Box plot shows the three quartile values of the distribution along with extreme values.
The “whiskers” extend to points that lie within 1.5 IQRs of the lower and upper quartile, and then observations that fall outside this range are displayed independently.
This means that each value in the boxplot corresponds to an actual observation in the data.
Let's try to visualize the distributio of Item_Outlet_Sales of items.

In [None]:
data = data_BM[['Item_Outlet_Sales']]

# create outlier point shape
red_diamond = dict(markerfacecolor='r', marker='D')

# set title
plt.title('Item Sales distribution')

# make the boxplot
plt.boxplot(data.values, labels=['Item Sales'], flierprops=red_diamond);

In [None]:
data = data_BM[['Item_Outlet_Sales', 'Item_MRP']]

# create outlier point shape
red_square = dict(markerfacecolor='r', marker='s')

# generate subplots
fig, ax = plt.subplots()

# make the boxplot
plt.boxplot(data.values, labels=['Item Sales', 'Item MRP (price)'],vert=False,flierprops=red_square);

# Violin Plots
Density distribution of Item weights and Item price

In [None]:
data = data_BM[['Item_Weight', 'Item_MRP']]

# generate subplots
fig, ax = plt.subplots()

# add labels to x axis
plt.xticks(ticks=[1,2], labels=['Item Weight', 'Item MRP'])

# make the violinplot
plt.violinplot(data.values);

# Scatter Plots
Relative distribution of item weight and it's visibility
It depicts the distribution of two variables using a cloud of points, where each point represents an observation in the dataset.
This depiction allows the eye to infer a substantial amount of information about whether there is any meaningful relationship between them.

In [None]:
# set label of axes 
plt.xlabel('Item_Weight')
plt.ylabel('Item_Visibility')

# plot
plt.scatter(data_BM["Item_Weight"][:200], data_BM["Item_Visibility"][:200])