# Import Necessary Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 1.0 Line Chart

### Preparation - Load a DataFrame

In [None]:
df = pd.read_csv('Car Sales.csv')
df

## 1.1 Make a simple line chart by one array
We can use matplotlib.pyplot.plot() function to make a line chart.<br>
Note: Moving forward, we will refer "matplotlib.pyplot" as "plt".

We can simply input an array (e.g. a list, a numpy.array, a pandas.Series, etc.) to create one.<br>
This array will work as the y-axis and the x-axis will be automatically labelled from 0 onwards.

In [None]:
y = df['Sales_in_thousands']

plt.plot(y)
plt.show()

## 1.2 Make a simple line chart by 2 arrays
We can also input 2 arrays to plt.plot().<br>
In this case, the 1st input will be used as the x-axis and the 2nd input will be used as the y-axis.

In [None]:
x = df['Engine_size']
y = df['Horsepower']

plt.plot(x, y)
plt.show()

We can observe a positive correlation between "Engine_size" and "Horsepower".<br>
But the chart is pretty messy.

We should sort the table using "Engine_size" (the x-axis) before we plot the line chart so the line does not jump around.<br>

In [None]:
sub_df = df[['Engine_size', 'Horsepower']].sort_values(by='Engine_size')
x = sub_df['Engine_size']
y = sub_df['Horsepower']

plt.plot(x, y)
plt.show()

Now it looks much better.

We can set the 3rd input as '--' so the line becomes a dash.

In [None]:
plt.plot(x, y, '--')
plt.show()

We can set marker='x' so the line will mark the data points by "x".

In [None]:
plt.plot(x, y, marker='x')
plt.show()

# 2.0 Scatter Plot
We can use plt.scatter() function to create a scatter plot.
We can just take it as a line chart without the line.

In [None]:
# Since there is no line, we do not need to sort the data
x = df['Engine_size']
y = df['Horsepower']

plt.scatter(x, y)
plt.show()

We can give an input to "s" argument to control the size of dots.

In [None]:
plt.scatter(x, y, s=5)
plt.show()

We can give an input to "color" argument to control the color of dots.<br>
The color array should have the same length as the "x" array and the "y" array.

In [None]:
color = df['Vehicle_type'].map({'Passenger': 'blue', 'Car': 'red'})

plt.scatter(x, y, s=5, color=color)
plt.show()

In this case, the color is assigned to each data point.

There is an alternative way to create the same plot by splitting data into 2 groups and assign one single color to each group.

In [None]:
sub_df1 = df[df['Vehicle_type'] == 'Passenger']
x1 = sub_df1['Engine_size']
y1 = sub_df1['Horsepower']

sub_df2 = df[df['Vehicle_type'] == 'Car']
x2 = sub_df2['Engine_size']
y2 = sub_df2['Horsepower']

plt.scatter(x1, y1, s=5, color='blue')
plt.scatter(x2, y2, s=5, color='red')
plt.show()

# 3.0 Pie Chart

Before we make a pie chart, we can use Series.value_counts() method to determine the number of each unique value.

In [None]:
series = df['Manufacturer'].value_counts()
series

Let's create a pie chart to see the porportion ot top 5 manufacturers.

We can use plt.pie() function to do that.<br>
It takes 2 inputs. The 1st one is the value of each pie, the 2nd one is the label.

In [None]:
values = series.iloc[:5]
labels = series.iloc[:5].index

plt.pie(values, labels=labels)
plt.show()

# 4.0 Bar Chart

We can use plt.bar() function to create a bar chart.

It works simiarly to a pie chart. But the 2 inputs are swapped.<br>
The 1st one is label and the 2nd one is the value.

In [None]:
series = df['Manufacturer'].value_counts()
values = series.iloc[:5]
labels = series.iloc[:5].index

plt.bar(labels, values)
plt.show()

We can use plt.barh() function to create a horizontal bar chart. It works the same as plt.bar()

In [None]:
plt.barh(labels, values)
plt.show()

# 5.0 Histogram
We can use plt.hist() function to create a histogram.<br>
It takes 1 input of a numeric array. We can also set the number of bins.

In [None]:
series = df['Sales_in_thousands']

plt.hist(series, bins=20)

# 6.0 Box Plot
We can use plt.boxplot() function to make a boxplot.<br>
It takes an array as the input.

In [None]:
series = df['Sales_in_thousands']

plt.boxplot(series)
plt.show()

We can set vert=0 to make a horizontal boxplot.

In [None]:
plt.boxplot(series, vert=0)
plt.show()

We can input a list of arrays.<br>
It will create multiple boxplots in parallel.

In [None]:
labels = ['Toyota', 'Ford', 'Dodge']
data = [df['Sales_in_thousands'][df['Manufacturer'] == label] for label in labels]

plt.boxplot(data, labels=labels)
plt.show()

# 7.0 Subplots

We can use plt.subplots() function to create a matrix of visualization.<br>
The matrix can be 1D or 2D.

Let's try 1D first.

In [None]:
fig, axs = plt.subplots(1, 3, figsize=(16, 4))

axs[0].scatter(df['Engine_size'], df['Horsepower'])
axs[1].hist(df['Sales_in_thousands'], bins=20)

labels = ['Toyota', 'Ford', 'Dodge']
data = [df['Sales_in_thousands'][df['Manufacturer'] == label] for label in labels]
axs[2].boxplot(data, labels=labels)

plt.show()

In plt.subplots() function, the first integer input is the number of rows and the second integer input is the number of columns.<br>
You can imagine the subplots as a "list" of plots. It is subscriptable by the index.

Now, we can try another way.

In [None]:
fig, axs = plt.subplots(3, 1, figsize=(12, 9))

axs[0].scatter(df['Engine_size'], df['Horsepower'])
axs[1].hist(df['Sales_in_thousands'], bins=20)

labels = ['Toyota', 'Ford', 'Dodge']
data = [df['Sales_in_thousands'][df['Manufacturer'] == label] for label in labels]
axs[2].boxplot(data, labels=labels)

plt.show()

It works just the same, except it is transposed.

Now, let's try 2D.

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(12, 9))

axs[0][0].scatter(x = df['Engine_size'], y = df['Horsepower'])
axs[0][1].hist(df['Sales_in_thousands'], bins=20)

series = df['Manufacturer'].value_counts()
values = series.iloc[:5]
labels = series.iloc[:5].index
axs[1][0].bar(labels, values)

labels = ['Toyota', 'Ford', 'Dodge']
data = [df['Sales_in_thousands'][df['Manufacturer'] == label] for label in labels]
axs[1][1].boxplot(data, labels=labels)

plt.show()

When we have a 2D visualization matrix, we need to input 2 indexes to set a plot.<br>
The first index is the row index and the second index is the column index.