# Matplotlib
Matplotlit is a basic library for displaying data, creating graphs. Probably not the best, but it is very often used.

There are other libraries that offer other charting functionality. Later on, we'll try out 
- plotly
- seaborn

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimage
import pandas as pd
import numpy as np

Let's read and prepare the data for imaging. This is a simple table that shows the length of employment and salary.

In [None]:
data = pd.read_csv ('..\\dataset\\salary_dataset.csv')

In [None]:
data.head()

We divide the data into two blocks of variables - input/explaining and target/explained.

In [None]:
inputs=data["YearsExperience"]
targets=data["Salary"]

In [None]:
inputs.head(3)

In [None]:
targets.head(3)

# Terms in matplotlib library

* Figure
    * The whole "canvas" (window/image).
    * May contain one or more graphs (Axes).
    * Created e.g. by plt.figure() or plt.subplots().

* Axes (ax)
    * The chart itself (the coordinate system where we draw the data).
    * Each Axes has its own axes (x, y, possibly z).
    * Created e.g. from fig.add_subplot() or from plt.subplots().

* Axis (x/y axis)
    * Object that controls the rendering of the axis, its extent, labels, tick marks, etc.
    * Each Axes has two (2D plot) or three (3D plot) axes.

* Legend 
    * Explains what each curve or marker in the chart means.
    * Added with ax.legend().
    * The label for the legend is specified in the draw method (label="...").    

* Tick (marks on axes)
    * Small commas and their labels on the X and Y axis.
    * We distinguish between major ticks - major (larger, with number/caption) and minor ticks - minor (smaller)
    * Can be set manually (ax.set_ticks([...]), ax.set_yticks([...])) or automatically.

* Grid
    * Helps to read the values in the graph.
    * Activates ax.grid(True).
    * Can be enabled separately for main and secondary markers.

* Marker (marker)
    * Symbol to highlight points in the graph.
    * Set by the marker parameter in plot().
    * Examples of markers:
        * "o" - circle
        * "s" - square
        * "^" - up triangle
        * "x" - cross

![obrazek.png](attachment:36a07ef3-81ff-40a3-8e2e-ec311a1d84cf.png) 

## Dot chart

In [None]:
fig1 = plt.figure()                # creating a chart
plt.scatter(inputs, targets)       # scatter - dot plot
plt.xlabel("years")                # axis description
plt.ylabel("salary")               # axis description
plt.show()                         # chart view

Use the numpy library to generate 50 random points.

In [None]:
x=np.random.randint(low=0, high=50, size=50)   
y=np.random.randint(0, 50, 50)

The chart can be displayed directly if there is no need to add decorations.

In [None]:
plt.scatter(x, y)

The colour and shape of the dots can be defined.

In [None]:
plt.plot(x, y, "rx")

## Line chart
From the random line chart

x coordinates are filled from 0 to 1

In [None]:
y=np.random.randint(0, 50, 50)
plt.plot(y)

Connecting random points.

In [None]:
plt.plot(x, y)

## Combination of multiple data in a chart
There can be multiple different data series in a chart. 

Data inputs lie almost in a straight line. Let's add an exact straight line to them. For this we will need a linear function.

In [None]:
def linear(x, a, b):
    """ Linear functions """
    return a*x + b

# setting the parameters of the linear function
a=740
b=1200

The linspace function creates an evenly split sequence of numbers between the specified start and end. We start with 0 and end with 11. In total, we create 10 numbers at regular intervals.

In [None]:
x=np.linspace(start=0, stop=11, num=10)  # creating n regular distances in the interval
print (x)

Now we will create a chart that will contain the points displayed by the scatter.

It will also contain a plot where the x coordinates of the points will be taken from linspace. y coordinates of the points will be calculated using a linear function.

In [None]:
fig2 = plt.figure()
plt.scatter(inputs, targets)               # scatter - plotting points
plt.plot(x, linear(x, a, b), color='red')  # plot - drawing of connected points
plt.ylim(-1, 10000)                        # y-axis constraints
plt.xlim(-1, 11)                           # x-axis constraints
plt.grid()                                 # grid display
plt.show()

## Colum chart
We can test to see if the pseudorandom number generator returns approximately evenly distributed numbers.

We will use the numpy library to generate the random data. We split the frequencies of the hotnots into 3 variables.

In [None]:
x=np.random.randint(0, 100, 10000)
A = np.sum(x < 33)
B = np.sum((x >= 33) & (x < 66))
C = np.sum(x >= 66)
print (A, B, C)

In [None]:
plt.bar (["less 33", "between 33 and 66", "grather 66"], [A, B ,C])

## Histogram
We can achieve a similar result using the historam function. It will display the frequencies in a defined number of intervals.

In [None]:
plt.hist(x, 10)

## Image
Sometimes it can be useful to display an image loaded from a file.

In [None]:
img = mpimage.imread ("..\dataset\signal.png")
plt.imshow(img)

The image does not have to be stored in a file, but can be represented by a matrix. 

The order of the axes depends on the use of
* width x height x channels - pixel matrix is better for matplotlib
* channels x width x height - on the other hand, it is suitable for artificial neural networks
* range of values 0 - 255, 0 - 1

In [None]:
img = np.random.rand(100, 100, 3)
plt.imshow(img)

## Matrix
Sometimes it is necessary to display matrices of numbers as a color scale. For example, to emphasize skewed values.

To do this you can use matshow. When you can choose color schemes (iridis, inferno, magma, ...). 

As a rule, a lighter color indicates a higher value.

In [None]:
matice = np.random.rand(30, 30)
plt.matshow(matice)

In [None]:
plt.matshow(matice, cmap="inferno")

## Pie
A favorite chart for managers and an unpopular one for statisticians is the pie chart.

In [None]:
plt.pie ([50, 40, 20], explode=[0, 0.1, 0], labels=["A", "B", "C"], autopct="%i%%", shadow=True)

## Edit chart

The size of the image can be adjusted using figsize.

Note that the image size is in inches.

In [None]:
y = np.random.randint(0, 50, 50)
plt.figure (figsize=(10, 5))
plt.plot(y)

We will prepare a function to calculate the moving average so that we can show different styles of line.

By uncommenting the lines, you can experiment.

In [None]:
def moving_average (a, n=3):
    ret = np.cumsum (a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n -1:] /n

In [None]:
AVG = moving_average(y, 5)
plt.figure (figsize=(10, 5))
plt.plot(y)
plt.plot(AVG, "r--")
#plt.plot(AVG, "r:")
#plt.plot(AVG, "g-.")
#plt.plot(AVG, "r-.+")
#plt.plot(AVG, "r-.*")
#plt.plot(AVG, "r-.o")

If we are making a chart for some work it is advisable to add o:
* label - name of the chart
* plot.label - description of data series
* legend - data series descriptions will be displayed in the legend
* xlabel, ylabel - description of axes
* grid - grid display
* text - you can insert your own text into the chart
* annotate - you can add an explanation to significant points (minimum)

There are many more possibilities to graphically modify the chart.

In [None]:
AVG = moving_average(y, 5)
plt.figure (figsize=(10, 5))

plt.title ("Random number")                      # chart name

plt.plot(y, label="random")                      # label - description of the series
plt.plot(AVG, "r-.o", label="moving average")

plt.legend()

plt.xlabel("days")                               # axis description
plt.ylabel("weight")                             # axis description

plt.grid(True)                                   # Grid

maxx=AVG.max()
miny=AVG.min()
minx=AVG.argmin()

plt.text (0, 0.9, f"max je {maxx:.1}")           # insertion of text
plt.annotate("min", xy=(minx, miny), xytext=(minx-2, miny-0.2), arrowprops={"color":"black"})    # insertion of the arrow

## Subplots
Sometimes it happens that you want to create multiple charts within one image.

In this case, you create graphs as subplots.
* In the following example, we will create a 10x10 inch image.
* The image will be divided into 2 graphs, which will lie below each other (2, 1)
* The axes array contains the individual subplots. For easier access, we can store the links to the individual graphs in the variables ax1, ax2
* ax1 is a simple scatter plot
* ax2 contains the data series y and the AVG moving average

In [None]:
fig = plt.figure (figsize=(10, 10))
axes = fig.subplots (2, 1)
ax1 = axes[0]
ax2 = axes[1]

ax1.scatter(inputs, targets)

ax2.plot(y)
ax2.plot(AVG, "r:")
ax2.legend(["foo", "bar"])
ax2.set_title("Random numbers")
ax2.set_xlabel="days"
ax2.set_ylabel="weight"
ax2.grid(True)

## 3D charts
If you have multidimensional data, it may be a good idea to display it in a 3D chart.

Matplot can do this as follows:

In [None]:
x1 = np.random.randint(0, 50, 50)
x2 = np.random.randint(0, 50, 50)
y = np.random.randint(0, 50, 50)

fig=plt.Figure()
ax = plt.axes(projection = '3d')
plt.scatter(x1, x2, y)
plt.show()

Some libraries, like plotly, allow you to make a 3D chart interactive.

In [None]:
import plotly.graph_objects as go 
fig = go.Figure(layout=go.Layout(width=1000, height=1000))
fig.add_trace(go.Scatter3d(
    x=x1, y=x2, z=y,
    mode='markers',
    marker=dict(size=5, color=y, colorscale='Viridis', opacity=0.8),
    name="Data"
))

fig.show()

## Boxplot
A boxplot is used to display the distribution of data - it shows the median, quartiles, range and outliers.

* line inside the box = median 
* box edges = 1st and 3rd quartiles (Q1 and Q3)
* "whiskers" = typically values in the interval [Q1 - 1.5-IQR, Q3 + 1.5-IQR]
* points outside the whiskers = outliers

In [None]:
import sqlite3
cur = sqlite3.connect ("..\\dataset\\database.db")
points = pd.read_sql_query ("SELECT * FROM points", cur)
points

In [None]:
points.plot(
    kind='box', 
    subplots=True, 
    sharey=False, 
    figsize=(15, 6)
)
plt.subplots_adjust(wspace=1) 
plt.show()