# Lecture 7 - plotting

Before we start plotting, let's get some basics about importing functions and lambda functions.

## Importing functions

You can import functions from existing libraries with `import` command. Just `import` will import the namespace so that you can use functions in `lib.function()` fashion.

You can import specific functions from libraries by `from lib import function` after which `function()` will be available in your environment.

Let's do word and character count by using the `Counter` function from `collections` library.

In [None]:
from collections import Counter

f=open("data/jane-austen-emma.txt")

char_count = Counter(f.read().lower())
f.close()

print(char_count)

In [None]:
f=open("data/jane-austen-emma.txt")

word_count = Counter(f.read().lower().split())
f.close()

list(word_count.items())[0:10]

<div class="alert alert-block alert-warning">
    <i class="fas fa-fw fa-exclamation-circle mr-3 align-self-center"></i>
    <b>Note:</b> The libraries such as <em>random, sys and time</em> are included in base Python installation. However, there are many Python libraries are available for specific purposes. Those libraries are needed to installed if you want to use them. The libraries used in this lecture were installed in this Binder session. <br> But, remember, in your local installation, you need to install the library that you plan to use.<br>
</div>

## Lambda function

Please refer to `05-str-dict-count` lecture for details.


## `Map`, `filter` and `reduce`

`lambda` is frequently used within `map`, `filter` and `reduce` operations. Please go over info at https://www.guru99.com/python-lambda-function.html

Please also refer to `05-str-dict-count` lecture for details.

# Plotting with matplotlib

Documentation of matplotlib for version 3.x is located [here](https://matplotlib.org/3.0.2/index.html)

Run the cell below to check the version of the matplotlib used in this notebook.

In [None]:
%load_ext watermark
%watermark  -p matplotlib

We alse need to run the following cell once so that the output of plotting function is displayed within notebook environment.

In [None]:
%matplotlib inline

Here's the anatomy of a figure (taken from matplotlib documentation). As you can see, a figure is composed of many components, thus we need access to almost all of components to generate a correct, aesthetic and reproducible figure.

![anatomy](https://matplotlib.org/_images/anatomy.png)

Let's start drawing a plot. We need to import the matplotlib and its pyplot module. This needs to be done once per session but you might notice it has been called many times. Also, for practical reasons, the pyplot object will be called `plt` and this is just a norm, you can name it anything you like.

> `Matplotlib` is the whole package; `matplotlib.pyplot` is a module in matplotlib


In [None]:
import matplotlib.pyplot as plt
plt.plot([1.6, 2.7])

Looks like `plot` function accepts a list (or array) and plots it. The list is accepted as y values and x values are automatically assigned 0 and 1.

But, we can provide two lists as `x` and `y` values.

In [None]:
import matplotlib.pyplot as plt
plt.plot([1.6, 2.7, 3.5],[10,5,15])

Notice that `plot` function is printing the object itself and notebook is showing the image. For clearer output let's use `show()` function.

> In other environments, you **need* to use `show` function to see the resulting plot.

In [None]:
plt.plot([1.6, 2.7],[10,5])
plt.show()

In [None]:
x = [1,2,3,4]
y = [10,20,25,27]
plt.plot(x,y)
plt.show()

As we mentioned earlier, a figure has many components and we have access to them. Let's change color and width of the line.

In [None]:
plt.plot(x, y, color='red', linewidth=5)
plt.show()

Now, let's overlay separate layers of drawings. As you can see the default type of drawing is line. Scatter plot takes (x,y) values and draws point at given coordinates.

In [None]:
plt.plot(x, y, color='red', linewidth=5)
plt.scatter([0.7,2.5,4], [5,25,30], c='green')
plt.show()

We can adjust color or type of the plot. Additionally, we can modify figure components such as `xlim`, `legend`, `title`, `xlabel` and `ylabel`.

In [None]:
plt.plot(x, y, color='red', linewidth=5, label="line")
plt.scatter([0.7,2.5,4], [5,25,30], c='green', label="dots")
plt.legend(loc="lower center", title="Legend", ncol=2)
plt.show()

In [None]:
plt.plot(x, y, color='red', linewidth=5, label="line")
plt.scatter([0.7,2.5,4], [5,25,30], c='green', label="dots")
plt.legend()
plt.xlim(0,6)
plt.show()

In [None]:
plt.plot(x, y, color='red', linewidth=5, label="line")
plt.scatter([0.7,2.5,4], [5,25,30], c='green', label="dots")
plt.legend()
plt.xlim(0,6)
plt.title("Sample Figure")
plt.xlabel("Measurement No")
plt.ylabel("Some units")
plt.show()

Many of the functions have numerous arguments to adjust. If you don't define any argument then defaults will be used. For example, we didn't provide any details to `legend()` function but it still worked. Let's see what we can change in `legend()` function from the [manual](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html) and adjust some parameters.

In [None]:
plt.plot(x, y, color='red', linewidth=5, label="red line")
plt.scatter([0.7,2.5,4], [5,25,30], c='green', label="green dots")
plt.legend(loc="lower center", ncol=2, fontsize='small')
plt.xlim(0,6)
plt.title("Sample Figure")
plt.xlabel("Measurement No")
plt.ylabel("Some units")
plt.show()

In [None]:
# modified from https://matplotlib.org/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py

#x_array = np.linspace(0, 2, 100)
#plt.plot(x_array, x_array, label='linear')
#plt.plot(x_array, x_array**2, label='quadratic')
#plt.plot(x_array, x_array**3, label='cubic')

x2=[x * 0.01 for x in range(0, 200)]

plt.plot(x2,x2, label='linear')
plt.plot(x2,[x**2 for x in x2], label='quadratic', lw=3)
plt.plot(x2,[x**3 for x in x2], label='cubic')

plt.xlabel('x label')
plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
plt.show()

## Subplots

In [None]:
plt.subplot(121)  # 1 row 2 columns and this is first plot
plt.plot(x, y, color='red', linewidth=5, label="line")
plt.title("Sample Figure1")
plt.subplot(122)  # following commands are applied to second plot
plt.scatter([0.7,2.5,4], [5,25,30], c='green', label="dots")
plt.legend()
plt.xlim(0,6)
plt.title("Sample Figure2")
plt.xlabel("Measurement No")
plt.ylabel("Some units")
plt.show()

In [None]:
import matplotlib.pyplot as plt

# Initialize the plot
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(131)
ax2 = fig.add_subplot(132)
ax3 = fig.add_subplot(133)

# Plot the data
ax1.bar([1,2,3],[3,4,5])
ax2.barh([0.5,1,2.5],[0,1,2])
ax2.axhline(0.45)
ax1.axvline(0.65)
ax3.scatter(x,y)
ax3.set_xlim(0,4)
ax3.yaxis.tick_right()
ax3.set_title('$x^2$ and $y$')

fig.suptitle("Title for whole figure")
plt.show()  # or plt.savefig('images/foo.png')

## subplot2grid

`subplot2grid()` is a helper function that is similar to `subplot()` but uses 0-based indexing and let subplot to **occupy multiple cells**.

In [None]:
plt.figure(figsize=(8,4))
ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=3)
ax2 = plt.subplot2grid((3, 3), (1, 0), colspan=2)
ax3 = plt.subplot2grid((3, 3), (1, 2), rowspan=2)
ax4 = plt.subplot2grid((3, 3), (2, 0))
ax5 = plt.subplot2grid((3, 3), (2, 1))
plt.show()

In [None]:
# modified from https://realpython.com/python-matplotlib-guide/
# from io import BytesIO
# import tarfile
# from urllib.request import urlopen
# url = 'http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz'
# b = BytesIO(urlopen(url).read())
# fpath = 'CaliforniaHousing/cal_housing.data'
# with tarfile.open(mode='r', fileobj=b) as archive:
#     housing = np.loadtxt(archive.extractfile(fpath), delimiter=',')
import numpy as np
housing = np.loadtxt("data/cal_housing.data", delimiter=',')
y = housing[:, -1]
pop, age = housing[:, [4, 7]].T

def add_titlebox(ax, text):
    ax.text(.55, .8, text,
        horizontalalignment='center',
        transform=ax.transAxes,
        bbox=dict(facecolor='white', alpha=0.6),
        fontsize=12.5)
    return ax

gridsize = (3, 2)
fig = plt.figure(figsize=(12, 8))
ax1 = plt.subplot2grid(gridsize, (0, 0), colspan=2, rowspan=2)
ax2 = plt.subplot2grid(gridsize, (2, 0))
ax3 = plt.subplot2grid(gridsize, (2, 1))

ax1.set_title('Home value as a function of home age & area population',
              fontsize=14)
sctr = ax1.scatter(x=age, y=pop, c=y, cmap='RdYlGn')
plt.colorbar(sctr, ax=ax1, format='$%d')
ax1.set_yscale('log')
ax2.hist(age, bins='auto')
ax3.hist(pop, bins='auto', log=True)

add_titlebox(ax2, 'Histogram: home age')
add_titlebox(ax3, 'Histogram: area population (log scl.)')
plt.show()

## Chaos game

Definition of the game: Assume we have 3 vertices A,B and C forming a triangular area. Pick a random point within triangle and then iterate over these steps:
* pick a random vertex
* move the point halfway between current position and selected vertex's location

If we do this many times what type of plot do you expect? Let's find out by the help of plot function.

In [None]:
import random
from IPython.display import clear_output
import matplotlib.pyplot as plt

def move_point(point,vertex):
    x=(point[0]+vertex[0])/2
    y=(point[1]+vertex[1])/2
    return (x,y)

# vertex={'A':(0.5,1), 'B':(0,0), 'C':(1,0)}
vertex=[(0.5,0.86),(0,0),(1,0)]
x=[]
y=[]

point=[0.5,0.5]
x.append(point[0])
y.append(point[1])

for i in range(10000):
    random_index = random.randrange(0, 3)
    point=move_point(point,vertex[random_index])
    x.append(point[0])
    y.append(point[1])
    
plt.figure(figsize=(12,12))
plt.scatter(x,y,s=1)
plt.scatter(*zip(*vertex), c='black')
plt.show()

## Plot word/char count


In [None]:
dict1 = {'a': 10, 'b': 2, 'c': 5, 'd': 4}
chars= list(dict1.keys())
counts = [dict1[x] for x in chars]

In [None]:
print(chars, counts)

In [None]:
import matplotlib.pyplot as plt
plt.plot(chars,counts)

In [None]:
plt.bar(chars,counts)

The expression below, `*zip(*dict1.items())` will be explained at the end of the lecture.

In [None]:
plt.bar(*zip(*dict1.items()))

In [None]:
f=open("data/jane-austen-emma.txt")
word_count = Counter(f.read().lower().split())
word_sorted = sorted(word_count.items(), key=lambda x: x[1], reverse=True)

In [None]:
plt.barh(*(zip(*word_sorted[:10])))

Let's go over Matplotlib [gallery](https://matplotlib.org/stable/gallery/index.html) and try anouther typo of plot.

Pie chart needs two separate lists, one contains the counts and another one for labels.

In [None]:
counts=[i[1] for i in word_sorted][:10]
labels=[i[0] for i in word_sorted][:10]

In [None]:
plt.pie(counts, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)
plt.show()

## Misc

There's a practical way to plot data if you have tuples of data points. The `*` operator is used for unpacking. Please search online for explanation and more examples. And `zip` is used to combine two (or more) list element-wise.


In [None]:
import matplotlib.pyplot as plt
data = [(1,510), (2, 100), (3, 25), (3.8, 2)]
plt.plot(*zip(*data))

Let's decipher the meaning of the expression `*zip(*data)`

In [None]:
print(*data)

In [None]:
list(zip(*data))

In [None]:
print(*zip(*data))

After the `zip` step, you can assign the x and y values to any variable and then use them.

In [None]:
m, n = zip(*data)
plt.plot(m,n)

Without unpack or zip..

In [None]:
x_values = [n[0] for n in data]
x_values

In [None]:
y_values = [ n[1] for n in data]
y_values

In [None]:
plt.plot(x_values,y_values)