# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))

## Import the Python package for numerical arrays (numpy)

In [2]:
import numpy as np

## Define a function that creates some statistical data

In [3]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [4]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [5]:
print(type(positions))
print(type(heights))

<class 'list'>
<class 'list'>


Question: *How many items are inside the lists ```positions``` and ```heights```?

Hint: Use the Python function ```len```. 

In [10]:
l_pos = len(positions)
l_hei = len(heights)
print(str(l_pos))
print(str(l_hei))

100
100


## Convert to numpy arrays

In [11]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: *what is the data type of ```np_positions``` and ```np_heights```*?<br>
Question: *what is the shape of ```np_positions``` and ```np_heights```*?


Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [13]:
dt_pos = type(np_positions)
dt_hei = type(np_heights)
print(str(dt_pos) + str(dt_hei))
sh_pos = np.shape(np_positions)
sh_hei = np.shape(np_heights)
print(str(sh_pos) + str(sh_hei))


<class 'numpy.ndarray'><class 'numpy.ndarray'>
(100,)(100,)


## Extract the heights of the goalkeepers

In [14]:
gk_heights = np_heights[np_positions == 'GK']
gk_heights

array([183.47263768, 178.42797781, 183.01629002, 173.1846458 ,
       173.77069295, 179.775412  , 174.54302504, 181.24127678,
       182.55319811, 180.50971065, 177.81303038, 191.73143574,
       184.66506188, 177.88308022, 182.25384715, 184.03774232,
       176.47579242, 181.21059349, 190.58037237, 175.08028786,
       177.36270196, 180.70340874, 178.70552911])

## Print the median of the goalkeepers heights

In [15]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

Median height of goalkeepers: 180.5097106520186


Question: *what is the median height of all the field players*?<br>
Question: *what is the median height of all the attackers*?<br>
Question: *what is the median height of goalkeepers and the attackers combined?*

In [36]:
gk_hei = np_heights[np_positions == 'GK']
d_hei = np_heights[np_positions == 'D']
m_hei = np_heights[np_positions == 'M']
a_hei = np_heights[np_positions == 'A']
all_hei = np.concatenate((gk_hei, d_hei, m_hei, a_hei))
print("Median height of goalkeepers: " + str(np.median(all_hei)))

Median height of goalkeepers: 180.60655969361702


## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: *Who is the shortest player (which position)*?<br>
Question: *Who is the tallest player (which position)*?

In [43]:
min_gk_d_m_a = [min(gk_hei), min(d_hei), min(m_hei), min(a_hei)]
max_gk_d_m_a = [max(gk_hei), max(d_hei), max(m_hei), max(a_hei)]
print('the shortest goalkeeper, defender, midfielder and attacker are:' + str(min_gk_d_m_a) + ' cm, respectively')
print('the tallest goalkeeper, defender, midfielder and attacker are:' + str(max_gk_d_m_a) + ' cm, respectively')


the shortest goalkeeper, defender, midfielder and attacker are:[173.18464579632194, 172.5335623484006, 168.37224454795378, 169.07295430737992] cm, respectively
the tallest goalkeeper, defender, midfielder and attacker are:[191.73143574022149, 188.7194099194952, 188.0019262695711, 186.10213924356992] cm, respectively


## Data plotting

For plotting, we need the package matplotlib

In [44]:
import matplotlib.pyplot as plt

There are different display modes for matplotlib plots inside a jupyter notebook.

In [45]:
# For inline plots use
%matplotlib inline

In [46]:
# For inline plots with interactive capabilities use
%matplotlib notebook

Lets visualize the height distribution of the defenders

In [47]:
d_heights = np_heights[np_positions == 'D']

In [48]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

To figure out the tallest/shortest player, we can the max/min within each position

In [49]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [50]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

For inspiration on data plotting and more examples, check out the matplotlib gallery: [https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html)