# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown,
- You can change the cell type in the toolbar,
- To execute a cell press \"Shift+Return\"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))


## Import the Python package for numerical arrays (numpy)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%matplotlib notebook

## Define a function that creates some statistical data

In [2]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [3]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [4]:
print(type(positions))
print(type(heights))

<class 'list'>
<class 'list'>


Question: *How many items are inside the lists positions and heights?

Hint: Use the Python function ```len```.

In [5]:
print('Number of items in Positions:  ', len(positions))
print('Number of items in heights:  ', len(heights))

Number of items in Positions:   100
Number of items in heights:   100


## Convert to numpy arrays

In [6]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: what is the data type of ``np_positions`` and ``np_heights``?

Question: what is the shape of ``np_positions`` and ``np_heights``?

Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [7]:
print('Type of Positions:  ', np_positions.dtype)
print('Shape of Positions:  ', np.shape(np_positions))
print('Type of Heights:  ', np_heights.dtype)
print('Shape of Heights:  ', np.shape(np_heights))

Type of Positions:   <U2
Shape of Positions:   (100,)
Type of Heights:   float64
Shape of Heights:   (100,)


## Extract the heights of the goalkeepers

In [8]:
gk_heights = np_heights[np_positions == 'GK']

## Print the median of the goalkeepers heights

In [9]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

Median height of goalkeepers: 179.88857719596228


Question: what is the median height of all the field players?

Question: what is the median height of all the attackers?

Question: what is the median height of goalkeepers and the attackers combined?

## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: Who is the shortest player (which position)?

Question: Who is the tallest player (which position)?

In [10]:
A_heights = np_heights[np_positions == 'A']
M_heights = np_heights[np_positions == 'M']
gkA_heights = np.concatenate((gk_heights, A_heights))


print("Median height of all the players: " + str(np.median(np_heights)))
print("Median height of attackers: " + str(np.median(A_heights)))
print("Median height of goalkeeprs and attackers combines: " + str(np.median(gkA_heights)))


Median height of all the players: 179.98908338558766
Median height of attackers: 181.81243259051135
Median height of goalkeeprs and attackers combines: 180.46173056443257


In [11]:
print("The tallest player is a " + str(np_positions [np_heights== np.max(np_heights)]), 'with ', str(np.max(np_heights)), 'cm')
print("The shortest player is a " + str(np_positions [np_heights== np.min(np_heights)]), 'with ', str(np.min(np_heights)), 'cm')

The tallest player is a ['D'] with  195.75674494860687 cm
The shortest player is a ['A'] with  171.2751376853617 cm


In [12]:
np_positions[2]

'GK'

In [13]:
d_heights = np_heights[np_positions == 'D']

## Data plotting

For plotting, we need the package matplotlib

In [14]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

In [15]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [16]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>