# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))

## Import the Python package for numerical arrays (numpy)

In [1]:
import numpy as np

## Define a function that creates some statistical data

In [2]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [3]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [4]:
print(type(positions))
print(type(heights))

<class 'list'>
<class 'list'>


Question: *How many items are inside the lists ```positions``` and ```heights```?

Hint: Use the Python function ```len```. 

In [6]:
print('Number of positions: ' + str(len(positions)))
print('Number of different heights: ' + str(len(heights)))

Number of positions: 100
Number of different heights: 100


## Convert to numpy arrays

In [7]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: *what is the data type of ```np_positions``` and ```np_heights```*?<br>
Question: *what is the shape of ```np_positions``` and ```np_heights```*?


Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [23]:
print('Data type of np_positions: ' + str(np_positions.dtype))
print('Data type of np_heights: ' + str(np_heights.dtype))
print('Shape of np_positions: ' + str(np_positions.shape))
print('Shape of np_heights: ' + str(np_heights.shape))


Data type of np_positions: <U2
Data type of np_heights: float64
Shape of np_positions: (100,)
Shape of np_heights: (100,)


## Extract the heights of the goalkeepers

In [40]:
gk_heights = np_heights[np_positions == 'GK']
m_heights = np_heights[np_positions == 'M']
a_heights = np_heights[np_positions == 'A']
gka_heights = np.concatenate((gk_heights,a_heights), axis=0)

[168.57684    170.82873593 171.70903277 171.73809856 172.67210918
 172.87986518 173.02446675 173.84775658 174.87998153 175.19435485
 175.30066857 175.39151565 175.92759819 175.9566279  176.46392181
 177.45977427 177.82266951 178.045429   178.30919244 178.40712565
 178.42859423 178.86723729 179.16513303 179.17557061 179.46397014
 180.26668264 180.48942314 180.55513208 180.59566007 180.66196606
 181.37588472 181.61529382 181.82276974 182.04848875 182.16257211
 182.3592883  182.3682854  182.48841494 182.6959254  182.87429854
 183.06843437 183.28101831 183.6349515  183.74620971 183.75693601
 183.84419615 184.04206264 184.67361801 184.84048211 184.96936975
 185.01709294 185.21393726 187.05501398 187.69194447 188.16230046
 189.31803879]


## Print the median of the goalkeepers heights

In [41]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))
print("Median height of midfielders: " + str(np.median(m_heights)))
print("Median height of attackers: " + str(np.median(a_heights)))
print("Median height of goalkeepers and attackers: " + str(np.median(gka_heights)))

Median height of goalkeepers: 180.62881306532574
Median height of midfielders: 179.27029121569814
Median height of attackers: 179.9766966372354
Median height of goalkeepers and attackers: 180.57539607739204
180.57539607739204


Question: *what is the median height of all the field players*?<br>
Question: *what is the median height of all the attackers*?<br>
Question: *what is the median height of goalkeepers and the attackers combined?*

## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: *Who is the shortest player (which position)*?<br>
Question: *Who is the tallest player (which position)*?

In [46]:
shortInd = np.argmin(np_heights)
print('The shortest player is a: ' + str(np_positions[shortInd]))
tallInd = np.argmax(np_heights)
print('The tallest player is a: ' + str(np_positions[tallInd]))

The shortest player is a: D
The tallest player is a: M


## Data plotting

For plotting, we need the package matplotlib

In [47]:
import matplotlib.pyplot as plt

There are different display modes for matplotlib plots inside a jupyter notebook.

In [48]:
# For inline plots use
%matplotlib inline

In [49]:
# For inline plots with interactive capabilities use
%matplotlib notebook

Lets visualize the height distribution of the defenders

In [50]:
d_heights = np_heights[np_positions == 'D']

In [51]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

To figure out the tallest/shortest player, we can the max/min within each position

In [52]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [53]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

For inspiration on data plotting and more examples, check out the matplotlib gallery: [https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html)