# Numpy and Pandas

Numpy and Panda are two increadibly powerful Python libraries. They are used for most tasks connected with manipulating data. In this notebook we will go into a lot of detail about what they can do. 

While you will need only basic understanding of Numpy and Panda for the rest of this course, you are very likely to encounter a lot of them if you get deeper into analysing data with Python.

In this notebook:

- Recap of last week + some extras: ** explode operator, multiple assignments, optional arguments
- Numpy: multidimensional arrays
- How random numbers are generated
- the whole new way of referencing items in Numpy (not my_list[0][3] but my_array[0,3])
- generating and reshaping items into dimensions
- stacking and splitting

## Recap

### Recap 1: range and creating stings

In [1]:
list('ABCDE')

['A', 'B', 'C', 'D', 'E']

In [2]:
[number*2 
 for number in [0,1,2,3,4,5]]

[0, 2, 4, 6, 8, 10]

In [3]:
[number 
 for number in range(0, 50, 10)]

[0, 10, 20, 30, 40]

In [4]:
# Using the 'fast string' formatting f" {variable} "
[f"{bottom} to {bottom+9}" 
 for bottom in range(0, 50, 10)]

['0 to 9', '10 to 19', '20 to 29', '30 to 39', '40 to 49']

In [6]:
# you can also use the " {} ".format(variable)
["{} to {}".format(bottom, bottom + 9)
 for bottom in range(0, 50, 10)]

['0 to 9', '10 to 19', '20 to 29', '30 to 39', '40 to 49']

In [7]:
# and it also supports variable names:
["{start} to {end}".format(start = bottom, end = bottom + 9)
 for bottom in range(0, 50, 10)]

['0 to 9', '10 to 19', '20 to 29', '30 to 39', '40 to 49']

In [8]:
# which is most useful when you want to repeat the same value or calculation
# note we'll pre-design our string in a variable first:
string_to_fill = "numbers from {start:>2} to {end:>2} could be: {start}-{end}, {start} to {end}"

[string_to_fill.format(start=bottom, end=bottom + 9)
 for bottom in range(0, 50, 10)]

# note :>2 means move it to the right, and give it 2 spaces. try to change it to :<10

['numbers from  0 to  9 could be: 0-9, 0 to 9',
 'numbers from 10 to 19 could be: 10-19, 10 to 19',
 'numbers from 20 to 29 could be: 20-29, 20 to 29',
 'numbers from 30 to 39 could be: 30-39, 30 to 39',
 'numbers from 40 to 49 could be: 40-49, 40 to 49']

Formating stirngs with .format() is especially useful when you split responsibilities amongs your team:

- one person writes the message (eg. copywriter),
- one person writes the logic (business analyst)
- yet another person will combine them together (programmer)

like in:

In [10]:
customer_message = "Dear {name}. Your gift card is expiring, you have £{money_left} and {days_left} days."

person = {'name':"Clara", 'money_left': 13.35, 'days_left': 3}

print(customer_message.format(name=person['name'], money_left=person['money_left'], days_left=person['days_left']))
# notice this does not feel very DRY. A solution to this problem is an ** explode operator explained below

Dear Clara. Your gift card is expiring, you have £13.35 and 3 days.


### New (but very rare): Explode operator **

Explode operator is a rare, but often very useful way to 'explode' a dictionary into a number of variables.

You can imagine that when you explode a dictionary:
`
person = {'name':"Clara", 'money_left': 13.35, 'days_left': 3}
`

Like this:

`
**person
`

It is interpreted sort of as 3 variables:

`
name = "Clara"
money_left =  13.35
days_left = 3
`

So you can use ** explode operator to feed dictionaries  .format() string formatting or functions

In [9]:
person = {'name':"Clara", 'money_left': 13.35, 'days_left': 3}

customer_message = "Dear {name}. Your gift card is expiring, you have £{money_left} and {days_left} days."
print(customer_message.format(**person)) # here split dict into many variables

Dear Clara. Your gift card is expiring, you have £13.35 and 3 days.


In [11]:
# it is quite common to explode a dict as we put it into a function
def customer_report(name, money_left, days_left):
    return f"{name} has {days_left} days left to spend £{money_left}"

person = {'name':"Clara", 'money_left': 13.35, 'days_left': 3}

print( customer_report(**person)  )

Clara has 3 days left to spend £13.35


This is most useful when you do not want to write everything by yourself, but also when the data is coming from outside (api, file) and you need flexibility.

### Multiple assignemnts at once

This is an interesting feature of python, having some resemblance to explode operator. But it is not common in other languages, and can create quite confusing code. So use it only if you know what you are doing:

In [12]:
# you can assign multiple variables at once
n1, n2, n3 = 4,5,6
print(n1, n2, n3)

4 5 6


In [13]:
n1, n2 = 3,7
added, subtracted, multiplied = n1+n2, n1-n2, n1*n2,
print(added, subtracted, multiplied)

# this is a cool feature, but watch out when you use it. 
# Shorter code is not necesserily better! More readable code is better!

10 -4 21


In [23]:
# unfortunately you cannot connect explode operator and multiple assignment\
#name, money_left, days_left = **person
name, money_left, days_left = person.values()
print(name, money_left, days_left)

Clara 13.35 3


### Named and Optional arguments in functions

In [24]:
# optional arguments:
range(9)
range(1,9)
range(1,9,3)

# named arguments:
def divide(first, second):
    return first / second

print( divide(2, 8)  )
print( divide(first = 2, second = 8)  )
print( divide(second = 8, first = 2)  ) # order does not matter anymore

0.25
0.25
0.25


In [None]:
def as_percent_string(number, decimal_places = 0, symbol = "%"):
    number *= 100
    return "{num:.{dec}f}{sym}".format(num = number, sym = symbol, dec = decimal_places)

print( as_percent_string(2/3) )
print( as_percent_string(2/3, decimal_places = 2) )
print( as_percent_string(2/3, symbol = " percent") )
print( as_percent_string(2/3, decimal_places = 1, symbol = " percent") )

In [25]:
# print has an optional argument 'end' which decides what character to put at the end.
# by default is equal to '/n' (meaning new line), but you can change it to anything
print("2")
print("4")
print("7")
print() # this prints a new line!

print("2",end="")
print("4",end="")
print("7",end="")
print()

print("2",end="\t")
print("4",end="\t")
print("7",end="\t")
print()

separator = "****"
print("2",end=separator)
print("4",end=separator)
print("7",end=separator)

2
4
7

247
2	4	7	
2****4****7****





# Numpy and Pandas

### What are Numpy and Pandas

Numpy and Pandas are two very useful libraries for dealing with data in Python. They work so closely together that often you might not know where one starts and the other ends. One basic distinction is:

- **Numpy** is for basic numerical operations (mean, range, etc), especially on multidimentional arrays (sort of like Lists of Lists of Lists...)

- **Pandas** is for advance data operations arranging, cleaning up, analysis, but also can be used for data input and a few other commond tasks in data analysis.

Let's look at some features that become easier with Numpy and Pandas

before we use them, as with all other libraries, we will need to import them. Becuase we will use their name frequently, it is frequent to import them and give them both shorter names:

```
import pandas as pd
import numpy as np
```

so that you can use theis short name like this:

```
np.zeros(10, dtype='int')
```

rather than having to type every time

```
numpy.zeros(10, dtype='int')
```

Note: it does not save you a lot of typing, but is done a lot. You can choose to not use 'as short_name' syntax, but be aware that other people might. 

## Arrays - new data type, like more powerful Lists

Numpy introduces a new data type called **Array**. It is basically like a List, but has s number of new very powerful methods and syntax that make data operations easier and faster.

Teoretically you could do everything that we do with Arrays by just using good old Lists, but it would take more time and be less compatible with other libraries.

To create an Array, you can cast a list into ```np.array( your_list )``` just like you could cast your list into a set ```set(your_list)``` or a decimal number into an whole number ```int(3.14)```

Notice the ```np.array``` at the begining - it means that you are using the ```array class``` from the ```np``` library. ```np``` is a short name for ```numpy``` (which you yourself gave it in ```import numpy as np```)

In [30]:
import numpy as np
import pandas as pd

print(np.zeros(3))

my_list = [3, 7, 5, 5]
print(my_list)
print(type(my_list))

[0. 0. 0.]
[3, 7, 5, 5]
<class 'list'>


In [31]:
# you can create an array by feeding in a List, as a variable or directly:
my_array = np.array(my_list)
print(my_array)
print(type(my_array))

my_array2 = np.array([3, 0, 5, 0])
print(my_array2)
print(type(my_array2))
# notice it is printed a bit differently than a list!

[3 7 5 5]
<class 'numpy.ndarray'>
[3 0 5 0]
<class 'numpy.ndarray'>


Arrays are often used in situations where there are many dimensions. So a grid of 

`
1,  2,  3,  4
11, 12, 13, 14
21, 22, 23, 24
`

Can be represented as:

In [32]:
scores = np.array([ [1, 2, 3, 4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
# do you see it? A list with 3 lists, each with 4 items inside!
print(scores)

# let's print what type of a thing it is:
print(type(scores))

[[ 1  2  3  4]
 [11 12 13 14]
 [21 22 23 24]]
<class 'numpy.ndarray'>


### Recap: Lists with many dimensions

Most of the time yoru data has many dimentions. 

- variable has **zero dimentions** - you do not need any more index/address to know what's in it
- list/array has **one dimention** - index is the way to address individual items in a list
- list of lists has **two dimentions** - index of the top list, and then an index of the inned list

Before, we sometimes used multi-dimentional lists:

In [34]:
# let's create a two-dimentional list:
list_of_lists = [[3,4,5,6,7], [30,40,50,60,70]] 

# print the big list
print(list_of_lists) 

# print first sub-list
print(list_of_lists[0]) 

 # print first number in first sub-list
print(list_of_lists[0][0])

[[3, 4, 5, 6, 7], [30, 40, 50, 60, 70]]
[3, 4, 5, 6, 7]
3


In [36]:
# remember indexes count from 0. So first element has index 0, second index 1, third index 2, etc
# go into the second array, then to its fourth element (which should have value 60), you would use
print(list_of_lists[1][3])

# negative numbers count from the end
# to get from the first array its last element (which should be 7), you would use
print(list_of_lists[0][-1])

60
7


### Numpy Arrays with many dimensions and new addressing style

In numpy and pandas we will most of the time deal with multidimentional data - with 2,3 or even more dimentions. The dimentions will have a meaning, just like in Excel rows and columns have meanings. We will talk about it soon.

**NEW ADDRESSING STYLE - my_array[3,6]**

- In **List** (pure Python way), to addressed a value by ```my_list[first_dimention][second_dimention]``` like ```stops[3][7]```
- In **Arrays** (Panda way) , we can also use        ```my_list[first_dimention, second_dimention]``` like ```stops[3, 7]```

From now on when you are getting numbers from arrays, you can pass in index of each next dimention separated by comas.

note: in Arrays you can still use the old format ```stops[3][7]```, but the new one ```stops[3, 7]``` is more common.

Let's look at some multi-dimentional arrays

In [37]:
scores = np.array([ [1, 2, 3, 4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])

# in the new indexing style, indexes for each next dimension are separated with a coma
# your_array[first_dimension, second_dimension, third_dimension, ....]

print(scores[0,0])
print(scores[0,1])
print(scores[1,2])
print(scores[1,-1])
print(scores[-1,-1])
# look at the printed output and see if you understand why these numebrs are printed.

1
2
13
14
24


### Minitask: Write your own first name with use of new indexing method:

In [39]:
# use the below keyboard variable to spell your name below
keyboard = np.array([ ['q','w','e','r','t','y','u','i','o','p'],
                      ['a','s','d','f','g','h','j','k','l',';'],
                      ['z','x','c','v','b','n','m','<','>','?']
                   ])
# For example to spell name 'Aoife' you would write:
print(keyboard[1,0], keyboard[0,8], keyboard[0,7], keyboard[1,3], keyboard[0,2])

a o i f e


In [40]:
# here write your first name, by referencing its letters from the keyboard variable above
print(keyboard[2, 4].capitalize() + keyboard[0, 2] + keyboard[2, 5])

Ben


Just like in Lists you can use the index not only to GET the value, but also to CHANGE the value:

In [41]:
scores = np.array([ [1,  2,   3,  4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])

# let's change some items
scores[0,0] = 10
scores[0,1] = 20
scores[1,3] += 30
scores[-1,-1] += 40
print(scores)

[[10 20  3  4]
 [11 12 13 44]
 [21 22 23 64]]


In [42]:
# Indeed you could ommit one of the dimentions, and replace everything in a row or column
# use colon : to indicate 'everything'
scores = np.array([ [1,  2,   3,  4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
 
scores[0,:] = 10 
print(scores)
# first dimention: value 0, second dimention: all values
# note you could also use simpler version scores[0] = 10

[[10 10 10 10]
 [11 12 13 14]
 [21 22 23 24]]


In [43]:
scores = np.array([ [1,  2,   3,  4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
 
scores[:,2] = 30  
print(scores)
# first dimention: all values, second dimention: value 2
# note but here you cannot simlify it to scores[,3] = 30

[[ 1  2 30  4]
 [11 12 30 14]
 [21 22 30 24]]


You can request some information about your arrays, just like you could request a len() from a List

In [44]:
# you can request some info about a multi-level Array:
scores = np.array([ [1, 2, 3, 4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])

print("dimentions", scores.ndim)
print("shape", scores.shape)
print("size ", scores.size)

dimentions 2
shape (3, 4)
size  12


### Creating Arrays full of values

You can specify the default value and type of your new empty array:

- full of zeros with .zeros()
- full of ones with .ones()
- full of some other value with .full(some_value)
- full of random values

In [45]:
np.zeros( 4 )

array([0., 0., 0., 0.])

By default, created values are floats (numbers with decimal places), but you can specify data type with 'dtype':

In [46]:
np.zeros(5, dtype='int')

array([0, 0, 0, 0, 0])

In [47]:
np.zeros(5, dtype='float')

array([0., 0., 0., 0., 0.])

You can also create multi-dimentional arrays with sizes of all dimentions in a tupple:

(10) - array of 10 elements

(5,10) - 5 sets of 10 elements

(3,5,10) - 3 sets of 5 sets of 10 elements

In [48]:
np.zeros((10), dtype=float)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [49]:
np.zeros((5,10), dtype=int)

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [50]:
np.zeros((3, 5,10), dtype=int)

array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])

You can also specify values other than zero with `.ones( dimensions )` or with `.full(dimensions, value)`

In [51]:
np.ones((2,5), dtype=int)

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [52]:
np.full((2,5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [53]:
# so as you can those two do the same thing:
print(np.ones((2,5), dtype=int))
print(np.full((2,5), 1))

[[1 1 1 1 1]
 [1 1 1 1 1]]
[[1 1 1 1 1]
 [1 1 1 1 1]]


You can also create values

- from a range with `arange(start, top, jump)`
- with even split between two values `linspace(start, end, slices)`
- indentity matrix with `eye(size)`
- full of random data with `np.random.randint(max_value, size=size_tupple)`

In [54]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [55]:
np.arange(5,15)

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [56]:
np.arange(5, 50, 10) # range has start, end, jump

array([ 5, 15, 25, 35, 45])

In [57]:
np.linspace(0, 1, 5) # from_value, to_value, how_many_slices

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [58]:
np.linspace(0, 100, 3) # from_value, to_value, how_many_slices

array([  0.,  50., 100.])

In [59]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [60]:
# There's also a way to repeat a pattern with .tile
# the number (or tupple) given in tile describes size/times of output matrix 
pattern = np.array([0, 1, 2])

np.tile(pattern, 2)
# repeat 2 times

array([0, 1, 2, 0, 1, 2])

In [61]:
np.tile(pattern, (3,2))
# repeat 2 times in one dimension, and 3 times in another dimension 

array([[0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2]])

In [62]:
np.tile(pattern, (4,3,2))
# and so on ...

array([[[0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2]],

       [[0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2]],

       [[0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2]],

       [[0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2]]])

In [63]:
# random numbers
print(np.random.randint(10, size=6))


[2 6 1 5 5 6]


## Extra computer-sciency bits about randomness. Feel free to skip it if not interested.

### This is really computer-sciency, but you might enjoy it: 

### Randomness in computers is pseudo-random (not really 'random')

This will be a bit Computer-Science-Heavy, but it's ok if you only understand some of it.

Mind you, this is a **huge simplification**, but basically for this course we will not worry about seeding our randomness, but be aware that your random numbers might repeat, and the seed is to blame.

In computers getting actual real randomness is very difficult, because everything requires a cause and effect. 

One of the ways to get a computer to create 'random' numbers is to pick some 'noise' and interpret it as numbers. For example you could pick some unocupied place in memory (sort of like a rubbish pile, full of variable leftovers of old files) and start reading it as if these were purposeful numbers. 

These data would make no sense, and not follow any obvious pattern, so would be in some way 'random'. But also they would be guessable (with a lot of effort) and if by any chance you were to start reading your rubbish data at the EXACT SAME POINT, you would end up with the EXACT SAME RANDOM NUMBERS... which would make them not a very useful randomness. Another problem could be that if you somehow dug up a very repetitive file (eg. an .mp3 music file with a lot of silence) you might end up with a lot of repeated numbers and some numbers missing completely (eg. if mp3 files use 0 to describe silence you would get disproportionately many 0 in your randomness).

On some level you could think of that first place from which you started reading your 'noise' as a SEED from which the random numbers will grow. One way to avoid the 'repetitive noise' like in the above example of mp3 file is what Python does: it uses some SEED as the address of  RANDOM_VALUE_1, and then instead of reading the next piece of noise it finds, it uses the value of RANDOM_VALUE_1 as an address in noise to find RANDOM_VALUE_2. And then if uses 'ranom' RANDOM_VALUE_2 as the address to find RANDOM_VALUE_3 and so on. 

It's a sif you had some 'noise' ```[3,6,5,8,7,3,1,9]``` and wanted random values with seed ```2```. 

- first random number would be value on the address/index of ```2``` ----> value ```5```
- next random number would be value on the address/index of ```5``` ----> value ```3```
- second next random number would be value on the address/index of ```3``` ----> value ```8```
- etc.

The basic problem is: **IF YOU START AT THE SAME SEED, YOU WILL END UP WITH THE SAME SEQUENCE OF RANDOM NUMBERS**

That's why best seeds are the ones which change all the time. A good example is time in miliseconds.



In [67]:
# generate a few random numbers in python:
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))

#every time you run this cell, your random numbers will be different. 
# Try it. go back to this cell and run it again

[9 3 8 8 6 2]
[3 8 4 0 8 8]


In [70]:
# but if you specify a seed, your random numbers will be the same every time you run this cell!
np.random.seed(0) # plant the seed 0 - this could be any number
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))

print()

np.random.seed(0)
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))

# WHY WOULD ANYONE DO THIS? Well... it's great for debugging! You could write tests!

[5 0 3 3 7 9]
[3 5 2 4 7 6]
[8 8 1 6 7 7]

[5 0 3 3 7 9]
[3 5 2 4 7 6]
[8 8 1 6 7 7]


In [73]:
# The best thing to do if you are concerned for the 'real randomness' of your numbers is
# to use time now in miliseconds as the seed. Time changes constantly and never repeats, so it's ideal.
import time
time_now = int(time.time()) # time in seconds since 1 Jan 1970 (computers treat this as 'the beginning of time' :D )
print(time_now)
np.random.seed(time_now) # plant the seed time_now this number changes every second

print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))

# Try it. go back to this cell and run it again. You should see different random numbers, since time changed

1716822642
[6 4 4 4 9 3]
[0 4 6 8 6 3]


Indeed this is done so much, that numpy has a built-in shortcut for it: just use ```np.random.seed()``` with no argument and you will seed current time. This way you never need to worry about randomness.

In [74]:
np.random.seed() # this will use current exact time (probably in microseconds) as a seed
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))
# Try it. go back to this cell and run it again.

[9 8 6 3 6 1]
[3 1 0 9 9 6]


How often should you do it? It does not really matter, but you could do it before generating some important random numbers.

### Creating Arrays full of Random values 

In [75]:
# ONE DIMENTION - with size 6
array1 = np.random.randint(100, size=6)
print(array1)
print(array1[0])

[80 75 26 94  3 64]
80


In [76]:
# TWO DIMENTIONS - with size 4 for first and 5 for the second
# these could be scores for four courses, each with 5 students
array2 = np.random.randint(100, size=(4,5))
print(array2)
print()

print(array2[3])
print(array2[3, 4])

[[34 28 59 64 28]
 [37 40 70 86  4]
 [72 22 12 20  0]
 [94 70 55 13 57]]

[94 70 55 13 57]
57


In [77]:
# THREE DIMENTIONS - with size 2 for first and 4 for the second and 5 for third
array3 = np.random.randint(100, size=(3,4,5))
print(array3)
print()

[[[83 25 48 96 39]
  [86 91 19 93 72]
  [85 47 35 45  2]
  [74 52 71 61 28]]

 [[91 26 21 81 36]
  [ 8 77 69 40 85]
  [34 14 74  6 39]
  [85 72 63 81 77]]

 [[90 83 75 88 92]
  [87 21 33 14 76]
  [60 46 87 95 99]
  [29  7  3 77 27]]]



In [78]:
# Indeed you can even specify a range of random numbers, great for 'making up' reasonable data
# these could be scores for two schools, each with four courses, with 5 students on each course
student_marks = np.random.randint(low = 45, high = 85, size=(3,4,5))
print(student_marks)
print()

[[[57 60 74 55 45]
  [64 55 52 54 81]
  [63 68 67 70 81]
  [58 71 76 54 53]]

 [[74 81 69 64 63]
  [82 59 79 48 63]
  [76 66 61 72 49]
  [57 57 65 52 72]]

 [[79 59 46 51 68]
  [70 68 68 59 53]
  [80 78 82 54 60]
  [57 81 63 77 54]]]



### Slicing/Subsets of Arrays - just like in Lists

In [79]:
# my_array[start:ceiling] if someting is not specified, it takes the extreme value
# my_array[:5] means from begining till 5th, my_array[5:] means from 6th till end
digits = np.arange(10,20)
print(digits)

print(digits[:5])
print(digits[3:])
print(digits[3:5])

[10 11 12 13 14 15 16 17 18 19]
[10 11 12 13 14]
[13 14 15 16 17 18 19]
[13 14]


### the new index-range syntax:    my_array[ start_index : ceiling_index : jump]

There is also new syntax (that also works for Lists, but you might have never seen it before)

In [81]:
# my_array[ start_index : ceiling_index : jump]
digits = np.arange(10,20)
print(digits)
print(digits[2:7:2]) # from index 2, till index 7, jumping every second item

[10 11 12 13 14 15 16 17 18 19]
[12 14 16]


In [82]:
print(digits[:7:2]) # from beginning, till index 7, jumping every 2

[10 12 14 16]


In [83]:
print(digits[2::2]) # from index 2, till end, jumping every 2

[12 14 16 18]


In [84]:
print(digits[::2]) # all, jumping every 2

[10 12 14 16 18]


In [85]:
# and to make it more interesting: when jump is negative, what will happen?
# the array gets reversed (draw it on a piece of paper to understand it better)
print(digits[::-1]) # all, but index counting down
print(digits[::-3]) # all, but index counting down every 3

[19 18 17 16 15 14 13 12 11 10]
[19 16 13 10]


### Reshaping - changing the dimensions of an Array

In [86]:
print( np.arange(12) ) 
print()

[ 0  1  2  3  4  5  6  7  8  9 10 11]



In [87]:
print( np.arange(12).reshape((2,6)))
print()

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]



In [88]:
print( np.arange(12).reshape((4,3)))
print()

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]



In [89]:
print( np.arange(12).reshape((3,2,2)))

[[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]]


In [90]:
# but what would this do? you can't split 12 numbers into 4 sets of 5 numbers!
print( np.arange(12).reshape((4,5)))

ValueError: cannot reshape array of size 12 into shape (4,5)

### Concatenating/Flattening Arrays and Lists - removing one dimension

```np.concatenate()``` takes one argument - a List/Array and will combine all of it's items into an Array. 

concatenate will remove one dimension:
 ```[[1,2,3], [4,5,6]]  ---> [1,2,3,4,5,6]```
 
You can think of concatenate as a **JOIN THE ARRAYS IN THIS ARRAY**

In [92]:
array1 = np.array([10,20,30])
array2 = np.array([40,50,60])
parent_array = [array1, array2]

print(parent_array)
print( np.concatenate(parent_array) )

[array([10, 20, 30]), array([40, 50, 60])]
[10 20 30 40 50 60]


In [93]:
list1 = [70, 80,90]
list2 = [1, 2, 3]

print( np.concatenate([list1, list2]) )

[70 80 90  1  2  3]


In [94]:
# you can concatenate lists and arrays together. They really are very simmilar
array1 = np.array([10,20,30])
array2 = np.array([40,50,60])
list1 = [70, 80,90]
print( np.concatenate([array1, array2, list1]) )

[10 20 30 40 50 60 70 80 90]


In [95]:
# concatenation respects dimentions, it will flatten only the top dimension
two_dimention_array1 = np.array([ [1,2,3],    [4,5,6] ])
two_dimention_array2 = np.array([ [10,20,30], [40,50,60] ])

print(two_dimention_array1)
print(two_dimention_array2)

[[1 2 3]
 [4 5 6]]
[[10 20 30]
 [40 50 60]]


In [96]:
print( np.concatenate([two_dimention_array1,two_dimention_array2]) )

[[ 1  2  3]
 [ 4  5  6]
 [10 20 30]
 [40 50 60]]


In [97]:
# this starts being spaghetti code, but you could flatten something twice and remove two dimensions
print( np.concatenate( np.concatenate( [two_dimention_array1,two_dimention_array2] ) ))

[ 1  2  3  4  5  6 10 20 30 40 50 60]


Concatenate has an extra argument axis ```np.concatenate([arr1,arr2],axis=1)``` which by default is 0

- axis=0 (the default) - flatten horisontally - remove one dimension from all items in list
- axis=1 - flatter vertically - combine all first items, then all second items, all third... etc


In [98]:
two_dimention_array1 = np.array([ [1,2,3], [4,5,6] ])
two_dimention_array2 = np.array([ [10,20,30], [40,50,60] ])

print( np.concatenate([two_dimention_array1,two_dimention_array2], axis=0) )

[[ 1  2  3]
 [ 4  5  6]
 [10 20 30]
 [40 50 60]]


In [99]:
two_dimention_array1 = np.array([ [1,2,3], [4,5,6] ])
two_dimention_array2 = np.array([ [10,20,30], [40,50,60] ])

print( np.concatenate([two_dimention_array1,two_dimention_array2], axis=1) )

[[ 1  2  3 10 20 30]
 [ 4  5  6 40 50 60]]


### Horisontal and Vertical Stack -  add Arrays to each other without losing dimensions 

In [100]:
my_array = np.array([-7,-8,-9])
my_array_2d = np.array([ [1,2,3], [4,5,6] ])

print(np.vstack([my_array, my_array_2d]))

[[-7 -8 -9]
 [ 1  2  3]
 [ 4  5  6]]


In [101]:
my_array_2d_1 = np.array([[-1,-2],[-3, -4]])
my_array_2d_2 = np.array([ [1,2,3], [4,5,6] ])

print(np.hstack([my_array_2d_1, my_array_2d_2]))

[[-1 -2  1  2  3]
 [-3 -4  4  5  6]]


### Split - split one Array into many Arrays using predefined indexes

In [102]:
digits = np.arange(1,10)
print(digits)
print()

[1 2 3 4 5 6 7 8 9]



In [103]:
three_sub_arrays = np.split(digits,[3,6])
print(three_sub_arrays)

[array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]


In [104]:
# you have not seen this syntaxt yet, it's typical to advanced Python. 
# You can specify many variables in one line, but assigning a List to them
a,b,c = [10,20,30]
print(a,b,c)

10 20 30


In [105]:
# so the split can be used as follows:
start, middle, end = np.split(digits,[3,6])
print(start, middle, end)

[1 2 3] [4 5 6] [7 8 9]


In [106]:
# Note: you could achieve the same effect with many lines of code with range()
# but that requires much more thinking and opportunities for bugs
first = np.arange(1,4)
second = np.arange(4,7)
third = np.arange(7,10)
print(first, second, third)

# but why do something the hard way if there is a proper syntax for it?

[1 2 3] [4 5 6] [7 8 9]


In [107]:
# putting it all together:
digits = np.arange(0,20).reshape(5,4)
print(digits)
print()

first, second, third = np.vsplit(digits,[2,3])

print(first)
print(second)
print(third)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]]


## ⭐️⭐️⭐️💥 What you learned in this session: Three stars and a wish 
**In your own words** write in your Learn diary:

- 3 things you yould like to remember from this badge
- 1 thing you wish to understand better in the future or a question you'd like to ask
