# CS682 Discussion Session 01: Slicing and Broadcasting in Python

## 1. Python List and Numpy Array (ndarray)

### 1.1 Difference between List and Numpy Array
- List is a collection of items. The items in a List can be numbers, strings, list, Numpy Array, etc.  
- Numpy Array is a grid of values, all of the same data type.

In [1]:
my_list = [1, '2', [3]]
my_list += [4.01]
print 'my_list:', my_list
print type(my_list)

l1 = [0, 1, 2, 3, 4]
s_l1 = [x**2 for x in l1 if x % 2 == 0]
print 's_l1:', s_l1

my_list: [1, '2', [3], 4.01]
<type 'list'>
s_l1: [0, 4, 16]


In [2]:
import numpy as np
my_arr = np.array(['hello', 1])
print 'my_arr:', my_arr
print type(my_arr)
print 'shape:', my_arr.shape
print 'dtype:', my_arr.dtype

my_arr: ['hello' '1']
<type 'numpy.ndarray'>
shape: (2,)
dtype: |S5


### 1.2 len, size, shape, indexing

In [3]:
l = [[0, 1, 2, 3], [4, 5, 6, 7]]
a = np.arange(8).reshape((2,4))
print l
print len(l)
print '----------'
print a
print len(a), a.size, a.shape

[[0, 1, 2, 3], [4, 5, 6, 7]]
2
----------
[[0 1 2 3]
 [4 5 6 7]]
2 8 (2, 4)


In [4]:
print a[1], type(a[1])
print a[1][2]
print a[1, 2]
print '----------'
print l[1][2]
print l[1, 2]

[4 5 6 7] <type 'numpy.ndarray'>
6
6
----------
6


TypeError: list indices must be integers, not tuple

### 1.3 Transfer between List and Numpy Array

- List to Numpy Array: a = np.array(l) or a = np.asarray(l)  
- Numpy Array to List: l = a.tolist()

In [None]:
l = [[0, 1, 2, 3], [4, 5, 6, 7]]
a = np.array(l)
a1 = np.asarray(l)
print 'a:', a, type(a)
print 'a1:', a1, type(a1)

print '--------'

l1 = list(a)
print 'l1:', type(l1), l1

l2 = a.tolist()
print 'l2:', type(l2), l2 

## 2. Slicing

### 2.1 Basic usage

In [5]:
a = np.arange(12).reshape((3,4))
print a
print '-------'

row1 = a[1, :]    
row2 = a[1:2, :]
row3 = a[1]
row4 = a[1:]
print 'row1\n', row1, row1.shape 
print 'row2\n', row2, row2.shape
print 'row3\n', row3, row3.shape
print 'row4\n', row4, row4.shape
print '-------'

col1 = a[:, -2:]
col2 = a[:, 0:-1:2]
print 'col1\n', col1, col1.shape 
print 'col2\n', col2, col2.shape
print '-------'

a1 = a[:]
a2 = a[:, :]
print a1.shape
print a2.shape

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
-------
row1
[4 5 6 7] (4,)
row2
[[4 5 6 7]] (1, 4)
row3
[4 5 6 7] (4,)
row4
[[ 4  5  6  7]
 [ 8  9 10 11]] (2, 4)
-------
col1
[[ 2  3]
 [ 6  7]
 [10 11]] (3, 2)
col2
[[ 0  2]
 [ 4  6]
 [ 8 10]] (3, 2)
-------
(3, 4)
(3, 4)


### 2.2 Change of dimensions

In [None]:
a = np.arange(24).reshape((2, 3, 4))
print a
print 'a', a.shape

a1 = a[1:2, 0:3, 1:3]
print 'a1', a1.shape

a2 = a[1, 0:3, 1:3]
print 'a2', a2.shape

a3 = a[0:2, -2, -1:]
print 'a3', a3.shape

### 2.3 Modify values

In [None]:
l = range(5)
print 'original: '
print l

print 'exp1:'
m = l
print m is l
print m == l
m[0] = -1
print l

print 'exp2:'
m = l[:]
print m is l
print m == l
m[1] = -2
print l

print 'exp3:'
l[2] = -3
print l

In [None]:
a = np.arange(5)
print 'original: '
print a

print 'exp0:'
a[3] = -10
print a

print 'exp1:'
b = a
print a is b
print a == b
b[1] = -1
print a

print 'exp2:'
b = a[:]
print a is b
b[2] = -2
print a

print 'exp3:'
b = a[:]
b = np.array([5,4,3,2,1])
print a

In [None]:
# Skip this cell

a = np.arange(12).reshape((3,4))
print 'original: '
print a

print 'exp1:'
a[1, 2] = -1
print a 

print 'exp2:'
b = a[1, 2]
b = -2
print a 

print 'exp3:'
b = a
b[1,2] = -3
print a

print 'exp4:'
b = a[:]
b[1,2] = -4
print a

print 'exp5:'
b = a[2, :]
b[1] = -5
print a

print 'exp6:'
b = a[2]
b[1] = -6
print a

### 2.4 Other indexing tricks
- Indexing with boolean array
- Indexing with integer list / array

In [None]:
a = np.arange(12).reshape((3,4))
print 'original: '
print a

print '\nexp1:'
idx = (a % 2 == 0)
print idx
print type(idx)
print a[idx]

print '\nexp2:'
idx = (a[0] < 3)
print idx
print a[1:, idx]
print a[1:, idx.tolist()]

print '\nexp3:'
idx = (a[0] < 3)
b = a[1:, idx]
b[0, 0] = -10
print b
print a

In [None]:
a = np.arange(12).reshape((3,4))
print 'original: '
print a

print '\nexp1:'
print a[1, [2,3]]  # [a[1,2], a[1,3]]
print a[1, np.array([2,3])]

print '\nexp2:'
print a[[0, 1, 2], [1, 2, 3]]  # [a[0,1], a[1,2], a[2,3]]
print '***'
print a[[[0, 1, 2], [1, 2, 3]]]
print '***'
idx = np.array([[0, 1, 2], [1, 2, 0]])  # [[a[0], a[1], a[2]], [a[1], a[2], a[0]]]
print a[idx]

## 3. Broadcasting
### 3.1 The basic idea
- Universal functions: functions that apply elementwise on arrays  
    Examples: np.add, np.power, np.greater, np.log, np.absolute  
- Universal functions that takes two input arrays:  
    - Simplest case: two input arrays have same shape  
    - Two inputs with different shapes? Broadcasting!  
        Replicate values to make their shapes match  
        Can avoid making redundant copies
        
**A simple example:**

In [None]:
a = np.arange(12).reshape((3,4))
b = 1.1
c = np.arange(4)

print a
print b
print c
print a * b
print (a * b) + c

### 3.2 The broadcasting rule
**Example:**  
Shape of A:   2 x 4 x 1 x 3  
Shape of B:   5 x 1 
Shape of A+B: 2 x 4 x 5 x 3

- If one array has smaller dimension, fill 1's at the beginning of its shape
    - B: 5 x 1 --> 1 x 1 x 5 x 1
- Start from the last dimension and work forward
- If one array has length 1 for the current dimension, replicate the values in that dimension
    - A: 2 x 4 x 1 x 3 --> 2 x 4 x 5 x 3  
    - B: 1 x 1 x 5 x 1 --> 2 x 4 x 5 x 3
- If either array has greater than 1 for a dimension, and two arrays don't match: report an error

In [None]:
A = np.arange(2*4*3).reshape((2,4,1,3))
B = np.arange(5).reshape((5,1)) * 0.1
C = A + B
print 'A\n', A
print '\nB\n', B
print '\nC', C.shape
print C

## 4. Advice
- Keep track of shapes of the variables:  
    - Write your expected shapes in the comments
    - Print out the actual shapes and see if it matches
- Make up small examples and test your code

## 5. Practice Question
100 students are divided into 5 teams (team 0,1,2,3,4). There are 3 courses. Each student has a grade (0 ~ 1) for each course.  
The criteria of an "honor student" is that: for every course, the student needs to get a grade higher than the average grade of all the students that are not in the same team with him / her.  
For example, when considering students from team 1, 2, 3 and 4, the average grade for the three courses are 0.8, 0.85, 0.9 respectively. An "honor student" from team 0 needs to get higher than 0.8, 0.85, 0.9 respectively for the three courses.

teams = np.random.choice(5, size=100)  
grades = np.random.rand(3, 100)

Find out the number of honor students in each team.

In [None]:
import numpy as np
teams = np.random.choice(5, size=100)
grades = np.random.rand(3, 100)

print teams.shape
print grades.shape

In [None]:
team_mask = np.arange(5).reshape((5,1)) != teams  # 5x100
print team_mask.shape
print teams[:8]
print team_mask[:, :8]

sum_grades = grades.dot(team_mask.T)  # 3x5
print sum_grades

In [None]:
count_students = np.sum(team_mask, axis=1)  # 5
print count_students

ave_grades = sum_grades / count_students  # 3x5
print ave_grades

In [None]:
require_grades = ave_grades[:, teams]  # 3x100
print require_grades.shape
print require_grades[:, :4]

In [None]:
is_honor = np.all(grades > require_grades, axis=0)  # 100
print is_honor.shape
print grades[:, :4]
print is_honor[:4]

In [None]:
team_honor = (np.arange(5).reshape((5,1)) == teams) * is_honor  # 5x100
print team_honor.shape
print team_honor[:, :4]

honor_count = np.sum(team_honor, axis=1)  # 5
print honor_count.shape
print honor_count