# 2. Arrays - Part 2

In [1]:
import numpy as np
np.random.seed(123)

## Array Indexing

For complete information  about indexing see
http://docs.scipy.org/doc/numpy/user/basics.indexing.html



## Matrix indexing

This is the regular indexing where we use one index for each dimension. When accessing more than one element, the slice syntax `m:n` can be used.

In [2]:
# indexing in a 3-dimensional array
z = np.arange(24).reshape((2, 3, 4))
print(z)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [3]:
# slices
print(z[0:2, 1:3, 3])
print(z[:, 2, :])

[[ 7 11]
 [19 23]]
[[ 8  9 10 11]
 [20 21 22 23]]


## Linear indexing 

Linear indexing transforms an n-dimensional array to a 1-dimensional list. This linear index is returned when the `argmin` and `argmax` function are applied to an n-dimensional array. 

To convert a linear index a matrix index, use the function `numpy.unravel_index()`, where the first argument is the linear index and the second argument is the shape of the array for which you want to transform the index.

In [4]:
# linear indexing
linear_index = 10
print("\n For a array with dimensions (2, 3, 4), the linear index: ", linear_index, " is equal to \
multidimensional index: ", np.unravel_index(linear_index, z.shape))


 For a array with dimensions (2, 3, 4), the linear index:  10  is equal to multidimensional index:  (0, 2, 2)


#### Exercise 2b.1

Create a $4\times3$ matrix of random numbers between $0$ and $1$. 
Find the row and column position of the minimum and the maximum value.

In [5]:
# 8<
A = np.random.uniform(0, 1, (4, 3))
print(A)

[[ 0.69646919  0.28613933  0.22685145]
 [ 0.55131477  0.71946897  0.42310646]
 [ 0.9807642   0.68482974  0.4809319 ]
 [ 0.39211752  0.34317802  0.72904971]]


In [6]:
largest = np.argmax(A)
smallest = np.argmin(A)
print("Largest value is at linear position {}".format(largest))
row, col = np.unravel_index(largest, A.shape)
print("Which is row {} column {}".format(row, col))


Largest value is at linear position 6
Which is row 2 column 0


#### Exercise 2b.2

Complete the following code to print years with the smallest number of hares, lynxes and carrots in the 
populations dataset.

In [7]:
population = np.loadtxt("population.txt")
for col in [1, 2, 3]:
    # 8<---------------------------
    i = population[:, col].argmin()
    year = population[i, 0]
    print("Least # of species {} in year {}".format(col, year))

Least # of species 1 in year 1917.0
Least # of species 2 in year 1900.0
Least # of species 3 in year 1916.0


### Boolean indexing

A boolean index can be created directly, but most often it is built by specifying a certain condition.

The condition will return a `True` or `False` for every position in the array and when the condition is `True` the corresponding element will be retrieved.

In [8]:
# Boolean indexing
x = np.arange(1, 6)
y = np.array([True, False, True, False, True ])
print("Only elements of x for which the value in y is True: ", x[y])

# boolean indexing by using a condition
print("Only elements of x for which the condition is True: ", x[x>3])

Only elements of x for which the value in y is True:  [1 3 5]
Only elements of x for which the condition is True:  [4 5]


#### Exercise 2b.3
Use the population data to

1. Select all the years in which there are more than 50000 lynxes;
2. Select all the years in which there are more lynxes than hares.

In [9]:
# 8<----------------------

# 1.

index = population[:, 2] > 50000
years = population[index, 0]
print(years)

[ 1904.  1915.]


In [10]:
#2.
index = population[:, 2] > population[:, 1]
years = population[index, 0]
print(years)

[ 1904.  1905.  1906.  1915.  1916.  1917.]


In [11]:
# 8<----------------------
# Solution with pandas
import pandas as pd
data = pd.read_csv("population.csv", sep='\t', index_col='year')
# 1.

print(data.loc[data['lynx'] > 50000])

         hare     lynx  carrot
year                          
1904  36300.0  59400.0   40600
1915  19500.0  51100.0   39000


In [12]:
# 8<----------------------
# Or to just print the years:
print(data.loc[data['lynx'] > 50000].index.values)

[1904 1915]


In [13]:
# 8<----------------------
# 2.
print(data.loc[data['lynx'] > data['hare']].index.values)

[1904 1905 1906 1915 1916 1917]


### Indexing with an array of indices

We specify a separate array storing indices as integers, we will retrieve exactly the elements of the array with these indices.

One advantage of this is that we can explicitly specify the order in which we want the values and we can return multiple times the value at a certain position. 

In [14]:
x = np.arange(100, 111)
y = np.array([8, 3, 8, 4, 9, 3])
print("Array x: ", x)
print("Array with indices: ", y)
print("Indexing with an array of indices will give:", x[y])

Array x:  [100 101 102 103 104 105 106 107 108 109 110]
Array with indices:  [8 3 8 4 9 3]
Indexing with an array of indices will give: [108 103 108 104 109 103]


#### Exercise 2b.4

Indexing with an array is often useful when we want to randomize the order of items in some data. Complete the following code which creates a scrambled version of the population data

In [15]:
population = np.loadtxt("population.txt")

# Create an index for the rows of population (from 0 to population.shape[0])
index = np.arange(0, population.shape[0])

# Shuffle the index
# 8<----------------
np.random.shuffle(index)
#print(index)
# Create a scrambled version
population_rand = population[index,:]
print(population[:, 0])
print(population_rand[:, 0])

[ 1900.  1901.  1902.  1903.  1904.  1905.  1906.  1907.  1908.  1909.
  1910.  1911.  1912.  1913.  1914.  1915.  1916.  1917.  1918.  1919.
  1920.]
[ 1909.  1908.  1920.  1912.  1906.  1905.  1917.  1910.  1911.  1913.
  1915.  1902.  1903.  1907.  1901.  1918.  1916.  1900.  1904.  1914.
  1919.]


## Vector concatenation and stacking

Sometimes we want to combine two or more vectors to create an array. 

We can concatenate all vectors in a list, or stack them into rows or columns:

In [16]:
x = np.arange(0, 5)                     
y = np.arange(5, 10)   
z = np.arange(10, 15)
print("Concatenate")
print( np.concatenate([x, y, z]))

print("Stack into rows")
print( np.stack([x, y, z], axis=0))

print("Stack into columns")
print( np.stack([x, y, z], axis=1))

Concatenate
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
Stack into rows
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
Stack into columns
[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]


In [17]:
np.stack([x,x,x], axis=0)

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

#### Exercise 2b.5

Load the population data into an array and use the stack create a re-arraged version of this data where the order of columns is as follows: year, carrot, hare, lynx.

In [19]:
# 8<------
population = np.loadtxt("population.txt")
population_out = np.stack([population[:, 0], population[:, 3], population[:, 1], population[:, 2]], axis=1)
print(population_out)

[[  1900.  48300.  30000.   4000.]
 [  1901.  48200.  47200.   6100.]
 [  1902.  41500.  70200.   9800.]
 [  1903.  38200.  77400.  35200.]
 [  1904.  40600.  36300.  59400.]
 [  1905.  39800.  20600.  41700.]
 [  1906.  38600.  18100.  19000.]
 [  1907.  42300.  21400.  13000.]
 [  1908.  44500.  22000.   8300.]
 [  1909.  42100.  25400.   9100.]
 [  1910.  46000.  27100.   7400.]
 [  1911.  46800.  40300.   8000.]
 [  1912.  43800.  57000.  12300.]
 [  1913.  40900.  76600.  19500.]
 [  1914.  39400.  52300.  45700.]
 [  1915.  39000.  19500.  51100.]
 [  1916.  36700.  11200.  29700.]
 [  1917.  41800.   7600.  15800.]
 [  1918.  43300.  14600.   9700.]
 [  1919.  41300.  16200.  10100.]
 [  1920.  47300.  24700.   8600.]]


### Save data set to file

To save an array from numpy as a separate file you specify the filename and the array you want to save. Use the following functions:
- `numpy.savetxt(filename, array)` : save an array to a text file. Some optional arguments are: delimiter=' ', newline = '\n', header = ' '. http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.savetxt.html#numpy.savetxt
- `numpy.save(filename, array)` : save an array to a binary file in numpy `.npy` format. http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.save.html#numpy.save


#### Exercise 2b.6 

Save the population data to a `.npy` file. Figure out how to load it back into a numpy array. Check if the data is unchanged.

In [20]:
population = np.loadtxt("population.txt")
# 8<-----
np.save("population.npy", population)
zzz = np.load("population.npy")
print(np.all(population == zzz))

True
