### 南京大学计算传播学系列课程
***
***
# 《计算传播学的编程基础》
***
***


王成军 

wangchengjun@nju.edu.cn

<img align="left" width = "500px" style="padding-right:10px;" src="figures/header2.png">


<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/PDSH-cover-small.png">
*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

<!--NAVIGATION-->
< [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >

# The Basics of NumPy Arrays

- Data manipulation in Python is nearly synonymous with NumPy array manipulation: 
- Newer tools like Pandas ([Chapter 3](03.00-Introduction-to-Pandas.ipynb)) are built around the NumPy array.



Using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.

- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

## NumPy Array Attributes

### Example

To define three random arrays:
- a one-dimensional
- two-dimensional
- three-dimensional array


In [1]:
import numpy as np
# We'll use NumPy's random number generator
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes:
- ``ndim`` (the number of dimensions)
- ``shape`` (the size of each dimension)
- ``size`` (the total size of the array)

In [5]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

('x3 ndim: ', 3)
('x3 shape:', (3, 4, 5))
('x3 size: ', 60)


``dtype``, the data type of the array

We discussed previously in [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)

In [3]:
print("dtype:", x3.dtype)

dtype: int64


- ``itemsize``, lists the size (in bytes) of each array element
- ``nbytes``, lists the total size (in bytes) of the array

In [7]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
print("x3 size: ", x3.size)

('itemsize:', 8, 'bytes')
('nbytes:', 480, 'bytes')
('x3 size: ', 60)


In general, we expect that ``nbytes`` is equal to ``itemsize`` times ``size``.

## Array Indexing: Accessing Single Elements

In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by **specifying the desired index** in square brackets, just as with Python lists

In [8]:
x1

array([5, 0, 3, 3, 7, 9])

In [9]:
x1[0]

5

In [10]:
x1[4]

7

To index from the end of the array, you can use negative indices:

In [8]:
x1[-1]

9

In [9]:
x1[-2]

7

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [10]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [11]:
x2[0, 0]

3

In [12]:
x2[2, 0]

1

In [13]:
x2[2, -1]

7

Values can also be modified using any of the above index notation:

In [14]:
x2[0, 0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

### Notice

- Unlike Python lists, NumPy arrays have a fixed type.

if you attempt to insert a floating-point value to an integer array, **the value will be silently truncated.** 

### Don't be caught unaware by this behavior!

In [15]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 0, 3, 3, 7, 9])

## Array Slicing: Accessing Subarrays

- Using square brackets to access **individual array elements**
- Using square brackets to access **subarrays** with the *slice* notation, marked by the colon (``:``) character.

To access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.


### One-dimensional subarrays

In [16]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [18]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [19]:
x[4:7]  # middle sub-array

array([4, 5, 6])

In [20]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [21]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

A potentially confusing case is when the ``step`` value is negative.
- In this case, the defaults for ``start`` and ``stop`` are swapped.
- This becomes a convenient way to reverse an array:

In [22]:
x[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [23]:
x[5::-2]  # reversed every other from index 5

array([5, 3, 1])

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.

In [11]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [12]:
x2[:2, :3]  # two rows, three columns

array([[3, 5, 2],
       [7, 6, 8]])

In [13]:
x2[:3, ::2]  # all rows, every other column

array([[3, 2],
       [7, 8],
       [1, 7]])

Finally, subarray dimensions can even be reversed together:

In [27]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

#### Accessing array rows and columns

- One commonly needed routine is accessing of single rows or columns of an array.
- This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [28]:
print(x2[:, 0])  # first column of x2

[12  7  1]


In [29]:
print(x2[0, :])  # first row of x2

[12  5  2  4]


In the case of row access, the empty slice can be omitted for a more compact syntax:

In [30]:
print(x2[0])  # equivalent to x2[0, :]

[12  5  2  4]


### Subarrays as no-copy views

- Array slices return *views* rather than *copies* of the array data.
- List slices will be copies.


In [31]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


Let's extract a $2 \times 2$ subarray from this:

In [32]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[12  5]
 [ 7  6]]


Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [33]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [34]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]



> When we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

To explicitly copy the data within an array or a subarray can be most easily done with the ``copy()`` method

In [35]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


If we now modify this subarray, the original array is not touched:

In [36]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [37]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


## Reshaping of Arrays

The most flexible way of doing this is with the ``reshape`` method.

### Example

if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [38]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


### Note
- The size of the initial array must match the size of the reshaped array. 
    - Where possible, the ``reshape`` method will use a no-copy view of the initial array
    - but with non-contiguous memory buffers this is not always the case.
- The conversion of a one-dimensional array into a two-dimensional row or column matrix.
    - This can be done with the ``reshape`` method
    - more easily done by making use of the ``newaxis`` keyword within a slice operation

In [14]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

array([[1, 2, 3]])

In [15]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [41]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [42]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

We will see this type of transformation often.

## Array Concatenation and Splitting

- to combine multiple arrays into one
- to conversely split a single array into multiple arrays

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.



In [17]:
# ``np.concatenate`` takes a tuple or list of arrays
# as its first argument, as we can see here:
    
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:

In [44]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


It can also be used for two-dimensional arrays:

In [45]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [46]:
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [47]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions
- ``np.vstack`` (vertical stack) 
- ``np.hstack`` (horizontal stack) 

In [48]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [49]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, ``np.dstack`` will stack arrays along the third axis.

### Splitting of arrays

The opposite of concatenation is splitting 
- Splitting is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  


In [18]:
# we can pass a list of indices giving the split points:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

(array([1, 2, 3]), array([99, 99]), array([3, 2, 1]))


### Notice 

- *N* split-points, leads to *N + 1* subarrays.
- The related functions ``np.hsplit`` and ``np.vsplit`` are similar

In [51]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [52]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [53]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, ``np.dsplit`` will split arrays along the third axis.

<!--NAVIGATION-->
< [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >