### Creating Arrays from Lists

Before we can use the NumPy library we need to import `numpy`.

We could import it this way:

In [1]:
import numpy

This means that we will have to use the `numpy` prefix in our code everywhere we need to use this library.

To save on typing, it is common practice to alias the library - `np` is basically a standard everyone uses:

In [2]:
import numpy as np

Creating NumPy arrays from Python lists is very simple - the `array` function in NumPy can take a list (or a tuple) and create a NumPy array out of it:

In [3]:
a1 = np.array([1, 2, 3, 4])
a2 = np.array((0.1, 0.2, 0.3, 0.4))

In [4]:
a1

array([1, 2, 3, 4])

In [5]:
a2

array([0.1, 0.2, 0.3, 0.4])

The data type of these arrays is `ndarray` (n-dimensional array):

In [6]:
type(a1)

numpy.ndarray

One of the properties of an `ndarray` is its data type (which we know is the data type of every element in the array).

We did not specify a data type for the elements of the arrays we just created - NumPy picked some default.

We can see that data type by using the `dtype` property:

In [7]:
a1.dtype

dtype('int64')

In [8]:
a2.dtype

dtype('float64')

As you can see NumPy picked `int64` and `float64` for our two arrays.

We can actually specify the data type of the elements if we want to - maybe in cases where we know we don't need a full 64-bit integer, or a 64-bit float.

We have to use the NumPy data types (which remember are basically the underlying C data types):

- `np.int8` / `np.uint8`
- `np.int16` / `np.uint16`
- `np.int32` / `np.uint32`
- `np.float32`
- `np.float64`
- etc

In the case of this list, `[1, 2, 3, 4]` we could actually get away with unsigned 8-bit integers:

In [9]:
a = np.array([1, 2, 3, 4], dtype=np.uint8)

In [10]:
a.dtype

dtype('uint8')

And now our element data type will be unsigned 8-bit integers, which means we are bounded by the range `[0, 255]`.

Now we have to be a bit careful here - what happens if we use a number outside of that range when we create the array?

In [11]:
a = np.array([1, 2, 3, 300], dtype=np.uint8)

OverflowError: Python integer 300 out of bounds for uint8

In [16]:
np.array([127], dtype=np.int8)

array([127], dtype=int8)

So as you can see, these wrap as well.

So be careful and make sure you do not use a type that is too restrictive if you opt to specify the type explicitly.

So, why not always just let NumPy specify the type for us (which was `int64` and `float64`)?

Storage efficiency - 64 bits of memory required vs 8 bits. When you have just a few numbers, the difference is trivial - but when you start working with very large datasets, it can make a difference.

Remember what we saw earlier - NumPy arrays are homogeneous - i.e. all the elements must be of the same type, unlike Python lists.

In [19]:
a = np.array([1, 2, 3.14])

In [20]:
a

array([1.  , 2.  , 3.14])

In [21]:
a.dtype

dtype('float64')

As you can see, NumPy optted for a float type, since one of the numbers in the list was a float.

We can of course, override this by specifying the type we want, but possibly resulting in some data loss:

In [22]:
np.array([1, 2, 9.9, 9.1], dtype=np.int64)

array([1, 2, 9, 9])

In [23]:
a = np.array([1, 3.14, 'x'])

In [24]:
a

array(['1', '3.14', 'x'], dtype='<U32')

In [25]:
a = np.array([1, 3.14, 'x'], dtype=np.int64)

ValueError: invalid literal for int() with base 10: 'x'

As you can see, the floats were **truncated**.

#### Multi-Dimensional Arrays

We saw multi dimensional lists in Python before - basically lists that contain other lists.

This nesting can occur to any depth, but in this course we'll stick to 2-dimensional arrays (matrices). Higher dimensions work the same way, but are harder to comprehend.

Let's see a list based version first:

In [26]:
m_py = [
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
]

In [27]:
m_py

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]

This is a 2-dimensional list - the outer list is a list of elements, each of which is a list.

We can think of this in terms of rows and columns - in this case we have three rows, and three columns.

We could create a ragged list (where not each row contains the same number of elements), but this won't really work with NumPy arrays where each row has the same number of columns.

We can transform this Python 2-dimensional list into a 2-dimensional array in the same way as before:

In [28]:
m1 = np.array(m_py, dtype=np.int16)

In [29]:
m1

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int16)

In [30]:
m1.dtype

dtype('int16')

In [31]:
len(m_py)

3

Again, the entire array is homogeneous - so all elements in this case are 16-bit signed integer.

#### Array Properties

These `ndarray` objects have certain properties - we already saw the `dtype` property.

There is also a property to get the total number of elements in the array:

In [32]:
a = np.array([1, 2, 3])
m2 = np.array(
    [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
        [10, 11, 12]
    ]
)

In [33]:
a

array([1, 2, 3])

In [34]:
m1

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int16)

In [35]:
m2

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [36]:
a.size

3

In [37]:
m1.size

9

In [38]:
m2.size

12

When we look at arrays, they have a certain shape:
- `a`: 1 row, 3 columns (can think of it as a row vector)
- `m1`: 3 rows, 3 columns
- `m2`: 4 rows, 3 columns

This is the information that the `shape` property of an `ndarray` can tell us:

In [39]:
a.shape

(3,)

In [40]:
m1.shape

(3, 3)

In [41]:
m2.shape

(4, 3)

As we can see, `shape` returns a tuple containing as many dimensions as the array has, and tells us the size fo each dimension.

`a` is a 1-D array, so only 1 element was returned in the shape tuple (and the number would be the number of elements in that dimension), whereas `m1` and `m2` were 2-D arrays, so two numbers in the shape tuple.

In 2-D arrays, we consider the first dimension to be rows, and the second to be columns - so for a 2-D array, the shape tuple essentially returns `(# rows, #columns)`