# 1. Working with NumPy arrays

Before creating our first array, we need to import NumPy.
It is usual to use the alias `np` for NumPy, so we use this line.


In [1]:
import numpy as np

## Creating a NumPy array
NumPy provides several options to create arrays.
You can create an array full of zeros or full of ones.

In [2]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [3]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

If you don't specify the dtype explicitly, NumPy will use the default type `float64`.

In [4]:
np.ones((2, 3)).dtype

dtype('float64')

You can also create an array full of zeros or ones with a specific type.

In [5]:
np.zeros((1, 5), dtype=int)

array([[0, 0, 0, 0, 0]])

In [6]:
np.ones(3, dtype=np.uint8)

array([1, 1, 1], dtype=uint8)

In [7]:
# create an array with full of ones of size 4 x 3


NumPy also provides a function to create an array with a range of values. The first value is included (and is 0 if ommited) and the last value is excluded.

In [8]:
np.arange(0, 5)

array([0, 1, 2, 3, 4])

In [9]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [10]:
# create an array with values from 2 to 10


You can also create an array of random values, it is very useful for many machine learning algorithms.

In [11]:
np.random.rand(3, 2)

array([[0.65036971, 0.52909328],
       [0.87379868, 0.14542104],
       [0.10756469, 0.34163232]])

In [12]:
np.random.randint(0, 10, (2, 3))

array([[0, 6, 9],
       [6, 1, 3]])

The last option is to create an empty array: the memory is allocated to the size you asked, but the values are not initialized.
Remember that it is not easy to change the size of a NumPy array after it is created, so you should already know the size of the data you will be manipulating.
Most of the time, initializing your array with zeros if you don't know yet the values is working fine though.

In [13]:
np.empty((2, 3), dtype=int)

array([[4604033229077644034, 4602940868183708537, 4606045698389120462],
       [4594407365255368136, 4592415273141817280, 4599825920658358526]])

## Creating an array from a list
It is also possible to create a NumPy array from a Python list.
It does not break the rule of not mixing Python with NumPy if you do it only once at the beginning of your program.
All the data will be copied from the Python data structure to the memory allocated with NumPy.

In [14]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(type(my_list), my_list)
print(type(my_array), my_array)


<class 'list'> [1, 2, 3, 4, 5]
<class 'numpy.ndarray'> [1 2 3 4 5]


Remember that all the elements of the list must be of the same type.
If it is not the case, but the types are compatible, NumPy will convert all elements to the more permissible types (for example, integers will be converted to floats).

In [15]:
integer_list = [0, 1, 2, 3, 4, 5]
integer_array = np.array(integer_list)
print(integer_array)
print(integer_array.dtype)

[0 1 2 3 4 5]
int64


In [16]:
mixed_list = [0.5, 1, 2, 3, 4, 5]
float_array = np.array(mixed_list)
print(float_array)
print(float_array.dtype)

[0.5 1.  2.  3.  4.  5. ]
float64


If one of the elements is a string, all the elements will be converted to strings.
This will limit the operations you can do with the array.

In [17]:
mixed_list2 = [0.5, 1, 2, 3, 4, "5"]
str_array = np.array(mixed_list2)
print(str_array)
print(str_array.dtype)

['0.5' '1' '2' '3' '4' '5']
<U32


It's a good idea to check the `dtype` of your array to make sure it is what you expect.

You can also create a 2D array from a list of lists. In that case, all the nested lists must have the same length, else NumPy will raise an error.

In [18]:
nested_list = [[1, 2, 3], [4, 5, 6]]
array_2d = np.array(nested_list)
print(array_2d)
print(array_2d.dtype)

[[1 2 3]
 [4 5 6]]
int64


In [19]:
nested_list2 = [[1, 2, 3], [4, 5, 6], [7, 8]]
np.array(nested_list2)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

## Loading a file

You can load some data from disk thanks to various functions provided by NumPy.

For CSV files:

In [20]:
data = np.loadtxt(fname="../data/simple_array.csv", delimiter=",")

In [21]:
print(data)

[[ 0.5  1.5  2.5  3.5  4.5]
 [ 1.   2.   3.   4.   5. ]
 [11.  12.  13.  14.  15. ]
 [21.  22.  23.  24.  25. ]
 [31.  32.  33.  34.  35. ]
 [41.  42.  43.  44.  45. ]
 [51.  52.  53.  54.  55. ]
 [61.  62.  63.  64.  65. ]
 [71.  72.  73.  74.  75. ]
 [81.  82.  83.  84.  85. ]]


The function `np.loadtxt` takes two parameters: the path to the file you want to load, and the character that delimits values on a same line ("," for a csv). Both of these parameters are strings.

We can check the `dtype` of the data we loaded, as well as the shape (the dimensions) of the array.

In [22]:
print(data.dtype)

float64


In [23]:
print(data.shape)

(10, 5)


CSV format is great because it's human-readable, but sometimes you care more about having something efficient.

For exemple, you might want to save an intermediate result and load it again later when you work again on the project. NumPy provides its own binary file format to store efficiently arrays on disk.

In [24]:
np.save("../data/simple_array.npy", data)

You can load again the data with `np.load`, giving as parameter the path to the `npy` file.

In [25]:
data_reloaded = np.load("../data/simple_array.npy")

Let's check that the data is still the same after being reloaded from disk.

In [26]:
print(data_reloaded)

[[ 0.5  1.5  2.5  3.5  4.5]
 [ 1.   2.   3.   4.   5. ]
 [11.  12.  13.  14.  15. ]
 [21.  22.  23.  24.  25. ]
 [31.  32.  33.  34.  35. ]
 [41.  42.  43.  44.  45. ]
 [51.  52.  53.  54.  55. ]
 [61.  62.  63.  64.  65. ]
 [71.  72.  73.  74.  75. ]
 [81.  82.  83.  84.  85. ]]


In [27]:
print(data_reloaded.dtype)

float64


In [28]:
print(data_reloaded.shape)

(10, 5)


NumPy also provides options to save several arrays in the same file, and use compression if you are handling very large data, but we won't need that today.