# Video: Creating a NumPy Array from Python Sequences

This video gives examples of creating NumPy arrays from Python sequences.
This is the simplest way to take advantage of NumPy efficiency in an existing Python program that already has data.
We will look at what data types and shapes NumPy picks based on different inputs, and discuss the implied reasons.


Script:
* Let's make some NumPy arrays now, and ask NumPy to make all the decisions about data types and shapes.
* We will do this by making our own sequences of data in Python, then passing them to NumPy functions to convert into arrays.
* We will start with the function numpy.array and give it a list of integers to start.

In [None]:
import numpy as np

In [None]:
my_data_array = np.array([1, 2, 3])
my_data_array

array([1, 2, 3])

Script:
* We called NP array with a list of three integers, and that looks like an array of three integers.
* Let's double check the data type.

In [None]:
my_data_array.dtype

dtype('int64')

Script:
* And the integer type is confirmed.
* NumPy picked the signed 64 bit integer type.
* Let's check the shape now.

In [None]:
my_data_array.shape

(3,)

Script:
* The shape has one size, so this array is 1-dimensional, and has length 3 along that dimension.
* So we should be able to index it with indexes 0, 1, and 2 to look at each entry individually.

In [None]:
my_data_array[0]

1

In [None]:
my_data_array[1]

2

In [None]:
my_data_array[2]

3

Script:
* We can also index backwards, like Python lists and tuples allow.

In [None]:
my_data_array[-1]

3

In [None]:
my_data_array[-2]

2

In [None]:
my_data_array[-3]

1

Script:
* That's all the indexing choices with this small array.
* Let's try some different data now.
* What needs to change to make a floating point array?

In [None]:
my_data_array = np.array([1.0, 2.0, 3.0])
my_data_array

array([1., 2., 3.])

Script:
* I changed the list values to floating point numbers, and printing the array visibly changed to show decimal points.
* Let's check the type now.

In [None]:
my_data_array.dtype

dtype('float64')

Script:
* As expected, the data type changed to floating point.
* For both examples, the data type choice had 64 bits.
* That's probably a safe choice for the NumPy default, since the 64 bit calculations will be no worse than using smaller data types.
* The 64 bit versions will just take more memory than the smaller versions.
* They will let us pick whether we want to use smaller, less expressive types, instead of guessing we will want one and surprising us with a calculation error.
* Let's look at one more example of how NumPy is picking types before moving on.
* This time, I will be tricky and pass mostly integers and hide the floating point number at the end.
* Do you think NumPy will notice and pick float64?
* Or will NumPy be lazy and just go with int64 after just checking the first number?


In [None]:
my_data_array = np.array([1, 2, 3.0])
my_data_array

array([1., 2., 3.])

Script:
* Those look like floats, that is floating point numbers.

In [None]:
my_data_array.dtype

dtype('float64')

Script:
* Let's move on to more interesting examples and try different shapes.
* I will try a list of three entries, each of which is a list of two numbers.
* What shape do you think this will be? 3 by 2? Or 2 by 3?

In [None]:
my_data_array = np.array([[1, 2], [3, 4], [5, 6]])
my_data_array

array([[1, 2],
       [3, 4],
       [5, 6]])

Script:
* Visually, that looks like 3 rows and 2 columns.
* Let's confirm that shape.

In [None]:
my_data_array.shape

(3, 2)

Script:
* Three by two is confirmed.
* One way you can think of this is that you can index the NumPy version the same way as the input sequence.
* Let's look at that input list again.

In [None]:
 my_data_list = [[1, 2], [3, 4], [5, 6]]
 my_data_list

[[1, 2], [3, 4], [5, 6]]

Script:
* We can index that input list with 0, 1, or 2 to get one of the inner lists.

In [None]:
my_data_list[0]

[1, 2]

In [None]:
my_data_list[1]

[3, 4]

In [None]:
my_data_list[2]

[5, 6]

Script:
* We can do the same with the NumPy array.
* NumPy has this neat feature where you can index at a time, and it will select that part of the array.
* For a 2-dimensional array, indexing with just one number will select you a 1-dimensional sub array.
* We will talk about how that works next week.

In [None]:
my_data_array[0]

array([1, 2])

In [None]:
my_data_array[1]

array([3, 4])

In [None]:
my_data_array[2]

array([5, 6])

Script:
* So that sure looks like the first axis should have length 3.
* Returning to the question of how NumPy picks the shape from a sequence, the outside length determines the length first axis, the next layer inside picks the length of the second axis, and so on.
* What if we do something weird and the inside lengths do not match?


In [None]:
np.array([[1, 2], [3]])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Script:
* See NumPy chastizing me?
* Wrapping up, numpy dot array will take in a sequence and try to turn it into a new NumPy array.
* If all the values are integers, it will have data type int64.
* If any floating point values are found, it will have data type float64.
* The next function to convert Python data is numpy dot asarray.
* I've seen a lot more using this function than numpy dot array.
* What does it do differently?

In [None]:
my_data_array = np.asarray([[1, 2], [3, 4], [5, 6]])
my_data_array

array([[1, 2],
       [3, 4],
       [5, 6]])

Script:
* That looks the same.
* There is a subtle hint in the name.
* numpy dot array and numpy dot asarray behave differently when you pass in a numpy array.
* numpy dot array's default behavior is to always copy the data and make a fresh array.
* numpy dot asarray's default behavior is to just return the input array, because you already have the data as an array.
* Sorry, it's really the name, not my joke.
* We can spot check these behaviors with the id function to see if it is the same object or a copy that looks the same.

In [None]:
id(my_data_array)

136263992969712

In [None]:
my_data_array_2 = np.array(my_data_array)
id(my_data_array_2)

136263992878032

Script:
* The id changed when we called numpy dot array.
* There is an option to avoid copying the data when possible, but the default is

In [None]:
my_data_array_3 = np.asarray(my_data_array)
id(my_data_array_3)

136263992969712

Script:
* When we passed the original NumPy array to numpy dot asarray, we got the same array back.
* Both numpy dot array and numpy dot asarray will act up if you pass an iterator without a length.

In [None]:
np.array([1.2 for i in range(10)])

array([1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2])

Script:
* List comprehensions have lengths because they are lists.

In [None]:
np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Script:
* Range works too, because the range object supports len.

In [None]:
np.array(i for i in range(10))

array(<generator object <genexpr> at 0x7bee6eeea3b0>, dtype=object)

Script:
* If you pass a generator object by accident, you will get a zero-dimension scalar array with data type object.
* That is almost certainly not what you wanted.
* If you need to pass a generator without a length, there is another function, numpy dot fromiter for this.
* You will need to pass it the dtype too; this function will not try to guess the type.


In [None]:
np.fromiter((i for i in range(10)), dtype=np.dtype("float64"))

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Script:
* There is also an optional `count` argument to fromiter.
* This will limit how many values are read from the iterator, and the documentation says that it will make fromiter since it can allocate memory once instead of repeatedly resizing as more values are read.
* So which one of these three functions should you use?
* I have seen numpy dot asarray used the most.
* Usually this is inside a function that other people call, and you want to support all the different kinds of array like input.
* Most of numpy's functions like this; they just convert everything into arrays as they are passed in.
* Using numpy dot asarray on these inputs, Numpy arrays are unchanged and everything is converted.
* Skipping the conversion for Numpy arrays is nice.
* What is not nice is writing all over the data of someone who nicely passed you an array.
* So, if you are going to write to the array, you probably should use numpy dot array and make your own copy to write to.
* What about fromiter?
* Save fromiter for cases where you are stuck with just a generator.
* And double check if you can get the count to avoid resizing the array as you build it.
* TLDR: use numpy dot array to write, numpy dot as array otherwise unless you are stuck with a generator.
