# Introduction to Numpy

Numpy is the most basic and a powerful package for working with data in python.

If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

So what does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

You might wonder, ‘I can store numbers and other objects in a python list itself and do all sorts of computations and manipulations through list comprehensions, for-loops etc. What do I need a numpy array for?’

Well, there are very significant advantages of using numpy arrays overs lists.

To understand this, let’s first see how to create a numpy array.

### How to create a Numpy Array?

There are multiple ways to create a numpy array, most of which will be covered as you read this. However one of the most common ways is to create one from a list or a list like an object by passing it to the np.array function.

In [2]:
# Importing Numpy 
import numpy as np

In [7]:
# Create an 1d array from a list
list_1 = [1,2,3,4,5]
list_to_array = np.array(list_1)
print(list_to_array)
print(type(list_to_array))

[1 2 3 4 5]
<class 'numpy.ndarray'>


The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this:

In [11]:
# list_1 + 2 #Error 
list_to_array + 2


array([3, 4, 5, 6, 7])

Another characteristic is that, once a numpy array is created, you cannot increase its size. To do so, you will have to create a new array. But such a behavior of extending the size is natural in a list.

Nevertheless, there are so many more advantages. Let’s find out.

So, that’s about 1d array. You can also pass a list of lists to create a matrix like a 2d array.

In [14]:
# Create a 2d array from a list of lists
list_2 = [[0,1,2],[3,4,5],[6,7,8]]
list_2_to_2d_array = np.array(list_2)
print(list_2_to_2d_array)
type(list_2_to_2d_array)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


numpy.ndarray

You may also specify the datatype by setting the dtype argument. Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

To control the memory allocations you may choose to use one of ‘float32’, ‘float64’, ‘int8’, ‘int16’ or ‘int32’.

In [15]:
# Create a 2D float array 

arr2d_f = np.array(list_2, dtype='float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

The decimal point after each number is indicative of the float datatype. You can also convert it to a different datatype using the *astype* method.

In [18]:
arr2d_f.astype('int')
print(arr2d_f)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]


In [19]:
# Convert to int then to str datatype
arr2d_f.astype('int').astype('str')

array([['0', '1', '2'],
       ['3', '4', '5'],
       ['6', '7', '8']], dtype='<U11')

A numpy array must have all items to be of the same data type, unlike lists. This is another significant difference.

However, if you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.

In [21]:
# Create a boolean array
arr2d_b = np.array([1, 0, 10], dtype='bool')
arr2d_b


array([ True, False,  True])

In [22]:
# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')
arr1d_obj


array([1, 'a'], dtype=object)

In [24]:
# Convert an array back to a list
arr1d_obj.tolist()
type(arr1d_obj)

numpy.ndarray