*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

## Introduction to NumPy and `ndarray`

This subtopic outlines techniques for effectively loading, storing, and manipulating in-memory data in Python.
Datasets can come from a wide range of sources and in a wide range of formats, such as collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, digital images can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.
Sound clips can be thought of as one-dimensional arrays of intensity versus time.
Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words.

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science, machine learning or scientific computing.

NumPy (short for *Numerical Python*) provides such an efficient interface to store and operate on numerical arrays.
In some ways, NumPy arrays are like Python's built-in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.
NumPy arrays form the core of many tools and modules in Python, so time spent learning to use NumPy effectively will be valuable no matter what programming tasks you attempt in the future.

If you're running Jupyter on your own Anaconda install you should check that you have NumPy installed prior to working through this notebook and the following one. If not, you should follow the instructions given in previous topics to install NumPy via the Anaconda Navigator. You can check that NumPy is installed, by running the following code:

In [None]:
import numpy
numpy.__version__

By convention, you'll find that most people import NumPy using ``np`` as an alias. We've not discussed this previously, but you can use an alias for any module or package that you import using the `as` keyword. This is often useful if a package has a long name that you don't want to type out repeatedly!

In [None]:
import numpy as np

Troughout this topic this is the way we will import and use NumPy. First let us take a look back at something we saw all the way back in notebook 1.1.

## A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C.
This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. We saw back in topic 1 that an integer in Python is actually a pointer, and it is infact a pointer to this C structure. We don't want to go too much into the details here, but the diagram below shows the difference between a Python integer, and an integer stored in C.

![Integer Memory Layout](figures/cint_vs_pyint.png)

Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value.
A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.
This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically.
All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

## A Python List Is More Than Just a List

Let's consider now what happens when we use a Python data structure that holds many Python objects.
The standard mutable multi-element container in Python is the list.
We can create a list of integers as follows:

In [None]:
L = list(range(10))
L

Or we can create a list that contains elements of different datatypes.

In [None]:
L1 = [True, "2", 3.0, 4]
[type(item) for item in L1]

This flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info so each item must be a complete Python object.
In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array.
The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

![Array Memory Layout](figures/array_vs_list.png)

At the implementation level, the array essentially contains a single pointer to one contiguous block of data.
The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier.
Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type.
Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

## Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers.
The built-in ``array`` module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [None]:
import array
L = list(range(10))
A = array.array('i', L)
A

Here ``'i'`` is a type code indicating the contents are integers.

Much more useful, however, is the ``ndarray`` object of the NumPy package.
While Python's ``array`` object provides efficient storage of array-based data, NumPy adds to this efficient *operations* on that data.

## NumPy Standard Data Types

NumPy arrays must contain values of a single type, so it is important to have some knowledge of those types and their limitations.
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table.
Note that when constructing an array, they can be specified using a string:

```python
np.zeros(10, dtype='int16')
```

Or using the associated NumPy object:

```python
np.zeros(10, dtype=np.int16)
```

The table below identifies some (though definitely not all) of the types available to you when constructing an `ndarray`.

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 


In the next notebook you will learn how to use this `ndarray` object, first looking at how to create a NumPy array and then taking a brief look at some of the vast number of functions offered to us by NumPy.