# <center> <div style="width: 370px;"> ![numpy title](pictures/numpy_tytle.jpg)

# <center> Optimizing Storage: Numpy Data Types

Now that you've gained some practical experience, it's time to revisit the theoretical aspect and delve into the world of data types. While data types may not be at the forefront of many Python code projects, they play a crucial role in how data is manipulated. In Python, numbers behave as expected, strings serve their purpose, Booleans determine truth or falsehood, and beyond that, you have the freedom to create your own objects and collections.

However, when it comes to NumPy, the story becomes more intricate. NumPy employs C code beneath its surface to optimize performance, and this optimization hinges on a critical requirement: all elements within an array must share the same data type. It's not merely about them being the same Python type; they must possess identical underlying C types, along with matching shapes and sizes in terms of bits!

Python, by default, defines only one type for each data class, such as a single integer type and a single floating-point type. This simplicity can be advantageous in applications where you don't need to concern yourself with the various ways data can be represented within a computer. However, in the realm of scientific computing, greater control is often necessary.

In NumPy, you'll encounter 24 new fundamental Python types designed to describe various scalar data types. These type descriptors draw heavily from the types available in the C language, which CPython, the Python interpreter, is written in. Additionally, NumPy introduces several extra types that align with Python's native types, providing you with a rich palette of options for precise data manipulation in scientific computing endeavors.

## Data Type Objects (`dtype`)

In the world of NumPy, data type objects are the linchpin that governs the interpretation of the bytes within a fixed-size memory block assigned to an array item. These data type objects, instances of the numpy.dtype class, provide a comprehensive blueprint for understanding various aspects of the data they encapsulate:

1. **Data Type**: They define the fundamental type of the data, whether it's an integer, a floating-point number, a Python object, or something else entirely.

2. **Data Size**: Data type objects specify the size in bytes occupied by the data. For instance, they reveal how many bytes constitute an integer or any other data type.

3. **Byte Order**: These objects define the byte order of the data, indicating whether it's little-endian or big-endian.

4. **Structured Data Types**: In the case of structured data types – an amalgamation of different data types that describe an array item, such as an integer combined with a float – data type objects unravel several crucial details:
   
   - Names of Fields: They delineate the names assigned to each "field" within the structure, providing a means to access them.
   
   - Data Type of Each Field: Data type objects unveil the specific data type associated with each field.

   - Memory Allocation: They specify the portion of the memory block allocated to accommodate each field of the structured data type.

5. **Sub-Arrays**: When dealing with sub-arrays, data type objects inform us about the shape and data type of these sub-arrays, offering essential insights into the hierarchical structure of data.

In NumPy, the characterization of scalar data types is facilitated by a diverse range of built-in scalar types, catering to different levels of precision for integers, floating-point numbers, and more. When you extract an item from an array, such as through indexing, you retrieve a Python object. This Python object's type aligns seamlessly with the scalar type linked to the array's data type, ensuring a coherent and accurate representation of the data.

> **Important Note:** While scalar types are versatile and can serve as replacements for dtype objects in various NumPy scenarios requiring data type specifications, it's essential to recognize that scalar types themselves are distinct from dtype objects.

Structured data types in NumPy are crafted by defining a data type that encompasses fields composed of other data types. Each field within this structure is given a name, providing a convenient means of access. The parent data type, within which these fields reside, must possess adequate size to accommodate all its constituent fields. Typically, the parent data type is based on the versatile void type, capable of accommodating items of arbitrary sizes. It's worth noting that structured data types can also include nested sub-array data types within their fields.

Additionally, a data type can delineate items that are themselves arrays composed of elements of another data type. However, these sub-arrays must adhere to a fixed size.

When you create an array using a data type that describes a sub-array, the dimensions of the sub-array are appended to the shape of the resulting array during its creation process. It's important to mention that sub-arrays residing within fields of a structured type exhibit unique behaviors, which are elaborated upon in the [Numpy Documentation under](https://numpy.org/doc/stable/reference/arrays.indexing.html#arrays-indexing-fields)

Furthermore, it's noteworthy that sub-arrays invariably adopt a C-contiguous memory layout.


A simple data type containing a 32-bit big-endian integer:

In [1]:
import numpy as np

In [2]:
dt = np.dtype('>i4')

In [3]:
dt.byteorder

'>'

In [4]:
dt.itemsize

4

In [5]:
dt.name

'int32'

In [20]:
np.size(dt)

1

In [6]:
dt.type is np.int32

True

A structured data type containing a 16-character string (in field ‍`name`) and a sub-array of two 64-bit floating-point number (in field `grades`):

In [7]:
dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])

In [8]:
dt['name']

dtype('<U16')

In [9]:
dt['grades']

dtype(('<f8', (2,)))

Items of an array of this data type are wrapped in an array scalar type that also has two fields:

In [10]:
x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)

In [11]:
x[0]

('Sarah', [8., 7.])

In [12]:
x[1]['grades']

array([6., 7.])

In [13]:
type(x[1])

numpy.void

In [14]:
type(x[1]['grades'])

numpy.ndarray

## Specifying and Constructing Data Types

Whenever a data-type is required in a NumPy function or method, either a dtype object or something that can be converted to one can be supplied. Such conversions are done by the dtype constructor:

In [15]:
dt = np.dtype(np.int32)      # 32-bit integer

In [16]:
dt = np.dtype(np.complex128) # 128-bit complex floating-point number

Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:

|built-in python type|numpy type|
|--|--|
|`int`|`int_`|
|`bool`|`bool_`|
|`float`|`float_`|
|`complex`|`cfloat`|
|`bytes`|`bytes_`|
|`str`|`str_`|
|`buffer`|`void`|
|all others|`object_`|

### More on Data Types

This segment of the tutorial has been carefully curated to equip you with the essential knowledge required to be productive with NumPy's data types. It offers insights into the inner workings of NumPy and helps you navigate common pitfalls. However, please be aware that it's not an exhaustive guide.

For those seeking deeper knowledge, the [NumPy documentation on `ndarrays`](https://numpy.org/doc/stable/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray) offers a wealth of additional resources and in-depth insights.

Furthermore, if you're interested in exploring the intricacies of `dtype` objects, including various construction methods, customization options, and optimization techniques, as well as ensuring they meet your data handling needs with robustness, there's a dedicated resource available [here](https://numpy.org/doc/stable/reference/arrays.dtypes.html). This can be a valuable starting point if you encounter challenges when loading your data into arrays.

Lastly, the NumPy `recarray` is a formidable tool in its own right, and our exploration here only scratches the surface of its capabilities. Delving into the [`recarray` documentation](https://numpy.org/doc/stable/reference/generated/numpy.recarray.html) will provide you with a comprehensive understanding of its potential. Additionally, NumPy offers other specialized array [subclasses](https://numpy.org/doc/stable/reference/arrays.classes.html) that are worth exploring to expand your proficiency in utilizing NumPy effectively.