# Understanding Data Types in Python

Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this. Understanding this difference is fundamental to understanding much of the material throughout the rest of the book.

Users of Python are often drawn in by its ease of use, one piece of which is **dynamic typing**. While a statically typed language like C or Java requires each variable to be explicitly declared, a dynamically typed language like Python skips this specification.

For example, in C you might specify a particular operation as follows:

```c
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:

```python
# Python code
result = 0
for i in range(100):
    result += i
```

Notice the main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:

```python
# Python code
x = 4
x = "four"
```

Here we've switched the contents of `x` from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintended consequences:

```c
/* C code */
int x = 4;
x = "four"; // FAILS
```

This sort of flexibility is one piece that makes Python and other dynamically typed languages convenient and easy to use. Understanding how this works is an important piece of learning to analyze data efficiently and effectively with Python. But what this type flexibility also points to is the fact that Python variables are more than just their value; they also contain extra information about the type of the value. We'll explore this more in the sections that follow.



## A Python Integer Is More Than Just an Integer
The standard Python implementation is written in C. This means every Python object is a C structure that contains not only its value but also other information. When you define an integer in Python, like `x = 10000`, `x` is not a "raw" integer. It's a pointer to a C structure that looks something like this:

```c
struct _longobject {
    long ob_refcnt;       // Reference count for memory management
    PyTypeObject *ob_type; // Encodes the type of the variable
    size_t ob_size;       // Specifies the size of the data members
    long ob_digit[1];     // The actual integer value
};
```

![Visual example](./img/2-1.png)

A single Python integer contains four pieces:
*   `ob_refcnt`: A reference count that helps Python manage memory automatically.
*   `ob_type`: Encodes the variable's data type.
*   `ob_size`: Specifies the size of the following data members.
*   `ob_digit`: Contains the actual integer value.

This is fundamentally different from a C integer, which is just a label for a position in memory. A Python integer, by contrast, is a pointer to a memory location containing all the Python object information. This extra information enables Python's dynamic features but comes at a cost, especially when working with structures that combine many objects.

The `PyObject_HEAD` part of the structure contains the reference count, type code, and other metadata. In essence, a C integer is a raw value in memory, whereas a Python integer is a pointer to an object that includes all the information needed for Python's dynamic typing. This design is what makes Python flexible, but it also introduces overhead, which becomes significant when dealing with large collections of objects.

