## Datatypes

### Topics covered:
- Types of dtype
- Size of dtypes
- Changing dtypes

Few important pointers
- Every numpy array is a grid of elements of the **same type**.
- Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy **tries to guess a datatype when you create an array**, but functions that construct arrays usually also include an optional argument to **explicitly specify the datatype**.


# Type of data types

## Integers Type

**Unsigned Integer Types and Their Ranges**
- **unsigned data types** refer to numeric types that can only represent **non-negative integers (i.e., 0 and positive numbers)**. 

```
  Data Type	Description              (Range)
np.uint8	Unsigned 8-bit integer	(0 to 255)
np.uint16	Unsigned 16-bit integer	(0 to 65,535)
np.uint32	Unsigned 32-bit integer	(0 to 4,294,967,295)
np.uint64	Unsigned 64-bit integer	(0 to 18,446,744,073,709,551,615)
```

**Signed Integer Types and Their Ranges**

- **signed data types** refer to numeric types that can only represent **positive and negative integers**. 

```
Data Type   Bits    Minimum Value            Maximum Value
np.int8      8      -128                        127
np.int16    16      -32,768                     32,767
np.int32    32      -2,147,483,648              2,147,483,647
np.int64    64	    -9,223,372,036,854,775,808	9,223,372,036,854,775,807
```

## Floats Type
```
float16: half precision (~3 decimal digits, not always supported on all CPUs)
float32: single precision (~7 decimal digits)
float64: double precision (~15 decimal digits, NumPy default)
float128: extended precision (platform dependent; may not be true 128-bit on all systems)
```

## Complex Numbers Type
```
complex64: real + imaginary, both float32
complex128: real + imaginary, both float64
complex256: extended precision (platform dependent)
```

## Boolean Type
```
bool_: stores True or False (1 byte)
```

## Text and String Types
```
str_: fixed-size Unicode string
unicode_: alias for str_
bytes_: fixed-size sequence of bytes
object_: can hold arbitrary Python objects (slower, less memory efficient)
```

## Other Special Types
```
datetime64: date and time values with precision (Y, M, D, h, m, s, ms, us, ns)
timedelta64: differences between dates/times
void: used for raw data (records, structured dtypes)
```

**Why Do We Have So Many Data Types in NumPy?**
- **Memory Efficiency**: Different data types require different amounts of memory. For example, an **int8 uses 1 byte**, **int64 uses 8 bytes**. By choosing the appropriate data type, you can save memory, especially when working with large datasets.

- **Performance**: Operations on smaller data types can be faster because they require less memory bandwidth and can fit more data into CPU caches. This can lead to improved performance in numerical computations.

- **Precision**: Different data types offer varying levels of precision. For example, float32 has less precision than float64. Depending on the requirements of your calculations, you may need to choose a data type that provides the necessary precision.


### Guidelines for Choosing a NumPy dtype: What kind of data do you have?

Whole numbers (integers) → Use int8, int16, int32, or int64 depending on the range.

Decimals (floating-point) → Use float32 or float64 depending on precision needed.

True/False values → Use bool_.

Text / categorical labels → Use str_, object_, or encode them to integers.

### Example
If your data is ages (0–120), np.uint8 (0–255) is enough.

If your data is population counts (millions), np.int32 is safer.

If you deal with financial values (billions with decimals), use np.float64.

In [41]:
import numpy as np

In [42]:
x = np.array([1, 2, 5, 2])   # Let numpy choose the datatype
print(x.dtype) 
################

x = np.array([1.0, 2.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)    

int32
float64


In [43]:
# Force a particular datatype to int8
x = np.array([1, 2, 5, 2], dtype=np.int8)   
print(x.dtype)                    

int8


In [44]:
# Les check how big/small your values can get.

print(np.iinfo(np.int8))
# print(np.iinfo(np.int16))
# print(np.iinfo(np.int32))
# print(np.iinfo(np.int64))

# print(np.finfo(np.float16))
# print(np.finfo(np.float32))
# print(np.finfo(np.float64))

Machine parameters for int8
---------------------------------------------------------------
min = -128
max = 127
---------------------------------------------------------------



In [45]:
# Lets see available data types

# 1. Integer Types: int8, int16, int32, int64 
# Create an array of int8
a = np.array([1, 2, 3], dtype=np.int8)
print("Integer Array:", a)
print("Data Type:", a.dtype)

Integer Array: [1 2 3]
Data Type: int8


In [46]:
# 2. Unsigned Integer Types: uint8, uint16, uint32, uint64
# Create an array of uint8
a = np.array([1, 2, 3], dtype=np.uint8)
print("Unsigned Integer Array:", a)
print("Data Type:", a.dtype)

Unsigned Integer Array: [1 2 3]
Data Type: uint8


In [47]:
# 3. Floating Point Types: float16, float32, float64
# Create an array of float64
a = np.array([1.0, 2.0, 3.0], dtype=np.float64)
print("Floating Point Array:", a)
print("Data Type:", a.dtype)

Floating Point Array: [1. 2. 3.]
Data Type: float64


In [48]:
# 4. Complex Types: complex64, complex128
# Create an array of complex64
a = np.array([1 + 2j, 3 + 4j], dtype=np.complex64)
print("Complex Array:", a)
print("Data Type:", a.dtype)

Complex Array: [1.+2.j 3.+4.j]
Data Type: complex64


In [49]:
# 5. Boolean Type: bool_
# Create a boolean array
a = np.array([True, False, True], dtype=np.bool_)
print("Boolean Array:", a)
print("Data Type:", a.dtype)

Boolean Array: [ True False  True]
Data Type: bool


In [50]:
# 6. String Type: str_
# Create an array of strings
a = np.array(['apple', 'banana', 'cherry'], dtype=np.str_)
print("String Array:", a)
print("Data Type:", a.dtype)

# 7. Object Type: object_
# Create an array of objects
a = np.array([1, 'apple', 3.14], dtype=np.object_)
print("Object Array:", a)
print("Data Type:", a.dtype)

String Array: ['apple' 'banana' 'cherry']
Data Type: <U6
Object Array: [1 'apple' 3.14]
Data Type: object


# Size of data types
### Demonstrating the size of different NumPy data types in bytes

In [51]:
# Integer Types
print("Size of int8 :", np.int8().itemsize,  "bytes")  
print("Size of int16:", np.int16().itemsize, "bytes") 
print("Size of int32:", np.int32().itemsize, "bytes") 
print("Size of int64:", np.int64().itemsize, "bytes")

Size of int8 : 1 bytes
Size of int16: 2 bytes
Size of int32: 4 bytes
Size of int64: 8 bytes


In [52]:
# Unsigned Integer Types
print("Size of uint8 :", np.uint8().itemsize,  "bytes")   
print("Size of uint16:", np.uint16().itemsize, "bytes") 
print("Size of uint32:", np.uint32().itemsize, "bytes") 
print("Size of uint64:", np.uint64().itemsize, "bytes") 

Size of uint8 : 1 bytes
Size of uint16: 2 bytes
Size of uint32: 4 bytes
Size of uint64: 8 bytes


In [53]:
# Floating Point Types
print("Size of float16:", np.float16().itemsize, "bytes") 
print("Size of float32:", np.float32().itemsize, "bytes") 
print("Size of float64:", np.float64().itemsize, "bytes") 

Size of float16: 2 bytes
Size of float32: 4 bytes
Size of float64: 8 bytes


In [54]:
# Complex Types
print("Size of complex64:",  np.complex64().itemsize,  "bytes") 
print("Size of complex128:", np.complex128().itemsize, "bytes") 

Size of complex64: 8 bytes
Size of complex128: 16 bytes


In [55]:
# Boolean Type
print("Size of bool:", np.bool_().itemsize, "bytes") 

# Object Type
print("Size of object:", np.dtype(np.object_).itemsize, "bytes") 

Size of bool: 1 bytes
Size of object: 8 bytes


# Change data types using astype() method
- .astype() always **returns a new array**.

In [61]:
# Change data type to uint8

# Create a NumPy array with default data type (int)
a = np.array([1, 2, 3, 4, 5])
print("Original:", a)
print(a.dtype)

a = a.astype(np.uint8)   # to uint8
# a = a.astype(np.float16) # to float
# a = a.astype(np.float32) # to float32
# a = a.astype(np.str_)     # to string

print("New     :", a)
print(a.dtype)

Original: [1 2 3 4 5]
int32
New     : ['1.0' '2.0' '3.0' '4.0' '5.0']
<U32


In [57]:
# Change data type from float to integer (will truncate the decimal part)

a = np.array([1.1, 2.2, 3.3])
print("Original:", a)
print(a.dtype)

a = a.astype(int)
print("New     :", a)
print(a.dtype)

Original: [1.1 2.2 3.3]
float64
New     : [1 2 3]
int32


In [58]:
# Change data type from string to integer

a = np.array(["1", "2", "3"])
print("Original:", a)
print(a.dtype)

a = a.astype(int)
print("New     :", a)
print(a.dtype)

Original: ['1' '2' '3']
<U1
New     : [1 2 3]
int32
