# NumPy Data Types
NumPy provides a range of integer and floating-point data types based on their bit representation.In this notebook, we will explore each type, its use case, and provide a coding example with an explanation.

# **Creating a NumPy Array With DataType**

In [5]:
import numpy as np

# Create an array of integers
int_array = np.array([1, 2, 2, 2, 2], dtype='int8')
print("Integer Array:", int_array)
print("Data Type:", int_array.dtype)

Integer Array: [1 2 2 2 2]
Data Type: int8


### **General Type characters**

In [6]:
# Using general type character i
import numpy as np
# integer might default to int32 or int64 based on system
arr_i = np.array([1, 2, 2, 2, 2], dtype='i')
print("Array with general type character 'i':", arr_i)
print("dtype of created array arr_i :",arr_i.dtype)



Array with general type character 'i': [1 2 2 2 2]
dtype of created array arr_i : int32


In [7]:
# Using general type character b
import numpy as np
arr_b = np.array([True, False, True], dtype='b')  # boolean
print("Array with general type character 'b':", arr_b)
print("dtype of created array arr_b",arr_b.dtype)



Array with general type character 'b': [1 0 1]
dtype of created array arr_b int8


In [8]:
# Using specific type strings - int8

# signed 8-bit integer
arr_int8 = np.array([1, 2, 3], dtype='int8')
print("Array with specific type string 'int8':", arr_int8)
print("dtype of created array arr_int8:",arr_int8.dtype)


Array with specific type string 'int8': [1 2 3]
dtype of created array arr_int8: int8


In [9]:
# # Using specific type strings
# arr_int8 = np.array([1, 2, 3], dtype='int8')  # signed 8-bit integer
# print("Array with specific type string 'int8':", arr_int8)

# arr_float32 = np.array([1.5, 2.5, 3.5], dtype='float32')  # 32-bit floating point
# print("Array with specific type string 'float32':", arr_float32)

# arr_str = np.array(['apple', 'banana', 'cherry'], dtype='str')  # string
# print("Array with specific type string 'str':", arr_str)


# NumPy Integer Data Types
NumPy provides various integer data types based on their bit representation.In this notebook, we will explore each type, its use case, and provide a coding example with an explanation.

### int8
**Use Case**: int8 stands for "8-bit integer." It can represent whole numbers ranging from -128 to 127. This data type can be useful when you're trying to save memory and are certain that the numbers won't go outside this range.

In [10]:
import numpy as np
arr_int8 = np.array([-100, 0, 100], dtype='int8')
print(arr_int8)
print(arr_int8.dtype)

[-100    0  100]
int8


### int16
**Use Case**: int16 stands for "16-bit integer." It can represent whole numbers ranging from -32,768 to 32,767. It's suitable when you need a larger range than int8 but still want to be memory efficient.

In [11]:
arr_int16 = np.array([-30000, 0, 30000], dtype='int16')
print(arr_int16)
print(arr_int16.dtype)

[-30000      0  30000]
int16


### int32
**Use Case**: int32 stands for "32-bit integer." It can represent whole numbers ranging from roughly -2 billion to 2 billion. This is the default integer type in many systems and is suitable for most general purposes.

In [12]:
arr_int32 = np.array([-1000000000, 0, 1000000000], dtype='int32')
print(arr_int32)
print(arr_int32.dtype)

[-1000000000           0  1000000000]
int32


### int64
**Use Case**: int64 stands for "64-bit integer." It can represent extremely large or small whole numbers. It's useful when working with big data or when exact large counts are required.

In [13]:
arr_int64 = np.array([-10000000000000, 0, 10000000000000], dtype='int64')
print(arr_int64)
print(arr_int64.dtype)

[-10000000000000               0  10000000000000]
int64


## (3) u - Unsigned Integer

NumPy provides various unsigned integer data types based on their bit representation like int8, int16, int32, and int64.

Using ``` type='u' ``` directly in this manner might raise an error in some versions of NumPy because the character code 'u' for unsigned integers is somewhat ambiguous. Instead, it's typically better to specify the exact unsigned integer type you want, such as 'uint8', 'uint16', 'uint32', etc.

**Use Case**: When you have non-negative whole numbers, like in image data where pixel values range from 0 to 255.

**Explanation**:- We're possibly dealing with pixel values of an image.- The `dtype='u'` specifies non-negative integers.

### uint8
**Range**: 0 to 255

**Use Case**: Used in image processing where pixel values typically range from 0 (black) to 255 (white).

In [14]:
# Using specific type strings uint8
import numpy as np
arr_uint8 = np.array([50, 100, 255], dtype='uint8')
print(arr_uint8)
print(arr_uint8.dtype)

[ 50 100 255]
uint8


### uint16
**Range**: 0 to 65,535

**Use Case**: Suitable for data that exceeds 255 but stays within the 65,535 limit, such as some medical images or high dynamic range photographs.

In [15]:
arr_uint16 = np.array([30000, 50000, 65535], dtype='uint16')
print(arr_uint16)
print(arr_uint16.dtype)

[30000 50000 65535]
uint16


### uint32
**Range**: 0 to 4,294,967,295

**Use Case**: Used in scenarios where large counts or indexes are needed, like in large databases or simulations.

In [16]:
arr_uint32 = np.array([1000000, 2000000, 4294967295], dtype='uint32')
print(arr_uint32)
print(arr_uint32.dtype)

[   1000000    2000000 4294967295]
uint32


### uint64
**Range**: 0 to 18,446,744,073,709,551,615

**Use Case**: Useful in very specific scenarios where extremely large counts are required, like in big data analytics or certain scientific simulations.

In [17]:
arr_uint64 = np.array([10000000000, 20000000000, 18446744073709551615], dtype='uint64')
print(arr_uint64)
print(arr_uint64.dtype)

[         10000000000          20000000000 18446744073709551615]
uint64


## **(4) f - float**

NumPy provides various floating-point data types based on their bit representation like float16, float32, and float64.
**Use Case**: For continuous values, like measurements or calculations that result in decimal values.

**Explanation**:- We're representing measurements that have decimal values.- The `dtype='f'` indicates floating-point numbers.

### float16
**Use Case**: `float16`, or "half-precision float," is useful for scenarios where you need to save memory, such as deep learning models on GPUs. It's less accurate than other floating-point types.

In [18]:
arr_float16 = np.array([1.5, 2.5, 3.5], dtype='float16')
print(arr_float16)
print(arr_float16.dtype)

[1.5 2.5 3.5]
float16


### float32
**Use Case**: `float32`, or "single-precision float," offers a good balance between precision and memory usage. It's commonly used in machine learning and other computations that require floating-point numbers but don't need the precision of `float64`.

In [19]:
# Using specific type strings float32
arr_float32 = np.array([1.5, 2.5, 3.5], dtype='float32')
print(arr_float32)
print(arr_float32.dtype)

[1.5 2.5 3.5]
float32


### float64
**Use Case**: `float64`, or "double-precision float," is the standard for floating-point computation in Python and provides high precision. It's suitable for most general purposes.

In [20]:
arr_float64 = np.array([1.5, 2.5, 3.5], dtype='float64')
print(arr_float64)
print(arr_float64.dtype)

[1.5 2.5 3.5]
float64


### float128 (if available)
**Use Case**: `float128`, or "quad-precision float," provides even higher precision than `float64`. It's used in specialized scenarios where extreme precision is required, such as in simulations or numerical methods that are sensitive to rounding errors.

In [21]:
# Note: float128 might not be available on all systems
arr_float128 = np.array([1.5, 2.5, 3.5], dtype='float128')
print(arr_float128)
print(arr_float128.dtype)

[1.5 2.5 3.5]
float128


### (5) c - complex float
**Use Case**: In fields like electrical engineering or quantum physics where complex numbers are common.


In [22]:
## dtype complex (c)
import numpy as np
arr_complex = np.array([1+2j, 2+2j], dtype='complex128')
print(arr_complex)
print(arr_complex.dtype)

[1.+2.j 2.+2.j]
complex128


### (2) b - boolean
**Use Case**: For binary decisions, like filtering operations or masks.

**Explanation**:- We're creating an array that could represent a mask or filter.- The `dtype='b'` denotes boolean values.

In [23]:
##  dtype boolean (using dtype='bool')
import numpy as np
arr_bool   = np.array([True, False, True], dtype='bool')
print(arr_bool)
print(arr_bool.dtype)

[ True False  True]
bool


In [24]:
## dtype boolean using dtype='b'
import numpy as np
arr_bool   = np.array([True, False, True], dtype='b')
print(arr_bool)
print(arr_bool.dtype)

[1 0 1]
int8


### (6) m - timedelta
**Use Case**: When representing differences between two times or dates, like durations.

**Explanation**:- We're representing durations in days.- The `dtype='m'` denotes time delta.

In [25]:
## dtype timedelta (m)
import numpy as np
arr = np.array([np.timedelta64(12, 'D'), np.timedelta64(6, 'D')], dtype='m')
print(arr)

[12  6]


In [26]:
# timedelta using  dtype='timedelta64'
import numpy as np
arr_timedelta = np.array([1,2,3], dtype='timedelta64[h]')
print(arr_timedelta)
print(arr_timedelta.dtype)


[1 2 3]
timedelta64[h]


In [27]:
# timedelta using dtype='m'
import numpy as np
arr_timedelta = np.array([5,3,1], dtype='m')
print(arr_timedelta)
print(arr_timedelta.dtype)


[5 3 1]
timedelta64


In [28]:
import numpy as np
date1 = np.datetime64('2022-01-29')
date2 = np.datetime64('2022-02-16')

delta = date2 - date1
print("difference between two dates is:",delta)

difference between two dates is: 18 days


### (7) M - datetime
**Use Case**: For timestamps or specific dates, like in time series data.

In [29]:
# Creating arrays with a specific date
import numpy as np
arr_date = np.array(['2022-06-28', '2023-09-24'], dtype='datetime64[D]')
print(arr_date)
print(arr_date.dtype)

['2022-06-28' '2023-09-24']
datetime64[D]


In [30]:
# Creating arrays with a specific Month
import numpy as np
arr_date = np.array(['2022-06-28', '2023-09-24'], dtype='datetime64[Y]')
print(arr_date)
print(arr_date.dtype)

['2022' '2023']
datetime64[Y]


### (8) O - object
**Use Case**: When you need to store various types of Python objects in an array.

**Explanation**:- We're storing diverse Python objects.- The `dtype='O'` denotes object type.

In [31]:
## dtype Object (O)
import numpy as np
arr_object = np.array(['Mango', (1,2), {'key': 'value'}], dtype='O')
print(arr_object)
print(arr_object.dtype)


['Mango' (1, 2) {'key': 'value'}]
object


###(9) S - string
**Use Case**: Storing text data where Unicode is not required.

**Explanation**:- We're storing string data.- The `dtype='S'` denotes a byte string.

In [32]:
## dtype String (S)
import numpy as np
arr_s= np.array(['banana', 'orange', 'apple'], dtype='S')
print(arr_s)
print(arr_s.dtype)

[b'banana' b'orange' b'apple']
|S6


In [33]:
## dtype String (S)
import numpy as np
arr_s= np.array(['banana', 'orange', 'apple'], dtype='S5')
print(arr_s)
print(arr_s.dtype)

[b'banan' b'orang' b'apple']
|S5


### (10) U - unicode string
**Use Case**: Whenever you're working with text that might contain non-ASCII characters, like names or descriptions in multiple languages.

**Explanation**:- We're storing strings with possible Unicode characters.- The `dtype='U'` denotes Unicode string.

In [34]:
## dtype Unicode (U)
import numpy as np
arr_unicode = np.array(['@pple', 'banaña','🙂'], dtype='U')
print(arr_unicode)
print(arr_unicode.dtype)

['@pple' 'banaña' '🙂']
<U6


In [35]:
## dtype Unicode (U)
import numpy as np
unicode_array = np.array(['@pple', 'banaña', '🙂'], dtype='U')
print(unicode_array)
print(unicode_array.dtype)

['@pple' 'banaña' '🙂']
<U6


### (11) V - void
**Use Case**: Less common for general users, but useful for low-level operations, interfacing with C libraries, or when memory layout is critical.

**Explanation**:- We're creating an array with a void datatype, often used as a placeholder.- The `dtype='V'` denotes void.

In [36]:
## dtype void
import numpy as np
arr_void = np.array([], dtype='V')
print(arr_void)
print(arr_void.dtype)

[]
|V8


In [37]:
# Create an array of void type with each element being 8 bytes
import numpy as np
arr_void = np.array([], dtype='V8')
print(arr_void)
print(arr_void.dtype)


[]
|V8


In [38]:
# Create an array of void type with each element being 8 bytes
import numpy as np
arr_void = np.array([b'\x01\x02\x03\x04\x05\x06\x07\x08', b'\x11\x12\x13\x14\x15\x16\x17\x18'], dtype='V8')
print(arr_void)
print("datatype:",arr_void.dtype)


[b'\x01\x02\x03\x04\x05\x06\x07\x08' b'\x11\x12\x13\x14\x15\x16\x17\x18']
datatype: |V8


# Struct type

In [39]:
# creting struct array in numpy
import numpy as np
# Dfine the structured data type
employee_type = np.dtype([('name', 'S10'), ('age', 'int8'), ('salary', 'float32'), ('department', 'S10')])
# Create an array of employees
employees = np.array([('Mona', 25, 60000.98, 'HR'), ('Rohit', 30, 50000.00, 'IT')], dtype=employee_type)

# prining structured array
print("Employees array:",employees)


Employees array: [(b'Mona', 25, 60000.98, b'HR') (b'Rohit', 30, 50000.  , b'IT')]


In [40]:
#printing all employees name
print("Employees name:",employees['name'])

Employees name: [b'Mona' b'Rohit']


# Understanding Data Type Conversion in NumPy


Common data types in NumPy:
- **int**: Integer values (e.g., `int32`, `int64`)
- **float**: Floating point numbers (e.g., `float32`, `float64`)
- **complex**: Complex numbers
- **bool**: Boolean values (`True` or `False`)
- **str**: String values
- **object**: Python objects

In [41]:
# Convert String array to integer array
import numpy as np
string_array = np.array(['1', '2', '3'], dtype='S')
print("String array before conversion: ", string_array)
print("Data type of string array: ", string_array.dtype)

int_array = string_array.astype('int32')
print("Integer array after conversion: ", int_array)
print("Data type of integer array: ", int_array.dtype)


String array before conversion:  [b'1' b'2' b'3']
Data type of string array:  |S1
Integer array after conversion:  [1 2 3]
Data type of integer array:  int32


In [42]:
# Convert String array to float array
import numpy as np
string_array = np.array(['1', '2', '3'], dtype='S')
print("String array before conversion: ", string_array)
print("Data type of string array: ", string_array.dtype)

float_array = string_array.astype('float64')
print("Float array after conversion: ", float_array)
print("Data type of Float array: ", float_array.dtype)


String array before conversion:  [b'1' b'2' b'3']
Data type of string array:  |S1
Float array after conversion:  [1. 2. 3.]
Data type of Float array:  float64


In [43]:
# Convert integer array to float array
import numpy as np
original_arr_ints = np.array([1, 2, 3], dtype='int32')
print("Original array before conversion: ", original_arr_ints)
print("Data type of original array: ", original_arr_ints.dtype)

converted_arr_floats = original_arr_ints.astype(np.float64)
print("Converted array after conversion: ", converted_arr_floats)
print("Data type of converted array: ", converted_arr_floats.dtype)


Original array before conversion:  [1 2 3]
Data type of original array:  int32
Converted array after conversion:  [1. 2. 3.]
Data type of converted array:  float64


## Implications of Conversions
It's important to understand that not all conversions are safe. Some can result in a loss of data.

In [44]:
# Convert float array to integer array
original_floats = np.array([1.7, 2.8, 3.2])
converted_ints = original_floats.astype('int32')
print("Original array before conversion: ", original_floats)
print("Converted array after conversion: ", converted_ints)

Original array before conversion:  [1.7 2.8 3.2]
Converted array after conversion:  [1 2 3]


## Memory Implications of Data Types
The choice of data type can impact the memory usage of your arrays, especially with large datasets.

In [45]:
int64_array = np.arange(1000, dtype=np.int64)
int32_array = int64_array.astype(np.int32)

int64_array.nbytes, int32_array.nbytes

(8000, 4000)

## Safe Type Conversion
NumPy provides a utility to check if a conversion is safe. This can prevent potential data loss.

In [46]:
#Checking if conversion from float64 to int32 is safe
import numpy as np
is_safe_float_to_int = np.can_cast(np.float64, np.int32, casting='safe')
print(is_safe_float_to_int)

False


In the above example, converting from `float64` to `int32` isn't considered safe due to potential loss of decimal information, but the reverse is safe.

In [47]:
# Trying to convert float to int using casting='safe'
import numpy as np
arr = np.array([1.5, 2.5, 3.5])
# Converting to int32 with safe casting
result = arr.astype(np.int32, casting='safe')
print(result)


TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'safe'

In [50]:
# Trying to convert float to int with unsafe casting
import numpy as np
arr_float = np.array([1.5, 2.5, 3.5])
print("arr_float: ", arr_float)
print("Data type of arr_float: ", arr_float.dtype)

# Converting to int32 with unsafe casting
arr_result = arr.astype(np.int32, casting='unsafe')
print("arr_result: ", arr_result)
print("Data type of arr_result: ", arr_result.dtype)

arr_float:  [1.5 2.5 3.5]
Data type of arr_float:  float64
arr_result:  [1 2 3]
Data type of arr_result:  int32


In [48]:
# Converting to with same_kind casting
import numpy as np
arr_float = np.array([12,2,22], dtype='int64')
print("arr_float: ", arr_float)
print("Data type of arr: ", arr_float.dtype)

arr_result = arr_float.astype(dtype='int32', casting='same_kind')
print("arr_result: ", arr_result)
print("Data type of arr_result: ", arr_result.dtype)

arr_float:  [12  2 22]
Data type of arr:  int64
arr_result:  [12  2 22]
Data type of arr_result:  int32


## Handling Errors during Conversion
Converting a string that doesn't represent a number will result in an error. Let's see an example:

In [49]:
# Handling errors when converting incompatible data types
try:
    arr_string = np.array(['a', 'b', 'c']).astype(float)
except ValueError as e:
    print(f"Error: {e}")

Error: could not convert string to float: 'a'
