# dtypes

- Integer Types:

    - bool_ (1 byte): Boolean values (True or False).
    - int_ (variable size): Default integer type (platform-dependent, typically int64 or int32).
    - intc (4 bytes): C integer (signed).
    - intp (variable size): Integer used for indexing (platform-dependent, typically int64 or int32).
    - uint8 (1 byte): Unsigned 8-bit integer.
    - int8 (1 byte): Signed 8-bit integer.
    - And so on, representing different sizes and signed/unsigned options for integers up to uint64 and int64.  

- Floating-Point Types:

    - float16 (2 bytes): Half-precision float (IEEE 754-2008).
    - float32 (4 bytes): Single-precision float (IEEE 754).
    - float64 (8 bytes): Double-precision float (IEEE 754).
    - longdouble (variable size): Extended-precision float (platform-dependent).

- String and Character Types:

    - str_ (variable size): Unicode string.
    - bytes_ (variable size): Byte string.
    - object_ (variable size): Python object (can store any type).

- Structured and Specialized Types:

    - Structured arrays: Combine different data types within a single array.
    - Timedelta and Datetime types: Represent durations and timestamps.



- i - integer
- b - boolean
- u - unsigned integer
- f - float
- c - complex float
- m - timedelta
- M - datetime
- O - python object type which can be any python object
- S - Fixed-length ASCII String type (1byte per character)
    - for example to create string dtype with length 10, use 'S10' 
- U - Fixed-length Unicode type
    - same specification semantic as String_
- V - fixed chunk of memory for other type ( void )  


__there are different way to declare each of them for example this table is for integer:__



<table >
  <tr>
    <th>Type Name</th>
    <th>Character Code</th>
    <th>Size (Bytes)</th>
    <th>Description</th>
    <th>Signed/Unsigned</th>
  </tr>
  <tr>
    <td>i1</td>
    <td>b</td>
    <td>1</td>
    <td>8-bit signed integer</td>
    <td>Signed</td>
  </tr>
  <tr>
    <td>i2</td>
    <td>i2</td>
    <td>2</td>
    <td>16-bit signed integer</td>
    <td>Signed</td>
  </tr>
  <tr>
    <td>i4</td>
    <td>i4</td>
    <td>4</td>
    <td>32-bit signed integer</td>
    <td>Signed</td>
  </tr>
  <tr>
    <td>i8</td>
    <td>i8</td>
    <td>8</td>
    <td>64-bit signed integer</td>
    <td>Signed</td>
  </tr>
  <tr>
    <td>u1</td>
    <td>u</td>
    <td>1</td>
    <td>8-bit unsigned integer</td>
    <td>Unsigned</td>
  </tr>
  <tr>
    <td>u2</td>
    <td>u2</td>
    <td>2</td>
    <td>16-bit unsigned integer</td>
    <td>Unsigned</td>
  </tr>
  <tr>
    <td>u4</td>
    <td>u4</td>
    <td>4</td>
    <td>32-bit unsigned integer</td>
    <td>Unsigned</td>
  </tr>
  <tr>
    <td>u8</td>
    <td>u8</td>
    <td>8</td>
    <td>64-bit unsigned integer</td>
    <td>Unsigned</td>
  </tr>
</table>

## signed vs unsigned:
The key difference between signed and unsigned data types in NumPy (and programming in general) lies in how they represent and interpret numbers:

- Signed:
  - Can store both positive and negative integer values.
  - Uses the most significant bit (MSB) to indicate the sign:
  - 0 for positive numbers
  - 1 for negative numbers
  - Remaining bits represent the magnitude of the number.

- Unsigned:
  - Can only store **non-negative integers** (zero or positive).
  - **All bits** are used to represent the magnitude of the number.
  - This allows them to have a **larger range of representable values** compared to signed types with the same number of bits.



### Example:

Consider 8-bit integers (1 byte):

- Signed:
  - Minimum value: -128 (MSB = 1, remaining 7 bits represent magnitude 01111111)
  - Maximum value: 127 (MSB = 0, remaining 7 bits represent magnitude 01111111)
- Unsigned:
  - Minimum value: 0 (all bits represent magnitude 00000000)
  - Maximum value: 255 (all bits represent magnitude 11111111)  

<table border="1">
  <tr>
    <th>Feature</th>
    <th>Signed</th>
    <th>Unsigned</th>
  </tr>
  <tr>
    <td>Range</td>
    <td>Negative to positive values</td>
    <td>Zero and positive values</td>
  </tr>
  <tr>
    <td>Sign bit</td>
    <td>Yes (MSB)</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Magnitude representation</td>
    <td>Remaining bits</td>
    <td>All bits</td>
  </tr>
  <tr>
    <td>Maximum value for N bits</td>
    <td>2^(N-1) - 1</td>
    <td>2^N - 1</td>
  </tr>
  <tr>
    <td>Example (8-bit integer)</td>
    <td>-128 to 127</td>
    <td>0 to 255</td>
  </tr>
</table>

### When to use which:

- Use signed if you need to store negative values.
- Use unsigned if you know your values will always be non-negative and you need a larger range.
- Be cautious when performing calculations between signed and unsigned types, as unexpected results might occur due to different interpretations of numbers.

			

In [1]:
import numpy as np

In [2]:
np.array(range(10)).dtype

dtype('int64')

#### in above example the time of it is int64 which will sign 64bits(8bytes) in memory

we can change it during creating the array or even after that

In [3]:
arr = np.array(range(10), 'i2')
print('i2 which means 2bytes(16bits) will change dtype to: ', arr.dtype)
arr = arr.astype('i4')
print('then we modify the array to make it i4(32bits) so the arr.dtype is: ', arr.dtype)


i2 which means 2bytes(16bits) will change dtype to:  int16
then we modify the array to make it i4(32bits) so the arr.dtype is:  int32


In [4]:
arr = np.array(['apple', 'banana', 'cherry'])
arr.dtype

dtype('<U6')

In [5]:
arr = np.array([b'apple', b'banana', b'cherry'])

arr.dtype

dtype('S6')

In [6]:
arr = np.array(['apple', 'banana', 'cherry'], dtype='S')

arr.dtype

dtype('S6')

In [7]:
arr = np.array(['apple', 'banana', 'cherry'], dtype='S2')   # here 'S2' is not enough so it will cut them to their first two bytes
arr

array([b'ap', b'ba', b'ch'], dtype='|S2')

In [8]:
arr = np.array(range(125, 128)).astype('i1')
# i1 is enough for [125, 126, 127]
arr

array([125, 126, 127], dtype=int8)

In [9]:
arr = np.array(range(126, 129)).astype('i1')
# i1 is NOT enough for [126, 127, 128]
arr

array([ 126,  127, -128], dtype=int8)

in above example i1 is not enough for it so it will mess it up for [126, 127, 128]

In [10]:
arr = np.array(range(126, 129)).astype('u1')
# u1 is enough for [126, 127, 128]

arr

array([126, 127, 128], dtype=uint8)

In [11]:
arr = np.array(range(254, 257)).astype('u1')
# u1 is NOT enough for [255, 255, 256]
arr

array([254, 255,   0], dtype=uint8)

In [12]:
arr = np.array(range(-5, 5)).astype('u1')
# u is NOT proper for negative values either
arr

array([251, 252, 253, 254, 255,   0,   1,   2,   3,   4], dtype=uint8)

**so we should always be careful for example when our dataset has for example people wight and its in str and we are going to convert it:**
- we should choose something that require less memory
- we should be careful to not missing data by choosing wrong dtype
    - like this ```arr = np.array(['apple', 'banana', 'cherry'], dtype='S2')```
    - or this ```arr = np.array(range(-5, 5)).astype('u1')```
