<a href="https://colab.research.google.com/github/Tarun-pandit/Data_Science_practice/blob/main/Data_Types_in_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [28]:
import numpy as np # Import the numpy library

In [29]:
arr = np.array([23,4,3,23]) # Create a NumPy array with integer elements

In [30]:
'''Common Data Types in NumPy:
int32, int64: Integer types with different bit sizes.
float32, float64: Floating-point types with different precision.
bool: Boolean data type.
complex64, complex128: Complex number types.
object: For storing objects (e.g., Python objects, strings).
You can check the dtype of a NumPy array using the .dtype attribute.'''
arr.dtype # Check the data type of the array

dtype('int64')

In [31]:
'''2. Changing Data Types
You can cast (convert) the data type of an array using the .astype() method. This is useful when you need to change the type for a specific operation or when you want to reduce memory usage.

Example: Changing Data Types'''
arr.astype(np.float64) # Convert the array to float64 data type

array([23.,  4.,  3., 23.])

In [32]:
'''3. Why Data Types Matter in NumPy
The choice of data type affects:

Memory Usage: Smaller data types use less memory.
Performance: Operations on smaller data types are faster due to less data being processed.
Precision: Choosing the appropriate data type ensures that you don't lose precision (e.g., using float32 instead of float64 if you don't need that extra precision).
Example: Memory Usage'''
arr_int64 = np.array([1, 2, 3], dtype=np.int64)
arr_int32 = np.array([1, 2, 3], dtype=np.int32)

print(arr_int64.nbytes)  # Output: 24 bytes (3 elements * 8 bytes each)
print(arr_int32.nbytes)  # Output: 12 bytes (3 elements * 4 bytes each)

24
12


In [33]:
arr2 = np.array([23,4,3,23], dtype=np.float32) # Create a NumPy array with float32 data type
arr2 # Display the array

array([23.,  4.,  3., 23.], dtype=float32)

In [47]:
arr2.nbytes # Check the number of bytes consumed by arr2


16

In [51]:
arr.nbytes # Check the number of bytes consumed by arr

120

In [37]:
'''4. Complex Numbers
NumPy also supports complex numbers, which consist of a real and imaginary part. You can store complex numbers using complex64 or complex128 data types.

Example: Complex Numbers'''
complex_arr = np.array([1 + 2j, 3 - 4j, 5 + 0j, 0 - 6j], dtype=np.complex128) # Create a NumPy array with complex128 data type
display(complex_arr) # Display the complex array

array([1.+2.j, 3.-4.j, 5.+0.j, 0.-6.j])

In [38]:
'''5. Object Data Type
If you need to store mixed or complex data types (e.g., Python objects), you can use dtype='object'. However, this type sacrifices performance, so it should only be used when absolutely necessary.

Example: Object Data Type'''
object_arr = np.array([1, 'hello', [1, 2, 3], {'a': 1}], dtype=object) # Create a NumPy array with object data type
display(object_arr) # Display the object array

array([1, 'hello', list([1, 2, 3]), {'a': 1}], dtype=object)

In [39]:
'''6. String Data Type in NumPy
Although NumPy arrays typically store numerical data, you can also store strings by using the dtype='str' or dtype='U' (Unicode string) format. However, working with strings in NumPy is less efficient than using lists or Python's built-in string types.

Example: String Array'''
arr = np.array(['apple', 'banana', 'cherry'], dtype='U10')  # Create a NumPy array with Unicode string data type ('U10' for strings up to 10 characters)
print(arr) # Print the string array

['apple' 'banana' 'cherry']


### Pros and Cons of NumPy Data Types

Choosing the appropriate data type in NumPy is crucial for optimizing memory usage, improving performance, and ensuring data integrity. Here's a breakdown of the pros and cons:

**Pros:**

*   **Memory Efficiency:** Using the correct data type can significantly reduce memory usage. For example, using `float32` instead of `float64` for floating-point numbers can halve the memory required. Similarly, using smaller integer types (`int8`, `int16`, `int32`) when appropriate can save memory.
*   **Performance:** Operations on arrays with smaller or simpler data types are generally faster. This is because the processor can perform operations on these data types more efficiently. For example, operations on integer arrays are typically faster than operations on floating-point arrays, and operations on fixed-size data types are faster than operations on `object` arrays.
*   **Data Integrity:** Specifying a data type ensures that the array stores data in a consistent format, preventing unexpected behavior or errors that might arise from mixed data types.

**Cons:**

*   **Data Loss/Precision Issues:** Using a data type with insufficient range or precision can lead to data loss or inaccurate results. For example, trying to store a large integer in an `int8` array will result in an overflow. Using `float32` instead of `float64` might not be suitable for calculations requiring high precision.
*   **Complexity with Mixed Data Types:** While NumPy arrays are designed to be homogeneous (contain elements of the same data type), the `object` data type allows for heterogeneous arrays. However, this comes at the cost of performance and memory efficiency, as NumPy cannot optimize operations on `object` arrays as effectively as it can for arrays with fixed-size data types.
*   **Type Casting Overhead:** Converting between data types using `astype()` can introduce some overhead, especially for large arrays. It's generally more efficient to create arrays with the desired data type from the beginning if possible.

In summary, selecting the data type that best suits the nature and range of your data while considering the trade-offs in terms of memory, performance, and precision is essential.