In [1]:
## Import statements
import numpy as np

## Custom Data Types ([Docs](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html))

Yes, you heard (or more like read) that right. Numpy offers the `np.dtype()` method that you can use to create your own data type. So you may be asking how is this useful? 

This is very useful if you are to work with **structured data** in Numpy. Regular NumPy arrays typically have a homogeneous data type, meaning that all elements within the array must have the same data type. This limitation can be restrictive when you're dealing with structured data that includes multiple types of information, such as strings, integers, and floats. This is where structured arrays with named fields come into play.

Structured arrays, created using the np.dtype() method, provide a way to store data with different data types within the same array. Each field in the structured array can have its own data type, allowing you to create arrays that effectively represent structured data while still benefiting from NumPy's performance optimizations.

Also this can be helpful in various situations where the built-in data types provided by NumPy don't precisely meet your needs. Creating a new data type using np.dtype() can offer benefits like **memory optimization, improved data organization, and customized behavior** for your data. Below we discuss briefly about these benefits with examples.

### *Custom data types for data organization*

You can define structured data types that contain multiple fields with different data types. This is particularly useful when you have structured data, like records with multiple attributes.

Suppose you have a dataset of products, each with attributes like name, price, and category. By defining a structured data type, you can organize your data easily and effectively.

In [2]:
# Define a structured data type for product information
product_dtype = np.dtype(
    [
        ("name", "U50"),  # Name of the product (up to 50 characters)
        ("price", "float64"),  # Price of the product
        ("category", "U20"),  # Category of the product (up to 20 characters)
    ]
)

# Create an array of products using the custom data type
products = np.array(
    [
        ("Laptop", 1200.00, "Electronics"),
        ("T-shirt", 25.00, "Apparel"),
        ("Coffee Maker", 80.00, "Appliances"),
        ("Headphones", 150.00, "Electronics"),
    ],
    dtype=product_dtype,
)

In [3]:
products

array([('Laptop', 1200., 'Electronics'), ('T-shirt',   25., 'Apparel'),
       ('Coffee Maker',   80., 'Appliances'),
       ('Headphones',  150., 'Electronics')],
      dtype=[('name', '<U50'), ('price', '<f8'), ('category', '<U20')])

**This type of structured data structures allows you to access and manipulate data within the structured array using the field names as indices.** This allows you to filter, modify, and analyze data based on specific attributes.

In [4]:
products["category"]

array(['Electronics', 'Apparel', 'Appliances', 'Electronics'],
      dtype='<U20')

This ability to organize and access data by meaningful field names, improves code readability and maintainability. This allows us to perform operations and calculations on specific fields, providing fine-grained control over data manipulation.

In [5]:
# Filter products based on category
electronics_products = products[products["category"] == "Electronics"]

# Filter expensive products
expensive_products = products[products["price"] > 100.00]

# Calculate the average price of electronics products
average_electronics_price = np.mean(electronics_products["price"])

print("Electronics Products:")
print(electronics_products)
print("\nExpensive Products:")
print(expensive_products)
print("\nAverage Electronics Price:", average_electronics_price)

Electronics Products:
[('Laptop', 1200., 'Electronics') ('Headphones',  150., 'Electronics')]

Expensive Products:
[('Laptop', 1200., 'Electronics') ('Headphones',  150., 'Electronics')]

Average Electronics Price: 675.0


In this example, the structured data type `product_dtype` allows you to easily filter products based on their categories and prices. By accessing fields using their names (e.g., `products['category']`), you can perform queries and operations specifically on those fields.

This feature enhances the usability and organization of your data, making it easier to work with and analyze complex datasets. It also helps you avoid manually iterating through arrays to extract relevant information, saving you time and effort.

#### **`Caution`**

While both NumPy structured arrays and pandas DataFrames provide ways to work with structured data, they serve different purposes. NumPy structured arrays are more focused on efficient numerical computations and memory management, whereas pandas DataFrames offer a higher-level interface for data manipulation and analysis, making them more suitable for data exploration and complex operations. The choice between using NumPy structured arrays or pandas DataFrames depends on the specific requirements of your project and the level of data analysis and manipulation you need to perform. If you require more extensive data manipulation and analysis capabilities, a library like pandas might be a more suitable choice.