# Pandas Functions and Commands - Part 94

This notebook documents pandas extension API and development information.

## ExtensionDtype Attributes and Methods

The `ExtensionDtype` class provides a framework for creating custom data types in pandas. Here are its key attributes and methods:

In [None]:
import pandas as pd
import numpy as np
from pandas.api.extensions import ExtensionDtype, ExtensionArray

### Attributes of ExtensionDtype

- **kind**: A character code (one of 'biufcmMOSUV'), default 'O'. This should match the NumPy dtype when the array is converted to ndarray.
- **na_value**: Default NA value to use for this type. This is the user-facing "boxed" version of the NA value.
- **name**: A string identifying the data type. Used for display in Series.dtype.
- **names**: Ordered list of field names, or None if there are no fields. For compatibility with NumPy arrays.
- **type**: The scalar type for the array. ExtensionArray[item] should return an instance of this type for scalar items.

### Methods of ExtensionDtype

- **construct_array_type()**: Return the array type associated with this dtype.
- **construct_from_string(string)**: Construct this type from a string.
- **is_dtype(dtype)**: Check if we match 'dtype'.

### Example of a Custom ExtensionDtype

In [None]:
# Example of a simple custom ExtensionDtype for money values
class MoneyDtype(ExtensionDtype):
    name = 'money'
    kind = 'O'
    type = object
    na_value = None
    
    def __init__(self, currency='USD'):
        self.currency = currency
        self._metadata = ('currency',)
    
    @classmethod
    def construct_from_string(cls, string):
        if string == cls.name:
            return cls()
        elif string.startswith(cls.name):
            # Parse 'money[EUR]' format
            if string.endswith(']') and '[' in string:
                currency = string[string.find('[')+1:string.find(']')]
                return cls(currency=currency)
        raise TypeError(f"Cannot construct {cls.name} from {string}")
    
    @classmethod
    def construct_array_type(cls):
        # This would normally return the associated ExtensionArray class
        return MoneyArray

# This is a simplified example and would need more implementation
# to be fully functional as an ExtensionArray
class MoneyArray(ExtensionArray):
    pass

# Create and display information about our custom dtype
money_dtype = MoneyDtype(currency='EUR')
print(f"Name: {money_dtype.name}")
print(f"Currency: {money_dtype.currency}")
print(f"Kind: {money_dtype.kind}")
print(f"Type: {money_dtype.type}")
print(f"NA value: {money_dtype.na_value}")

## Pandas Development Environment Setup

This section provides instructions for setting up a pandas development environment. This is useful if you want to contribute to pandas or build pandas from source.

### Prerequisites

For building pandas from source, you need:

1. A C compiler
2. Python environment (3.6.1 or higher)

#### Installing a C Compiler

**For Debian/Ubuntu:**
```bash
sudo apt install build-essential
```

**For Red Hat/RHEL/CentOS/Fedora:**
```bash
yum groupinstall "Development Tools"
```

### Setting up with Conda

```bash
# Create and activate the build environment
conda env create -f environment.yml
conda activate pandas-dev

# Build and install pandas
python setup.py build_ext --inplace -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517
```

To verify the installation:
```python
import pandas
print(pandas.__version__)
```

Conda environment management:
```bash
# View environments
conda info -e

# Return to root environment
conda deactivate
```

### Setting up with pip (Unix/Mac OS)

```bash
# Create a virtual environment
python3 -m venv ~/virtualenvs/pandas-dev

# Activate the virtualenv
. ~/virtualenvs/pandas-dev/bin/activate

# Install the build dependencies
python -m pip install -r requirements-dev.txt

# Build and install pandas
python setup.py build_ext --inplace -j 0
python -m pip install -e . --no-build-isolation --no-use-pep517
```

### Setting up with pip (Windows)

```powershell
# Create a virtual environment
python -m venv $env:USERPROFILE\virtualenvs\pandas-dev

# Activate the virtualenv
~\virtualenvs\pandas-dev\Scripts\Activate.ps1

# Install the build dependencies
python -m pip install -r requirements-dev.txt

# Build and install pandas
python setup.py build_ext --inplace -j 0
python -m pip install -e . --no-build-isolation --no-use-pep517
```

### Git Workflow for Contributing

When contributing to pandas, it's recommended to create a feature branch for your changes:

```bash
# Create and switch to a new branch
git branch shiny-new-feature
git checkout shiny-new-feature

# Or in one command
git checkout -b shiny-new-feature
```

## Practical Example: Using ExtensionDtype

Let's see how we might use a custom dtype in a practical scenario:

In [None]:
# This is a simplified example showing how a custom dtype might be used
# Note: This won't actually run without a full implementation of MoneyArray

# Imagine we have a fully implemented MoneyArray class
# We could create a Series with money values

# Conceptual example (won't run):
'''
money_values = MoneyArray([10.50, 20.75, 30.00], dtype=MoneyDtype(currency='USD'))
s = pd.Series(money_values)

# The Series would display with currency information
print(s)
# 0    $10.50
# 1    $20.75
# 2    $30.00
# dtype: money[USD]

# And operations would respect the currency
print(s * 2)
# 0    $21.00
# 1    $41.50
# 2    $60.00
# dtype: money[USD]
'''

## Using Existing Extension Arrays

While creating custom extension arrays is advanced, pandas comes with several built-in extension arrays that you can use:

In [None]:
# Categorical data
cat = pd.Categorical(['a', 'b', 'a', 'c'], categories=['a', 'b', 'c', 'd'])
s_cat = pd.Series(cat)
print("Categorical Series:")
print(s_cat)
print(f"dtype: {s_cat.dtype}\n")

# Integer with NA support
s_int = pd.Series([1, 2, None, 4], dtype="Int64")
print("Integer Series with NA:")
print(s_int)
print(f"dtype: {s_int.dtype}\n")

# Boolean with NA support
s_bool = pd.Series([True, False, None, True], dtype="boolean")
print("Boolean Series with NA:")
print(s_bool)
print(f"dtype: {s_bool.dtype}\n")

# String data
s_str = pd.Series(['a', 'b', None, 'd'], dtype="string")
print("String Series:")
print(s_str)
print(f"dtype: {s_str.dtype}")