#### Pandas Development and Documentation - Part 95

This notebook covers pandas development practices, documentation standards, and type hinting.

##### Git Workflow for Contributing

When contributing to pandas, it's important to follow proper Git workflow practices:

```bash
# Create and switch to a feature branch
git checkout -b shiny-new-feature

# Keep your master branch up to date
git checkout master
git pull upstream master --ff-only

# Switch back to your feature branch
git checkout shiny-new-feature
```

##### Documentation Standards

Pandas documentation is written in reStructuredText (reST) and built using Sphinx. The documentation consists of two parts:

1. Docstrings in the code itself
2. Documentation in the `doc/` folder (tutorials, overviews, etc.)

### Docstring Example

Pandas follows a convention based on the NumPy Docstring Standard. Here's an example of a properly formatted docstring:

In [None]:
def add(num1, num2):
    """
    Add up two integer numbers.

    This function simply wraps the `+` operator, and does not
    do anything interesting, except for illustrating what is
    the docstring of a very simple function.

    Parameters
    ----------
    num1 : int
        First number to add
    num2 : int
        Second number to add

    Returns
    -------
    int
        The sum of `num1` and `num2`

    See Also
    --------
    subtract : Subtract one integer from another

    Examples
    --------
    >>> add(2, 2)
    4
    >>> add(25, 0)
    25
    >>> add(10, -10)
    0
    """
    return num1 + num2

##### Type Hinting in Pandas

Pandas uses type hints to improve code readability and enable static type checking. Here are some best practices for type hinting in pandas:

### Using Standard Type Hints

In [None]:
from typing import List, Optional, Union

# Good practice
primes: List[int] = []

# Use Optional for values that might be None
# Instead of: maybe_primes: List[Union[int, None]] = []
maybe_primes: List[Optional[int]] = []

### Handling Shadowed Builtins

When a class variable shadows a builtin, create an unambiguous alias:

In [None]:
# Create an alias for the builtin
str_type = str

class SomeClass:
    str: str_type = None

### Avoiding Type Casting

Using `cast` from the typing module is discouraged. Instead, refactor code to make type checking clearer:

In [None]:
from typing import Union

# Discouraged approach
'''
from typing import cast
from pandas.core.dtypes.common import is_number

def cannot_infer_bad(obj: Union[str, int, float]):
    if is_number(obj):
        ...
    else:
        obj = cast(str, obj)  # Mypy complains without this!
        return obj.upper()
'''

# Preferred approach
def cannot_infer_good(obj: Union[str, int, float]):
    if isinstance(obj, str):
        return obj.upper()
    else:
        # Handle numeric types
        pass

### Pandas-specific Types

Pandas provides custom types in the `pandas._typing` module for common patterns:

In [None]:
# Example of using pandas-specific types
import pandas as pd
import numpy as np

# This is conceptual code showing how pandas types would be used
# The actual _typing module is private and may change

'''
from pandas._typing import Dtype

def as_type(dtype: Dtype):
    # This function accepts various dtype formats:
    # - String like "object"
    # - NumPy dtype like np.int64
    # - Pandas ExtensionDtype like pd.CategoricalDtype
    pass
'''

# Example of different dtype formats that would be accepted
string_dtype = "object"
numpy_dtype = np.int64
pandas_dtype = pd.CategoricalDtype()

### Validating Type Hints

Pandas uses mypy to statically analyze the code base and type hints. After making changes, you can validate your type hints by running:

```bash
mypy pandas
```

##### Continuous Integration

Pandas uses Travis-CI and Azure Pipelines for continuous integration testing. Pull requests are considered for merging when they have an all 'green' build, indicating that all tests have passed.

##### Practical Example: Creating a Function with Type Hints and Documentation

In [None]:
from typing import List, Union, Optional
import pandas as pd
import numpy as np

def filter_dataframe(df: pd.DataFrame, 
                     column: str, 
                     values: List[Union[str, int, float]], 
                     keep_na: Optional[bool] = False) -> pd.DataFrame:
    """
    Filter a DataFrame to include only rows where the specified column values are in the given list.
    
    Parameters
    ----------
    df : pd.DataFrame
        The DataFrame to filter
    column : str
        The column name to filter on
    values : List[Union[str, int, float]]
        List of values to include
    keep_na : bool, optional
        Whether to keep NA values in the filtered result, by default False
    
    Returns
    -------
    pd.DataFrame
        Filtered DataFrame containing only rows where the column value is in the values list
        
    Examples
    --------
    >>> df = pd.DataFrame({'A': [1, 2, 3, None], 'B': ['a', 'b', 'c', 'd']})
    >>> filter_dataframe(df, 'A', [1, 3])
       A  B
    0  1  a
    2  3  c
    
    >>> filter_dataframe(df, 'A', [1, 3], keep_na=True)
        A  B
    0   1  a
    2   3  c
    3  None  d
    """
    if keep_na:
        return df[df[column].isin(values) | df[column].isna()]
    else:
        return df[df[column].isin(values)]

# Example usage
df = pd.DataFrame({
    'A': [1, 2, 3, None], 
    'B': ['a', 'b', 'c', 'd']
})

print("Original DataFrame:")
print(df)
print("\nFiltered DataFrame (without NAs):")
print(filter_dataframe(df, 'A', [1, 3]))
print("\nFiltered DataFrame (with NAs):")
print(filter_dataframe(df, 'A', [1, 3], keep_na=True))