# Pandas Tutorial - Part 34

This notebook covers:
- More on Pandas options and settings
- Enhancing performance with pandas.eval()

In [None]:
import pandas as pd
import numpy as np
import time

%matplotlib inline

## Pandas Options and Settings (Continued)

Continuing from Part 33, let's explore more about pandas options and settings.

In [None]:
# Reset all display options
pd.reset_option("^display")

### Using option_context

The `option_context` context manager allows you to execute code with given option values. Option values are restored automatically when you exit the with block.

In [None]:
# Using option_context to temporarily change options
with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
    print(pd.get_option("display.max_rows"))
    print(pd.get_option("display.max_columns"))

# Options are restored to their previous values
print(pd.get_option("display.max_rows"))
print(pd.get_option("display.max_columns"))

### Setting Startup Options

You can set startup options in Python/IPython environment by creating a `.py` or `.ipy` script in the startup directory of the desired profile. For example, in a default ipython profile, the startup folder is at `$IPYTHONDIR/profile_default/startup`.

An example startup script for pandas might look like:

```python
import pandas as pd
pd.set_option('display.max_rows', 999)
pd.set_option('precision', 5)
```

### Frequently Used Options

Let's explore some of the most frequently used display options.

#### display.max_rows and display.max_columns

These options set the maximum number of rows and columns displayed when a frame is pretty-printed. Truncated lines are replaced by an ellipsis.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame(np.random.randn(7, 2))

# Set max_rows to 7 (show all rows)
pd.set_option('max_rows', 7)
df

In [None]:
# Set max_rows to 5 (truncate display)
pd.set_option('max_rows', 5)
df

In [None]:
# Reset to default
pd.reset_option('max_rows')

#### display.min_rows

Once the `display.max_rows` is exceeded, the `display.min_rows` option determines how many rows are shown in the truncated representation.

In [None]:
# Set max_rows and min_rows
pd.set_option('max_rows', 8)
pd.set_option('min_rows', 4)

# Below max_rows -> all rows shown
df = pd.DataFrame(np.random.randn(7, 2))
df

In [None]:
# Above max_rows -> only min_rows (4) rows shown
df = pd.DataFrame(np.random.randn(9, 2))
df

In [None]:
# Reset options
pd.reset_option('max_rows')
pd.reset_option('min_rows')

#### display.expand_frame_repr

This option allows for the representation of dataframes to stretch across pages, wrapped over the full column vs row-wise.

In [None]:
# Create a wider DataFrame
df = pd.DataFrame(np.random.randn(5, 10))

# Set expand_frame_repr to True
pd.set_option('expand_frame_repr', True)
df

## Enhancing Performance with pandas.eval()

Pandas provides the `eval()` function which allows you to evaluate a string describing operations on pandas objects. This can lead to improved performance for certain types of operations.

In [None]:
# Create some large DataFrames for demonstration
nrows, ncols = 20000, 100
df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) for _ in range(4)]

### Basic Usage of pandas.eval()

The `eval()` function evaluates a string describing operations on pandas objects.

In [None]:
# Using variables in the current namespace
a, b = 1, 2
pd.eval('a + b')

In [None]:
# The @ prefix is not allowed in top-level eval calls
try:
    pd.eval('@a + b')
except SyntaxError as e:
    print(e)

### pandas.eval() Parsers

There are two different parsers you can use as the backend:
- The default 'pandas' parser allows a more intuitive syntax for expressing query-like operations
- The 'python' parser enforces strict Python semantics

In [None]:
# Using the 'python' parser with parentheses
expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)'
x = pd.eval(expr, parser='python')

# Using the 'pandas' parser without parentheses
expr_no_parens = 'df1 > 0 & df2 > 0 & df3 > 0 & df4 > 0'
y = pd.eval(expr_no_parens, parser='pandas')

# Check if results are the same
np.all(x == y)

In [None]:
# Using 'and' instead of '&'
expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)'
x = pd.eval(expr, parser='python')

expr_with_ands = 'df1 > 0 and df2 > 0 and df3 > 0 and df4 > 0'
y = pd.eval(expr_with_ands, parser='pandas')

# Check if results are the same
np.all(x == y)

### pandas.eval() Backends

There's also the option to make `eval()` operate identical to plain Python using the 'python' engine. However, this generally provides no performance benefits and may even be slower.

In [None]:
# Compare performance
%timeit df1 + df2 + df3 + df4
%timeit pd.eval('df1 + df2 + df3 + df4', engine='python')

### pandas.eval() Performance

`eval()` is intended to speed up certain kinds of operations, particularly those involving complex expressions with large DataFrame/Series objects. Let's compare the performance of regular Python operations versus using `eval()`.

In [None]:
# Regular Python operation
start = time.time()
result1 = df1 + df2 + df3 + df4
end = time.time()
print(f"Regular Python operation: {end - start:.6f} seconds")

# Using eval with 'numexpr' engine (default)
start = time.time()
result2 = pd.eval('df1 + df2 + df3 + df4')
end = time.time()
print(f"eval with 'numexpr' engine: {end - start:.6f} seconds")

# Verify results are the same
print(f"Results are equal: {result1.equals(result2)}")

## Conclusion

In this notebook, we've explored:

1. More pandas options and settings, including:
   - Using the `option_context` context manager
   - Setting startup options
   - Frequently used display options like `max_rows`, `min_rows`, and `expand_frame_repr`

2. Enhancing performance with `pandas.eval()`, including:
   - Basic usage
   - Different parsers ('pandas' vs 'python')
   - Different backends and their performance implications

These features provide powerful tools for customizing pandas behavior and improving performance for complex operations on large datasets.