#### Pandas Tutorial - Part 34

This notebook covers:
- More on Pandas options and settings
- Enhancing performance with pandas.eval()

In [1]:
import pandas as pd
import numpy as np
import time

%matplotlib inline

##### Pandas Options and Settings (Continued)

Continuing from Part 33, let's explore more about pandas options and settings.

In [2]:
# Reset all display options
pd.reset_option("^display")

### Using option_context

The `option_context` context manager allows you to execute code with given option values. Option values are restored automatically when you exit the with block.

In [3]:
# Using option_context to temporarily change options
with pd.option_context("display.max_rows", 10, "display.max_columns", 5):
    print(pd.get_option("display.max_rows"))
    print(pd.get_option("display.max_columns"))

# Options are restored to their previous values
print(pd.get_option("display.max_rows"))
print(pd.get_option("display.max_columns"))

10
5
60
20


### Setting Startup Options

You can set startup options in Python/IPython environment by creating a `.py` or `.ipy` script in the startup directory of the desired profile. For example, in a default ipython profile, the startup folder is at `$IPYTHONDIR/profile_default/startup`.

An example startup script for pandas might look like:

```python
import pandas as pd
pd.set_option('display.max_rows', 999)
pd.set_option('precision', 5)
```

### Frequently Used Options

Let's explore some of the most frequently used display options.

#### display.max_rows and display.max_columns

These options set the maximum number of rows and columns displayed when a frame is pretty-printed. Truncated lines are replaced by an ellipsis.

In [7]:
import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame(np.random.randn(7, 2))
print("Original DataFrame:")
print(df)

# Use the fully qualified option name
print("\nSetting display.max_rows to 7:")
pd.set_option('display.max_rows', 7)
print(df)

# Let's also demonstrate some other common display options
print("\nSetting display.max_columns to 2:")
pd.set_option('display.max_columns', 2)
print(df)

# Reset to defaults
print("\nResetting to defaults:")
pd.reset_option('display.max_rows')
pd.reset_option('display.max_columns')
print(df)

# Get current value of an option
print("\nCurrent value of display.max_rows:", pd.get_option('display.max_rows'))

# List some common display options manually
print("\nSome common display options:")
print("- display.max_rows: Maximum number of rows to display")
print("- display.max_columns: Maximum number of columns to display")
print("- display.precision: Floating point output precision in terms of number of places after the decimal")
print("- display.float_format: Formatter for floating point numbers")
print("- display.width: Width of the display in characters")
print("- display.expand_frame_repr: Whether to print dataframes to fill the display width")
print("- display.colheader_justify: Alignment of column headers")
print("- display.notebook_repr_html: Whether to use HTML representation for DataFrame in IPython notebook")

Original DataFrame:
          0         1
0 -0.927496  0.139893
1  1.574129  0.640052
2 -0.407168 -0.342235
3  0.019597  1.042983
4 -1.126250  0.768523
5  0.842572 -1.299073
6  1.449984  1.521035

Setting display.max_rows to 7:
          0         1
0 -0.927496  0.139893
1  1.574129  0.640052
2 -0.407168 -0.342235
3  0.019597  1.042983
4 -1.126250  0.768523
5  0.842572 -1.299073
6  1.449984  1.521035

Setting display.max_columns to 2:
          0         1
0 -0.927496  0.139893
1  1.574129  0.640052
2 -0.407168 -0.342235
3  0.019597  1.042983
4 -1.126250  0.768523
5  0.842572 -1.299073
6  1.449984  1.521035

Resetting to defaults:
          0         1
0 -0.927496  0.139893
1  1.574129  0.640052
2 -0.407168 -0.342235
3  0.019597  1.042983
4 -1.126250  0.768523
5  0.842572 -1.299073
6  1.449984  1.521035

Current value of display.max_rows: 60

Some common display options:
- display.max_rows: Maximum number of rows to display
- display.max_columns: Maximum number of columns to display
- 

In [9]:
import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame(np.random.randn(7, 2))
print("Original DataFrame with 7 rows:")
print(df)

# Use the fully qualified option name
print("\nSetting display.max_rows to 5 (truncate display):")
pd.set_option('display.max_rows', 5)
print(df)

# Verify that the display is truncated
print("\nNotice that the middle row is replaced with '...' when display.max_rows=5")

# Reset to show all rows
print("\nResetting to show all rows:")
pd.set_option('display.max_rows', 7)
print(df)

# Demonstrate min_rows option
print("\nSetting display.min_rows to 3:")
pd.set_option('display.min_rows', 3)
print(df)

# Reset all options to defaults
print("\nResetting all options to defaults:")
pd.reset_option('all')
print(df)

Original DataFrame with 7 rows:
          0         1
0  0.793831 -2.044238
1 -0.441312 -1.527862
2 -2.451083  2.006875
3  0.875346  0.138094
4  0.851321 -0.043500
5 -1.240881  0.321155
6  0.024097 -0.361155

Setting display.max_rows to 5 (truncate display):
           0         1
0   0.793831 -2.044238
1  -0.441312 -1.527862
..       ...       ...
5  -1.240881  0.321155
6   0.024097 -0.361155

[7 rows x 2 columns]

Notice that the middle row is replaced with '...' when display.max_rows=5

Resetting to show all rows:
          0         1
0  0.793831 -2.044238
1 -0.441312 -1.527862
2 -2.451083  2.006875
3  0.875346  0.138094
4  0.851321 -0.043500
5 -1.240881  0.321155
6  0.024097 -0.361155

Setting display.min_rows to 3:
          0         1
0  0.793831 -2.044238
1 -0.441312 -1.527862
2 -2.451083  2.006875
3  0.875346  0.138094
4  0.851321 -0.043500
5 -1.240881  0.321155
6  0.024097 -0.361155

Resetting all options to defaults:
          0         1
0  0.793831 -2.044238
1 -0.441312 -

  pd.reset_option('all')
  pd.reset_option('all')


In [10]:
# Reset to default
pd.reset_option('max_rows')

#### display.min_rows

Once the `display.max_rows` is exceeded, the `display.min_rows` option determines how many rows are shown in the truncated representation.

In [12]:
import pandas as pd
import numpy as np

# Set max_rows and min_rows using fully qualified option names
pd.set_option('display.max_rows', 8)
pd.set_option('display.min_rows', 4)

# Below max_rows -> all rows shown
df = pd.DataFrame(np.random.randn(7, 2))
print("DataFrame with 7 rows (below max_rows of 8, all rows shown):")
print(df)

# Above max_rows -> truncated display
df_large = pd.DataFrame(np.random.randn(10, 2))
print("\nDataFrame with 10 rows (above max_rows of 8, truncated display):")
print(df_large)

# Small DataFrame -> min_rows enforced
df_small = pd.DataFrame(np.random.randn(3, 2))
print("\nDataFrame with 3 rows (below min_rows of 4, min_rows enforced):")
print(df_small)

# Reset options to defaults
print("\nResetting options to defaults:")
pd.reset_option('display.max_rows')
pd.reset_option('display.min_rows')

DataFrame with 7 rows (below max_rows of 8, all rows shown):
          0         1
0 -1.544378 -0.800139
1 -1.663644 -0.438397
2 -0.050639  0.809520
3 -0.462558 -1.415430
4  0.083647 -2.220747
5 -1.282211 -0.865768
6 -1.364048  0.195216

DataFrame with 10 rows (above max_rows of 8, truncated display):
           0         1
0   0.337388 -0.087717
1  -0.273936  0.749711
..       ...       ...
8  -0.725696  0.894212
9   1.768393  0.941977

[10 rows x 2 columns]

DataFrame with 3 rows (below min_rows of 4, min_rows enforced):
          0         1
0  0.006476  0.098658
1  1.064371  2.522334
2  0.690075  0.371259

Resetting options to defaults:


In [13]:
# Above max_rows -> only min_rows (4) rows shown
df = pd.DataFrame(np.random.randn(9, 2))
df

Unnamed: 0,0,1
0,-0.243665,-1.689881
1,2.196061,0.596462
2,-0.876289,1.12947
3,-2.249422,0.616063
4,-1.337313,-0.433469
5,0.284723,-0.128869
6,1.953385,1.471074
7,-0.194825,0.63969
8,0.90844,-0.184461


In [14]:
# Reset options
pd.reset_option('max_rows')
pd.reset_option('min_rows')

#### display.expand_frame_repr

This option allows for the representation of dataframes to stretch across pages, wrapped over the full column vs row-wise.

In [15]:
# Create a wider DataFrame
df = pd.DataFrame(np.random.randn(5, 10))

# Set expand_frame_repr to True
pd.set_option('expand_frame_repr', True)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.711213,0.716578,0.348995,-0.107725,0.017254,0.959268,-0.315211,0.115822,0.049168,0.798565
1,-0.059231,0.358736,1.019495,-0.308093,1.776857,-1.087037,-0.918249,-0.316566,-1.261562,-0.673513
2,0.195051,1.027643,-0.876066,0.690749,1.130593,-0.749288,0.630534,-0.235007,-0.924213,-1.767366
3,1.296595,-0.449367,0.364136,1.122711,1.749654,-0.647843,-0.702419,-0.228711,2.391075,0.090017
4,0.521081,-0.661773,0.217504,0.403717,-1.348578,0.328468,-0.287249,1.333994,-1.12352,0.687911


##### Enhancing Performance with pandas.eval()

Pandas provides the `eval()` function which allows you to evaluate a string describing operations on pandas objects. This can lead to improved performance for certain types of operations.

In [16]:
# Create some large DataFrames for demonstration
nrows, ncols = 20000, 100
df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) for _ in range(4)]

### Basic Usage of pandas.eval()

The `eval()` function evaluates a string describing operations on pandas objects.

In [17]:
# Using variables in the current namespace
a, b = 1, 2
pd.eval('a + b')

np.int64(3)

In [18]:
# The @ prefix is not allowed in top-level eval calls
try:
    pd.eval('@a + b')
except SyntaxError as e:
    print(e)

The '@' prefix is not allowed in top-level eval calls.
please refer to your variables by name without the '@' prefix.


### pandas.eval() Parsers

There are two different parsers you can use as the backend:
- The default 'pandas' parser allows a more intuitive syntax for expressing query-like operations
- The 'python' parser enforces strict Python semantics

In [19]:
# Using the 'python' parser with parentheses
expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)'
x = pd.eval(expr, parser='python')

# Using the 'pandas' parser without parentheses
expr_no_parens = 'df1 > 0 & df2 > 0 & df3 > 0 & df4 > 0'
y = pd.eval(expr_no_parens, parser='pandas')

# Check if results are the same
np.all(x == y)

np.True_

In [20]:
# Using 'and' instead of '&'
expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)'
x = pd.eval(expr, parser='python')

expr_with_ands = 'df1 > 0 and df2 > 0 and df3 > 0 and df4 > 0'
y = pd.eval(expr_with_ands, parser='pandas')

# Check if results are the same
np.all(x == y)

np.True_

### pandas.eval() Backends

There's also the option to make `eval()` operate identical to plain Python using the 'python' engine. However, this generally provides no performance benefits and may even be slower.

In [21]:
# Compare performance
%timeit df1 + df2 + df3 + df4
%timeit pd.eval('df1 + df2 + df3 + df4', engine='python')

2.72 ms ± 191 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.92 ms ± 371 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### pandas.eval() Performance

`eval()` is intended to speed up certain kinds of operations, particularly those involving complex expressions with large DataFrame/Series objects. Let's compare the performance of regular Python operations versus using `eval()`.

In [22]:
# Regular Python operation
start = time.time()
result1 = df1 + df2 + df3 + df4
end = time.time()
print(f"Regular Python operation: {end - start:.6f} seconds")

# Using eval with 'numexpr' engine (default)
start = time.time()
result2 = pd.eval('df1 + df2 + df3 + df4')
end = time.time()
print(f"eval with 'numexpr' engine: {end - start:.6f} seconds")

# Verify results are the same
print(f"Results are equal: {result1.equals(result2)}")

Regular Python operation: 0.003755 seconds
eval with 'numexpr' engine: 0.003193 seconds
Results are equal: True


##### Conclusion

In this notebook, we've explored:

1. More pandas options and settings, including:
   - Using the `option_context` context manager
   - Setting startup options
   - Frequently used display options like `max_rows`, `min_rows`, and `expand_frame_repr`

2. Enhancing performance with `pandas.eval()`, including:
   - Basic usage
   - Different parsers ('pandas' vs 'python')
   - Different backends and their performance implications

These features provide powerful tools for customizing pandas behavior and improving performance for complex operations on large datasets.