# Pandas Tutorial - Part 68: DataFrame Methods (to_timestamp and to_xarray)

This notebook covers two important DataFrame methods:
- `to_timestamp()` - Cast a PeriodIndex to DatetimeIndex of timestamps
- `to_xarray()` - Convert pandas DataFrame to xarray Dataset

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

## 1. DataFrame.to_timestamp()

The `to_timestamp()` method casts a DataFrame with a PeriodIndex to a DataFrame with a DatetimeIndex of timestamps, at the beginning or end of the period.

This method is particularly useful when you have data indexed by periods (like months, quarters, years) and you want to convert them to specific timestamps.

In [None]:
# Create a DataFrame with a PeriodIndex
periods = pd.period_range('2020-01', '2020-12', freq='M')
df_period = pd.DataFrame({
    'value': np.random.randn(len(periods)),
    'category': np.random.choice(['A', 'B', 'C'], len(periods))
}, index=periods)

print("DataFrame with PeriodIndex:")
print(df_period.head())
print("\nIndex type:", type(df_period.index))

In [None]:
# Convert to DatetimeIndex with timestamps at the start of each period
df_timestamp_start = df_period.to_timestamp(how='start')

print("DataFrame with DatetimeIndex (start of period):")
print(df_timestamp_start.head())
print("\nIndex type:", type(df_timestamp_start.index))

In [None]:
# Convert to DatetimeIndex with timestamps at the end of each period
df_timestamp_end = df_period.to_timestamp(how='end')

print("DataFrame with DatetimeIndex (end of period):")
print(df_timestamp_end.head())
print("\nIndex type:", type(df_timestamp_end.index))

In [None]:
# Create a DataFrame with a quarterly PeriodIndex
quarters = pd.period_range('2020Q1', '2021Q4', freq='Q')
df_quarter = pd.DataFrame({
    'sales': np.random.randint(100, 1000, len(quarters)),
    'expenses': np.random.randint(50, 500, len(quarters))
}, index=quarters)

print("DataFrame with quarterly PeriodIndex:")
print(df_quarter)

In [None]:
# Convert to DatetimeIndex with timestamps at the start of each quarter
df_quarter_start = df_quarter.to_timestamp(how='start')

print("DataFrame with DatetimeIndex (start of quarter):")
print(df_quarter_start)

In [None]:
# Convert to DatetimeIndex with timestamps at the end of each quarter
df_quarter_end = df_quarter.to_timestamp(how='end')

print("DataFrame with DatetimeIndex (end of quarter):")
print(df_quarter_end)

In [None]:
# Change the frequency when converting to timestamp
# For example, convert quarterly periods to monthly timestamps
df_quarter_monthly = df_quarter.to_timestamp(freq='M', how='start')

print("DataFrame with monthly DatetimeIndex (from quarterly periods):")
print(df_quarter_monthly)

In [None]:
# Create a DataFrame with a PeriodIndex in the columns
periods = pd.period_range('2020-01', '2020-06', freq='M')
df_period_cols = pd.DataFrame(np.random.randn(3, len(periods)), 
                             index=['A', 'B', 'C'],
                             columns=periods)

print("DataFrame with PeriodIndex in columns:")
print(df_period_cols)

In [None]:
# Convert column PeriodIndex to DatetimeIndex
df_timestamp_cols = df_period_cols.to_timestamp(axis=1)

print("DataFrame with DatetimeIndex in columns:")
print(df_timestamp_cols)

## 2. DataFrame.to_xarray()

The `to_xarray()` method converts a pandas DataFrame to an xarray Dataset. xarray is a library for working with labeled multi-dimensional arrays, particularly for working with NetCDF data and other gridded datasets common in geoscience and other scientific fields.

Note: To use this method, you need to have the xarray package installed. If you don't have it installed, you can install it with `pip install xarray`.

In [None]:
# Try to import xarray
try:
    import xarray as xr
    xarray_available = True
except ImportError:
    print("xarray is not installed. Install it with 'pip install xarray' to use DataFrame.to_xarray()")
    xarray_available = False

In [None]:
# Create a sample DataFrame
df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
                   ('parrot', 'bird', 24.0, 2),
                   ('lion', 'mammal', 80.5, 4),
                   ('monkey', 'mammal', np.nan, 4)],
                  columns=['name', 'class', 'max_speed', 'num_legs'])

print("Sample DataFrame:")
print(df)

In [None]:
# Convert DataFrame to xarray Dataset
if xarray_available:
    ds = df.to_xarray()
    print("xarray Dataset:")
    print(ds)
else:
    print("xarray is not available. Skipping conversion.")

In [None]:
# Convert a Series to xarray DataArray
if xarray_available:
    da = df['max_speed'].to_xarray()
    print("xarray DataArray from Series:")
    print(da)
else:
    print("xarray is not available. Skipping conversion.")

In [None]:
# Create a DataFrame with MultiIndex
dates = pd.to_datetime(['2018-01-01', '2018-01-01', '2018-01-02', '2018-01-02'])
df_multiindex = pd.DataFrame({
    'date': dates,
    'animal': ['falcon', 'parrot', 'falcon', 'parrot'],
    'speed': [350, 18, 361, 15]
})
df_multiindex = df_multiindex.set_index(['date', 'animal'])

print("DataFrame with MultiIndex:")
print(df_multiindex)

In [None]:
# Convert MultiIndex DataFrame to xarray Dataset
if xarray_available:
    ds_multi = df_multiindex.to_xarray()
    print("xarray Dataset from MultiIndex DataFrame:")
    print(ds_multi)
else:
    print("xarray is not available. Skipping conversion.")

In [None]:
# Create a DataFrame with datetime index
dates = pd.date_range('2020-01-01', periods=6, freq='D')
df_time = pd.DataFrame({
    'temperature': np.random.uniform(15, 25, 6),
    'humidity': np.random.uniform(40, 70, 6)
}, index=dates)

print("DataFrame with DatetimeIndex:")
print(df_time)

In [None]:
# Convert DataFrame with DatetimeIndex to xarray Dataset
if xarray_available:
    ds_time = df_time.to_xarray()
    print("xarray Dataset from DataFrame with DatetimeIndex:")
    print(ds_time)
else:
    print("xarray is not available. Skipping conversion.")

## Advantages of xarray

If xarray is available, here are some advantages of using xarray over pandas for certain types of data:

1. **N-dimensional data**: xarray is designed to handle N-dimensional data, while pandas is primarily for 1D (Series) and 2D (DataFrame) data.

2. **Named dimensions**: In xarray, dimensions have names, making it easier to understand what each dimension represents.

3. **Coordinate-based indexing**: xarray allows indexing by coordinate values rather than just integer positions.

4. **NetCDF compatibility**: xarray has excellent support for NetCDF files, a common format in scientific computing.

5. **Vectorized operations**: xarray supports vectorized operations along named dimensions.

Let's demonstrate some of these advantages if xarray is available:

In [None]:
if xarray_available:
    # Create a 3D xarray Dataset
    # Dimensions: time, latitude, longitude
    times = pd.date_range('2020-01-01', periods=3)
    lats = [0, 1, 2]  # latitude values
    lons = [10, 20, 30]  # longitude values
    
    # Create random data with shape (time, lat, lon)
    data = np.random.rand(len(times), len(lats), len(lons))
    
    # Create xarray Dataset
    ds_3d = xr.Dataset(
        data_vars={
            'temperature': (['time', 'lat', 'lon'], data)
        },
        coords={
            'time': times,
            'lat': lats,
            'lon': lons
        }
    )
    
    print("3D xarray Dataset:")
    print(ds_3d)
    
    # Select data for a specific time
    print("\nData for 2020-01-01:")
    print(ds_3d.sel(time='2020-01-01'))
    
    # Calculate mean across time dimension
    print("\nMean temperature across time:")
    print(ds_3d.temperature.mean(dim='time'))
else:
    print("xarray is not available. Skipping demonstration.")

## Summary

In this notebook, we've explored two important DataFrame methods:

1. **to_timestamp()**: Converts a DataFrame with a PeriodIndex to a DataFrame with a DatetimeIndex. This method is useful when you need to convert period-based data (like months, quarters, years) to specific timestamps. Key parameters include:
   - `freq`: Desired frequency for the timestamps
   - `how`: Convention for converting period to timestamp ('start' or 'end')
   - `axis`: The axis to convert (0 for index, 1 for columns)
   - `copy`: Whether to copy the underlying data

2. **to_xarray()**: Converts a pandas DataFrame to an xarray Dataset or a pandas Series to an xarray DataArray. This method is particularly useful for working with multi-dimensional labeled data, especially in scientific computing and data analysis. xarray provides:
   - Support for N-dimensional data
   - Named dimensions
   - Coordinate-based indexing
   - NetCDF compatibility
   - Vectorized operations along named dimensions

These methods extend pandas' functionality by providing interoperability with other data structures and formats, making it easier to work with different types of data and perform specialized analyses.