# Part 30: Advanced GroupBy and Time Series Operations

In this notebook, we'll explore:
- Grouping with ordered factors
- Using pd.Grouper for time-based grouping
- Taking specific rows from each group
- Time series partial string indexing

## Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime

# Set the plotting style
plt.style.use('ggplot')

# Make plots appear in the notebook
%matplotlib inline

## 1. Grouping with Ordered Factors

Categorical variables represented as instances of pandas's Categorical class can be used as group keys. If so, the order of the levels will be preserved:

In [None]:
# Create a Series of random data
data = pd.Series(np.random.randn(100))

# Create quartiles as an ordered categorical
factor = pd.qcut(data, [0, .25, .5, .75, 1.])

# Group by the factor and compute the mean
data.groupby(factor).mean()

## 2. Grouping with a Grouper Specification

You may need to specify a bit more data to properly group. You can use the `pd.Grouper` to provide this local control, especially for time-based grouping.

In [None]:
# Create a DataFrame with date information
df = pd.DataFrame({
    'Branch': 'A A A A A A A B'.split(),
    'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
    'Quantity': [1, 3, 5, 1, 8, 1, 9, 3],
    'Date': [
        datetime.datetime(2013, 1, 1, 13, 0),
        datetime.datetime(2013, 1, 1, 13, 5),
        datetime.datetime(2013, 10, 1, 20, 0),
        datetime.datetime(2013, 10, 2, 10, 0),
        datetime.datetime(2013, 10, 1, 20, 0),
        datetime.datetime(2013, 10, 2, 10, 0),
        datetime.datetime(2013, 12, 2, 12, 0),
        datetime.datetime(2013, 12, 2, 14, 0)
    ]
})
df

### 2.1 Groupby a Specific Column with the Desired Frequency

This is similar to resampling, but within the groupby framework:

In [None]:
# Group by month and buyer
df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer']).sum()

### 2.2 Working with a DatetimeIndex

In [None]:
# Set the Date column as index
df = df.set_index('Date')

# Create a new Date column that's the index plus 2 months
df['Date'] = df.index + pd.offsets.MonthEnd(2)
df

In [None]:
# Group by the Date column with a 6-month frequency
df.groupby([pd.Grouper(freq='6M', key='Date'), 'Buyer']).sum()

In [None]:
# Group by the index with a 6-month frequency
df.groupby([pd.Grouper(freq='6M', level='Date'), 'Buyer']).sum()

## 3. Taking the First Rows of Each Group

Just like for a DataFrame or Series, you can call `head` and `tail` on a groupby:

In [None]:
# Create a simple DataFrame
df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
df

In [None]:
# Group by column A
g = df.groupby('A')

# Get the first row of each group
g.head(1)

In [None]:
# Get the last row of each group
g.tail(1)

## 4. Taking the nth Row of Each Group

To select from a DataFrame or Series the nth item, use `nth()`. This is a reduction method, and will return a single row (or no row) per group if you pass an int for n:

In [None]:
# Create a DataFrame with some NaN values
df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B'])
df

In [None]:
# Group by column A
g = df.groupby('A')

# Get the first row (index 0) of each group
g.nth(0)

## 5. Time Series Operations

Let's explore some time series operations, particularly partial string indexing.

In [None]:
# Create a time series
ts = pd.Series(np.random.randn(12),
              index=pd.date_range('1/31/2011', periods=12, freq='BM'))
ts

### 5.1 Partial String Indexing

Dates and strings that parse to timestamps can be passed as indexing parameters:

In [None]:
# Access by date string
ts['1/31/2011']

In [None]:
# Access by datetime object
ts[datetime.datetime(2011, 12, 25):]

In [None]:
# Access by date range
ts['10/31/2011':'12/31/2011']

To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:

In [None]:
# Access all data for 2011
ts['2011']

In [None]:
# Access data for June 2011
ts['2011-6']

### 5.2 Partial String Indexing with DataFrames

This type of slicing will work on a DataFrame with a DatetimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included.

In [None]:
# Create a DataFrame with a DatetimeIndex
dft = pd.DataFrame(np.random.randn(100000, 1), columns=['A'],
                  index=pd.date_range('20130101', periods=100000, freq='T'))
dft.head()

In [None]:
# Access all data for 2013
dft['2013'].head()

In [None]:
# This starts on the very first time in the month, and includes the last date and time for the month
dft['2013-1':'2013-2'].head()

## 6. Practical Examples

Let's put these concepts together in some practical examples:

In [None]:
# Create a DataFrame with sales data
dates = pd.date_range('2022-01-01', periods=365)
sales = pd.DataFrame({
    'date': dates,
    'product': np.random.choice(['A', 'B', 'C'], size=365),
    'store': np.random.choice(['North', 'South', 'East', 'West'], size=365),
    'sales': np.random.randint(100, 1000, size=365),
    'units': np.random.randint(1, 20, size=365)
})
sales.head()

In [None]:
# Set the date as index
sales = sales.set_index('date')
sales.head()

In [None]:
# Group by month and product
monthly_product_sales = sales.groupby([pd.Grouper(freq='M'), 'product']).agg({
    'sales': 'sum',
    'units': 'sum'
})
monthly_product_sales.head(10)

In [None]:
# Visualize monthly sales by product
monthly_sales_by_product = sales.groupby([pd.Grouper(freq='M'), 'product'])['sales'].sum().unstack()
monthly_sales_by_product.plot(figsize=(12, 6), title='Monthly Sales by Product')

In [None]:
# Get the top-selling store for each month
monthly_store_sales = sales.groupby([pd.Grouper(freq='M'), 'store'])['sales'].sum().reset_index()
top_stores = monthly_store_sales.sort_values(['date', 'sales'], ascending=[True, False]).groupby('date').head(1)
top_stores

In [None]:
# Find the best-selling product for each quarter
quarterly_product_sales = sales.groupby([pd.Grouper(freq='Q'), 'product'])['sales'].sum().reset_index()
top_products = quarterly_product_sales.sort_values(['date', 'sales'], ascending=[True, False]).groupby('date').head(1)
top_products

## Summary

In this notebook, we've explored:

1. Grouping with ordered factors using categorical data
2. Using pd.Grouper for time-based grouping
   - Grouping by specific columns with desired frequencies
   - Working with DatetimeIndex
3. Taking specific rows from each group
   - Using head() and tail() on groupby objects
   - Using nth() to select specific rows
4. Time series partial string indexing
   - Accessing data by date strings
   - Accessing data by year or year-month
   - Working with DataFrames with DatetimeIndex
5. Practical examples combining these concepts

These techniques provide powerful tools for time-based analysis and grouping in pandas.