#### Pandas Tutorial - Part 57: DataFrame Methods (abs, add, astype, at_time, between_time)

This notebook covers several important DataFrame methods including:
- `abs()` - Get absolute values
- `add()` - Addition operation
- `astype()` - Convert data types
- `at_time()` - Select values at a particular time of day
- `between_time()` - Select values between particular times of the day

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### 1. DataFrame.abs()

The `abs()` method returns a DataFrame with the absolute value of each element.

In [2]:
# Create a sample DataFrame with positive and negative values
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})
print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,a,b,c
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [3]:
# Get absolute values
print("DataFrame with absolute values:")
df.abs()

DataFrame with absolute values:


Unnamed: 0,a,b,c
0,4,10,100
1,5,20,50
2,6,30,30
3,7,40,50


In [4]:
# Using abs() with argsort() to sort by how close values are to a reference point
print("Sorting by proximity to 43:")
df.loc[(df.c - 43).abs().argsort()]

Sorting by proximity to 43:


Unnamed: 0,a,b,c
1,5,20,50
0,4,10,100
2,6,30,-30
3,7,40,-50


##### 2. DataFrame.add()

The `add()` method performs addition of a DataFrame with another object (DataFrame, Series, or scalar) element-wise.

In [5]:
# Create a sample DataFrame
df = pd.DataFrame({
    'angles': [0, 3, 4],
    'degrees': [360, 180, 360]
}, index=['circle', 'triangle', 'rectangle'])
print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,angles,degrees
circle,0,360
triangle,3,180
rectangle,4,360


In [6]:
# Add a scalar value
print("Adding 1 using operator:")
df + 1

Adding 1 using operator:


Unnamed: 0,angles,degrees
circle,1,361
triangle,4,181
rectangle,5,361


In [7]:
# Add a scalar using the add() method
print("Adding 1 using add() method:")
df.add(1)

Adding 1 using add() method:


Unnamed: 0,angles,degrees
circle,1,361
triangle,4,181
rectangle,5,361


In [8]:
# Create another DataFrame with some overlapping indices
df2 = pd.DataFrame({
    'angles': [1, 2],
    'degrees': [10, 20]
}, index=['circle', 'square'])
print("Second DataFrame:")
df2

Second DataFrame:


Unnamed: 0,angles,degrees
circle,1,10
square,2,20


In [9]:
# Add two DataFrames
print("Adding two DataFrames:")
df.add(df2)

Adding two DataFrames:


Unnamed: 0,angles,degrees
circle,1.0,370.0
rectangle,,
square,,
triangle,,


In [10]:
# Add with fill_value to handle NaN values
print("Adding with fill_value=0:")
df.add(df2, fill_value=0)

Adding with fill_value=0:


Unnamed: 0,angles,degrees
circle,1.0,370.0
rectangle,4.0,360.0
square,2.0,20.0
triangle,3.0,180.0


In [11]:
# Divide by constant
print("Dividing by 10:")
df.div(10)

Dividing by 10:


Unnamed: 0,angles,degrees
circle,0.0,36.0
triangle,0.3,18.0
rectangle,0.4,36.0


In [12]:
# Reverse division (10/df)
print("Reverse division (10/df):")
df.rdiv(10)

Reverse division (10/df):


Unnamed: 0,angles,degrees
circle,inf,0.027778
triangle,3.333333,0.055556
rectangle,2.5,0.027778


##### 3. DataFrame.astype()

The `astype()` method is used to cast a pandas object to a specified dtype.

In [13]:
# Create a sample DataFrame
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
print("Original DataFrame data types:")
df.dtypes

Original DataFrame data types:


col1    int64
col2    int64
dtype: object

In [14]:
# Cast all columns to int32
print("Cast all columns to int32:")
df.astype('int32').dtypes

Cast all columns to int32:


col1    int32
col2    int32
dtype: object

In [15]:
# Cast only col1 to int32 using a dictionary
print("Cast only col1 to int32:")
df.astype({'col1': 'int32'}).dtypes

Cast only col1 to int32:


col1    int32
col2    int64
dtype: object

In [16]:
# Create a Series
ser = pd.Series([1, 2], dtype='int32')
print("Original Series:")
print(ser)
print("\nOriginal dtype:")
print(ser.dtype)

Original Series:
0    1
1    2
dtype: int32

Original dtype:
int32


In [17]:
# Convert to int64
print("Convert to int64:")
ser_int64 = ser.astype('int64')
print(ser_int64)
print("\nNew dtype:")
print(ser_int64.dtype)

Convert to int64:
0    1
1    2
dtype: int64

New dtype:
int64


In [18]:
# Convert to categorical type
print("Convert to categorical type:")
ser_cat = ser.astype('category')
print(ser_cat)
print("\nCategorical dtype info:")
print(ser_cat.dtype)

Convert to categorical type:
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Categorical dtype info:
category


In [19]:
# Convert to ordered categorical type with custom ordering
print("Convert to ordered categorical with custom ordering:")
cat_dtype = pd.api.types.CategoricalDtype(categories=[2, 1], ordered=True)
ser_ordered = ser.astype(cat_dtype)
print(ser_ordered)
print("\nOrdered categorical dtype info:")
print(ser_ordered.dtype)

Convert to ordered categorical with custom ordering:
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Ordered categorical dtype info:
category


In [20]:
# Demonstrate copy=False behavior
s1 = pd.Series([1, 2])
print("Original Series s1:")
print(s1)

# Convert with copy=False
s2 = s1.astype('int64', copy=False)
print("\nConverted Series s2:")
print(s2)

# Modify s2 and see if s1 changes
s2[0] = 10
print("\nAfter modifying s2, s1 is:")
print(s1)

Original Series s1:
0    1
1    2
dtype: int64

Converted Series s2:
0    1
1    2
dtype: int64

After modifying s2, s1 is:
0    10
1     2
dtype: int64


##### 4. DataFrame.at_time()

The `at_time()` method selects values at a particular time of day (e.g., 9:30 AM).

In [21]:
# Create a DataFrame with DatetimeIndex
i = pd.date_range('2018-04-09', periods=4, freq='12H')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
print("DataFrame with DatetimeIndex:")
ts

DataFrame with DatetimeIndex:


  i = pd.date_range('2018-04-09', periods=4, freq='12H')


Unnamed: 0,A
2018-04-09 00:00:00,1
2018-04-09 12:00:00,2
2018-04-10 00:00:00,3
2018-04-10 12:00:00,4


In [22]:
# Select values at 12:00
print("Values at 12:00:")
ts.at_time('12:00')

Values at 12:00:


Unnamed: 0,A
2018-04-09 12:00:00,2
2018-04-10 12:00:00,4


In [23]:
# Create a more detailed DataFrame with different times
i = pd.date_range('2018-04-09', periods=10, freq='3H')
ts2 = pd.DataFrame({'A': range(10)}, index=i)
print("More detailed DataFrame:")
ts2

More detailed DataFrame:


  i = pd.date_range('2018-04-09', periods=10, freq='3H')


Unnamed: 0,A
2018-04-09 00:00:00,0
2018-04-09 03:00:00,1
2018-04-09 06:00:00,2
2018-04-09 09:00:00,3
2018-04-09 12:00:00,4
2018-04-09 15:00:00,5
2018-04-09 18:00:00,6
2018-04-09 21:00:00,7
2018-04-10 00:00:00,8
2018-04-10 03:00:00,9


In [24]:
# Select values at 00:00
print("Values at 00:00:")
ts2.at_time('00:00')

Values at 00:00:


Unnamed: 0,A
2018-04-09,0
2018-04-10,8


In [25]:
# Select values at 03:00
print("Values at 03:00:")
ts2.at_time('03:00')

Values at 03:00:


Unnamed: 0,A
2018-04-09 03:00:00,1
2018-04-10 03:00:00,9


##### 5. DataFrame.between_time()

The `between_time()` method selects values between particular times of the day (e.g., 9:00-9:30 AM).

In [26]:
# Using the same DataFrame as above
print("Original DataFrame:")
ts2

Original DataFrame:


Unnamed: 0,A
2018-04-09 00:00:00,0
2018-04-09 03:00:00,1
2018-04-09 06:00:00,2
2018-04-09 09:00:00,3
2018-04-09 12:00:00,4
2018-04-09 15:00:00,5
2018-04-09 18:00:00,6
2018-04-09 21:00:00,7
2018-04-10 00:00:00,8
2018-04-10 03:00:00,9


In [27]:
# Select values between 00:00 and 06:00
print("Values between 00:00 and 06:00:")
ts2.between_time('00:00', '06:00')

Values between 00:00 and 06:00:


Unnamed: 0,A
2018-04-09 00:00:00,0
2018-04-09 03:00:00,1
2018-04-09 06:00:00,2
2018-04-10 00:00:00,8
2018-04-10 03:00:00,9


In [28]:
# Select values between 09:00 and 18:00
print("Values between 09:00 and 18:00:")
ts2.between_time('09:00', '18:00')

Values between 09:00 and 18:00:


Unnamed: 0,A
2018-04-09 09:00:00,3
2018-04-09 12:00:00,4
2018-04-09 15:00:00,5
2018-04-09 18:00:00,6


In [29]:
# Select values NOT between 09:00 and 18:00 by reversing the order
print("Values NOT between 09:00 and 18:00:")
ts2.between_time('18:00', '09:00')

Values NOT between 09:00 and 18:00:


Unnamed: 0,A
2018-04-09 00:00:00,0
2018-04-09 03:00:00,1
2018-04-09 06:00:00,2
2018-04-09 09:00:00,3
2018-04-09 18:00:00,6
2018-04-09 21:00:00,7
2018-04-10 00:00:00,8
2018-04-10 03:00:00,9


##### Summary

In this notebook, we've explored several important DataFrame methods:

1. **abs()**: Returns absolute values of DataFrame elements
2. **add()**: Performs element-wise addition with support for filling missing values
3. **astype()**: Converts DataFrame or Series to different data types
4. **at_time()**: Selects values at specific times of day from a time-indexed DataFrame
5. **between_time()**: Selects values between specific times of day from a time-indexed DataFrame

These methods are essential for data manipulation, type conversion, and time-based filtering in pandas.