#### Pandas Tutorial - Part 61: DataFrame Methods (info, mask)

This notebook covers important DataFrame methods including:
- `info()` - Print a concise summary of a DataFrame
- `mask()` - Replace values where the condition is True

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### 1. DataFrame.info()

The `info()` method prints a concise summary of a DataFrame, including the index dtype, column dtypes, non-null values, and memory usage.

In [2]:
# Create a DataFrame with different data types
int_values = [1, 2, 3, 4, 5]
text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
df = pd.DataFrame({
    "int_col": int_values, 
    "text_col": text_values,
    "float_col": float_values
})

print("DataFrame:")
df

DataFrame:


Unnamed: 0,int_col,text_col,float_col
0,1,alpha,0.0
1,2,beta,0.25
2,3,gamma,0.5
3,4,delta,0.75
4,5,epsilon,1.0


In [3]:
# Print information about the DataFrame
print("DataFrame info:")
df.info(verbose=True)

DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    5 non-null      int64  
 1   text_col   5 non-null      object 
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 252.0+ bytes


In [4]:
# Create a DataFrame with missing values
df_with_na = pd.DataFrame({
    "A": [1, 2, np.nan, 4, 5],
    "B": [np.nan, 2, 3, 4, 5],
    "C": [1, 2, 3, np.nan, np.nan]
})

print("DataFrame with missing values:")
df_with_na

DataFrame with missing values:


Unnamed: 0,A,B,C
0,1.0,,1.0
1,2.0,2.0,2.0
2,,3.0,3.0
3,4.0,4.0,
4,5.0,5.0,


In [5]:
# Print information about the DataFrame with missing values
print("DataFrame with missing values info:")
df_with_na.info()

DataFrame with missing values info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       4 non-null      float64
 1   B       4 non-null      float64
 2   C       3 non-null      float64
dtypes: float64(3)
memory usage: 252.0 bytes


In [6]:
# Create a larger DataFrame
large_df = pd.DataFrame({
    f"col_{i}": np.random.rand(1000) for i in range(20)
})

print("Large DataFrame info:")
large_df.info()

Large DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col_0   1000 non-null   float64
 1   col_1   1000 non-null   float64
 2   col_2   1000 non-null   float64
 3   col_3   1000 non-null   float64
 4   col_4   1000 non-null   float64
 5   col_5   1000 non-null   float64
 6   col_6   1000 non-null   float64
 7   col_7   1000 non-null   float64
 8   col_8   1000 non-null   float64
 9   col_9   1000 non-null   float64
 10  col_10  1000 non-null   float64
 11  col_11  1000 non-null   float64
 12  col_12  1000 non-null   float64
 13  col_13  1000 non-null   float64
 14  col_14  1000 non-null   float64
 15  col_15  1000 non-null   float64
 16  col_16  1000 non-null   float64
 17  col_17  1000 non-null   float64
 18  col_18  1000 non-null   float64
 19  col_19  1000 non-null   float64
dtypes: float64(20)
memory usage: 156.4 KB


In [7]:
# Show memory usage with deep introspection
print("Memory usage with deep introspection:")
df.info(memory_usage='deep')

Memory usage with deep introspection:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    5 non-null      int64  
 1   text_col   5 non-null      object 
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 483.0 bytes


In [8]:
# Customize max_cols parameter
print("Info with max_cols=2:")
large_df.info(max_cols=2)

Info with max_cols=2:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Columns: 20 entries, col_0 to col_19
dtypes: float64(20)
memory usage: 156.4 KB


##### 2. DataFrame.mask()

The `mask()` method replaces values where the condition is True. It's the opposite of `where()` method.

In [9]:
# Create a Series
s = pd.Series(range(5))
print("Original Series:")
print(s)

Original Series:
0    0
1    1
2    2
3    3
4    4
dtype: int64


In [10]:
# Using where() - keep values where condition is True
print("\nwhere(s > 0) - keep values where s > 0:")
print(s.where(s > 0))


where(s > 0) - keep values where s > 0:
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64


In [11]:
# Using mask() - replace values where condition is True
print("\nmask(s > 0) - replace values where s > 0:")
print(s.mask(s > 0))


mask(s > 0) - replace values where s > 0:
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64


In [12]:
# Using where() with a replacement value
print("\nwhere(s > 1, 10) - replace values where s <= 1 with 10:")
print(s.where(s > 1, 10))


where(s > 1, 10) - replace values where s <= 1 with 10:
0    10
1    10
2     2
3     3
4     4
dtype: int64


In [13]:
# Using mask() with a replacement value
print("\nmask(s > 1, 10) - replace values where s > 1 with 10:")
print(s.mask(s > 1, 10))


mask(s > 1, 10) - replace values where s > 1 with 10:
0     0
1     1
2    10
3    10
4    10
dtype: int64


In [14]:
# Create a DataFrame
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
print("Original DataFrame:")
print(df)

Original DataFrame:
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9


In [15]:
# Create a condition
m = df % 3 == 0
print("Condition (m = df % 3 == 0):")
print(m)

Condition (m = df % 3 == 0):
       A      B
0   True  False
1  False   True
2  False  False
3   True  False
4  False   True


In [16]:
# Using where() with the condition
print("\ndf.where(m, -df) - keep values where m is True, replace others with -df:")
print(df.where(m, -df))


df.where(m, -df) - keep values where m is True, replace others with -df:
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9


In [17]:
# Using mask() with the condition
print("\ndf.mask(m, -df) - replace values where m is True with -df:")
print(df.mask(m, -df))


df.mask(m, -df) - replace values where m is True with -df:
   A  B
0  0  1
1  2 -3
2  4  5
3 -6  7
4  8 -9


In [18]:
# Verify that where(m) is equivalent to mask(~m)
print("\nVerify that df.where(m, -df) == df.mask(~m, -df):")
print(df.where(m, -df) == df.mask(~m, -df))


Verify that df.where(m, -df) == df.mask(~m, -df):
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True


In [19]:
# Using mask() with a callable for the condition
print("\nUsing a callable for the condition:")
print(df.mask(lambda x: x > 5, 0))


Using a callable for the condition:
   A  B
0  0  1
1  2  3
2  4  5
3  0  0
4  0  0


In [20]:
# Using mask() with a callable for the replacement
print("\nUsing a callable for the replacement:")
print(df.mask(m, lambda x: x * 10))


Using a callable for the replacement:
    A   B
0   0   1
1   2  30
2   4   5
3  60   7
4   8  90


In [21]:
# Using mask() with inplace=True
df_copy = df.copy()
print("\nBefore mask() with inplace=True:")
print(df_copy)

df_copy.mask(m, -df, inplace=True)
print("\nAfter mask() with inplace=True:")
print(df_copy)


Before mask() with inplace=True:
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

After mask() with inplace=True:
   A  B
0  0  1
1  2 -3
2  4  5
3 -6  7
4  8 -9


##### Summary

In this notebook, we've explored two important DataFrame methods:

1. **info()**: Prints a concise summary of a DataFrame, including the index dtype, column dtypes, non-null values, and memory usage. This is useful for quickly understanding the structure and content of a DataFrame.

2. **mask()**: Replaces values where the condition is True. It's the opposite of the `where()` method, which keeps values where the condition is True. The `mask()` method is useful for data cleaning and transformation tasks.

These methods are essential for data exploration, understanding DataFrame structure, and data manipulation in pandas.