In [2]:
import numpy as np
import pandas as pd

### Boolean Comparisons

In [3]:
df = pd.DataFrame({
    'one': pd.Series(np.random.randn(3), index=['a','b','c']),
    'two': pd.Series(np.random.randn(4), index=['a','b','c','d']),
    'three': pd.Series(np.random.randn(3), index=['b','c','d'])
})

In [4]:
df2 = df.copy()

In [5]:
df.gt(df2)

Unnamed: 0,one,two,three
a,False,False,False
b,False,False,False
c,False,False,False
d,False,False,False


In [6]:
df2.ne(df)

Unnamed: 0,one,two,three
a,False,False,True
b,False,False,False
c,False,False,False
d,True,False,False


You can apply the reductions: empty, any(), all(), and bool() to provide a way to summarize a boolean result.

In [7]:
(df > 0).all()

one      False
two       True
three    False
dtype: bool

In [8]:
(df > 0).any()

one      True
two      True
three    True
dtype: bool

In [9]:
(df > 0).any().any()

True

To evaluate single-element pandas objects in a boolean context, use the method bool():

In [10]:
pd.Series([True]).bool()

True

In [11]:
pd.Series([False]).bool()

False

In [12]:
pd.DataFrame([[True]]).bool()

True

In [13]:
pd.DataFrame([[False]]).bool()

False

### Objects Comparison

You can conveniently perform element

In [14]:
pd.Series(['foo', 'bar', 'baz']) == 'foo'

0     True
1    False
2    False
dtype: bool

In [15]:
pd.Series(['foo', 'bar', 'baz']) == pd.Index(['foo', 'bar', 'qux'])

0     True
1     True
2    False
dtype: bool

In [16]:
(df + df == df * 2).all()

one      False
two       True
three    False
dtype: bool

In [17]:
df + df == df * 2

Unnamed: 0,one,two,three
a,True,True,False
b,True,True,True
c,True,True,True
d,False,True,True


In [18]:
np.nan == np.nan

False

In [19]:
(df + df ).equals(df * 2)

True

In [20]:
df.mean(0)

one     -0.254121
two      0.586216
three   -0.096417
dtype: float64

In [21]:
df.mean(1)

a   -0.007413
b    0.368172
c    0.543112
d   -0.712888
dtype: float64

In [22]:
ts_stand = (df - df.mean()) / df.std()

In [23]:
ts_stand.std()

one      1.0
two      1.0
three    1.0
dtype: float64

### Describe

There is a convenient describe() function which computes a variety of summary statistics about a series or the columns of a DataFrame

In [24]:
series = pd.Series(np.random.randn(1000))

In [25]:
series[::2] = np.nan

In [26]:
series.describe()

count    500.000000
mean       0.029692
std        1.011617
min       -2.883998
25%       -0.690335
50%        0.044806
75%        0.744250
max        2.930830
dtype: float64

In [27]:
frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])

In [28]:
frame.iloc[::2] = np.nan

In [29]:
frame.describe()

Unnamed: 0,a,b,c,d,e
count,500.0,500.0,500.0,500.0,500.0
mean,0.011941,0.02575,-0.034337,0.018805,-0.080744
std,1.009164,0.979099,1.004943,0.961321,1.002245
min,-3.084964,-3.062232,-2.95382,-2.642159,-3.689509
25%,-0.668343,-0.682364,-0.727141,-0.60089,-0.735162
50%,0.00218,0.006121,-0.009139,0.00548,-0.111746
75%,0.648776,0.698074,0.714659,0.57242,0.632332
max,3.288266,3.73714,2.461162,3.235756,2.376609


For a non-numerical Series object, describe() will give a simple summary of the number of unique values and the most frequently occurring values:

In [30]:
s = pd.Series(['a','a','b','a','a', np.nan, 'c','d','a'])

In [31]:
s.describe()

count     8
unique    4
top       a
freq      5
dtype: object

### Index of min/ max values

The idxmin() and idxmax() functions on Series and DataFrame compute the index labels with the minimum and maximum corresponding values

In [32]:
s1 = pd.Series(np.random.randn(5))

In [33]:
s1

0    0.849550
1   -0.715777
2   -0.097960
3    0.793169
4    0.272446
dtype: float64

In [34]:
s1.idxmin(), s1.idxmax()

(1, 0)

In [35]:
df1 = pd.DataFrame(np.random.randn(5,3), columns=['A','B','C'])

In [36]:
df1

Unnamed: 0,A,B,C
0,0.453303,0.042233,-0.143159
1,-0.304523,2.04362,-0.476466
2,0.357473,0.608878,1.287963
3,0.510554,1.694235,-0.713022
4,-1.264143,-0.227413,0.906641


In [38]:
df1.idxmin(axis=0)

A    4
B    4
C    3
dtype: int64

In [39]:
df1.idxmax(axis=1)

0    A
1    B
2    C
3    B
4    C
dtype: object

### Iterations

The behavior of basic iterations over pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iterations produces the values. DataFrames follow the dict-like convention of iterating over the keys of the objects.

In short, basic iteration (for i in object) produces:

* Series: Values
* DataFrame: Column Labels

In [40]:
df = pd.DataFrame({'col1': np.random.randn(3),
                   'col2': np.random.randn(3)}, index=['a','b','c'])

In [41]:
for col in df:
    print(col)

col1
col2


#### To iterate over the rows of a DataFrame, you can use the following methods:

* items(): to iterate over the (key, value) pairs
* iterrows(): Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change the dtypes and has some performance implications.
* itertuples(): Iterate over the rows of a DataFrame as namedtuples of the values. This is alot faster than iterrows() and is in most cases preferable to use to iterate over the values of a DataFrame.

### Items

Consistent with the dict-like interface, items() iterates through key value pairs:
* Series: (index, scalar value) pairs
* DataFrame: (column, Series) pairs

For Example:

In [42]:
df = pd.DataFrame({'a': [1,2,3], 'b': ['a','b','c']})

In [43]:
for label, ser in df.items():
    print(label)
    print(ser)

a
0    1
1    2
2    3
Name: a, dtype: int64
b
0    a
1    b
2    c
Name: b, dtype: object


### Iterrows

iterrows() allows you to iterate through the rows of DataFrame as Series objects. It returns an iterator yielding each index value along with a Series containing the data in each row:

In [44]:
for row_index, row in df.iterrows():
    print(row_index, row, sep='\n')

0
a    1
b    a
Name: 0, dtype: object
1
a    2
b    b
Name: 1, dtype: object
2
a    3
b    c
Name: 2, dtype: object


### Itertuple

The itertuples() method will return an iterator yielding a namedtuple for each row in the DataFrame. The first element of the tuple will be the row's corresponding index value, while the remaining values are the row values.

For Example:

In [45]:
for row in df.itertuples():
    print(row)

Pandas(Index=0, a=1, b='a')
Pandas(Index=1, a=2, b='b')
Pandas(Index=2, a=3, b='c')


Completed Dec 2, 2021 - Jason Cardinal, Lighthouse Labs - Data Science