### Intro 
Pandas is a powerful library for dealing with data prepocessing and data analysis. Originally, Pandas was applying in Stock Analysis.

Pandas mainly used for univariate analysis meaning that we cannot conduct reasosns analysis here. For more complicated analysis it's better to use library StatsModels

### Pandas Methods 

- **pd.date_range( date_start, date_end )** - returns DatetimeIndexes of provided dates (dates must be Strings)
- **pd.date_range( date, periods, freq )** - returns DatetimeIndexes of provided date with certain frequency
- **pd.concat([ data frames ], axis = 1 )** - concatenate DataFrames (by columns of rows) **Has axis parameter** **Duplicates are possible**
- **pd.concat([ data frames ], axis = 0)** is similar to **df.append**

#### Data Discretizationa and Quantilization
Allows to split the sequence of values into equal groups **(bins)**
- **pd.cut( data , num_of_bins, labels )** - splits the sequence of values into equal bins
- **pd.qcut( )** - splits the sequence of values into equal quantiles
 
 **You can provide your own bins values as a list as well as labels**


### Pandas Info 
- **NaN** - this is the result of data alignment in Pandas. When idexes don't match, it leads to creation of NaN values
- **Data Alignment** leads to Cartesian Product of indexes ( For example, Series 1 has **m** labels and Sereis 2 has **n** then final number of rows will be **m** x **n** 
- **Series reindexing** leads to changes in original Series. In other words, it reindexes **inplace** as well as other manipulations with values (+,-,/,*)

In [1]:
import pandas as pd 
import numpy as np

### Series Object 

Series is the main pandas object which is similar to NumPy array except for it has an index which is much more flexible for elements searching. 

Sereis can have only one column!
#### Series creation 
- **s = pd.Series([1,2,3],index = list('abc'))**
- **dates = pd.date_range('2016/4/1','2016/4/3'); s = pd.Series([1,2,3], index = dates)**
- **s = pd.Series( list('abc') )** - By using List 
- **s = pd.Series( {'Programmer':'Alex','Teacher':'Max'} )** - By using Dictionary
- **s = pd.Series([2]*10)** - creates a Sereis with 10 identical values
- **s = pd.Series( np.arange(0,6) )** - By using NumPy
- **s = pd.Series( np.linspace(0,11,2) )** - By using NumPy
- **s = pd.Series( np.random.normal (size=5) )** - byusing NumPy 

In [17]:
# Sereis creation using a list 
s = pd.Series(list('abc'),index = list('abc'))
print(s)

# Using data range as index
dates = pd.date_range('2020/6/1','2020/6/3')
s = pd.Series([1,2,3],index=dates)
print('\n'+str(s))

# Using Dictionary 
s = pd.Series({'Programmer':'Alex','Teacher':'Max','Sportsman':'Vlad'})
print('\n'+str(s))

# Using NumPy
s = pd.Series(np.linspace(0,10,5))
print('\n'+str(s))

a    a
b    b
c    c
dtype: object

2020-06-01    1
2020-06-02    2
2020-06-03    3
Freq: D, dtype: int64

Programmer    Alex
Teacher        Max
Sportsman     Vlad
dtype: object

0     0.0
1     2.5
2     5.0
3     7.5
4    10.0
dtype: float64


#### Selection operators
- **s[ ]** - uses index nuber or label (might be confusing,thus not recommended for using in Series, better use .loc[] or .iloc[])
- **s.loc[ start:end:step ]** - uses index value of a row (index) **last index is not excluded**
- **s.iloc[ start:end:step ]** - uses a position of a row (index position starting from 0) **last index is excluded**

In [6]:
# Create Series to demonstrate the difference 
s = pd.Series(np.arange(4),index=[10,11,12,13])
print(s)

# [] operator
print('\n'+str(s[13])) # uses index label

# .iloc[] operator
print('\n'+str(s.iloc[[1,2]])) # uses a position, not value !!!

# .loc[] operator
print('\n'+str(s.loc[[12,13]])) # uses index value

10    0
11    1
12    2
13    3
dtype: int32

3

11    1
12    2
dtype: int32

12    2
13    3
dtype: int32


#### Series Slicing  
Slicing has the following order: 

**sereis[ start:end:step ]**. By default **Step = 1**

- Start, end, step are optional,thus it makes slicing very flexible

**The end values are always excluded** in slicing

For example, s[1:2] will return only one element instead of 2

**Slicing is a link to original data whereas .iloc[ ] is only a copy**. Thus, any changes in slice will cahnge original Series

In [43]:
# Create Series
s = pd.Series(np.arange(100,110),index=np.arange(10,20))
print(s)

# Select values from 1 to 5th positions
print('\n'+str(s[1:5])) # the last element is excluded

# Select each second element from the sequence 
print('\n'+str(s[1:5:2]))

# Alternative of .head()
print('\n'+str(s[:5]))

# Select each second elements in reverse order starting from 5th element 
print('\n'+str(s.loc[15::-2])) # to reverse the order, provide step operator = -1

# Select last 5 values
print('\n'+str(s[-4:]))

# Select all values except for the last ones 
print('\n'+str(s[:-4]))

10    100
11    101
12    102
13    103
14    104
15    105
16    106
17    107
18    108
19    109
dtype: int32

11    101
12    102
13    103
14    104
dtype: int32

11    101
13    103
dtype: int32

10    100
11    101
12    102
13    103
14    104
dtype: int32

15    105
13    103
11    101
dtype: int32

16    106
17    107
18    108
19    109
dtype: int32

10    100
11    101
12    102
13    103
14    104
15    105
dtype: int32


#### Series Logical Selection 
For using Logical operation just provide them in [ ] of Series object

If we need to provide operations such as (and/or) use (&,|)

- **.all( )** - cheks whether all values meet a certain condition
- **.any( )** - if any values/value meet certain condition  

In [50]:
# Select values values that >= 100 and <= 105
print(s[(s>=100) & (s<=105)])

# .all()
print('\n'+str((s>=100).all()))
# any()
print('\n'+str((s>105).any()))
# sum()
print('\n'+str((s>105).sum()))

10    100
11    101
12    102
13    103
14    104
15    105
dtype: int32

True

True

4


#### Series methods
- **s.head( )** - by default returns 5 first values of Series 
- **s.tail( )** - by default returns 5 last observations of Series
- **s.take( )** - returns elements according to provided positions 
- **s.values** - returns values if Series
- **s.index** - returns index and its type
- **s.size** - returns the size of Series 
- **s.shape** - returns shape or dimension of Series 
- **s.reindex(fill_value = 0, method = '(ffill/bfill)')** - reindexes current Sereis with new indexes **(returns new Sereis object)**; **fill_value** inputes specified values if some indexes are missing; **method 'ffill'** - fills new labels with last know value
- **s.copy( )** - creates a copy of the Series

#### Numerical and Statistical Methods


- **s.describe( )[param]** - returns Descriptive Statistics (param allows to provide exact statistics (for example, std or mean)
- **s.min( ); s.max( ), s.mean( ), s.mode ( ); s.sum( axis=1 )** - quite obvious, axis = 1 - > rows
- **s.add( ); s.sub( ); s.mul( ); s.div( )** - provide better efficiency
- **s.count( )** - count number of variables **distinct from NaN**
- **s.unique( )** - returns unique values (**NaN is unique, thus, will be included**)
- **s.nunique( )** - returns unique values (**NaN is excluded**)
- **s.value_counts( )** - returns how often each value appears (**histogramming**) 
- **s.minidx( ); s.midxmax( )** - get index value for min and max values 
- **s.nsmallest( ); s.nlargest( )** - returns n smallest or n max values 
- **s.cumsum( ); s.cumprod( )** - returns cumulative sum and product
- **s.var( )** - returns **variance** of the Series
- **s.rank( )** - returns rank of variables (values which has max values will have the biggest rank and vice versa)
- **s.pct_change( )** - returns percentage change for a certain period
- **s.rolling( window = n ).mean( )** - moving average 

#### Values substitution and deletion in Series
All operations (*,/,+,-) change original Sereis. For this reason, it is recommended to create a copy of the Series objects while manipulating the data.

In [71]:
# del() operator makes inplace operation!! The result cannot be saved
del(s[15])
s # 15th element was deopped

# Use slicing operators
new_s = s.copy()
new_s.loc[13] = 500
new_s

10    100
11    101
12    102
13    103
14    104
16    106
17    107
18    108
19    109
dtype: int32

In [53]:
print(s.head())
print(s.tail())
print(s.take([1,3,9]))

10    100
11    101
12    102
13    103
14    104
dtype: int32
15    105
16    106
17    107
18    108
19    109
dtype: int32
11    101
13    103
19    109
dtype: int32


### DataFrame object
DataFrame can consist of one or several Series objects. Basically, DataFrame is a multidimensional array where each column is represented by Series object.
#### DataFrame creation
- **pd.DataFrame(np.arange( 1,6 ))** - by suing NumPy
- **pd.DataFrame(np.array[ [a,b],[c,d] ]))** - by using NumPy 
- **pd.DataFrame({'New York':[ 25,35,30 ],'Los Angelese':[ 35,36,39 ]})** - by using Dictionary (the most popular)
- **pd.read_csv( )** - by reading CSV - files. Parameter **parse_dates** indicates that a certain column has dates, **index_col** indicates which column use as an index
- **pd.DataFrame( [ series_1, series_2, series_n ] )** - by using Sereises

In [160]:
# Using NumPy
df = pd.DataFrame(np.arange(4))
print(df)
df = pd.DataFrame(np.array([[23,29],[30,31]]), columns=['New_York','Los_Angeles'])
print(df)

# Using Dictionary
new_york_temp = [23,30,33]
los_ang_temp = [29,31,38]
df = pd.DataFrame({'New_York':new_york_temp,'Los_Angeles':los_ang_temp})
print(df)

# Using CSV - file
path = 'D:/ML/Books/Learning_Pandas_russian_translation-1-master/Notebooks/Data/sp500.csv'
df = pd.read_csv(path,index_col='Symbol',usecols=[0,2,3,7])
df.head()

   0
0  0
1  1
2  2
3  3
   New_York  Los_Angeles
0        23           29
1        30           31
   New_York  Los_Angeles
0        23           29
1        30           31
2        33           38


Unnamed: 0_level_0,Sector,Price,Book Value
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MMM,Industrials,141.14,26.668
ABT,Health Care,39.6,15.573
ABBV,Health Care,53.95,2.954
ACN,Information Technology,79.79,8.326
ACE,Financials,102.91,86.897


#### Selection operators
In DataFrame operator [ ] is used for choosing columns instead of rows (Series object)

Logical selection is similar to Sereis
- **df[ ]** - returns provided column or list of columns 
- **df.column_name** - attribute access (only if column_name has no spaces)
- **df.loc[ ]** - returns rows or list of provided rows
- **df.loc[ [ index_values ] ][ [ column_names ] ]** - simultaneous rows and columns selection 
- **df.iloc[ ]** - returns rows by their position or list of positions
- **df.index.get_loc( index_value )** - returns index of provided row
- **df.at[ index_value, column_name ]** - returns scalar value of a row by providing index value(!slow)
- **df.iat[ index_position, column_position ]** - returns scalar value of a row by providing a postion(faster than .at[ ])
- **df[ start : end : step ]** - slicing operator
- **df.columns[ start : end : step ]** - select columns in any order save it in a variable and then apply to DataFrame

In [141]:
df.columns[::-1]

Index(['Book Value', 'Price', 'Sector'], dtype='object')

In [161]:
# Operator []
print(df[['Sector','Book Value']].head())

# df.column_name
print('\n'+str(df.Sector.head()))

# df.loc[]
print('\n'+str(df.loc[['MMM','ACN','ACE']].head()))

# Simultaneous rows and columns selection
print('\n'+str(df.loc[['MMM','ACN','ACE']][['Sector','Price']].head()))

# df.iloc[]
print('\n'+str(df.iloc[[0,3,4]].head()))

# Let's return the index of a provided row
print('\n'+'Position is: '+str(df.index.get_loc('MMM'))) # It is possible to get a postion only for a signle row

# Scalar values by index value and postion value
print('\n'+str(df.at['ACN','Price']))
print('\n'+str(df.iat[3,1]))

# Let's make reversed columns 
reversed_col = df.columns[::-1]
print('\n'+str(df[reversed_col].head()))

                        Sector  Book Value
Symbol                                    
MMM                Industrials      26.668
ABT                Health Care      15.573
ABBV               Health Care       2.954
ACN     Information Technology       8.326
ACE                 Financials      86.897

Symbol
MMM                Industrials
ABT                Health Care
ABBV               Health Care
ACN     Information Technology
ACE                 Financials
Name: Sector, dtype: object

                        Sector   Price  Book Value
Symbol                                            
MMM                Industrials  141.14      26.668
ACN     Information Technology   79.79       8.326
ACE                 Financials  102.91      86.897

                        Sector   Price
Symbol                                
MMM                Industrials  141.14
ACN     Information Technology   79.79
ACE                 Financials  102.91

                        Sector   Price  Book Value
Symb

#### DataFrame methods<br>
- **df.shape** - returns dimension of DataFrame
- **df.size** - returns total number of observations in DataFrame 
- **df.index** - returns index 
- **df.head( )** - returns first 5 values in DataFrame 
- **df.tail( )** - returns last 5 values in DataFrame 
- **df.columns** - returns columns of DataFrame
- **df.rename( columns = { old_name : new_name })** - renames columns **Has an Inplace parameter**,thus returns a **copy**
- **df.rename(str.lower,axis = 1)** - lower column names **(axis = 1 - > Columns, axis = 0 - > Rows)**
- **df.insert( column_position, column_name, new_values )** - insert a new column into DataFrame **Inplace Execution!**
- **df[ new_column_name ]** - add new column into DataFrame (cannot provide a position, thus will be added in the end). **Inplace Execution!**
- **df[ column ] = new value** - change values in a column **Inplace Execution!**
- **df.loc[ :, column_name ]** - similar to previou method
- **del df[ column_name ]** deletes column **Inplace Execution!**
- **df.pop( column_name )** extracts and returns extracted value from a Data Frame **Inplace Execution!**
- **df.drop( )** deletes either columns or rows **Has axis parameter**
- **df.append( )** - appends new rows (data alignment is not happening) **Duplicates are possible**
- **df.set_index( )** - set new indexes **Has an Inplace parameter**
- **df.reset_index( )** - reset index by default
- **df.reindex( [ new indexes ] )** - reindexes indexes or columns
- **df[ col_name ].duplicated()** - finds duplicates if exists<br>

#### Numerical and Statistical Methods<br>
- **df.describe( include = ' all ' )[param]** - returns Descriptive Statistics (param allows to provide exact statistics (for example, std or mean) 
- **df.min( ); df.max( ), df.mean( ), df.mode ( ), df.sum( )** - quite obvious
- **df.add( ); df.sub( ); df.mul( ); df.div( )** - provide better efficiency
- **df.count( )** - counts number of variables **distinct from NaN**
- **df.unique( )** - returns unique values (**NaN is always unique**,thus, will be included)
- **df.nunique( )** - returns unique values (**NaN is excluded**)
- **df.idxmin( ); df.idxmax( )** -  get index value for min and max values
- **df.nsmallest( ); df.nlargest( )** - returns n smallest or n max values 
- **df.cumsum( ); df.cumprod( )** - returns cumulative sum and product
- **df.var( )** - returns **variance** of the DataFrame
- **df[ variable_1 ].cov( df[ variable_2 ] )** - returns **covariance** between two variables 
- **df[ variable_1 ].corr( df[ variable_2 ] )** - returns **correlation** between two variables
- **df.rank( )** - returns rank of variables (values which has max values will have the biggest rank and vice versa)
- **df.pct_change( )** - returns percentage change for a certain period
- **df.sample( sample_volume )** - returns 3 random rows (**sample_volume can be a number or fraction**)
- **df.dtypes** - returns data types of each columns<br>
- **df.col_name.dtype** - returns a data type of a column
- **df.select_dtypes( include = [' '], exclude = [' '])** - selects provided data types

#### Data Cleaning<br>
- **df.isnull( ).sum( )** - returns number of missing values in a DataFrame
- **df.count( )** - counts the number of values different from 0
- **df.notnull( ).sum( )** - returns the number of values that don't equal 0
- **df.dropna( how = ' all ', axis = 0/1, thresh)** - drops rows/columns with NaN values (By defualt a row/column will be dropped if it has **at least one NaN value**.To avoid this, use parameter 'all' to delete a row/column if it has all NaN values). Parameter **thresh** defines how many NaN values have to be in order to be deleted **Has an inplace operator**
- **NaN values are ignored by Pandas and not ignored by NumPy**
- **df.fillna( values, method = ' ffill/bfill ' )** - fills NaN values with provided values **Has an inplace operator**
- **df.fillna( df.mean( ) )** - fills NaN values with avg value of each column
- **df.fillna( value = values )** where values have the following format **{' col_name ': val_1, ...}**
- **df[' col_name '].interpolate( method = ' time/values ' )** - interpolates NaN values. Can use a specific interpolation method for a specific problem. For example, **Interpolation based on time** or **Interpolation based on indexes/labels**
- **df.duplicated ( )** returns duplicated **rows**
- **df.drop_duplicates( keep = ' first/last ')** - drop duplicates. Parameter **keep** determines which row to keep first occurance or last 
- **df.replace( [ replaced_values ],[ new_values ] )** - replaces provided values with new values **Has an inplace operator**
- **df.replace( { old_value_1: new_value, old_value_n: new_value } )** - replacement using a Dictionary
- **df.replace( {' col_name_1 ':[ replaced_values ], ' col_name_n ':[ replaced_values ] }, new_value )** - replacement using a Dictionary
<br>

Can **apply** functions for rows, columns or individual elements providing flexability<br>
- **df.apply( lambda function )** - applies a provided function to **columns if axis = 0** or **rows if axis = 1**
- **df.applymap( lambda function )** - applies a provided function to **each value**

In [162]:
# Shape of DataFrame 
print('Data Frame dimension is: '+str(df.shape))

# Number of values in DataFrame
print('Number of observation in data frame is : '+str(df.size))

# Obtain list of indexes in Data Frame 
print('Data Frame indexes are: '+'\n'+str(df.index))

# Obtain list of Data Frame columns
print('Data Frame Columns are:'+'\n'+str(df.columns))

Data Frame dimension is: (500, 3)
Number of observation in data frame is : 1500
Data Frame indexes are: 
Index(['MMM', 'ABT', 'ABBV', 'ACN', 'ACE', 'ACT', 'ADBE', 'AES', 'AET', 'AFL',
       ...
       'XEL', 'XRX', 'XLNX', 'XL', 'XYL', 'YHOO', 'YUM', 'ZMH', 'ZION', 'ZTS'],
      dtype='object', name='Symbol', length=500)
Data Frame Columns are:
Index(['Sector', 'Price', 'Book Value'], dtype='object')


In [163]:
# Rename our Data Frame by providing a dictionary
df.rename(columns={'Book Value':'Book_Value'},inplace=True)
print(df.head())

# Let's lower column_names
df.rename(str.lower, axis=1,inplace=True)
print('\n'+str(df.head()))

# Let's add a new value into DataFrame 
df['rounded_price'] = df.price.round()
print('\n'+str(df.head()))

# .insert() allows choosing the postion 
df.insert(3,'rounded_book_value',df.book_value.round())
print('\n'+str(df.head()))

# Using [] operator
df['Multiplication'] = df.price*df.book_value
print('\n'+str(df.head()))

                        Sector   Price  Book_Value
Symbol                                            
MMM                Industrials  141.14      26.668
ABT                Health Care   39.60      15.573
ABBV               Health Care   53.95       2.954
ACN     Information Technology   79.79       8.326
ACE                 Financials  102.91      86.897

                        sector   price  book_value
Symbol                                            
MMM                Industrials  141.14      26.668
ABT                Health Care   39.60      15.573
ABBV               Health Care   53.95       2.954
ACN     Information Technology   79.79       8.326
ACE                 Financials  102.91      86.897

                        sector   price  book_value  rounded_price
Symbol                                                           
MMM                Industrials  141.14      26.668          141.0
ABT                Health Care   39.60      15.573           40.0
ABBV               H

In [164]:
# Concatenation demonstration
rounded_price = pd.DataFrame({'rounded_price':df.price.round()})
price = df.price

# Concatenate by columns 
concatenated = pd.concat([price,rounded_price],axis=1)
concatenated.head()

Unnamed: 0_level_0,price,rounded_price
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
MMM,141.14,141.0
ABT,39.6,40.0
ABBV,53.95,54.0
ACN,79.79,80.0
ACE,102.91,103.0


In [187]:
# Now let's choose some rows from price data frame and rounded_price data frame
# To make the task more interesting we select only idexes that start with A and B 
# In the end we will concatenate them by rows

# Indexes Selection
a_idx = []
b_idx = []
for idx in concatenated.index.values:
    if idx.startswith('A'):
        a_idx.append(idx)
    elif idx.startswith('B'):
        b_idx.append(idx)

df_a = concatenated.loc[a_idx]
df_b = concatenated.loc[b_idx]

# Concatenation
a_b_concatenated = pd.concat([df_a,df_b],axis=0)
a_b_concatenated

Unnamed: 0_level_0,price,rounded_price
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
ABT,39.60,40.0
ABBV,53.95,54.0
ACN,79.79,80.0
ACE,102.91,103.0
ACT,213.77,214.0
...,...,...
BRCM,30.64,31.0
BF-B,91.42,91.0
BEN,55.00,55.0
BTU,17.22,17.0


In [192]:
# .pop()
poped_val = df.pop('rounded_book_value') # now poped value is stored in poped_val variable
df.head()

Unnamed: 0_level_0,sector,price,book_value,rounded_price,Multiplication
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
MMM,Industrials,141.14,26.668,141.0,3763.92152
ABT,Health Care,39.6,15.573,40.0,616.6908
ABBV,Health Care,53.95,2.954,54.0,159.3683
ACN,Information Technology,79.79,8.326,80.0,664.33154
ACE,Financials,102.91,86.897,103.0,8942.57027


In [193]:
# Let's delete a clumn and some rows with the help of .drop() method
df.drop('Multiplication',axis=1,inplace=True)
print(df.head())

                        sector   price  book_value  rounded_price
Symbol                                                           
MMM                Industrials  141.14      26.668          141.0
ABT                Health Care   39.60      15.573           40.0
ABBV               Health Care   53.95       2.954           54.0
ACN     Information Technology   79.79       8.326           80.0
ACE                 Financials  102.91      86.897          103.0


In [194]:
# .append() method 
df_1 = df.iloc[:3]
df_2 = df.iloc[[12,20,44]]
appended = df_1.append(df_2)
appended.head()

Unnamed: 0_level_0,sector,price,book_value,rounded_price
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
MMM,Industrials,141.14,26.668,141.0
ABT,Health Care,39.6,15.573,40.0
ABBV,Health Care,53.95,2.954,54.0
APD,Materials,118.74,34.723,119.0
ADS,Information Technology,248.88,17.392,249.0


In [5]:
# Replace Implementation
np.random.seed(123)
df = pd.DataFrame(np.random.randint(0,50,50).reshape(10,5),
                  columns=['col_1','col_2','col_3','col_4','col_5'])
df.head()

Unnamed: 0,col_1,col_2,col_3,col_4,col_5
0,45,2,28,34,38
1,17,19,42,22,33
2,32,49,47,9,32
3,46,32,47,25,19
4,14,36,32,16,4


In [15]:
# First way
print(df.replace([45,2,42,49,46],[0,0,0,0,0]).head())

#Second way
print('\n'+str(df.replace({45:100,32:100,14:100}).head()))

# Third way
print('\n'+str(df.replace({'col_1':df.col_1.values,
                           'col_2':df.col_2.values},100).head()))

   col_1  col_2  col_3  col_4  col_5
0      0      0     28     34     38
1     17     19      0     22     33
2     32      0     47      9     32
3      0     32     47     25     19
4     14     36     32     16      4

   col_1  col_2  col_3  col_4  col_5
0    100      2     28     34     38
1     17     19     42     22     33
2    100     49     47      9    100
3     46    100     47     25     19
4    100     36    100     16      4

   col_1  col_2  col_3  col_4  col_5
0    100    100     28     34     38
1    100    100     42     22     33
2    100    100     47      9     32
3    100    100     47     25     19
4    100    100     32     16      4


In [12]:
# Appy Examples
np.random.seed(5)
df = pd.DataFrame(np.random.randint(10,size=(5,5)),columns=list('abcde'))
print(df)

# Typical demonstration 
print_df_col = lambda col: print('\n'+str(col))
df.apply(print_df_col)

   a  b  c  d  e
0  3  6  6  0  9
1  8  4  7  0  0
2  7  1  5  7  0
3  1  4  6  2  9
4  9  9  9  1  2

0    3
1    8
2    7
3    1
4    9
Name: a, dtype: int32

0    6
1    4
2    1
3    4
4    9
Name: b, dtype: int32

0    6
1    7
2    5
3    6
4    9
Name: c, dtype: int32

0    0
1    0
2    7
3    2
4    1
Name: d, dtype: int32

0    9
1    0
2    0
3    9
4    2
Name: e, dtype: int32


a    None
b    None
c    None
d    None
e    None
dtype: object

In [73]:
# Using logical operations
lambda_func = lambda col: (col*100) if (col >= 1).all() else 0
new_df = df.apply(lambda_func,axis=0)
new_df

Unnamed: 0,a,b,c,d,e
0,300,600,600,0,0
1,800,400,700,0,0
2,700,100,500,0,0
3,100,400,600,0,0
4,900,900,900,0,0


In [74]:
# Sum of each column 
new_df.apply(sum,axis=0)

a    2800
b    2400
c    3300
d       0
e       0
dtype: int64