What is pandas in Python? (3 Examples)
 

This tutorial explains what the pandas library is and how to use it in the Python programming language.

Table of contents:

1) Definition of the pandas Library in Python
2) Example Data & Add-On Libraries
3) Example 1: Add New Column to pandas DataFrame
4) Example 2: Remove Row from pandas DataFrame
5) Example 3: Calculate Mean for pandas DataFrame Column
6) Video & Further Resources
Here’s move on to the definition…

 

Definition of the pandas Library in Python
pandas is a software library that was created by Wes McKinney for the Python programming language.

The pandas library is mainly used for data manipulation, i.e. to edit, modify, and adjust certain components of a DataFrame object.

However, pandas is very flexible and can also be used for other tasks such as drawing data sets in plots, and storing time series values.

As other Python libraries, pandas is open source, i.e. freely available for usage, modification, and redistribution.

In the remaining part of this tutorial, I’ll show some example applications of the pandas library in practice.

So without too much talk, let’s dive into the example code!

 

Example Data & Add-On Libraries
We first have to load the pandas library to Python:

import pandas as pd                              # Import pandas library
Next, we can use the pd.DataFrame function to create some example data:

data = pd.DataFrame({"x1":range(5, 10),          # Create pandas DataFrame
                     "x2":["a", "b", "c", "d", "e"],
                     "x3":range(10, 5, - 1)})
print(data)                                      # Print pandas DataFrame
 

table 1 DataFrame what is pandas python programming language

 

Table 1 shows that our example DataFrame is composed of five rows and three columns.

 

Example 1: Add New Column to pandas DataFrame
This example illustrates how to append a new variable to a pandas DataFrame.

For this task, we first have to create a list object that contains the values of our new column:

new_col = ["foo", "bar", "bar", "foo", "bar"]    # Create list
print(new_col)                                   # Print list
# ['foo', 'bar', 'bar', 'foo', 'bar']
Next, we can apply the assign function to add our list as a new column to our pandas DataFrame:

data_add = data.assign(new_col = new_col)        # Add new column
print(data_add)                                  # Print DataFrame with new column
 

table 2 DataFrame what is pandas python programming language

 

As shown in Table 2, the previous code has managed to construct a new pandas DataFrame containing our input data plus our list object as a new variable.

 

Example 2: Remove Row from pandas DataFrame
Example 2 shows how to drop certain rows from a pandas DataFrame.

To achieve this, we can use logical operators as illustrated below:

data_drop = data[data.x2 != "c"]                 # Drop row using logical condition
print(data_drop)                                 # Print DataFrame without row
 

table 3 DataFrame what is pandas python programming language

 

Table 3 shows the output of the previous Python syntax: We have excluded the third row from our data set.

 

Example 3: Calculate Mean for pandas DataFrame Column
The pandas library can also be used to calculate certain descriptive statistics of the columns of a DataFrame.

In this specific example, we calculate the mean value of the variable x3:

data_mean = data["x3"].mean()                    # Calculate average
print(data_mean)                                 # Print average
# 8.0
The previous console output shows the mean value of our third column, i.e. 8.0.

 

Video & Further Resources
I have recently released a video on my YouTube channel, which explains the contents of this tutorial. You can find the video below.

 

The YouTube video will be added soon.

 

Furthermore, you may read the related articles on my website.

pandas Library Tutorial in Python
Change pandas DataFrames in Python
DataFrame Manipulation Using pandas in Python
Sort pandas DataFrame by Date in Python
Count Unique Values by Group in Column of pandas DataFrame
Insert Column at Specific Position of pandas DataFrame
Check If Any Value is NaN in pandas DataFrame in Python
Check if pandas DataFrame is Empty in Python
All Python Programming Tutorials
 

In summary: You have learned in this article how to apply the functions of the pandas library in the Python programming language. If you have any additional questions and/or comments, let me know in the comments section below.



In [1]:
import pandas as pd                              # Import pandas library

In [2]:
data = pd.DataFrame({"x1":range(5, 10),          # Create pandas DataFrame
                     "x2":["a", "b", "c", "d", "e"],
                     "x3":range(10, 5, - 1)})
print(data)                                      # Print pandas DataFrame

   x1 x2  x3
0   5  a  10
1   6  b   9
2   7  c   8
3   8  d   7
4   9  e   6


In [3]:
new_col = ["foo", "bar", "bar", "foo", "bar"]    # Create list
print(new_col)                                   # Print list
# ['foo', 'bar', 'bar', 'foo', 'bar']

['foo', 'bar', 'bar', 'foo', 'bar']


In [4]:
data_add = data.assign(new_col = new_col)        # Add new column
print(data_add)                                  # Print DataFrame with new column

   x1 x2  x3 new_col
0   5  a  10     foo
1   6  b   9     bar
2   7  c   8     bar
3   8  d   7     foo
4   9  e   6     bar


In [5]:
data_drop = data[data.x2 != "c"]                 # Drop row using logical condition
print(data_drop)                                 # Print DataFrame without row

   x1 x2  x3
0   5  a  10
1   6  b   9
3   8  d   7
4   9  e   6


In [6]:
data_mean = data["x3"].mean()                    # Calculate average
print(data_mean)                                 # Print average
# 8.0

8.0


At the top of the figure, two different separate data sets are shown. At the bottom of the figure, you can see four different joined versions of these two data sets:

- Inner join: Keep only IDs that are contained in both data sets.
- Outer join: Keep all IDs.
- Left join: Keep only IDs that are contained in the first data set.
- Right join: Keep only IDs that are contained in the second data set.

All of these join-types will be applied in the following programming examples.



In [7]:
data1 = pd.DataFrame({"ID":range(1001, 1007),    # Create first pandas DataFrame
                      "x1":range(1, 7),
                      "x2":["a", "b", "c", "d", "e", "f"],
                      "x3":range(16, 10, - 1)})
print(data1)                                     # Print first pandas DataFrame


     ID  x1 x2  x3
0  1001   1  a  16
1  1002   2  b  15
2  1003   3  c  14
3  1004   4  d  13
4  1005   5  e  12
5  1006   6  f  11


In [8]:
data2 = pd.DataFrame({"ID":range(1004, 1009),    # Create second pandas DataFrame
                      "y1":["x", "y", "x", "y", "x"],
                      "y2":range(10, 1, - 2)})
print(data2)                                     # Print second pandas DataFrame

     ID y1  y2
0  1004  x  10
1  1005  y   8
2  1006  x   6
3  1007  y   4
4  1008  x   2


In [9]:
data_inner = pd.merge(data1,                     # Inner join
                      data2,
                      on = "ID",
                      how = "inner")
print(data_inner)                                # Print merged DataFrame

     ID  x1 x2  x3 y1  y2
0  1004   4  d  13  x  10
1  1005   5  e  12  y   8
2  1006   6  f  11  x   6


In [10]:
data_outer = pd.merge(data1,                     # Outer join
                      data2,
                      on = "ID",
                      how = "outer")
print(data_outer)                                # Print merged DataFrame

     ID   x1   x2    x3   y1    y2
0  1001  1.0    a  16.0  NaN   NaN
1  1002  2.0    b  15.0  NaN   NaN
2  1003  3.0    c  14.0  NaN   NaN
3  1004  4.0    d  13.0    x  10.0
4  1005  5.0    e  12.0    y   8.0
5  1006  6.0    f  11.0    x   6.0
6  1007  NaN  NaN   NaN    y   4.0
7  1008  NaN  NaN   NaN    x   2.0


In [11]:
data_left = pd.merge(data1,                      # Left join
                      data2,
                      on = "ID",
                      how = "left")
print(data_left)                                 # Print merged DataFrame

     ID  x1 x2  x3   y1    y2
0  1001   1  a  16  NaN   NaN
1  1002   2  b  15  NaN   NaN
2  1003   3  c  14  NaN   NaN
3  1004   4  d  13    x  10.0
4  1005   5  e  12    y   8.0
5  1006   6  f  11    x   6.0


In [12]:
data_right = pd.merge(data1,                     # Right join
                      data2,
                      on = "ID",
                      how = "right")
print(data_right)                                # Print merged DataFrame

     ID   x1   x2    x3 y1  y2
0  1004  4.0    d  13.0  x  10
1  1005  5.0    e  12.0  y   8
2  1006  6.0    f  11.0  x   6
3  1007  NaN  NaN   NaN  y   4
4  1008  NaN  NaN   NaN  x   2


In [13]:
data = pd.DataFrame({'x1':[2, 7, 5, 7, 1, 5, 9],  # Create pandas DataFrame
                     'x2':range(1, 8),
                     'group':['A', 'B', 'A', 'A', 'C', 'B', 'A']})
print(data)                           

   x1  x2 group
0   2   1     A
1   7   2     B
2   5   3     A
3   7   4     A
4   1   5     C
5   5   6     B
6   9   7     A


In [14]:
print(data['x1'].mean())                          # Get mean of one column

5.142857142857143


In [15]:
print(data.mean())                                # Get mean of all columns

x1    5.142857
x2    4.000000
dtype: float64


  print(data.mean())                                # Get mean of all columns


In [16]:
print(data.describe())      

             x1        x2
count  7.000000  7.000000
mean   5.142857  4.000000
std    2.853569  2.160247
min    1.000000  1.000000
25%    3.500000  2.500000
50%    5.000000  4.000000
75%    7.000000  5.500000
max    9.000000  7.000000


In [17]:
print(data.groupby('group').mean())               # Get mean by group

         x1    x2
group            
A      5.75  3.75
B      6.00  4.00
C      1.00  5.00


In [19]:
print(data.groupby('group').describe())             # Get descriptive stats by group

         x1                                              x2                  \
      count  mean       std  min   25%  50%  75%  max count  mean       std   
group                                                                         
A       4.0  5.75  2.986079  2.0  4.25  6.0  7.5  9.0   4.0  3.75  2.500000   
B       2.0  6.00  1.414214  5.0  5.50  6.0  6.5  7.0   2.0  4.00  2.828427   
C       1.0  1.00       NaN  1.0  1.00  1.0  1.0  1.0   1.0  5.00       NaN   

                                 
       min  25%  50%   75%  max  
group                            
A      1.0  2.5  3.5  4.75  7.0  
B      2.0  3.0  4.0  5.00  6.0  
C      5.0  5.0  5.0  5.00  5.0  


In [20]:
data = pd.DataFrame({'x1':[6, 2, 7, 1, 9, 3, 4, 9],  # Create example DataFrame
                     'x2':[2, 5, 7, 1, 3, 1, 2, 3],
                     'x3':range(8, 0, - 1)})
print(data)                                          # Print example DataFrame

   x1  x2  x3
0   6   2   8
1   2   5   7
2   7   7   6
3   1   1   5
4   9   3   4
5   3   1   3
6   4   2   2
7   9   3   1


In [21]:
print(data.sum(axis = 1))                            # Get row sums

0    16
1    14
2    20
3     7
4    16
5     7
6     8
7    13
dtype: int64


In [22]:
data = pd.DataFrame({'x1':range(0, 8),       # Create example DataFrame
                     'x2':['C', 'A', 'B', 'C', 'A', 'A', 'B', 'A'],
                     'x3':['a', 'b', 'c', 'b', 'c', 'b', 'a', 'c']})
print(data)                                  # Print example DataFrame

   x1 x2 x3
0   0  C  a
1   1  A  b
2   2  B  c
3   3  C  b
4   4  A  c
5   5  A  b
6   6  B  a
7   7  A  c


In [23]:
data_new = data.sort_values(['x2', 'x3'])    # Sort DataFrame
print(data_new)                              # Print new DataFrame

   x1 x2 x3
1   1  A  b
5   5  A  b
4   4  A  c
7   7  A  c
6   6  B  a
2   2  B  c
0   0  C  a
3   3  C  b


In [24]:
data = pd.DataFrame({'dates':['02/25/2023','01/01/2024','17/11/2019','17/10/2022','02/02/2022'],  # Create example DataFrame
                     'values':range(1, 6)})
print(data)   

        dates  values
0  02/25/2023       1
1  01/01/2024       2
2  17/11/2019       3
3  17/10/2022       4
4  02/02/2022       5


In [25]:
data_new = data.copy()                                                   # Create duplicate


In [26]:
data_new['dates'] = pd.to_datetime(data_new.dates)                       # Convert to date


  data_new['dates'] = pd.to_datetime(data_new.dates)                       # Convert to date
  data_new['dates'] = pd.to_datetime(data_new.dates)                       # Convert to date


In [27]:
data_new = data_new.sort_values(by = ['dates'])                          # Sort rows of data

In [28]:
print(data_new)                                                          # Print new data set

       dates  values
2 2019-11-17       3
4 2022-02-02       5
3 2022-10-17       4
0 2023-02-25       1
1 2024-01-01       2


In [29]:
data = pd.DataFrame({'x1':range(10, 18),    # Create pandas DataFrame
                     'x2':['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'],
                     'x3':range(27, 19, - 1),
                     'x4':['x', 'z', 'y', 'y', 'x', 'y', 'z', 'x']})
print(data)    


   x1 x2  x3 x4
0  10  a  27  x
1  11  b  26  z
2  12  b  25  y
3  13  c  24  y
4  14  d  23  x
5  15  a  22  y
6  16  b  21  z
7  17  d  20  x


In [30]:
split_point = 3                             # Define split point
print(split_point)                          # Print split point

3


In [31]:
data_upper = data.iloc[:split_point]        # Create upper data set
print(data_upper)                           # Print DataFrame of upper rows

   x1 x2  x3 x4
0  10  a  27  x
1  11  b  26  z
2  12  b  25  y


In [32]:
data_lower = data.iloc[split_point:]        # Create lower data set
print(data_lower)                           # Print DataFrame of lower rows

   x1 x2  x3 x4
3  13  c  24  y
4  14  d  23  x
5  15  a  22  y
6  16  b  21  z
7  17  d  20  x


In [35]:
data = pd.DataFrame({'x1':range(10, 15),    # Create pandas DataFrame
                     'x2':['a', 'b', 'c', 'd', 'e'],
                     'x3':range(10, 5, - 1)})
print(data)                                 # Print pandas DataFrame

   x1 x2  x3
0  10  a  10
1  11  b   9
2  12  c   8
3  13  d   7
4  14  e   6


In [42]:
data_new = data.set_index('x2')             # Apply set_index function
print(data_new)                             # Print updated DataFrame

    x1  x3
x2        
a   10  10
b   11   9
c   12   8
d   13   7
e   14   6


In [43]:
data1 = pd.DataFrame({'datetime':pd.to_datetime(['01-06-2018 23:15:00',   # Creating data
                                                 '02-09-2019 01:48:00',
                                                 '08-06-2020 13:20:00',
                                                 '07-03-2021 14:50:00']),
                      'values':range(0, 4)})
print(data1)                                         # Print example DataFrame

             datetime  values
0 2018-01-06 23:15:00       0
1 2019-02-09 01:48:00       1
2 2020-08-06 13:20:00       2
3 2021-07-03 14:50:00       3


In [44]:
data1_new = data1.set_index('datetime')           # Applying the set_index method
print(data1_new)       

                     values
datetime                   
2018-01-06 23:15:00       0
2019-02-09 01:48:00       1
2020-08-06 13:20:00       2
2021-07-03 14:50:00       3


In [45]:
data2 = pd.DataFrame({'date':['09-05-2019', '08-03-2020', '02-06-2021', '01-01-2022'],  # Creating example data
                      'time':['22:40:00', '03:46:00', '21:19:00', '17:35:00'],
                      'values':range(0, 4)})
print(data2)                      

         date      time  values
0  09-05-2019  22:40:00       0
1  08-03-2020  03:46:00       1
2  02-06-2021  21:19:00       2
3  01-01-2022  17:35:00       3


In [46]:
data2_new = data2.copy()                             # Create copy of pandas DataFrame
data2_new['datetime'] = pd.to_datetime(data2_new['date'] +  # Create datetime column using + operator
                                       ' ' +
                                       data2_new['time'])
print(data2_new)                                     # Print new DataFrame with datetime column

         date      time  values            datetime
0  09-05-2019  22:40:00       0 2019-09-05 22:40:00
1  08-03-2020  03:46:00       1 2020-08-03 03:46:00
2  02-06-2021  21:19:00       2 2021-02-06 21:19:00
3  01-01-2022  17:35:00       3 2022-01-01 17:35:00


In [47]:
data2_new = data2_new.set_index('datetime')          # Applying the set_index method
print(data2_new)                                     # Print new DataFrame with updated index

                           date      time  values
datetime                                         
2019-09-05 22:40:00  09-05-2019  22:40:00       0
2020-08-03 03:46:00  08-03-2020  03:46:00       1
2021-02-06 21:19:00  02-06-2021  21:19:00       2
2022-01-01 17:35:00  01-01-2022  17:35:00       3


In [48]:
del data2_new['date']                                # Remove unnecessary date column
del data2_new['time']                                # Remove unnecessary time column
print(data2_new)                                     # Print adapted DataFrame

                     values
datetime                   
2019-09-05 22:40:00       0
2020-08-03 03:46:00       1
2021-02-06 21:19:00       2
2022-01-01 17:35:00       3


In [49]:
data3_new = data2.copy()                               # Creating example data

In [50]:
data3_new['datetime'] = pd.to_datetime(data3_new['date'] +  # Create datetime column using the format parameter
                                       data3_new['time'],
                                       format = '%m-%d-%Y%H:%M:%S')
data3_new = data3_new.set_index('datetime')            # Applying the set_index method
print(data3_new)                                       # Print new DataFrame with updated index

                           date      time  values
datetime                                         
2019-09-05 22:40:00  09-05-2019  22:40:00       0
2020-08-03 03:46:00  08-03-2020  03:46:00       1
2021-02-06 21:19:00  02-06-2021  21:19:00       2
2022-01-01 17:35:00  01-01-2022  17:35:00       3


In [51]:
del data3_new['date']                                # Remove unnecessary date column
del data3_new['time']                                # Remove unnecessary time column
print(data3_new)  

                     values
datetime                   
2019-09-05 22:40:00       0
2020-08-03 03:46:00       1
2021-02-06 21:19:00       2
2022-01-01 17:35:00       3


In [55]:
data = pd.DataFrame({'x1':range(1, 6),    # Create example DataFrame
                     'x2':range(11, 16),
                     'x3':range(101, 106)})
print(data)                               # Print example DataFrame

   x1  x2   x3
0   1  11  101
1   2  12  102
2   3  13  103
3   4  14  104
4   5  15  105


In [56]:
print(data.iloc[[3]])

   x1  x2   x3
3   4  14  104


In [57]:
print(data.iloc[[1, 3, 4]])

   x1  x2   x3
1   2  12  102
3   4  14  104
4   5  15  105


In [58]:
data = pd.DataFrame({'x1':['a', 'b', 'c', 'd', 'e', 'f'],   # Create pandas DataFrame
                     'x2':range(7, 1, - 1),
                     'x3':[1, 2, 1, 2, 3, 1]})
print(data)  

  x1  x2  x3
0  a   7   1
1  b   6   2
2  c   5   1
3  d   4   2
4  e   3   3
5  f   2   1


In [59]:
data_sub1 = data.loc[data['x3'] == 1]                       # Get rows with particular value
print(data_sub1)                    

  x1  x2  x3
0  a   7   1
2  c   5   1
5  f   2   1


In [60]:
data_sub2 = data.loc[data['x3'] >= 2]                       # Get rows in range
print(data_sub2)                                            # Print DataFrame subset


  x1  x2  x3
1  b   6   2
3  d   4   2
4  e   3   3


In [61]:
data_sub3 = data.loc[data['x3'].isin([1, 3])]               # Get rows with set of values
print(data_sub3)                                            # Print DataFrame subset

  x1  x2  x3
0  a   7   1
2  c   5   1
4  e   3   3
5  f   2   1


In [62]:
data_sub4 = data.loc[(data['x2'] > 3) & (data['x3'] == 1)]  # Multiple conditions
print(data_sub4)                                            # Print DataFrame subset

  x1  x2  x3
0  a   7   1
2  c   5   1


In [63]:
data = pd.DataFrame({'x1':range(80, 73, - 1),    # Create pandas DataFrame
                     'x2':['a', 'b', 'c', 'a', 'c', 'c', 'b'],
                     'x3':range(27, 20, - 1)})
print(data)   

   x1 x2  x3
0  80  a  27
1  79  b  26
2  78  c  25
3  77  a  24
4  76  c  23
5  75  c  22
6  74  b  21


In [64]:
search_result_1 = data.isin(['b'])               # Create matrix of logical values
print(search_result_1)                           # Print output

      x1     x2     x3
0  False  False  False
1  False   True  False
2  False  False  False
3  False  False  False
4  False  False  False
5  False  False  False
6  False   True  False


In [67]:
search_result_2 = data.isin(['b']).any()         # Check by column
print(search_result_2)                           # Print output

x1    False
x2     True
x3    False
dtype: bool


In [68]:
data_rev1 = data[::-1]                             # Reverse order of rows
print(data_rev1)    

   x1 x2  x3
6  74  b  21
5  75  c  22
4  76  c  23
3  77  a  24
2  78  c  25
1  79  b  26
0  80  a  27


In [69]:
data_rev2 = data[::-1].reset_index(drop = True)    # Reset index
print(data_rev2)   

   x1 x2  x3
0  74  b  21
1  75  c  22
2  76  c  23
3  77  a  24
4  78  c  25
5  79  b  26
6  80  a  27


In [70]:
data_rev3 = data[data.columns[::-1]]               # Reverse order of columns
print(data_rev3)                                   # Print updated data

   x3 x2  x1
0  27  a  80
1  26  b  79
2  25  c  78
3  24  a  77
4  23  c  76
5  22  c  75
6  21  b  74


In [71]:
data = pd.DataFrame({'x1':range(1, 5),                  # Create example DataFrame
                     'x2':range(5, 1, - 1),
                     'x3':range(3, 7)})
print(data)            

   x1  x2  x3
0   1   5   3
1   2   4   4
2   3   3   5
3   4   2   6


In [72]:
data_new1 = data.copy()                                 # Create copy of DataFrame
data_new1.at[2, 'x1'] = 999                             # Replace values in DataFrame
print(data_new1)                                        # Print updated DataFrame


    x1  x2  x3
0    1   5   3
1    2   4   4
2  999   3   5
3    4   2   6


In [73]:
data_new2 = data.copy()                                 # Create copy of DataFrame
data_new2['x1'] = data_new2['x1'].replace([1, 3], 999)  # Replace values in DataFrame
print(data_new2)                                        # Print updated DataFrame


    x1  x2  x3
0  999   5   3
1    2   4   4
2  999   3   5
3    4   2   6


In [74]:
data_new3 = data.copy()                                 # Create copy of DataFrame
data_new3 = data_new3.replace(4, 999)                   # Replace values in DataFrame
print(data_new3)  

    x1   x2   x3
0    1    5    3
1    2  999  999
2    3    3    5
3  999    2    6


In [75]:
data = pd.DataFrame({'x1':[float('NaN'), 0, 1, float('NaN'), 1, 0],  # Create DataFrame
                     'x2':[1, 7, float('NaN'), 5, 3, 1],
                     'x3':[10, 11, 12, float('NaN'), float('NaN'), 13]})
print(data) 

    x1   x2    x3
0  NaN  1.0  10.0
1  0.0  7.0  11.0
2  1.0  NaN  12.0
3  NaN  5.0   NaN
4  1.0  3.0   NaN
5  0.0  1.0  13.0


In [76]:
data_new1 = data.fillna(0)                                 # Substitute NaN in all columns
print(data_new1) 

    x1   x2    x3
0  0.0  1.0  10.0
1  0.0  7.0  11.0
2  1.0  0.0  12.0
3  0.0  5.0   0.0
4  1.0  3.0   0.0
5  0.0  1.0  13.0


In [77]:
data_new2 = data.copy()                                    # Create copy of input DataFrame
data_new2['x1'] = data_new2['x1'].fillna(0)                # Substitute NaN in single column
print(data_new2)

    x1   x2    x3
0  0.0  1.0  10.0
1  0.0  7.0  11.0
2  1.0  NaN  12.0
3  0.0  5.0   NaN
4  1.0  3.0   NaN
5  0.0  1.0  13.0


In [78]:
data.index.names = ['index_name']                # Rename index
print(data)   

             x1   x2    x3
index_name                
0           NaN  1.0  10.0
1           0.0  7.0  11.0
2           1.0  NaN  12.0
3           NaN  5.0   NaN
4           1.0  3.0   NaN
5           0.0  1.0  13.0


In [79]:
data = data.set_index('x3')                      # Column as indices
print(data)           

       x1   x2
x3            
10.0  NaN  1.0
11.0  0.0  7.0
12.0  1.0  NaN
NaN   NaN  5.0
NaN   1.0  3.0
13.0  0.0  1.0


In [81]:
data = pd.DataFrame({"x1":range(7, 1, - 1),                 # Create pandas DataFrame
                     "x2":["a", "b", "c", "d", "e", "f"],
                     "x3":["X", "Y", "X", "X", "Y", "X"]})
print(data)

   x1 x2 x3
0   7  a  X
1   6  b  Y
2   5  c  X
3   4  d  X
4   3  e  Y
5   2  f  X


In [82]:
data_new1 = data.copy()                                     # Create copy of DataFrame
data_new1.columns = ["col1", "col2", "col3"]                # Using columns attribute
print(data_new1)  

   col1 col2 col3
0     7    a    X
1     6    b    Y
2     5    c    X
3     4    d    X
4     3    e    Y
5     2    f    X


In [83]:
data_new2 = data.copy()                                     # Create copy of DataFrame
data_new2 = data_new2.rename(columns = {"x1": "col1", "x3": "col3"})  # Using rename()
print(data_new2)  

   col1 x2 col3
0     7  a    X
1     6  b    Y
2     5  c    X
3     4  d    X
4     3  e    Y
5     2  f    X


In [84]:
data_new = data.rename(columns = {data.columns[2]: 'new'})  # Apply rename function
print(data_new)  

   x1 x2 new
0   7  a   X
1   6  b   Y
2   5  c   X
3   4  d   X
4   3  e   Y
5   2  f   X


In [85]:
data_cbind_1 = pd.DataFrame({"x1":range(10, 16),                   # Create first pandas DataFrame
                             "x2":range(30, 24, - 1),
                             "x3":["a", "b", "c", "d", "e", "f"],
                             "x4":range(48, 42, - 1)})
print(data_cbind_1)                                                # Print first pandas DataFrame
 

   x1  x2 x3  x4
0  10  30  a  48
1  11  29  b  47
2  12  28  c  46
3  13  27  d  45
4  14  26  e  44
5  15  25  f  43


In [86]:
data_cbind_2 = pd.DataFrame({"y1":["foo", "bar", "bar", "foo", "foo", "bar"], # Create second pandas DataFrame
                             "y2":["x", "y", "z", "x", "y", "z"],
                             "y3":range(18, 0, - 3)})
print(data_cbind_2)                                                # Print second pandas DataFrame


    y1 y2  y3
0  foo  x  18
1  bar  y  15
2  bar  z  12
3  foo  x   9
4  foo  y   6
5  bar  z   3


In [97]:
data_cbind_all = pd.concat([data_cbind_1.reset_index(drop = True), # Cbind DataFrames
                            data_cbind_2],
                           axis = 1)
print(data_cbind_all)                                              # Print combined DataFrame


      x1    x2   x3    x4   y1   y2    y3
0   10.0  30.0    a  48.0  NaN  NaN   NaN
1   11.0  29.0    b  47.0  NaN  NaN   NaN
2   12.0  28.0    c  46.0  NaN  NaN   NaN
3   13.0  27.0    d  45.0  NaN  NaN   NaN
4   14.0  26.0    e  44.0  NaN  NaN   NaN
5   15.0  25.0    f  43.0  NaN  NaN   NaN
6    NaN   NaN  NaN   NaN  foo    x  18.0
7    NaN   NaN  NaN   NaN  bar    y  15.0
8    NaN   NaN  NaN   NaN  bar    z  12.0
9    NaN   NaN  NaN   NaN  foo    x   9.0
10   NaN   NaN  NaN   NaN  foo    y   6.0
11   NaN   NaN  NaN   NaN  bar    z   3.0


In [93]:
data_merge_1 = pd.DataFrame({"ID":range(1, 5),                     # Create first pandas DataFrame
                             "x1":range(10, 14),
                             "x2":range(30, 26, - 1),
                             "x3":["a", "b", "c", "d"],
                             "x4":range(48, 44, - 1)})
print(data_merge_1)                    

   ID  x1  x2 x3  x4
0   1  10  30  a  48
1   2  11  29  b  47
2   3  12  28  c  46
3   4  13  27  d  45


In [94]:
data_merge_2 = pd.DataFrame({"ID":range(3, 9),                     # Create second pandas DataFrame
                             "y1":["foo", "bar", "bar", "foo", "foo", "bar"],
                             "y2":["x", "y", "z", "x", "y", "z"],
                             "y3":range(18, 0, - 3)})
print(data_merge_2)                                                # Print second pandas DataFrame
 

   ID   y1 y2  y3
0   3  foo  x  18
1   4  bar  y  15
2   5  bar  z  12
3   6  foo  x   9
4   7  foo  y   6
5   8  bar  z   3


In [95]:
data_merge_all = pd.merge(data_merge_1,                            # Cbind DataFrames
                          data_merge_2,
                          on = "ID",
                          how = "outer")
print(data_merge_all)                                              # Print combined DataFrame


   ID    x1    x2   x3    x4   y1   y2    y3
0   1  10.0  30.0    a  48.0  NaN  NaN   NaN
1   2  11.0  29.0    b  47.0  NaN  NaN   NaN
2   3  12.0  28.0    c  46.0  foo    x  18.0
3   4  13.0  27.0    d  45.0  bar    y  15.0
4   5   NaN   NaN  NaN   NaN  bar    z  12.0
5   6   NaN   NaN  NaN   NaN  foo    x   9.0
6   7   NaN   NaN  NaN   NaN  foo    y   6.0
7   8   NaN   NaN  NaN   NaN  bar    z   3.0


In [98]:
data_rbind_1 = pd.DataFrame({"x1":range(11, 16),                   # Create first pandas DataFrame
                             "x2":["a", "b", "c", "d", "e"],
                             "x3":range(30, 25, - 1),
                             "x4":range(30, 20, - 2)})
print(data_rbind_1) 

   x1 x2  x3  x4
0  11  a  30  30
1  12  b  29  28
2  13  c  28  26
3  14  d  27  24
4  15  e  26  22


In [99]:
data_rbind_2 = pd.DataFrame({"x1":range(3, 10),                    # Create second pandas DataFrame
                             "x2":["x", "y", "y", "y", "x", "x", "y"],
                             "x3":range(20, 6, - 2),
                             "x4":range(28, 21, - 1)})
print(data_rbind_2)

   x1 x2  x3  x4
0   3  x  20  28
1   4  y  18  27
2   5  y  16  26
3   6  y  14  25
4   7  x  12  24
5   8  x  10  23
6   9  y   8  22


In [100]:
data_rbind_all = pd.concat([data_rbind_1, data_rbind_2],           # Rbind DataFrames
                           ignore_index = True,
                           sort = False)
print(data_rbind_all)    

    x1 x2  x3  x4
0   11  a  30  30
1   12  b  29  28
2   13  c  28  26
3   14  d  27  24
4   15  e  26  22
5    3  x  20  28
6    4  y  18  27
7    5  y  16  26
8    6  y  14  25
9    7  x  12  24
10   8  x  10  23
11   9  y   8  22
