<a href="https://colab.research.google.com/github/drshahizan/Pandas_Numpy/blob/main/Pandas_Text_manipulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Working With Text Data
Series and Indexes are equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically. These are accessed via the str attribute and generally, have names matching the equivalent (scalar) built-in string methods.

## Lowercasing and Uppercasing a Data
In order to lowercase a data, we use str.lower() this function converts all uppercase characters to lowercase. If no uppercase characters exist, it returns the original string. In order to uppercase a data, we use str.upper() this function converts all lowercase characters to uppercase. If no lowercase characters exist, it returns the original string.

In [None]:
# Import pandas package 
import pandas as pd 
   
# Define a dictionary containing employee data 
data = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data) 
   
# converting and overwriting values in column 
df["Name"]= df["Name"].str.lower()
 
print(df)

       Name  Age       Address Qualification
0  shahizan   27   Johor Bahru           Msc
1      liza   24        Melaka            MA
2     aiman   22  Kuala Lumpur           MCA
3    atiqah   32      Seremban           Phd


In [2]:
# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 

url ='https://raw.githubusercontent.com/drshahizan/dataset/main/nba.csv'
data = pd.read_csv(url)
   
# converting and overwriting values in column 
data["Team"]= data["Team"].str.upper() 
   
# display 
data 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,BOSTON CELTICS,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,BOSTON CELTICS,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,BOSTON CELTICS,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,BOSTON CELTICS,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,BOSTON CELTICS,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,UTAH JAZZ,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,UTAH JAZZ,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,UTAH JAZZ,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,UTAH JAZZ,24.0,C,26.0,7-0,231.0,Kansas,947276.0


## Splitting and Replacing a Data
In order to split a data, we use str.split() this function returns a list of strings after breaking the given string by the specified separator but it can only be applied to an individual string. Pandas str.split() method can be applied to a whole series. .str has to be prefixed every time before calling this method to differentiate it from the Python’s default function otherwise, it will throw an error. In order to replace a data, we use str.replace() this function works like Python .replace() method only, but it works on Series too. Before calling .replace() on a Pandas series, .str has to be prefixed in order to differentiate it from the Python’s default replace method.

In [3]:
# importing pandas module 
import pandas as pd
 
# reading csv file from url
url ='https://raw.githubusercontent.com/drshahizan/dataset/main/nba.csv'
data = pd.read_csv(url)
 
# overwriting column with replaced value of age
data["Age"]= data["Age"].replace(25.0, "Twenty five")
 
# creating a filter for age column 
# where age = "Twenty five"
filter = data["Age"]=="Twenty five"
 
# printing only filtered columns
data.where(filter).dropna()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,Twenty five,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,Twenty five,6-6,235.0,Marquette,6796117.0
7,Kelly Olynyk,Boston Celtics,41.0,C,Twenty five,7-0,238.0,Gonzaga,2165160.0
26,Thomas Robinson,Brooklyn Nets,41.0,PF,Twenty five,6-10,237.0,Kansas,981348.0
35,Cleanthony Early,New York Knicks,11.0,SF,Twenty five,6-8,210.0,Wichita State,845059.0
44,Derrick Williams,New York Knicks,23.0,PF,Twenty five,6-8,240.0,Arizona,4000000.0
47,Isaiah Canaan,Philadelphia 76ers,0.0,PG,Twenty five,6-0,201.0,Murray State,947276.0
48,Robert Covington,Philadelphia 76ers,33.0,SF,Twenty five,6-9,215.0,Tennessee State,1000000.0
59,Hollis Thompson,Philadelphia 76ers,31.0,SG,Twenty five,6-8,206.0,Georgetown,947276.0
71,Terrence Ross,Toronto Raptors,31.0,SF,Twenty five,6-7,195.0,Washington,3553917.0


## Concatenation of Data
In order to concatenate a Series or Index, we use str.cat() this function is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. .str has to be prefixed to differentiate it from the Python’s default method.

In [None]:
# importing pandas module 
import pandas as pd 
   
# Define a dictionary containing employee data 
data = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data) 
 
# making copy of address column 
new = df["Address"].copy() 
   
# concatenating address with name column 
# overwriting name column 
df["Name"]= df["Name"].str.cat(new, sep =", ") 
   
# display 
print(df)

                    Name  Age       Address Qualification
0  Shahizan, Johor Bahru   27   Johor Bahru           Msc
1           Liza, Melaka   24        Melaka            MA
2    Aiman, Kuala Lumpur   22  Kuala Lumpur           MCA
3       Atiqah, Seremban   32      Seremban           Phd


In [4]:
# importing pandas module
import pandas as pd
 
# importing csv from link
url ='https://raw.githubusercontent.com/drshahizan/dataset/main/nba.csv'
data = pd.read_csv(url)
 
# making copy of team column
new = data["Team"].copy()
 
# concatenating team with name column
# overwriting name column
data["Name"]= data["Name"].str.cat(new, sep =", ")
 
# display
data

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,"Avery Bradley, Boston Celtics",Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,"Jae Crowder, Boston Celtics",Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,"John Holland, Boston Celtics",Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,"R.J. Hunter, Boston Celtics",Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,"Jonas Jerebko, Boston Celtics",Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,"Shelvin Mack, Utah Jazz",Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,"Raul Neto, Utah Jazz",Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,"Tibor Pleiss, Utah Jazz",Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,"Jeff Withey, Utah Jazz",Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


### Removing Whitespaces of Data
In order to remove a whitespaces, we use str.strip(), str.rstrip(), str.lstrip() these function used to handle white spaces(including New line) in any text data. As it can be seen in the name, str.lstrip() is used to remove spaces from the left side of string, str.rstrip() to remove spaces from right side of the string and str.strip() removes spaces from both sides. Since these are pandas function with same name as Python’s default functions, .str has to be prefixed to tell the compiler that a Pandas function is being called.

In [None]:
# importing pandas module 
import pandas as pd 
   
# Define a dictionary containing employee data 
data = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru junction', 'Melaka junction', 'Kuala Lumpur junction', 'Seremban junction'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data)
   
# replacing address name and adding spaces in start and end 
new = df["Address"].replace("Melaka junction", "  Melaka junction  ").copy() 
   
# checking with custom string 
print(new.str.strip()==" Melaka junction")
print(new.str.strip()=="Melaka junction ")
print(new.str.strip()==" Melaka junction ")

0    False
1    False
2    False
3    False
Name: Address, dtype: bool
0    False
1    False
2    False
3    False
Name: Address, dtype: bool
0    False
1    False
2    False
3    False
Name: Address, dtype: bool


In [5]:
# importing pandas module 
import pandas as pd 
   
# making data frame 
url ='https://raw.githubusercontent.com/drshahizan/dataset/main/nba.csv'
data = pd.read_csv(url)
   
# replacing team name and adding spaces in start and end 
new = data["Team"].replace("Boston Celtics", "  Boston Celtics  ").copy() 
   
# checking with custom removed space string 
new.str.lstrip()=="Boston Celtics  "

0       True
1       True
2       True
3       True
4       True
       ...  
453    False
454    False
455    False
456    False
457    False
Name: Team, Length: 458, dtype: bool

### Extracting a Data
In order to extract a data, we use str.extract() this function accepts a regular expression with at least one capture group. Extracting a regular expression with more than one group returns a DataFrame with one column per group. Elements that do not match return a row filled with NaN.

In [None]:
# importing pandas module 
import pandas as pd 
 
# creating a series 
s = pd.Series(['a1', 'b2', 'c3'])
 
# Extracting a data
n= s.str.extract(r'([ab])(\d)')
 
print(n)

     0    1
0    a    1
1    b    2
2  NaN  NaN


In [None]:
# importing pandas module 
import pandas as pd 
 
# creating a series 
s = pd.Series(['a1', 'b2', 'c3'])
 
# Extracting a data
n = s.str.extract(r'(?P<Column_1>[ab])(?P<Column_2>\d)')
 
print(n)

  Column_1 Column_2
0        a        1
1        b        2
2      NaN      NaN


# Pandas Merging, Joining, and Concatenating
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining, merging and concating different dataframe.

In order to concat dataframe, we use concat() function which helps in concatenating a dataframe. We can concat a dataframe in many different ways, they are:
* Concatenating DataFrame using .concat()
* Concatenating DataFrame by setting logic on axes
* Concatenating DataFrame using .append()
* Concatenating DataFrame by ignoring indexes
* Concatenating DataFrame with group keys
* Concatenating with mixed ndims

## Concatenating DataFrame using .concat()
In order to concat a dataframe, we use .concat() function this function concat a dataframe and returns a new dataframe.

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
# Define a dictionary containing employee data 
data2 = {'Name':['Chong Wei', 'Ravi', 'James', 'Amy'], 
        'Age':[17, 14, 12, 52], 
        'Address':['Shah Alam', 'Muar', 'Tangkak', 'Mersing'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
 
print(df, "\n\n", df1) 

       Name  Age       Address Qualification
0  Shahizan   27   Johor Bahru           Msc
1      Liza   24        Melaka            MA
2     Aiman   22  Kuala Lumpur           MCA
3    Atiqah   32      Seremban           Phd 

         Name  Age    Address Qualification
4  Chong Wei   17  Shah Alam         Btech
5       Ravi   14       Muar           B.A
6      James   12    Tangkak          Bcom
7        Amy   52    Mersing        B.hons


Now we apply .concat function in order to concat two dataframe

In [None]:
# using a .concat() method
frames = [df, df1]
 
res1 = pd.concat(frames)
res1

Unnamed: 0,Name,Age,Address,Qualification
0,Shahizan,27,Johor Bahru,Msc
1,Liza,24,Melaka,MA
2,Aiman,22,Kuala Lumpur,MCA
3,Atiqah,32,Seremban,Phd
4,Chong Wei,17,Shah Alam,Btech
5,Ravi,14,Muar,B.A
6,James,12,Tangkak,Bcom
7,Amy,52,Mersing,B.hons


## Concatenating DataFrame by setting logic on axes
In order to concat dataframe, we have to set different logic on axes. We can set axes in the following three ways:

Taking the union of them all, join='outer'. This is the default option as it results in zero information loss.
Taking the intersection, join='inner'.
Use a specific index, as passed to the join_axes argument

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd'],
        'Mobile No': [97, 91, 58, 76]} 
   
# Define a dictionary containing employee data 
data2 = {'Name':['Chong Wei', 'Ravi', 'James', 'Amy'], 
        'Age':[22, 32, 12, 52], 
        'Address':['Shah Alam', 'Muar', 'Tangkak', 'Mersing'], 
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'],
        'Salary':[1000, 2000, 3000, 4000]} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=[2, 3, 6, 7]) 
 
print(df, "\n\n", df1) 

       Name  Age       Address Qualification  Mobile No
0  Shahizan   27   Johor Bahru           Msc         97
1      Liza   24        Melaka            MA         91
2     Aiman   22  Kuala Lumpur           MCA         58
3    Atiqah   32      Seremban           Phd         76 

         Name  Age    Address Qualification  Salary
2  Chong Wei   22  Shah Alam           MCA    1000
3       Ravi   32       Muar           Phd    2000
6      James   12    Tangkak          Bcom    3000
7        Amy   52    Mersing        B.hons    4000


Now we set axes join = inner for intersection of dataframe

In [None]:
# applying concat with axes
# join = 'inner'
res2 = pd.concat([df, df1], axis=1, join='inner')
 
res2

Unnamed: 0,Name,Age,Address,Qualification,Mobile No,Name.1,Age.1,Address.1,Qualification.1,Salary
2,Aiman,22,Kuala Lumpur,MCA,58,Chong Wei,22,Shah Alam,MCA,1000
3,Atiqah,32,Seremban,Phd,76,Ravi,32,Muar,Phd,2000


Now we set axes join = outer for union of dataframe.

In [None]:
# using a .concat for
# union of dataframe
res2 = pd.concat([df, df1], axis=1, sort=False)
 
res2

Unnamed: 0,Name,Age,Address,Qualification,Mobile No,Name.1,Age.1,Address.1,Qualification.1,Salary
0,Shahizan,27.0,Johor Bahru,Msc,97.0,,,,,
1,Liza,24.0,Melaka,MA,91.0,,,,,
2,Aiman,22.0,Kuala Lumpur,MCA,58.0,Chong Wei,22.0,Shah Alam,MCA,1000.0
3,Atiqah,32.0,Seremban,Phd,76.0,Ravi,32.0,Muar,Phd,2000.0
6,,,,,,James,12.0,Tangkak,Bcom,3000.0
7,,,,,,Amy,52.0,Mersing,B.hons,4000.0


## Concatenating DataFrame using .append()
In order to concat a dataframe, we use .append() function this function concatenate along axis=0, namely the index. This function exist before .concat.

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
# Define a dictionary containing employee data 
data2 = {'Name':['Chong Wei', 'Ravi', 'James', 'Amy'], 
        'Age':[22, 32, 12, 52], 
        'Address':['Shah Alam', 'Muar', 'Tangkak', 'Mersing'],  
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
 
print(df, "\n\n", df1) 

       Name  Age       Address Qualification
0  Shahizan   27   Johor Bahru           Msc
1      Liza   24        Melaka            MA
2     Aiman   22  Kuala Lumpur           MCA
3    Atiqah   32      Seremban           Phd 

         Name  Age    Address Qualification
4  Chong Wei   22  Shah Alam         Btech
5       Ravi   32       Muar           B.A
6      James   12    Tangkak          Bcom
7        Amy   52    Mersing        B.hons


Now we apply .append() function inorder to concat to dataframe

In [None]:
# using append function
 
res = df.append(df1)
res

Unnamed: 0,Name,Age,Address,Qualification
0,Shahizan,27,Johor Bahru,Msc
1,Liza,24,Melaka,MA
2,Aiman,22,Kuala Lumpur,MCA
3,Atiqah,32,Seremban,Phd
4,Chong Wei,22,Shah Alam,Btech
5,Ravi,32,Muar,B.A
6,James,12,Tangkak,Bcom
7,Amy,52,Mersing,B.hons


## Concatenating DataFrame by ignoring indexes
In order to concat a dataframe by ignoring indexes, we ignore index which don’t have a meaningful meaning, you may wish to append them and ignore the fact that they
may have overlapping indexes. In order to do that we use ignore_index as an argument.

In [None]:
# importing pandas module
import pandas as pd 
  
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd'],
        'Mobile No': [97, 91, 58, 76]} 
    
# Define a dictionary containing employee data 
data2 = {'Name':['Chong Wei', 'Ravi', 'James', 'Amy'], 
        'Age':[22, 32, 12, 52], 
        'Address':['Shah Alam', 'Muar', 'Tangkak', 'Mersing'],  
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'],
        'Salary':[1000, 2000, 3000, 4000]} 
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
  
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=[2, 3, 6, 7]) 
  
  
print(df, "\n\n", df1) 

       Name  Age       Address Qualification  Mobile No
0  Shahizan   27   Johor Bahru           Msc         97
1      Liza   24        Melaka            MA         91
2     Aiman   22  Kuala Lumpur           MCA         58
3    Atiqah   32      Seremban           Phd         76 

         Name  Age    Address Qualification  Salary
2  Chong Wei   22  Shah Alam           MCA    1000
3       Ravi   32       Muar           Phd    2000
6      James   12    Tangkak          Bcom    3000
7        Amy   52    Mersing        B.hons    4000


Now we are going to apply ignore_index as an argument.

In [None]:
# using ignore_index
res = pd.concat([df, df1], ignore_index=True)
 
res

Unnamed: 0,Name,Age,Address,Qualification,Mobile No,Salary
0,Shahizan,27,Johor Bahru,Msc,97.0,
1,Liza,24,Melaka,MA,91.0,
2,Aiman,22,Kuala Lumpur,MCA,58.0,
3,Atiqah,32,Seremban,Phd,76.0,
4,Chong Wei,22,Shah Alam,MCA,,1000.0
5,Ravi,32,Muar,Phd,,2000.0
6,James,12,Tangkak,Bcom,,3000.0
7,Amy,52,Mersing,B.hons,,4000.0


## Concatenating DataFrame with group keys :
In order to concat dataframe with group keys, we override the column names with the use of the keys argument. Keys argument is to override the column names when creating a new DataFrame based on existing Series.

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
# Define a dictionary containing employee data 
data2 = {'Name':['Chong Wei', 'Ravi', 'James', 'Amy'], 
        'Age':[22, 32, 12, 52], 
        'Address':['Shah Alam', 'Muar', 'Tangkak', 'Mersing'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
 
print(df, "\n\n", df1)  

       Name  Age       Address Qualification
0  Shahizan   27   Johor Bahru           Msc
1      Liza   24        Melaka            MA
2     Aiman   22  Kuala Lumpur           MCA
3    Atiqah   32      Seremban           Phd 

         Name  Age    Address Qualification
4  Chong Wei   22  Shah Alam         Btech
5       Ravi   32       Muar           B.A
6      James   12    Tangkak          Bcom
7        Amy   52    Mersing        B.hons


Now we use keys as an argument.



In [None]:
# using keys 
frames = [df, df1 ]
 
res = pd.concat(frames, keys=['x', 'y'])
res

Unnamed: 0,Unnamed: 1,Name,Age,Address,Qualification
x,0,Shahizan,27,Johor Bahru,Msc
x,1,Liza,24,Melaka,MA
x,2,Aiman,22,Kuala Lumpur,MCA
x,3,Atiqah,32,Seremban,Phd
y,4,Chong Wei,22,Shah Alam,Btech
y,5,Ravi,32,Muar,B.A
y,6,James,12,Tangkak,Bcom
y,7,Amy,52,Mersing,B.hons


## Concatenating with mixed ndims
User can concatenate a mix of Series and DataFrame. The Series will be transformed to DataFrame with the column name as the name of the Series.

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# creating a series
s1 = pd.Series([1000, 2000, 3000, 4000], name='Salary')
 
print(df, "\n\n", s1) 

       Name  Age       Address Qualification
0  Shahizan   27   Johor Bahru           Msc
1      Liza   24        Melaka            MA
2     Aiman   22  Kuala Lumpur           MCA
3    Atiqah   32      Seremban           Phd 

 0    1000
1    2000
2    3000
3    4000
Name: Salary, dtype: int64


Now we are going to mix Series and dataframe together



In [None]:
# combining series and dataframe
res = pd.concat([df, s1], axis=1)
 
res

Unnamed: 0,Name,Age,Address,Qualification,Salary
0,Shahizan,27,Johor Bahru,Msc,1000
1,Liza,24,Melaka,MA,2000
2,Aiman,22,Kuala Lumpur,MCA,3000
3,Atiqah,32,Seremban,Phd,4000


## Merging DataFrame
Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on. It’s a good practice to use keys which have unique values throughout the column to avoid unintended duplication of row values. Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.
There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.

Code #1 : Merging a dataframe with one unique key combination

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32],} 
   
# Define a dictionary containing employee data 
data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1)
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2) 
  
 
print(df, "\n\n", df1) 

  key      Name  Age
0  K0  Shahizan   27
1  K1      Liza   24
2  K2     Aiman   22
3  K3    Atiqah   32 

   key       Address Qualification
0  K0   Johor Bahru         Btech
1  K1        Melaka           B.A
2  K2  Kuala Lumpur          Bcom
3  K3      Seremban        B.hons


Now we are using .merge() with one unique key combination

In [None]:
# using .merge() function
res = pd.merge(df, df1, on='key')
 
res

Unnamed: 0,key,Name,Age,Address,Qualification
0,K0,Shahizan,27,Johor Bahru,Btech
1,K1,Liza,24,Melaka,B.A
2,K2,Aiman,22,Kuala Lumpur,Bcom
3,K3,Atiqah,32,Seremban,B.hons


Code #2: Merging dataframe using multiple join keys.

In [None]:
# importing pandas module
import pandas as pd 
 
# Define a dictionary containing employee data 
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K1', 'K0', 'K1'],
         'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32],} 
   
# Define a dictionary containing employee data 
data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K0', 'K0', 'K0'],
         'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1)
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2) 
  
 
print(df, "\n\n", df1)

  key key1      Name  Age
0  K0   K0  Shahizan   27
1  K1   K1      Liza   24
2  K2   K0     Aiman   22
3  K3   K1    Atiqah   32 

   key key1       Address Qualification
0  K0   K0   Johor Bahru         Btech
1  K1   K0        Melaka           B.A
2  K2   K0  Kuala Lumpur          Bcom
3  K3   K0      Seremban        B.hons


In [None]:
# merging dataframe using multiple keys
res1 = pd.merge(df, df1, on=['key', 'key1'])
 
res1

Unnamed: 0,key,key1,Name,Age,Address,Qualification
0,K0,K0,Shahizan,27,Johor Bahru,Btech
1,K2,K0,Aiman,22,Kuala Lumpur,Bcom


## Merging dataframe using how in an argument
We use how argument to merge specifies how to determine which keys are to be included in the resulting table. If a key combination does not appear in either the left or right tables, the values in the joined table will be NA. Here is a summary of the how options and their SQL equivalent names:

Now we set how = 'left' in order to use keys from left frame only.



In [None]:
# using keys from left frame
res = pd.merge(df, df1, how='left', on=['key', 'key1'])
 
res

Unnamed: 0,key,key1,Name,Age,Address,Qualification
0,K0,K0,Shahizan,27,Johor Bahru,Btech
1,K1,K1,Liza,24,,
2,K2,K0,Aiman,22,Kuala Lumpur,Bcom
3,K3,K1,Atiqah,32,,


Now we set how = 'right' in order to use keys from right frame only.

In [None]:
# using keys from right frame
res1 = pd.merge(df, df1, how='right', on=['key', 'key1'])
 
res1

Unnamed: 0,key,key1,Name,Age,Address,Qualification
0,K0,K0,Shahizan,27.0,Johor Bahru,Btech
1,K1,K0,,,Melaka,B.A
2,K2,K0,Aiman,22.0,Kuala Lumpur,Bcom
3,K3,K0,,,Seremban,B.hons


Now we set how = 'outer' in order to get union of keys from dataframes.

In [None]:
# getting union  of keys
res2 = pd.merge(df, df1, how='outer', on=['key', 'key1'])
 
res2

Unnamed: 0,key,key1,Name,Age,Address,Qualification
0,K0,K0,Shahizan,27.0,Johor Bahru,Btech
1,K1,K1,Liza,24.0,,
2,K2,K0,Aiman,22.0,Kuala Lumpur,Bcom
3,K3,K1,Atiqah,32.0,,
4,K1,K0,,,Melaka,B.A
5,K3,K0,,,Seremban,B.hons


Now we set how = 'inner' in order to get intersection of keys from dataframes.



In [None]:
# getting intersection of keys
res3 = pd.merge(df, df1, how='inner', on=['key', 'key1'])
 
res3

Unnamed: 0,key,key1,Name,Age,Address,Qualification
0,K0,K0,Shahizan,27,Johor Bahru,Btech
1,K2,K0,Aiman,22,Kuala Lumpur,Bcom


## Joining DataFrame
In order to join dataframe, we use .join() function this function is used for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

In [None]:
# importing pandas module
import pandas as pd 
  
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32]} 
    
# Define a dictionary containing employee data 
data2 = {'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} 
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3'])
  
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])
 
 
print(df, "\n\n", df1)  

        Name  Age
K0  Shahizan   27
K1      Liza   24
K2     Aiman   22
K3    Atiqah   32 

          Address Qualification
K0   Johor Bahru           MCA
K2        Melaka           Phd
K3  Kuala Lumpur          Bcom
K4      Seremban        B.hons


Now we are use .join() method in order to join dataframes

In [None]:
# joining dataframe
res = df.join(df1)
 
res

Unnamed: 0,Name,Age,Address,Qualification
K0,Shahizan,27,Johor Bahru,MCA
K1,Liza,24,,
K2,Aiman,22,Melaka,Phd
K3,Atiqah,32,Kuala Lumpur,Bcom


Now we use how = 'outer' in order to get union

In [None]:
# getting union
res1 = df.join(df1, how='outer')
 
res1

Unnamed: 0,Name,Age,Address,Qualification
K0,Shahizan,27.0,Johor Bahru,MCA
K1,Liza,24.0,,
K2,Aiman,22.0,Melaka,Phd
K3,Atiqah,32.0,Kuala Lumpur,Bcom
K4,,,Seremban,B.hons


## Joining dataframe using on in an argument
In order to join dataframes we use on in an argument. join() takes an optional on argument which may be a column or multiple column names, which specifies that the passed DataFrame is to be aligned on that column in the DataFrame.

In [None]:
# importing pandas module
import pandas as pd 
  
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman', 'Atiqah'], 
        'Age':[27, 24, 22, 32],
        'Key':['K0', 'K1', 'K2', 'K3']} 
    
# Define a dictionary containing employee data 
data2 = {'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} 
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1)
  
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])
 
 
print(df, "\n\n", df1) 

       Name  Age Key
0  Shahizan   27  K0
1      Liza   24  K1
2     Aiman   22  K2
3    Atiqah   32  K3 

          Address Qualification
K0   Johor Bahru           MCA
K2        Melaka           Phd
K3  Kuala Lumpur          Bcom
K4      Seremban        B.hons


In [None]:
# using on argument in join
res2 = df.join(df1, on='Key')
 
res2

Unnamed: 0,Name,Age,Key,Address,Qualification
0,Shahizan,27,K0,Johor Bahru,MCA
1,Liza,24,K1,,
2,Aiman,22,K2,Melaka,Phd
3,Atiqah,32,K3,Kuala Lumpur,Bcom


## Joining singly-indexed DataFrame with multi-indexed DataFrame
In order to join singly indexed dataframe with multi-indexed dataframe, the level will match on the name of the index of the singly-indexed frame against a level name of the multi-indexed frame.



In [None]:
# importing pandas module
import pandas as pd 
  
# Define a dictionary containing employee data 
data1 = {'Name':['Shahizan', 'Liza', 'Aiman'], 
        'Age':[27, 24, 22]} 
    
# Define a dictionary containing employee data 
data2 = {'Address':['Johor Bahru', 'Melaka', 'Kuala Lumpur', 'Seremban'], 
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} 
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1, index=pd.Index(['K0', 'K1', 'K2'], name='key'))
 
index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
                                   ('K2', 'Y2'), ('K2', 'Y3')],
                                   names=['key', 'Y'])
  
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index= index)
 
 
print(df, "\n\n", df1)

         Name  Age
key               
K0   Shahizan   27
K1       Liza   24
K2      Aiman   22 

              Address Qualification
key Y                             
K0  Y0   Johor Bahru           MCA
K1  Y1        Melaka           Phd
K2  Y2  Kuala Lumpur          Bcom
    Y3      Seremban        B.hons


Now we join singly indexed dataframe with multi-indexed dataframe



In [None]:
# joining singly indexed with
# multi indexed
result = df.join(df1, how='inner')
 
result

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Age,Address,Qualification
key,Y,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
K0,Y0,Shahizan,27,Johor Bahru,MCA
K1,Y1,Liza,24,Melaka,Phd
K2,Y2,Aiman,22,Kuala Lumpur,Bcom
K2,Y3,Aiman,22,Seremban,B.hons
