# Pandas

In [1]:
import pandas as pd

In [None]:
Pandas is a powerful and popular open-source Python library used for data manipulation
and analysis. It provides data structures and functions that make working with structured
data both easy and intuitive. Pandas is widely used in data science, machine learning,
and other related fields for tasks such as data cleaning, data preparation, and data
analysis. Below are some key components and functionalities of Pandas:

#Data Structures:

Series: A one-dimensional labeled array capable of holding any data type.
DataFrame: A two-dimensional labeled data structure with columns of potentially
different types.

#Key functionalities:

Data reading and writing: Pandas can read data from various file formats such as CSV,
Excel, SQL databases, and more.

Data manipulation: It provides methods for filtering, selecting, transforming,
and cleaning data.
Data alignment and handling missing data: Pandas enables easy alignment of data, and 
it provides functionalities to handle missing or NaN values.

Grouping and aggregating: It allows grouping data based on certain criteria and performing
various aggregate functions on grouped data.

Merging and joining: Pandas supports merging and joining different datasets based on
common columns or indices.

Time series functionality: Pandas has extensive support for time series data handling 
and manipulation.

Data visualization: While not its primary focus, Pandas can be used in conjunction with
other libraries like Matplotlib or Seaborn for data visualization.

### importance of  Pandas

In [None]:
The importance of Pandas stems from its ability to simplify and streamline the data
manipulation and analysis process in Python. Some key reasons why Pandas is essential 
in the data science and analysis ecosystem include:

Data Handling: Pandas provides easy-to-use data structures like DataFrame and Series 
that can handle both structured and unstructured data, making it easier to manage and
manipulate large datasets.

Data Cleaning: It offers a wide range of tools to clean messy data, handle missing values,
and deal with data inconsistencies, enabling users to preprocess data effectively before 
analysis.

Data Transformation: Pandas facilitates various data transformation operations, such a
s reshaping, pivoting, and merging, which are crucial for preparing data for downstream 
analysis or machine learning tasks.

    
Data Analysis: It supports various statistical and analytical operations, including grouping,
aggregation, and time series analysis, which are fundamental for deriving insights and making
data-driven decisions.

Integration with Other Libraries: Pandas seamlessly integrates with other libraries commonly 
used for data analysis and visualization, such as NumPy, Matplotlib, and Seaborn, enhancing
its capabilities and making it a powerful tool for data scientists.

Time Series Analysis: Pandas has robust support for handling time series data, which is 
essential in many real-world applications, including finance, economics, and other domains 
where time-based data analysis is crucial.

Data I/O Operations: Pandas supports data input/output operations for various file formats
such as CSV, Excel, SQL databases, and more, making it easy to read and write data to and 
from different sources.

Flexibility and Customization: It allows users to customize and tailor data analysis 
workflows to their specific requirements, enabling them to perform complex data operations 
with relative ease and efficiency.

In essence, Pandas is a fundamental tool for data manipulation and analysis in Python,
offering a high level of flexibility, efficiency, and ease of use. Its extensive 
functionalities and intuitive interface make it a crucial component in the toolkit
of any data scientist, analyst, or researcher dealing with data-centric tasks.







### Types of Data Structure  in Pandas
##### 1. Series
##### 1. DataFrame
##### 1. Panel




### Series 

In [None]:
A Series is a one-dimensional labeled array that can hold data of any type 
(integers, strings, floating-point numbers, Python objects, etc.). It is similar 
to a one-dimensional NumPy array but comes with additional functionalities,
such as indexing by labels.

It has two main components: the data and the index. The index provides labels for 
each element in the Series, allowing for easy and efficient data access.
Series are often used to represent a single column of a DataFrame or as standalone
data structures for simpler tasks.

In [3]:
data = [1,3,6,8,9,10,11]
x = pd.Series(data)
print(x)
print()
print(type(x))

0     1
1     3
2     6
3     8
4     9
5    10
6    11
dtype: int64

<class 'pandas.core.series.Series'>


In [4]:
data = [1,3,6,8,9,10,11]
x = pd.Series(data)
print(x)
print()
print(x[2])

0     1
1     3
2     6
3     8
4     9
5    10
6    11
dtype: int64

6


In [8]:
data = [1,3,6,8]
x = pd.Series(data, index = ['a','b','c','d'],dtype = 'f')
print(x)


a    1.0
b    3.0
c    6.0
d    8.0
dtype: float32


In [9]:
data = [1,3,6,8]
x = pd.Series(data, index = ['a','b','c','d'],dtype = 'f', name = "Pandas")
print(x)


a    1.0
b    3.0
c    6.0
d    8.0
Name: Pandas, dtype: float32


In [14]:
dic = {"Name":"Shahabaz","Age":21 }
data = pd.Series(dic)
print(data)

Name    Shahabaz
Age           21
dtype: object


In [15]:
sr = pd.Series(12)
print(sr)

0    12
dtype: int64


In [18]:
sr = pd.Series(12,index = [0,1,2,3,4,5,6])
print(sr)

0    12
1    12
2    12
3    12
4    12
5    12
6    12
dtype: int64


In [19]:
s1 = pd.Series(12, index = [1,2,3,4,5,6,7])
s2 = pd.Series(12, index = [1,2,3,4])
print(s1 + s2)

1    24.0
2    24.0
3    24.0
4    24.0
5     NaN
6     NaN
7     NaN
dtype: float64


### DataFrame 

In [22]:
lst = [1,2,3,4,5,6,7]
var = pd.DataFrame(lst)
print(var)

   0
0  1
1  2
2  3
3  4
4  5
5  6
6  7


In [24]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3]}
var = pd.DataFrame(dic)
print(var)

  language  Raking
0        C       1
1      C++       2
2   Python       4
3     Java       3


In [25]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4]}
#ValueError: All arrays must be of the same length

var = pd.DataFrame(dic)
print(var)

ValueError: All arrays must be of the same length

In [27]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3]}


var = pd.DataFrame(dic, columns = ['language'])
print(var)

  language
0        C
1      C++
2   Python
3     Java


In [28]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3]}


var = pd.DataFrame(dic, row = [2])
print(var)

TypeError: DataFrame.__init__() got an unexpected keyword argument 'row'

In [29]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3],'year':[1974, 1981,1991,1993]}


var = pd.DataFrame(dic)
print(var)

  language  Raking  year
0        C       1  1974
1      C++       2  1981
2   Python       4  1991
3     Java       3  1993


In [33]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3],'year':[1974, 1981,1991,1993]}


var = pd.DataFrame(dic, index = ['0-a','1-b','2-c','3-d'])
print(var)

    language  Raking  year
0-a        C       1  1974
1-b      C++       2  1981
2-c   Python       4  1991
3-d     Java       3  1993


In [34]:
dic = {'language':['C', 'C++','Python','Java'],'Raking':[1,2,4,3],'year':[1974, 1981,1991,1993]}


var = pd.DataFrame(dic, index = ['0-a','1-b','2-c','3-d'], columns = ['language','year'])
print(var)

    language  year
0-a        C  1974
1-b      C++  1981
2-c   Python  1991
3-d     Java  1993


In [36]:
# to retreve data from dataframe

dic = {'language':['C','C++','Python','Java'],'Raking':[1,2,4,3],'year':[1974, 1981,1991,1993]}

var = pd.DataFrame(dic)
print(var)
print('-------')
print(var['language'][2])

  language  Raking  year
0        C       1  1974
1      C++       2  1981
2   Python       4  1991
3     Java       3  1993
-------
Python


In [37]:
# to retreve data from dataframe

dic = {'language':['C','C++','Python','Java'],'Raking':[1,2,4,3],'year':[1974, 1981,1991,1993]}

var = pd.DataFrame(dic)
print(var)
print('-------')
print(var['language'][2],['year'][2])






  language  Raking  year
0        C       1  1974
1      C++       2  1981
2   Python       4  1991
3     Java       3  1993
-------


IndexError: list index out of range

In [38]:
#List with DataFrame
lst = [[1,2,3,4,5],[3,4,5,6,7]]

var = pd.Series(lst)
print(type(var))
print()
print(var)

<class 'pandas.core.series.Series'>

0    [1, 2, 3, 4, 5]
1    [3, 4, 5, 6, 7]
dtype: object


In [39]:
#List with DataFrame
lst = [[1,2,3,4,5],[3,4,5]]

var = pd.Series(lst)
print(type(var))
print()
print(var)

<class 'pandas.core.series.Series'>

0    [1, 2, 3, 4, 5]
1          [3, 4, 5]
dtype: object


In [44]:
#Series wirh dataFrame

sr = {'s':pd.Series([1,2,3,4,5]), 'h':pd.Series([1,3,4,5]), 'a':pd.Series([1,2,4,5])}

var = pd.DataFrame(sr)
print(type(var))
print()
print(var)

<class 'pandas.core.frame.DataFrame'>

   s    h    a
0  1  1.0  1.0
1  2  3.0  2.0
2  3  4.0  4.0
3  4  5.0  5.0
4  5  NaN  NaN


## Arithematic Operation in Pandas 

In [45]:
var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5


In [46]:
# Add

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)
print()

var["C"] = var["A"] + var["B"]
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B   C
0  1  0   1
1  4  3   7
2  7  6  13
3  2  3   5
4  9  5  14


In [47]:
# Sub

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)
print()

var["C"] = var["A"] - var["B"]
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B  C
0  1  0  1
1  4  3  1
2  7  6  1
3  2  3 -1
4  9  5  4


In [48]:
# Mul

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)
print()

var["C"] = var["A"] * var["B"]
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B   C
0  1  0   0
1  4  3  12
2  7  6  42
3  2  3   6
4  9  5  45


In [49]:
# Div

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)
print()

var["C"] = var["A"] / var["B"]
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B         C
0  1  0       inf
1  4  3  1.333333
2  7  6  1.166667
3  2  3  0.666667
4  9  5  1.800000


In [50]:
# Mod

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)
print()

var["C"] = var["A"] % var["B"]
print(var)

   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B    C
0  1  0  NaN
1  4  3  1.0
2  7  6  1.0
3  2  3  2.0
4  9  5  4.0


In [54]:
# Logical Operation

# Sub

var = pd.DataFrame({"A":[1,4,7,2,9],"B":[0,3,6,3,5]})
print(var)

var["CHECK"] = var["A"] >= 5
print()
print(var)


   A  B
0  1  0
1  4  3
2  7  6
3  2  3
4  9  5

   A  B  CHECK
0  1  0  False
1  4  3  False
2  7  6   True
3  2  3  False
4  9  5   True


In [56]:
# Logical Operation

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)

var["CHECK"] = var["B"] <= 7
print()
print(var)


   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A   B  CHECK
0  1   0   True
1  2   3   True
2  4   6   True
3  7   9  False
4  9  15  False


### Delete & Insert in Pandas 

#### Insert 

In [60]:

var = pd.DataFrame({"A":[1,3,5,8,23,34],"B":[2,4,6,9,7,16]})
print(var)

    A   B
0   1   2
1   3   4
2   5   6
3   8   9
4  23   7
5  34  16


In [62]:
var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()
var.insert(2,'INSERT',[12,13,14,15,16])
print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A   B  INSERT
0  1   0      12
1  2   3      13
2  4   6      14
3  7   9      15
4  9  15      16


In [63]:
#Insert (index,column_name, data_to_insert)

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()

var.insert(1,'INSERT',[12,13,14,15,16])
print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A  INSERT   B
0  1      12   0
1  2      13   3
2  4      14   6
3  7      15   9
4  9      16  15


In [64]:
# ValueError: Length of values (4) does not match length of index (5) 

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()

var.insert(1,'INSERT',[12,13,14,15])
print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15



ValueError: Length of values (4) does not match length of index (5)

In [67]:
# Copy one column data  to insert in new column

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()

var.insert(1,'INSERT',var["A"])
print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A  INSERT   B
0  1       1   0
1  2       2   3
2  4       4   6
3  7       7   9
4  9       9  15


In [68]:
# Copy one column data  to insert in new column to requered data fetch

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()

var["DATA"] = var["A"][:2]

print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A   B  DATA
0  1   0   1.0
1  2   3   2.0
2  4   6   NaN
3  7   9   NaN
4  9  15   NaN


In [69]:
# Copy one column data  to insert in new column to requered data fetch

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15]})
print(var)
print()

var["DATA"] = var["A"][::2]

print(var)

   A   B
0  1   0
1  2   3
2  4   6
3  7   9
4  9  15

   A   B  DATA
0  1   0   1.0
1  2   3   NaN
2  4   6   4.0
3  7   9   NaN
4  9  15   9.0


In [70]:
print(var)

   A   B  DATA
0  1   0   1.0
1  2   3   NaN
2  4   6   4.0
3  7   9   NaN
4  9  15   9.0


In [82]:
#pop

var = pd.DataFrame({"A":[1,2,4,7,9],"B":[0,3,6,9,15],"C":[4,6,2,4,56]})

var1 = var.pop("B")
print(var)
print(var1)


   A   C
0  1   4
1  2   6
2  4   2
3  7   4
4  9  56
0     0
1     3
2     6
3     9
4    15
Name: B, dtype: int64


In [83]:
# del
del var["A"]
print(var)

    C
0   4
1   6
2   2
3   4
4  56


## Create CSV file 

In [88]:
data = pd.DataFrame({"A":[1,3,5,7,9,11,13,15],"B":[0,2,4,6,8,10,12,14],"C":[1,2,4,6,3,8,9,12]})

data.to_csv("My_csv_data")

In [89]:
data = pd.DataFrame({"A":[1,3,5,7,9,11,13,15],"B":[0,2,4,6,8,10,12,14],"C":[1,2,4,6,3,8,9,12]})

data.to_csv("My_csv_data1", index  = False)

In [91]:
data = pd.DataFrame({"A":[1,3,5,7,9,11,13,15],"B":[0,2,4,6,8,10,12,14],"C":[1,2,4,6,3,8,9,12]})

data.to_csv("My_csv_data2",header = ['K',"L","M"], index  = False)

## Read CSV File 

In [94]:
# /Users/shahbazalam/Desktop
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
file_read


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [95]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",nrows = 2)
file_read


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1


In [96]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",ncolumns = 2)
file_read


TypeError: read_csv() got an unexpected keyword argument 'ncolumns'

In [99]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",usecols = ["JoiningYear"])
file_read


Unnamed: 0,JoiningYear
0,2017
1,2013
2,2014
3,2016
4,2017
...,...
4648,2013
4649,2013
4650,2018
4651,2012


In [100]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",usecols = ["JoiningYear","Age","Gender"])
file_read


Unnamed: 0,JoiningYear,Age,Gender
0,2017,34,Male
1,2013,28,Female
2,2014,38,Female
3,2016,27,Male
4,2017,24,Male
...,...,...,...
4648,2013,26,Female
4649,2013,37,Male
4650,2018,27,Male
4651,2012,30,Male


In [101]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",usecols = [0,2,4,5,7])
file_read


Unnamed: 0,Education,City,Age,Gender,ExperienceInCurrentDomain
0,Bachelors,Bangalore,34,Male,0
1,Bachelors,Pune,28,Female,3
2,Bachelors,New Delhi,38,Female,2
3,Masters,Bangalore,27,Male,5
4,Masters,Pune,24,Male,2
...,...,...,...,...,...
4648,Bachelors,Bangalore,26,Female,4
4649,Masters,Pune,37,Male,2
4650,Masters,New Delhi,27,Male,5
4651,Bachelors,Bangalore,30,Male,2


In [106]:
# /Users/shahbazalam/Desktop
# /Users/shahbazalam/Desktop
file = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [0,1])

file

Unnamed: 0,Bachelors,2013,Pune,1,28,Female,No,3,1.1
0,Bachelors,2014,New Delhi,3,38,Female,No,2,0
1,Masters,2016,Bangalore,3,27,Male,No,5,1
2,Masters,2017,Pune,3,24,Male,Yes,2,1
3,Bachelors,2016,Bangalore,3,22,Male,No,0,0
4,Bachelors,2015,New Delhi,3,38,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4646,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4647,Masters,2013,Pune,2,37,Male,No,2,1
4648,Masters,2018,New Delhi,3,27,Male,No,5,1
4649,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [107]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
file_read

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [110]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [0])
file_read.head()


Unnamed: 0,Bachelors,2017,Bangalore,3,34,Male,No,0,0.1
0,Bachelors,2013,Pune,1,28,Female,No,3,1
1,Bachelors,2014,New Delhi,3,38,Female,No,2,0
2,Masters,2016,Bangalore,3,27,Male,No,5,1
3,Masters,2017,Pune,3,24,Male,Yes,2,1
4,Bachelors,2016,Bangalore,3,22,Male,No,0,0


In [116]:
fl = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [2])
fl

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2014,New Delhi,3,38,Female,No,2,0
2,Masters,2016,Bangalore,3,27,Male,No,5,1
3,Masters,2017,Pune,3,24,Male,Yes,2,1
4,Bachelors,2016,Bangalore,3,22,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4647,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4648,Masters,2013,Pune,2,37,Male,No,2,1
4649,Masters,2018,New Delhi,3,27,Male,No,5,1
4650,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [121]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
file_read
fd = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [2])
fd

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2014,New Delhi,3,38,Female,No,2,0
2,Masters,2016,Bangalore,3,27,Male,No,5,1
3,Masters,2017,Pune,3,24,Male,Yes,2,1
4,Bachelors,2016,Bangalore,3,22,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4647,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4648,Masters,2013,Pune,2,37,Male,No,2,1
4649,Masters,2018,New Delhi,3,27,Male,No,5,1
4650,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [130]:
file_read = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [1])
file_read

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2013,Pune,1,28,Female,No,3,1
1,Bachelors,2014,New Delhi,3,38,Female,No,2,0
2,Masters,2016,Bangalore,3,27,Male,No,5,1
3,Masters,2017,Pune,3,24,Male,Yes,2,1
4,Bachelors,2016,Bangalore,3,22,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4647,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4648,Masters,2013,Pune,2,37,Male,No,2,1
4649,Masters,2018,New Delhi,3,27,Male,No,5,1
4650,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [134]:
dl = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",skiprows = [4])
dl

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2017,Pune,3,24,Male,Yes,2,1
4,Bachelors,2016,Bangalore,3,22,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4647,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4648,Masters,2013,Pune,2,37,Male,No,2,1
4649,Masters,2018,New Delhi,3,27,Male,No,5,1
4650,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [135]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",header = None)
var

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
1,Bachelors,2017,Bangalore,3,34,Male,No,0,0
2,Bachelors,2013,Pune,1,28,Female,No,3,1
3,Bachelors,2014,New Delhi,3,38,Female,No,2,0
4,Masters,2016,Bangalore,3,27,Male,No,5,1
...,...,...,...,...,...,...,...,...,...
4649,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4650,Masters,2013,Pune,2,37,Male,No,2,1
4651,Masters,2018,New Delhi,3,27,Male,No,5,1
4652,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [146]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",header = None, add_prefix = "col")
# var = var.add_prefix('col')
var


TypeError: read_csv() got an unexpected keyword argument 'add_prefix'

In [149]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv",header = None)
var = var.add_prefix('col_')
var


Unnamed: 0,col_0,col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8
0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
1,Bachelors,2017,Bangalore,3,34,Male,No,0,0
2,Bachelors,2013,Pune,1,28,Female,No,3,1
3,Bachelors,2014,New Delhi,3,38,Female,No,2,0
4,Masters,2016,Bangalore,3,27,Male,No,5,1
...,...,...,...,...,...,...,...,...,...
4649,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4650,Masters,2013,Pune,2,37,Male,No,2,1
4651,Masters,2018,New Delhi,3,27,Male,No,5,1
4652,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [151]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv", dtype ={'PaymentTier':"float"})

var


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3.0,34,Male,No,0,0
1,Bachelors,2013,Pune,1.0,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3.0,38,Female,No,2,0
3,Masters,2016,Bangalore,3.0,27,Male,No,5,1
4,Masters,2017,Pune,3.0,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3.0,26,Female,No,4,0
4649,Masters,2013,Pune,2.0,37,Male,No,2,1
4650,Masters,2018,New Delhi,3.0,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3.0,30,Male,Yes,2,0


### Pandas Function 

In [152]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.index


RangeIndex(start=0, stop=4653, step=1)

In [154]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.columns


Index(['Education', 'JoiningYear', 'City', 'PaymentTier', 'Age', 'Gender',
       'EverBenched', 'ExperienceInCurrentDomain', 'LeaveOrNot'],
      dtype='object')

In [157]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.rows


AttributeError: 'DataFrame' object has no attribute 'rows'

In [159]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.describe()


Unnamed: 0,JoiningYear,PaymentTier,Age,ExperienceInCurrentDomain,LeaveOrNot
count,4653.0,4653.0,4653.0,4653.0,4653.0
mean,2015.06297,2.698259,29.393295,2.905652,0.343864
std,1.863377,0.561435,4.826087,1.55824,0.475047
min,2012.0,1.0,22.0,0.0,0.0
25%,2013.0,3.0,26.0,2.0,0.0
50%,2015.0,3.0,28.0,3.0,0.0
75%,2017.0,3.0,32.0,4.0,1.0
max,2018.0,3.0,41.0,7.0,1.0


In [160]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.head()


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1


In [161]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.tail()


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0
4652,Bachelors,2015,Bangalore,3,33,Male,Yes,4,0


In [162]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.head(3)


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0


In [163]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.tail(2)


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0
4652,Bachelors,2015,Bangalore,3,33,Male,Yes,4,0


In [165]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var[:4]


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1


In [166]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var[:10]


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
5,Bachelors,2016,Bangalore,3,22,Male,No,0,0
6,Bachelors,2015,New Delhi,3,38,Male,No,0,0
7,Bachelors,2016,Bangalore,3,34,Female,No,2,1
8,Bachelors,2016,Pune,3,23,Male,No,1,0
9,Masters,2017,New Delhi,2,37,Male,No,2,0


In [167]:
# Retreve data in range
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var[2:11]


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
5,Bachelors,2016,Bangalore,3,22,Male,No,0,0
6,Bachelors,2015,New Delhi,3,38,Male,No,0,0
7,Bachelors,2016,Bangalore,3,34,Female,No,2,1
8,Bachelors,2016,Pune,3,23,Male,No,1,0
9,Masters,2017,New Delhi,2,37,Male,No,2,0
10,Masters,2012,Bangalore,3,27,Male,No,5,1


In [168]:
# type of var
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

type(var)


pandas.core.frame.DataFrame

In [169]:
# all index

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.index.array


<PandasArray>
[   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
 ...
 4643, 4644, 4645, 4646, 4647, 4648, 4649, 4650, 4651, 4652]
Length: 4653, dtype: int64

In [171]:
# Convert into Array

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.to_numpy()


array([['Bachelors', 2017, 'Bangalore', ..., 'No', 0, 0],
       ['Bachelors', 2013, 'Pune', ..., 'No', 3, 1],
       ['Bachelors', 2014, 'New Delhi', ..., 'No', 2, 0],
       ...,
       ['Masters', 2018, 'New Delhi', ..., 'No', 5, 1],
       ['Bachelors', 2012, 'Bangalore', ..., 'Yes', 2, 0],
       ['Bachelors', 2015, 'Bangalore', ..., 'Yes', 4, 0]], dtype=object)

In [174]:
# Convert into Array
import numpy as np
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

v = np.asarray(var)
v

array([['Bachelors', 2017, 'Bangalore', ..., 'No', 0, 0],
       ['Bachelors', 2013, 'Pune', ..., 'No', 3, 1],
       ['Bachelors', 2014, 'New Delhi', ..., 'No', 2, 0],
       ...,
       ['Masters', 2018, 'New Delhi', ..., 'No', 5, 1],
       ['Bachelors', 2012, 'Bangalore', ..., 'Yes', 2, 0],
       ['Bachelors', 2015, 'Bangalore', ..., 'Yes', 4, 0]], dtype=object)

In [175]:
#Ascending Order

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")

var.sort_index(axis = 0, ascending = False)


Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
4652,Bachelors,2015,Bangalore,3,33,Male,Yes,4,0
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4649,Masters,2013,Pune,2,37,Male,No,2,1
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
...,...,...,...,...,...,...,...,...,...
4,Masters,2017,Pune,3,24,Male,Yes,2,1
3,Masters,2016,Bangalore,3,27,Male,No,5,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1


In [176]:
#Bad way to change the name of row
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var["Education"][0] = "Carrer"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  var["Education"][0] = "Carrer"


In [177]:
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Carrer,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [178]:
# Good way to change the name of row 
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.loc[0,"Education"] = "Python"

In [179]:
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Python,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


In [192]:

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.loc[:,["Education","JoiningYear"]]

SyntaxError: invalid syntax (1133672516.py, line 2)

In [193]:
# all row value but specific column value
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.loc[:, ["Education", "JoiningYear"]]

Unnamed: 0,Education,JoiningYear
0,Bachelors,2017
1,Bachelors,2013
2,Bachelors,2014
3,Masters,2016
4,Masters,2017
...,...,...
4648,Bachelors,2013
4649,Masters,2013
4650,Masters,2018
4651,Bachelors,2012


In [200]:
# all column value but specific row value

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.loc[[0,3], :]

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1


In [201]:
#iloc take row and column and give rowa& col value[1][3]

var.iloc[0,1]

2017

In [203]:
# here we delete a column
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.drop("Age", axis = 1)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,Male,No,0,0
1,Bachelors,2013,Pune,1,Female,No,3,1
2,Bachelors,2014,New Delhi,3,Female,No,2,0
3,Masters,2016,Bangalore,3,Male,No,5,1
4,Masters,2017,Pune,3,Male,Yes,2,1
...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,Female,No,4,0
4649,Masters,2013,Pune,2,Male,No,2,1
4650,Masters,2018,New Delhi,3,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,Male,Yes,2,0


In [204]:
# here we delete a row

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee.csv")
var.drop(0, axis = 0)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
5,Bachelors,2016,Bangalore,3,22,Male,No,0,0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0


## Handling Missing Values
### ( dropna & fillna )

#### dropna 

In [207]:

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013,Pune,1.0,28.0,,No,3.0,1.0
2,Bachelors,2014,New Delhi,3.0,38.0,Female,No,2.0,0.0
3,Masters,2016,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016,Bangalore,3.0,22.0,Male,No,0.0,0.0
6,Bachelors,2015,New Delhi,3.0,38.0,Male,No,0.0,0.0
7,Bachelors,2016,Bangalore,3.0,34.0,Female,No,2.0,1.0
8,Bachelors,2016,Pune,3.0,23.0,Male,No,1.0,0.0
9,Masters,2017,New Delhi,2.0,37.0,,No,2.0,0.0


In [208]:

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna()

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3.0,34.0,Male,No,0.0,0.0
2,Bachelors,2014,New Delhi,3.0,38.0,Female,No,2.0,0.0
3,Masters,2016,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016,Bangalore,3.0,22.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [209]:

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(axis = 1)

Unnamed: 0,Education,JoiningYear
0,Bachelors,2017
1,Bachelors,2013
2,Bachelors,2014
3,Masters,2016
4,Masters,2017
...,...,...
4648,Bachelors,2013
4649,Masters,2013
4650,Masters,2018
4651,Bachelors,2012


In [210]:
# here we drop row where NaN value {default row-wise}
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(axis = 0)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3.0,34.0,Male,No,0.0,0.0
2,Bachelors,2014,New Delhi,3.0,38.0,Female,No,2.0,0.0
3,Masters,2016,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016,Bangalore,3.0,22.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [211]:

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [212]:
# if we want to delete row if any where null value 
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(how = "any")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
6,Bachelors,2015.0,New Delhi,3.0,38.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [213]:
# if we want to delete row which is all null value 

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(how = "all")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [214]:
# if we want to delete row which is all null value 

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [215]:
# if we want to delete all null value if a selected column then use subset🤨
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(subset = ["LeaveOrNot"])

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [222]:
# if we want to delete all null value if a selected column then use subset
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna(inplace = True)
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
6,Bachelors,2015.0,New Delhi,3.0,38.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [224]:
# if we want to delete a single null then use {thresh  = 1}
# if we want to delete selected  null then use {thresh  = like 2 use , thresh = 2}

var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.dropna( thresh = 1 )

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [225]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


### fillna Pandas

In [226]:
# if you want to fill null value with your value use fillna = "......"
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna("jadoooo")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,jadoooo,No,3.0,1.0
2,jadoooo,jadoooo,jadoooo,jadoooo,jadoooo,jadoooo,jadoooo,jadoooo,jadoooo
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [227]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [229]:
# if you want to fill null value with your value in specific column  use fillna({"":"","":""})
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna({"JoiningYear":"2050","Education":"Diploma"})

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,Diploma,2050,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [230]:
#if you want to fill null value with previous row value in specific use fillna(method ="ffill")
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna( method = "ffill")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
2,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [231]:
#if you want to fill null value with next row value in specific use fillna(method ="ffill")
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna( method = "bfill")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
2,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [236]:
#if you want to fill null value with forward row value in specific use fillna(method ="ffill")
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna( method = "ffill", axis = 0)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
2,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [237]:
#if you want to fill null value with forward row value in specific use fillna(method ="ffill")
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.fillna( method = "pad")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
2,Bachelors,2013.0,Pune,1.0,28.0,Male,No,3.0,1.0
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [238]:
#if you want to fill null value with forward row value in specific use fillna(method ="ffill")
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [242]:
#if you want to fill null value with previous column-wise value in specific use fillna(method ="ffill", axis = 1)
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var1 = var.fillna( method ="ffill", axis = 1)
var1

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,28.0,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [243]:
#if you want to fill null value with next column-wise value in specific use fillna(method ="ffill", axis = 1)
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var1 = var.fillna( method ="bfill", axis = 1)
var1

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,No,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [244]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [248]:
#if you want to fill null value with column-wise like first null fill and other not fill 
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var1 = var.fillna("sha🤪", limit = 1)
var1

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,sha🤪,No,3.0,1.0
2,sha🤪,sha🤪,sha🤪,sha🤪,sha🤪,,sha🤪,sha🤪,sha🤪
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


## Replace in Pandas

In [249]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace(to_replace = 2.0, value = 1000)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,1000.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,1000.0,37.0,Male,No,1000.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,1000.0,0.0


In [250]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace(to_replace = "Bangalore", value = "Noida")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Noida,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Noida,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Noida,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Noida,3.0,30.0,Male,Yes,2.0,0.0


In [253]:
#ExperienceInCurrentDomain in this column we replace value 
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace([1,2,3,4,5,6],244)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,244.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,244.0,28.0,,No,244.0,244.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,244.0,27.0,Male,No,244.0,244.0
4,Masters,2017.0,Pune,244.0,24.0,Male,Yes,244.0,244.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,244.0,26.0,Female,No,244.0,0.0
4649,Masters,2013.0,Pune,244.0,37.0,Male,No,244.0,244.0
4650,Masters,2018.0,New Delhi,244.0,27.0,Male,No,244.0,244.0
4651,Bachelors,2012.0,Bangalore,244.0,30.0,Male,Yes,244.0,0.0


In [254]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace("[A-Z]","Shahbaz",regex = True)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Shahbazachelors,2017.0,Shahbazangalore,3.0,34.0,Shahbazale,Shahbazo,0.0,0.0
1,Shahbazachelors,2013.0,Shahbazune,1.0,28.0,,Shahbazo,3.0,1.0
2,,,,,,,,,
3,Shahbazasters,2016.0,Shahbazangalore,3.0,27.0,Shahbazale,Shahbazo,5.0,1.0
4,Shahbazasters,2017.0,Shahbazune,3.0,24.0,Shahbazale,Shahbazes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Shahbazachelors,2013.0,Shahbazangalore,3.0,26.0,Shahbazemale,Shahbazo,4.0,0.0
4649,Shahbazasters,2013.0,Shahbazune,2.0,37.0,Shahbazale,Shahbazo,2.0,1.0
4650,Shahbazasters,2018.0,Shahbazew Shahbazelhi,3.0,27.0,Shahbazale,Shahbazo,5.0,1.0
4651,Shahbazachelors,2012.0,Shahbazangalore,3.0,30.0,Shahbazale,Shahbazes,2.0,0.0


In [255]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace("[A-Za-z]","Shahbaz",regex = True)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,2017.0,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,3.0,34.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbaz,0.0,0.0
1,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,2013.0,ShahbazShahbazShahbazShahbaz,1.0,28.0,,ShahbazShahbaz,3.0,1.0
2,,,,,,,,,
3,ShahbazShahbazShahbazShahbazShahbazShahbazShahbaz,2016.0,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,3.0,27.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbaz,5.0,1.0
4,ShahbazShahbazShahbazShahbazShahbazShahbazShahbaz,2017.0,ShahbazShahbazShahbazShahbaz,3.0,24.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbazShahbaz,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,2013.0,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,3.0,26.0,ShahbazShahbazShahbazShahbazShahbazShahbaz,ShahbazShahbaz,4.0,0.0
4649,ShahbazShahbazShahbazShahbazShahbazShahbazShahbaz,2013.0,ShahbazShahbazShahbazShahbaz,2.0,37.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbaz,2.0,1.0
4650,ShahbazShahbazShahbazShahbazShahbazShahbazShahbaz,2018.0,ShahbazShahbazShahbaz ShahbazShahbazShahbazSha...,3.0,27.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbaz,5.0,1.0
4651,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,2012.0,ShahbazShahbazShahbazShahbazShahbazShahbazShah...,3.0,30.0,ShahbazShahbazShahbazShahbaz,ShahbazShahbazShahbaz,2.0,0.0


In [256]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace({"Education":"[A-Z]"}, 'Jadoo', regex = True)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Jadooachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Jadooachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Jadooasters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Jadooasters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Jadooachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Jadooasters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Jadooasters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Jadooachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [257]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace({"Education":"[A-Za-z]"}, 'Jadoo', regex = True)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,JadooJadooJadooJadooJadooJadooJadooJadooJadoo,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,JadooJadooJadooJadooJadooJadooJadooJadooJadoo,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,JadooJadooJadooJadooJadooJadooJadoo,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,JadooJadooJadooJadooJadooJadooJadoo,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,JadooJadooJadooJadooJadooJadooJadooJadooJadoo,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,JadooJadooJadooJadooJadooJadooJadoo,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,JadooJadooJadooJadooJadooJadooJadoo,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,JadooJadooJadooJadooJadooJadooJadooJadooJadoo,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [258]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [264]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace(2.0, method = "ffill")

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,5.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,3.0,37.0,Male,No,4.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,5.0,0.0


In [267]:
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace("Bachelors", method = "bfill", limit = 2)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Masters,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [268]:
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [294]:
### we use inplace to change in original file 
var = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var.replace("Bachelors", method = "bfill",limit = 2, inplace = True)

In [270]:
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Masters,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


In [272]:
var

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
...,...,...,...,...,...,...,...,...,...
4648,Masters,2013.0,Bangalore,3.0,26.0,Female,No,4.0,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.0,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.0,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.0,0.0


## Interpolate in Pandas

In [274]:

var.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,Bangalore,3.0,27.0,Male,No,5.0,1.0
4,Masters,2017.0,Pune,3.0,24.0,Male,Yes,2.0,1.0
5,Bachelors,2016.0,Bangalore,3.0,22.0,Male,No,0.0,0.0
6,Bachelors,2015.0,New Delhi,3.0,38.0,Male,No,0.0,0.0
7,Masters,2016.0,Bangalore,3.0,34.0,Female,No,2.0,1.0
8,Masters,2016.0,Pune,3.0,23.0,Male,No,1.0,0.0
9,Masters,2017.0,New Delhi,2.0,37.0,,No,2.0,0.0


In [288]:
import pandas as pd
var2 = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1.csv")
var2.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,,3.0,27.0,Male,No,,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,,1.0
5,Bachelors,2016.0,,3.0,22.0,Male,No,,0.0
6,Bachelors,2015.0,,3.0,38.0,Male,No,,0.0
7,Bachelors,2016.0,,3.0,34.0,Female,No,,1.0
8,Bachelors,2016.0,,3.0,23.0,Male,No,,0.0
9,Masters,2017.0,,2.0,37.0,,No,,0.0


In [293]:
var3 = var2.interpolate()
var3.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,2014.5,,2.0,27.5,,,2.875,1.0
3,Masters,2016.0,,3.0,27.0,Male,No,2.75,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,2.625,1.0
5,Bachelors,2016.0,,3.0,22.0,Male,No,2.5,0.0
6,Bachelors,2015.0,,3.0,38.0,Male,No,2.375,0.0
7,Bachelors,2016.0,,3.0,34.0,Female,No,2.25,1.0
8,Bachelors,2016.0,,3.0,23.0,Male,No,2.125,0.0
9,Masters,2017.0,,2.0,37.0,,No,2.0,0.0


In [297]:
sh = pd.read_csv("/Users/shahbazalam/Desktop/Employee copy1 copy.csv")
sh.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,,3.0,27.0,Male,No,,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,,1.0
5,Bachelors,2016.0,,3.0,22.0,Male,No,,0.0
6,Bachelors,2015.0,,3.0,38.0,Male,No,,0.0
7,Bachelors,2016.0,,3.0,34.0,Female,No,,1.0
8,Bachelors,2016.0,,3.0,23.0,Male,No,,0.0
9,Masters,2017.0,,2.0,37.0,,No,,0.0


In [299]:
sh.interpolate(method='linear')

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.000,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.000,1.0
2,,2014.5,,2.0,27.5,,,2.875,1.0
3,Masters,2016.0,,3.0,27.0,Male,No,2.750,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,2.625,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.000,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.000,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.000,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.000,0.0


In [304]:
import pandas as pd
jd = pd.read_csv("//Users//shahbazalam//Desktop//shahabz.csv")

jd.head(20)

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.0,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.0,1.0
2,,,,,,,,,
3,Masters,2016.0,,3.0,27.0,Male,No,,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,,1.0
5,Bachelors,2016.0,,3.0,22.0,Male,No,,0.0
6,Bachelors,2015.0,,3.0,38.0,Male,No,,0.0
7,Bachelors,2016.0,,3.0,34.0,Female,No,,1.0
8,Bachelors,2016.0,,3.0,23.0,Male,No,,0.0
9,Masters,2017.0,,2.0,37.0,,No,,0.0


In [305]:
jd.interpolate()

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017.0,Bangalore,3.0,34.0,Male,No,0.000,0.0
1,Bachelors,2013.0,Pune,1.0,28.0,,No,3.000,1.0
2,,2014.5,,2.0,27.5,,,2.875,1.0
3,Masters,2016.0,,3.0,27.0,Male,No,2.750,1.0
4,Masters,2017.0,,3.0,24.0,Male,Yes,2.625,1.0
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013.0,Bangalore,3.0,26.0,Female,No,4.000,0.0
4649,Masters,2013.0,Pune,2.0,37.0,Male,No,2.000,1.0
4650,Masters,2018.0,New Delhi,3.0,27.0,Male,No,5.000,1.0
4651,Bachelors,2012.0,Bangalore,3.0,30.0,Male,Yes,2.000,0.0


## Merge

In [307]:
var1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[56,57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,4,5],"C":[11,22,33,44,55]})
pd.merge(var1,var2 )

Unnamed: 0,A,B,C
0,1,56,11
1,2,57,22
2,3,49,33
3,4,39,44
4,5,78,55


In [308]:
var1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[56,57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,4,5],"C":[11,22,33,44,55]})
pd.merge(var1,var2 ,on = "A")

Unnamed: 0,A,B,C
0,1,56,11
1,2,57,22
2,3,49,33
3,4,39,44
4,5,78,55


In [309]:
var1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[56,57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,4,5],"C":[11,22,33,44,55]})
pd.merge(var2,var1 ,on = "A")

Unnamed: 0,A,C,B
0,1,11,56
1,2,22,57
2,3,33,49
3,4,44,39
4,5,55,78


In [312]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 )

Unnamed: 0,A,B,C
0,1,57,22
1,2,49,33
2,3,39,44


In [313]:
var1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[56,57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,4,5],"C":[11,22,33,44,55]})
pd.merge(var1,var2 , how = "inner")

Unnamed: 0,A,B,C
0,1,56,11
1,2,57,22
2,3,49,33
3,4,39,44
4,5,78,55


In [316]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 , how = "left")

Unnamed: 0,A,B,C
0,1,57,22.0
1,2,49,33.0
2,3,39,44.0
3,4,78,


In [317]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 , how = "right")

Unnamed: 0,A,B,C
0,1,57.0,22
1,2,49.0,33
2,3,39.0,44
3,5,,55


In [318]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 , how = "outer")

Unnamed: 0,A,B,C
0,1,57.0,22.0
1,2,49.0,33.0
2,3,39.0,44.0
3,4,78.0,
4,5,,55.0


In [319]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 , how = "left",indicator = True)

Unnamed: 0,A,B,C,_merge
0,1,57,22.0,both
1,2,49,33.0,both
2,3,39,44.0,both
3,4,78,,left_only


In [321]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"C":[22,33,44,55]})
pd.merge(var1,var2 , how = "right",indicator = True)

Unnamed: 0,A,B,C,_merge
0,1,57.0,22,both
1,2,49.0,33,both
2,3,39.0,44,both
3,5,,55,right_only


In [323]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
# pd.merge(var1,var2 , how = "left",indicator = True)
pd.merge(var1, var2)

Unnamed: 0,A,B


In [326]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
pd.merge(var1,var2 , left_index = True,right_index = True)

Unnamed: 0,A_x,B_x,A_y,B_y
0,1,57,1,22
1,2,49,2,33
2,3,39,3,44
3,4,78,5,55


In [329]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
pd.merge(var1,var2 , left_index = True,right_index = True, suffixes = ("_name", "_id"))

Unnamed: 0,A_name,B_name,A_id,B_id
0,1,57,1,22
1,2,49,2,33
2,3,39,3,44
3,4,78,5,55


## Concat

In [330]:
s1 = pd.Series([1,2,3,4])
s2 = pd.Series([11,12,31,41])
pd.concat([s1,s2])

0     1
1     2
2     3
3     4
0    11
1    12
2    31
3    41
dtype: int64

In [333]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
pd.concat([var1,var2])

Unnamed: 0,A,B
0,1,57
1,2,49
2,3,39
3,4,78
0,1,22
1,2,33
2,3,44
3,5,55


In [334]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
pd.concat([var1,var2], axis = 1)

Unnamed: 0,A,B,A.1,B.1
0,1,57,1,22
1,2,49,2,33
2,3,39,3,44
3,4,78,5,55


In [336]:
var1 = pd.DataFrame({"A":[1,2,3,4],"B":[57,49,39,78]})
var2 = pd.DataFrame({"A":[1,2,3,5],"B":[22,33,44,55]})
pd.concat([var1,var2], axis = 0)   #default axis = 0

Unnamed: 0,A,B
0,1,57
1,2,49
2,3,39
3,4,78
0,1,22
1,2,33
2,3,44
3,5,55


In [341]:
var1 = pd.DataFrame({"A":[1,2,3,5,7,8],"B":[49,39,78,22,11,33]})
var2 = pd.DataFrame({"A":[1,2,],"B":[33,55]})
pd.concat([var1,var2], axis = 1)

Unnamed: 0,A,B,A.1,B.1
0,1,49,1.0,33.0
1,2,39,2.0,55.0
2,3,78,,
3,5,22,,
4,7,11,,
5,8,33,,


In [342]:
var1 = pd.DataFrame({"A":[1,2,3,5,7,8],"B":[49,39,78,22,11,33]})
var2 = pd.DataFrame({"A":[1,2,],"B":[33,55]})
pd.concat([var1,var2], axis = 1, join = "outer")

Unnamed: 0,A,B,A.1,B.1
0,1,49,1.0,33.0
1,2,39,2.0,55.0
2,3,78,,
3,5,22,,
4,7,11,,
5,8,33,,


In [344]:
var1 = pd.DataFrame({"A":[1,2,3,5,7,8],"B":[49,39,78,22,11,33]})
var2 = pd.DataFrame({"A":[1,2,],"B":[33,55]})
pd.concat([var1,var2], axis = 1, join = "inner")

Unnamed: 0,A,B,A.1,B.1
0,1,49,1,33
1,2,39,2,55


In [346]:
var1 = pd.DataFrame({"A":[1,2,3,5,7],"B":[49,39,78,22,11]})
var2 = pd.DataFrame({"A":[1,2,4,7,9],"B":[33,55,66,77,88]})
pd.concat([var1,var2], axis = 1, keys = ["D1","D2"])

Unnamed: 0_level_0,D1,D1,D2,D2
Unnamed: 0_level_1,A,B,A,B
0,1,49,1,33
1,2,39,2,55
2,3,78,4,66
3,5,22,7,77
4,7,11,9,88


In [347]:
var1 = pd.DataFrame({"A":[1,2,3,5,7],"B":[49,39,78,22,11]})
var2 = pd.DataFrame({"A":[1,2,4,7,9],"B":[33,55,66,77,88]})
pd.concat([var1,var2], axis = 0, keys = ["D1","D2"])

Unnamed: 0,Unnamed: 1,A,B
D1,0,1,49
D1,1,2,39
D1,2,3,78
D1,3,5,22
D1,4,7,11
D2,0,1,33
D2,1,2,55
D2,2,4,66
D2,3,7,77
D2,4,9,88
