# Series

The first main data type we will learn about for pandas is the Series data type. Let’s import
Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object).
What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning
it can be indexed by a label, instead of just a number location. It also doesn’t need to hold numeric
data, it can hold any arbitrary Python Object.
Let’s explore this concept through some examples:

-----------------------------------------------------------------------------
Series:-
-----------------------------------------------------------------------------
A "Series" typically refers to a one-dimensional labeled array-like data structure.

It is part of the pandas library, which is widely used for data analysis and manipulation in Python.

A Series can hold various types of data (integers, floating-point numbers, strings, etc.).

Each value in a Series is associated with an index, which can be used to access or manipulate the data

-----------------------------------------------------------------------------
Array:
-----------------------------------------------------------------------------
An "Array" usually refers to a collection of elements of the same data type, organized in a grid of rows and columns.

In the context of Python, "array" can refer to the built-in array module, but it's more commonly associated with the numpy library.

numpy arrays are more efficient for numerical computations and are widely used for scientific computing and mathematical operations.

In [None]:
import numpy as np
import pandas as pd

__0.0.1 Creating a Series__

You can convert a list,numpy array, or dictionary to a Series:

In [None]:
labels = ['a','b','c','d']
my_list = [10,20,30,40]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

In [None]:
pd.Series()

Series([], dtype: object)

In [3]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [6]:
pd.Series(data=my_list,index=labels)

ValueError: Length of values (3) does not match length of index (4)

In [72]:
pd.Series(my_list)

0    10
1    20
2    30
dtype: int64

In [73]:
pd.Series(labels)

0    a
1    b
2    c
dtype: object

In [74]:
pd.Series(my_list,labels)

a    10
b    20
c    30
dtype: int64

In [75]:
pd.Series(labels,my_list)

10    a
20    b
30    c
dtype: object

__NumPy Arrays__

In [76]:
arr

array([10, 20, 30])

In [77]:
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [78]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

__Dictionary__

In [79]:
d

{'a': 10, 'b': 20, 'c': 30}

In [80]:
pd.Series(d)

a    10
b    20
c    30
dtype: int64

__0.1 Using an Index__

The key to using a Series is understanding its index. Pandas makes use of these index names or
numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let’s see some examples of how to grab information from a Series. Let us create two sereis,
ser1 and ser2:

In [81]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [82]:
ser2 = pd.Series([1,2,5,4],index = ['Japan','USA', 'Italy','Germany' ])
ser2

Japan      1
USA        2
Italy      5
Germany    4
dtype: int64

In [83]:
ser1+ser2

Germany    6.0
Italy      NaN
Japan      5.0
USA        3.0
USSR       NaN
dtype: float64

<!-- Germany    6.0   # 2 + 4
     Italy      NaN   # Not present in ser1
     Japan      5.0   # 1 + 4
     USA        3.0   # 1 + 2
     USSR       NaN   # Not present in ser2
     dtype: float64 -->


Let’s stop here for now and move on to DataFrames, which will expand on the concept of
Series! # Great Job!

# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same
index. Let’s use pandas to explore this topic!

In [84]:
'A B C D E'.split()

['A', 'B', 'C', 'D', 'E']

In [85]:
data = [[1,2,3,4], [5,6,7,8], [9,10,11,12],[13,14,15,16],[17,18,19,20]]

In [86]:
df = pd.DataFrame(data, index='A B C D E'.split(), columns=['W', 'X', 'Y', 'Z'])

In [87]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16
E,17,18,19,20


__1.1 Selection and Indexing__

Let’s learn the various methods to grab data from a DataFrame

In [88]:
df['X']

A     2
B     6
C    10
D    14
E    18
Name: X, dtype: int64

In [89]:
type(df['X'])

pandas.core.series.Series

In [90]:
list(df.index)

['A', 'B', 'C', 'D', 'E']

In [91]:
list(df.index)

['A', 'B', 'C', 'D', 'E']

In [92]:
df.columns

Index(['W', 'X', 'Y', 'Z'], dtype='object')

In [93]:
list(df.columns)

['W', 'X', 'Y', 'Z']

In [94]:
df[['X','Y']]

Unnamed: 0,X,Y
A,2,3
B,6,7
C,10,11
D,14,15
E,18,19


In [95]:
df[['X','Z']]

Unnamed: 0,X,Z
A,2,4
B,6,8
C,10,12
D,14,16
E,18,20


In [96]:
# Pass a list of column names
test= df[['X','W','Z']]
test

Unnamed: 0,X,W,Z
A,2,1,4
B,6,5,8
C,10,9,12
D,14,13,16
E,18,17,20


Creating a new column:


In [97]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16
E,17,18,19,20


In [98]:
df['W'] + df['Y'] + df['X'] + df['Z']

A    10
B    26
C    42
D    58
E    74
dtype: int64

In [99]:
df['new'] = df['W'] + df['Y'] + df['X'] + df['Z']
df['new1'] = [9,8,7,6,4]

In [100]:
df

Unnamed: 0,W,X,Y,Z,new,new1
A,1,2,3,4,10,9
B,5,6,7,8,26,8
C,9,10,11,12,42,7
D,13,14,15,16,58,6
E,17,18,19,20,74,4


In [101]:
#drop()

__Removing Columns__

In [102]:
df.drop('new',axis=1)

Unnamed: 0,W,X,Y,Z,new1
A,1,2,3,4,9
B,5,6,7,8,8
C,9,10,11,12,7
D,13,14,15,16,6
E,17,18,19,20,4


In [103]:
df

Unnamed: 0,W,X,Y,Z,new,new1
A,1,2,3,4,10,9
B,5,6,7,8,26,8
C,9,10,11,12,42,7
D,13,14,15,16,58,6
E,17,18,19,20,74,4


In [104]:
df.drop('new',axis=1,inplace=True)
df

Unnamed: 0,W,X,Y,Z,new1
A,1,2,3,4,9
B,5,6,7,8,8
C,9,10,11,12,7
D,13,14,15,16,6
E,17,18,19,20,4


In [105]:
df = df.drop('new1',axis=1)
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16
E,17,18,19,20


Can also drop rows this way:


In [106]:
df.drop('E',axis=0)

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [107]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16
E,17,18,19,20


In [108]:
df.drop('E',axis=0,inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [109]:
df['Y']

A     3
B     7
C    11
D    15
Name: Y, dtype: int64

__Selecting Rows__

In [110]:
df.loc['A']

W    1
X    2
Y    3
Z    4
Name: A, dtype: int64

Or select based off of position instead of label

In [111]:
df.iloc[0]

W    1
X    2
Y    3
Z    4
Name: A, dtype: int64

__Selecting subset of rows and columns__

In [112]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [113]:
df.loc['B','Y']

np.int64(7)

In [114]:
df.loc[['A','B'],['W','Y']]

Unnamed: 0,W,Y
A,1,3
B,5,7


__1.1.1 Conditional Selection__

An important feature of pandas is conditional selection using bracket notation, very similar to
numpy:

In [115]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [116]:
df>10

Unnamed: 0,W,X,Y,Z
A,False,False,False,False
B,False,False,False,False
C,False,False,True,True
D,True,True,True,True


In [117]:
df[df>10]

Unnamed: 0,W,X,Y,Z
A,,,,
B,,,,
C,,,11.0,12.0
D,13.0,14.0,15.0,16.0


In [118]:
df['Y']>10

A    False
B    False
C     True
D     True
Name: Y, dtype: bool

In [119]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [120]:
df[df['Y']>10]['W']

C     9
D    13
Name: W, dtype: int64

In [121]:
df[df['Y']>10]['W']

C     9
D    13
Name: W, dtype: int64

In [122]:
df[df['Y']>10]['X']

C    10
D    14
Name: X, dtype: int64

In [123]:
df[df['Y']>10][['Y','X']]

Unnamed: 0,Y,X
C,11,10
D,15,14


In [124]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


For two conditions you can use | and & with parenthesis:

In [125]:
df['Y']

A     3
B     7
C    11
D    15
Name: Y, dtype: int64

In [133]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [126]:
df[(df['W']>12) & (df['Y'] > 15)]

Unnamed: 0,W,X,Y,Z


In [127]:
df[(df['W']>12) | (df['Y'] >15)]

Unnamed: 0,W,X,Y,Z
D,13,14,15,16


__1.2 More Index Details__

Let’s discuss some more features of indexing, including resetting the index or setting it something
else. We’ll also talk about index hierarchy!

In [128]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [129]:
# Reset to default 0,1...n index
df.reset_index()

Unnamed: 0,index,W,X,Y,Z
0,A,1,2,3,4
1,B,5,6,7,8
2,C,9,10,11,12
3,D,13,14,15,16


In [130]:
df

Unnamed: 0,W,X,Y,Z
A,1,2,3,4
B,5,6,7,8
C,9,10,11,12
D,13,14,15,16


In [131]:
newind = 'CA NY WY OR CO'.split()
newind

['CA', 'NY', 'WY', 'OR', 'CO']

In [132]:
df['States'] = newind
df

ValueError: Length of values (5) does not match length of index (4)

In [None]:
df.set_index('States')

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,2,3,4
NY,5,6,7,8
WY,9,10,11,12
OR,13,14,15,16
CO,17,18,19,20


In [None]:
df.set_index('States',inplace=True)
df

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,2,3,4
NY,5,6,7,8
WY,9,10,11,12
OR,13,14,15,16
CO,17,18,19,20


__1.3 Multi-Index and Index Hierarchy__

Let us go over how to work with Multi-Index, first we’ll create a quick example of what a MultiIndexed DataFrame would look like:

In [None]:
# Index Levels
hier_index = [('G1', 1), ('G1', 2), ('G1', 3), ('G2', 4), ('G2', 5), ('G2', 6)]
#print(hier_index)
hier_index = pd.MultiIndex.from_tuples(hier_index)
print(hier_index)

MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 4),
            ('G2', 5),
            ('G2', 6)],
           )


In [None]:
data1 = [[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]]

In [None]:
df = pd.DataFrame(data1,index=hier_index,columns=['A','B'])
df

Unnamed: 0,Unnamed: 1,A,B
G1,1,1,2
G1,2,3,4
G1,3,5,6
G2,4,7,8
G2,5,9,10
G2,6,11,12


Now let’s show how to index this! For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:

In [None]:
df.loc['G2']

Unnamed: 0,A,B
4,7,8
5,9,10
6,11,12


In [None]:
df.loc['G2'].loc[6]

A    11
B    12
Name: 6, dtype: int64

In [None]:
df.index.name=

MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 4),
            ('G2', 5),
            ('G2', 6)],
           )

In [None]:
df.index.names= ["Group",'myindex']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
Group,myindex,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,1,2
G1,2,3,4
G1,3,5,6
G2,4,7,8
G2,5,9,10
G2,6,11,12


In [None]:
df.index.names = ['Group','Num']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,1,2
G1,2,3,4
G1,3,5,6
G2,4,7,8
G2,5,9,10
G2,6,11,12


# Index preservation and Index alignment

__1 Index Preservation__

Pandas is designed to work with NumPy, any NumPy ufunc will work on pandas Series and
DataFrame objects.

Lets start by defining a simple Series and DataFrame

In [None]:
ser = pd.Series([8,8,5,6])
ser

0    8
1    8
2    5
3    6
dtype: int64

In [None]:
type(ser)

pandas.core.series.Series

Applying operation on series

If we apply a NumPy ufunc on either of these objects, the result will be another Pandas object
with the indices preserved:

In [None]:
np.exp(ser)

0    2980.957987
1    2980.957987
2     148.413159
3     403.428793
dtype: float64

__2 Index Alignment__

For binary operations on two Series or DataFrame objects, Pandas will align indices in the process
of performing the operation. This is very convenient when working with incomplete data, as we’ll
see in some of the examples below.

In [None]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': 19651127},name='population')

In [None]:
area

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64

In [None]:
population

California    38332521
Texas         26448193
New York      19651127
Name: population, dtype: int64

Let’s Divide and see what happens


In [None]:
ser1 = population / area
ser1

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

Any item for which one or the other does not have an entry is marked by NaN, or “Not a
Number”, which is how Pandas marks missing data

In [None]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])

In [None]:
A

0    2
1    4
2    6
dtype: int64

In [None]:
B

1    1
2    3
3    5
dtype: int64

In [None]:
A+B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [None]:
#add()

In [None]:
A.add(B,fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

__Dataframe__

In [None]:
data3=[[3,4],[5,6]]
data4=[[1,5,5],[3,9,8],[8,3,4]]

In [None]:
A = pd.DataFrame(data3,columns=list('AB'))
A

Unnamed: 0,A,B
0,3,4
1,5,6


In [None]:
B = pd.DataFrame(data4,columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,1,5,5
1,3,9,8
2,8,3,4


In [None]:
A + B

Unnamed: 0,A,B,C
0,8.0,5.0,
1,14.0,9.0,
2,,,


In [None]:
A.add(B,fill_value=0)

Unnamed: 0,A,B,C
0,8.0,5.0,5.0
1,14.0,9.0,8.0
2,3.0,8.0,4.0


In [None]:
A.add(B, fill_value=np.mean(A.values))

Unnamed: 0,A,B,C
0,8.0,5.0,9.5
1,14.0,9.0,12.5
2,7.5,12.5,8.5


# Missing Data
Let’s show a few convenient methods to deal with Missing Data in pandas:

The first sentinel value used by Pandas is None, a Python singleton object that is often used for
missing data in Python code. Because it is a Python object, None cannot be used in any arbitrary
NumPy/Pandas array, but only in arrays with data type ’object’ (i.e., arrays of Python objects):

__Dropping__

In [None]:
df = pd.DataFrame({'A':[1,2,np.nan],
'B':[5,np.nan,np.nan],
'C':[1,2,3]})
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [None]:
df.dropna()

Unnamed: 0,A,B,C
0,1.0,5.0,1


In [None]:
df.dropna(axis=1)

Unnamed: 0,C
0,1
1,2
2,3


In [None]:
df.dropna(thresh=1)

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [None]:
df.dropna(thresh=2)

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


In [None]:
df.dropna(thresh=3)

Unnamed: 0,A,B,C
0,1.0,5.0,1


__Filling__

In [None]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [None]:
df.fillna(value=0)

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,0.0,2
2,0.0,0.0,3


# Merging, Joining, and Concatenating

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating.
In this lecture we will discuss these 3 methods with examples.

In [None]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])

In [None]:
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [None]:
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [None]:
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


__1.1 Concatenation__

Concatenation basically glues together DataFrames. Keep in mind that dimensions should match
along the axis you are concatenating on. You can use pd.concat and pass in a list of DataFrames
to concatenate together:

In [None]:
#concat()

In [None]:
pd.concat([df1,df2,df3])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


In [None]:
pd.concat([df1,df2,df3],axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1,A.2,B.2,C.2,D.2
0,A0,B0,C0,D0,,,,,,,,
1,A1,B1,C1,D1,,,,,,,,
2,A2,B2,C2,D2,,,,,,,,
3,A3,B3,C3,D3,,,,,,,,
4,,,,,A4,B4,C4,D4,,,,
5,,,,,A5,B5,C5,D5,,,,
6,,,,,A6,B6,C6,D6,,,,
7,,,,,A7,B7,C7,D7,,,,
8,,,,,,,,,A8,B8,C8,D8
9,,,,,,,,,A9,B9,C9,D9


__1.2 Example DataFrames__

In [None]:
left = pd.DataFrame({'key': ['K00', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K01', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
left

Unnamed: 0,key,A,B
0,K00,A0,B0
1,K1,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [None]:
right

Unnamed: 0,key,C,D
0,K01,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3


__1.3 Merging__

The merge function allows you to merge DataFrames together using a similar logic as merging
SQL Tables together. For example:

In [None]:
 pd.merge(left,right,how='inner',on='key')

Unnamed: 0,key,A,B,C,D
0,K1,A1,B1,C1,D1
1,K2,A2,B2,C2,D2
2,K3,A3,B3,C3,D3


In [None]:
pd.merge(left,right,how='outer',on='key')

Unnamed: 0,key,A,B,C,D
0,K00,A0,B0,,
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3
4,K01,,,C0,D0


Or to show a more complicated example:


In [None]:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
left

Unnamed: 0,key1,key2,A,B
0,K0,K0,A0,B0
1,K0,K1,A1,B1
2,K1,K0,A2,B2
3,K2,K1,A3,B3


In [None]:
right

Unnamed: 0,key1,key2,C,D
0,K0,K0,C0,D0
1,K1,K0,C1,D1
2,K1,K0,C2,D2
3,K2,K0,C3,D3


In [None]:
pd.merge(left, right, on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2


In [None]:
pd.merge(left, right, how='outer', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,
5,K2,K0,,,C3,D3


In [None]:
left

Unnamed: 0,key1,key2,A,B
0,K0,K0,A0,B0
1,K0,K1,A1,B1
2,K1,K0,A2,B2
3,K2,K1,A3,B3


In [None]:
right

Unnamed: 0,key1,key2,C,D
0,K0,K0,C0,D0
1,K1,K0,C1,D1
2,K1,K0,C2,D2
3,K2,K0,C3,D3


In [None]:
pd.merge(left, right, how='right', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2
3,K2,K0,,,C3,D3


In [None]:
pd.merge(left, right, how='left', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,


__1.4 Joining__

Joining is a convenient method for combining the columns of two potentially differently-indexed
DataFrames into a single result DataFrame.

In [None]:
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])

In [None]:
left

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [None]:
right

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [None]:
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [None]:
left.join(right, how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3


__2 Appending__

In [None]:
df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df

Unnamed: 0,A,B
0,1,2
1,3,4


In [None]:
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('BA'))
df2

Unnamed: 0,B,A
0,5,6
1,7,8


In [None]:
df

Unnamed: 0,A,B
0,1,2
1,3,4


In [None]:
df.append(df2)

Unnamed: 0,A,B
0,1,2
1,3,4
0,6,5
1,8,7


In [None]:
df.append(df2, ignore_index=True)

Unnamed: 0,A,B
0,1,2
1,3,4
2,6,5
3,8,7


In [None]:
df

Unnamed: 0,A,B
0,1,2
1,3,4


In [None]:
df.to_csv('coimbatore.csv')

In [None]:
df = pd.read_csv('coimbatore.csv')

In [None]:
df.drop('A',axis=1)

Unnamed: 0.1,Unnamed: 0,B,Unnamed: 3
0,0,2,100.0
1,1,4,100.0
2,2,6,
3,3,8,100.0
4,4,10,100.0
5,5,12,
6,6,14,100.0
7,7,16,100.0
