**Topic 2: Python Pandas Data Frame**

Pandas DataFrame is a widely used data structure which works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data that has two different indexes, i.e., row index and column index. It consists of the following properties:

The columns can be heterogeneous types like int, bool, and so on.


It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.

**Parameter & Description:**

**data:** It consists of different forms like ndarray, series, map, constants, lists, array.

**index:** The Default np.arrange(n) index is used for the row labels if no index is passed.

**columns:** The default syntax is np.arrange(n) for the column labels. It shows only true if no index is passed.


**dtype:** It refers to the data type of each column.

**copy():** It is used for copying the data.

**Create a DataFrame**

We can create a DataFrame using following ways:

1. empty
2. list
3. dict
4. Numpy ndarrrays
5. Series


**1: Create an empty DataFrame**

In [1]:
import pandas as pd
df = pd.DataFrame()
print (df)

Empty DataFrame
Columns: []
Index: []


This code will create an empty DataFrame with no columns and no rows. You can verify that the DataFrame is empty.

**2: Create a DataFrame using List:**

**Method 01  
To create DataFrame**

In [2]:
import pandas as pd

names=['fahim' , 'Ali' , 'Muzamil']
df=pd.DataFrame(names)

print("DataFrame: \n" , df)

DataFrame: 
          0
0    fahim
1      Ali
2  Muzamil


In [3]:
import pandas as pd

a = ['zeeshan','rooshiii','khan','imran']
df = pd.DataFrame(a)

print("DataFrame: \n" , df)

DataFrame: 
           0
0   zeeshan
1  rooshiii
2      khan
3     imran


**Method 02
  To create the DataFrame**

In [4]:
names = ["John", "Jane", "Mary"]
ages = [25, 23, 20]
address = ['Lahore','Islamabad','bahawalpur']

df = pd.DataFrame(zip(names, ages,address))

print(df)

      0   1           2
0  John  25      Lahore
1  Jane  23   Islamabad
2  Mary  20  bahawalpur


In [5]:
import pandas as pd

x = ['Python', 'Pandas' , 'Numpy']
frame = {'Libraries' : ['Python' , 'Pandas' , 'Numpy']}
df = pd.DataFrame(frame)

print(df)

  Libraries
0    Python
1    Pandas
2     Numpy


### Another Method

In [6]:
import pandas as pd

a = [11 , 22 , 33 , 44 , 55]
frame = {'values' : a}
df = pd.DataFrame(frame)
print(df)

   values
0      11
1      22
2      33
3      44
4      55


In [7]:
import pandas as pd

x = ['Python', 'Pandas', 'Numpy']
df = pd.DataFrame(x , columns=['Values'])

print(df)

   Values
0  Python
1  Pandas
2   Numpy


### Another Method

In [8]:
import pandas as pd
a = ['java','C++','C sharp','python']
df = pd.DataFrame(a , columns=['Languages'])
print(df)


  Languages
0      java
1       C++
2   C sharp
3    python


**3. Create a DataFrame from Dict of ndarrays/ Lists:**

In [9]:
import pandas as pd
info ={'name' : ['Fahim','Ali','Abdullah'],   'Department' :['B.Sc','B.Tech','M.Tech']
       ,  'ID' : [101, 102, 103]}
df = pd.DataFrame(info)
print (df)

       name Department   ID
0     Fahim       B.Sc  101
1       Ali     B.Tech  102
2  Abdullah     M.Tech  103


In [10]:
import pandas as pd
a = {'name' : ['zeeshan','ali','rooshan','khan'] , 'class' : [8,9,11,12] , 'RollNo' : [1,2,3,4]}
frame = pd.DataFrame(a)
print(frame)

      name  class  RollNo
0  zeeshan      8       1
1      ali      9       2
2  rooshan     11       3
3     khan     12       4


**4. Create a DataFrame using Numpy ndarrrays**

**Example 1: Creating a DataFrame with a single-dimensional ndarray**

In [11]:
import numpy as np
import pandas as pd

# Create a 1D NumPy ndarray with values
data = np.array([10, 20, 30, 40, 50])

# Create a DataFrame from the ndarray
df = pd.DataFrame(data,columns=['Values'])

print(df)


   Values
0      10
1      20
2      30
3      40
4      50


In [12]:
import numpy as np
import pandas as pd

# Create a 1D NumPy ndarray with values
data = np.array([10, 20, 30, 40, 50])
frame ={'y':data}
# Create a DataFrame from the ndarray
df = pd.DataFrame(frame)

print(df)

    y
0  10
1  20
2  30
3  40
4  50


**Example 2: Creating a DataFrame with multiple one-dimensional ndarrays**

In [13]:
import numpy as np
import pandas as pd
a = np.array([6,7,8])
b = np.array([11,12,13])
df = pd.DataFrame({'Col 1' : a , 'Col 2' : b})
print(df)

   Col 1  Col 2
0      6     11
1      7     12
2      8     13


In [14]:
import numpy as np
import pandas as pd
a = np.array([6,7,8])
b = np.array([11,12,13])
df = pd.DataFrame({'Col 1':a, 'Col 2':b})
print(df)

   Col 1  Col 2
0      6     11
1      7     12
2      8     13


**Example 3: Creating a DataFrame with a two-dimensional ndarray**

In [15]:
import numpy as np
import pandas as pd

# Create a 2D NumPy ndarray with values
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Create a DataFrame from the ndarray
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])

print(df)


   Column1  Column2  Column3
0        1        2        3
1        4        5        6
2        7        8        9


In [16]:
import numpy as np
import pandas as pd

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
frame ={'Values' : data.flatten()}
df = pd.DataFrame(frame)
print(df)

   Values
0       1
1       2
2       3
3       4
4       5
5       6
6       7
7       8
8       9


**Example 4: Creating a DataFrame with multiple two-dimensional ndarrays**

In [17]:
import numpy as np
import pandas as pd
a = np.array([[2, 3, 4], [7, 8, 9]])
b = np.array([[4, 7, 9], [11, 12, 13]])
df = pd.DataFrame({'Col 1': a.flatten(), 'Col 2': b.flatten()})
print(df)


   Col 1  Col 2
0      2      4
1      3      7
2      4      9
3      7     11
4      8     12
5      9     13


**5. Create a DataFrame from Dict of Series:**

**Example 01:  Create one Series with data**

In [18]:
import pandas as pd
import numpy as np

series1 = np.array([[10, 20, 30, 40, 50],[1,2,3,4,5]])
df = pd.DataFrame({'Columns' : series1.flatten()} , index=['a','b','c','d','e','f','g','h','i','j'])
print(df)

   Columns
a       10
b       20
c       30
d       40
e       50
f        1
g        2
h        3
i        4
j        5


In [19]:
import pandas as pd
import numpy as np

data = np.array([[10, 20, 30, 40, 50], [1, 2, 3, 4, 5]])
#index = [1, 2, 3, 4, 5,6,7,8,9,0]
df = pd.DataFrame({'Col': data.flatten()}, index=[1, 2, 3, 4, 5,6,7,8,9,0])
print(df)

   Col
1   10
2   20
3   30
4   40
5   50
6    1
7    2
8    3
9    4
0    5


In [20]:
import pandas as pd
series1 = pd.Series([10, 20, 30, 40, 50],index=[1, 2, 3, 4, 5])
df = pd.DataFrame(series1,columns=['Values'])
print(df)

   Values
1      10
2      20
3      30
4      40
5      50


**Example 02:  Create three Series with data**

In [21]:
import pandas as pd

Series1 = pd.Series(['Waqas','Mudasir','Shahzad'], index =['a', 'b', 'c'])
Series2 = pd.Series([27, 30, 40], index =['a', 'b', 'c'])
Series3 = pd.Series([1997,1993,1983], index =['a', 'b', 'c'])

df = pd.DataFrame({'Names':Series1,'Ages':Series2,'Years':Series3})
df

Unnamed: 0,Names,Ages,Years
a,Waqas,27,1997
b,Mudasir,30,1993
c,Shahzad,40,1983


**Example 3: Create  Series with Dictionary**

In [22]:
# importing the pandas library
import pandas as pd

info = {'one' :pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
        'two' :pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}
df = pd.DataFrame(info)
print (df)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  6.0    6
g  NaN    7
h  NaN    8


In [23]:
import pandas as pd

info =pd.DataFrame({'one' :pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
        'two' :pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])})
print (info)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  6.0    6
g  NaN    7
h  NaN    8


**Column Selection:**

In [24]:
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}
df = pd.DataFrame(info)
print(df)
print()
print(df['two'])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  6.0    6
g  NaN    7
h  NaN    8

a    1
b    2
c    3
d    4
e    5
f    6
g    7
h    8
Name: two, dtype: int64


**Column Addition:**

In [25]:

import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print(df)

print()

print ("Add new column by passing series")
df['three']=pd.Series([20,40,60] , index=['a','b','c'])
print (df)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  NaN    6

Add new column by passing series
   one  two  three
a  1.0    1   20.0
b  2.0    2   40.0
c  3.0    3   60.0
d  4.0    4    NaN
e  5.0    5    NaN
f  NaN    6    NaN


In [26]:
print ("Add new column using existing DataFrame columns")
df['four']=pd.Series([5 ,6, 7,8],index = ['a','b','c','d'] )
print (df)

Add new column using existing DataFrame columns
   one  two  three  four
a  1.0    1   20.0   5.0
b  2.0    2   40.0   6.0
c  3.0    3   60.0   7.0
d  4.0    4    NaN   8.0
e  5.0    5    NaN   NaN
f  NaN    6    NaN   NaN


In [27]:
#print ("Add new column using existing DataFrame columns")
#df = {'five'=pd.serise(df['three']:[1,2,3,4,5] + df['four']:[6,7,8,9,0])}
#print (df)

In [28]:
df['five']=df['three'] + df['four']
df

Unnamed: 0,one,two,three,four,five
a,1.0,1,20.0,5.0,25.0
b,2.0,2,40.0,6.0,46.0
c,3.0,3,60.0,7.0,67.0
d,4.0,4,,8.0,
e,5.0,5,,,
f,,6,,,


**Column Deletion:**

In [29]:
import pandas as pd

info = {'one' : pd.Series([1, 2], index= ['a', 'b']),'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)

# using del function
print ("Delete the first column:")
del df['one']
print (df)
# using pop function
print ("Delete the another column:")
df.pop('two')
print (df)

The DataFrame:
   one  two
a  1.0    1
b  2.0    2
c  NaN    3
Delete the first column:
   two
a    1
b    2
c    3
Delete the another column:
Empty DataFrame
Columns: []
Index: [a, b, c]


In [30]:
import pandas as pd
data={'one': pd.Series([1,9], index=['a','b']),'two': pd.Series([2,8] , index=['c','d']),'three': pd.Series([3,7], index=['e','f'])}
datafram=pd.DataFrame(data)
print("Dataframe of data : \n",datafram)
# delete first column using Del function
print("delete the first row : \n")
del datafram['one']
print(datafram)
# delete another column using pop function.
print("delete Secound column of dataframe")
datafram.pop('two')
print(datafram)


Dataframe of data : 
    one  two  three
a  1.0  NaN    NaN
b  9.0  NaN    NaN
c  NaN  2.0    NaN
d  NaN  8.0    NaN
e  NaN  NaN    3.0
f  NaN  NaN    7.0
delete the first row : 

   two  three
a  NaN    NaN
b  NaN    NaN
c  2.0    NaN
d  8.0    NaN
e  NaN    3.0
f  NaN    7.0
delete Secound column of dataframe
   three
a    NaN
b    NaN
c    NaN
d    NaN
e    3.0
f    7.0


**Remove Clumnn using Drop() function**

In [31]:
import pandas as pd

df = pd.DataFrame({'Values': [1, 2, 3, 4]})
print(df)

df = df.drop('Values',axis=1)

print(df)

   Values
0       1
1       2
2       3
3       4
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]


In [32]:
import pandas as pd

info = {'one' : pd.Series([1, 2], index = ['a', 'b']), 'two' : pd.Series([1, 2, 3], index = ['a', 'b', 'c'])}

df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)

# using del function
print ("Delete the first column:")
del df['one']
print (df)
print()
del df['two']
print (df)

The DataFrame:
   one  two
a  1.0    1
b  2.0    2
c  NaN    3
Delete the first column:
   two
a    1
b    2
c    3

Empty DataFrame
Columns: []
Index: [a, b, c]


**Row Selection, Addition, and Deletion:**

**Row Selection:**

In [33]:
# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)
print(df)
print()
print (df.loc['f'])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  NaN    6

one    NaN
two    6.0
Name: f, dtype: float64


In [34]:
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print(df)
print()
print("Selection by integer location ")  # integers count the indexes
print (df.iloc[4])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  NaN    6

Selection by integer location 
one    5.0
two    5.0
Name: e, dtype: float64


In [35]:
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print(df)
print()
print("Slice Rows")
print (df[2:6])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  NaN    6

Slice Rows
   one  two
c  3.0    3
d  4.0    4
e  5.0    5
f  NaN    6


In [36]:
import pandas as pd

# Creating a DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],'Age': [25, 30, 35],'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)
print(df)
print()
# Deleting the first row
print(df.drop(0))
print()
print(df.drop(1))


    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   35     Paris

    Name  Age    City
1  Alice   30  London
2    Bob   35   Paris

   Name  Age      City
0  John   25  New York
2   Bob   35     Paris


## Follow My Github Account : https://github.com/ZeshanFareed