---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.6</h1>

## _Modifying Data in Dataframes.ipynb_

## Learning agenda of this notebook

1. How to add a row/column in a dataframe (unconditionally)
2. How to delete a row/column from a dataframe (unconditionally)
3. How to conditionally perform above operations
4. How to iterate a dataframe
5. How to write a dataframe to a file

## 1. How to add a row/column in a dataframe (unconditionally)

### a. Create a Simple Dataframe to perform the tasks

In [None]:
import pandas as pd
d1 = {
    'name': ['Arif', 'Hadeed', 'Mujahid'],
    'math': [80, 82, 90],
    'english': [70, 69, 74],
    'urdu': [87, 71, 66],
}
# Pass this Dictionary of Python Lists to pd.Dataframe()
df = pd.DataFrame(data=d1)
df

### b. Add a Column in the Dataframe

In [None]:
# Create a new column named 'Total', having sum of the three subjects marks of each student
df['Total'] = df['math'] + df['english'] + df['urdu']
df

### c. Add a Row in the Dataframe

In [None]:
# To add a new row in above dataframe, we have to first create a series
# Create a series
my_dict = {
    "name" : 'Maaz', 
    "math" : 85, 
    "english" : 85,
    "urdu" : 85
}
new_row = pd.Series(my_dict, name=3)
df = df.append(new_row, ignore_index=True)
df


## 2. How to delete a row/column in a dataframe

### a. Delete a Column in the Dataframe

In [None]:
# Let us drop the Total Column from our dataframe
df.drop('Total', axis=1, inplace=True)
df

### b. Delete a Row in the Dataframe

In [None]:
df.drop(3, axis=0, inplace=True)
df

## 3. How to add/delete a row/column in a Dataframe (conditionally)

### a. Add a Column in the Dataframe Based on a Specific Condition

In [None]:
import pandas as pd
d1 = {
    'name': ['Arif', 'Hadeed', 'Mujahid'],
    'math': [80, 82, 90],
    'english': [70, 69, 74],
    'urdu': [87, 71, 66],
}
# Pass this Dictionary of Python Lists to pd.Dataframe()
df = pd.DataFrame(data=d1)
df['Total'] = df['math'] + df['english'] + df['urdu']
df

In [None]:
# Create a new column named 'Grade', having overall grades of the three subjects based on their total marks 

# Add a new column named 'Grade'
df['Grade'] = ['A' if x >= 230 else 'B' if x >= 220 else 'F' for x in df['Total']]
df

### b. Add a Row in the Dataframe Based on a Specific Condition

In [None]:
# You can add row at your desired position using concat method with slicing operator

import numpy as np
import pandas as pd

my_dict1 = {
    "name" : 'Mohid', 
    "math" : 80, 
    "english" : 80,
    "urdu" : 80
}

# Converting dict into dataframe
df1 = pd.DataFrame([my_dict1])
# placing the new dataframe row at your desird location using slicing operator
df = pd.concat([df[:1], df1, df[1:]], ignore_index = True)

df

### c. Delete a Column in the Dataframe Based on Specific Condition

In [None]:
# Let us drop a column from the data frame, if it contains more than 2 NaN values
# It will delete the Total column

df.drop(df.columns[df.apply(lambda col: col.isnull().sum() > 2)], axis=1)

### d. Delete a Row in the Dataframe Based on Specific Condition

In [None]:
# Let us drop an entire row from the data frame, in which name is 'Maaz'
# Get the indices where name == 'Maaz' using the .index function
count = df[df['name'] == 'Maaz'].index

# Pass those indices to the drop function to delete those rows
df.drop(count, inplace = True)
df

## 4. How to Iterate a Dataframe

### a. Create a Dataframe

In [1]:
import pandas as pd
import numpy as np

# Let us create a dataframe and name the row and column labels of our choice
arr2 = np.random.randint(10,100, size= (6,5))
df = pd.DataFrame(data=arr2, columns=['Col1', 'Col2', 'Col3', 'Col4', 'Col5'])
df

Unnamed: 0,Col1,Col2,Col3,Col4,Col5
0,68,33,61,62,26
1,24,30,91,46,11
2,91,22,70,90,39
3,17,14,35,73,82
4,98,23,55,57,26
5,32,40,76,62,45


### b. Use index and subscript operator

In [3]:
df.index

RangeIndex(start=0, stop=6, step=1)

In [5]:
# Iterate over all the rows of a dataframe using index and printing using [] operator
for index in df.index:
    print(df['Col1'][index])

68
24
91
17
98
32


In [6]:
for index in df.index:
    print(df['Col1'][index], df['Col4'][index])

68 62
24 46
91 90
17 73
98 57
32 62


### c. Use index and loc

In [8]:
# Iterate over all the rows of a dataframe using index and printing using loc method
for index in range(len(df)):
    print(df.loc[index, 'Col1'], df.loc[index, 'Col4'])

68 62
24 46
91 90
17 73
98 57
32 62


### d. Use index and iloc

In [10]:
# Iterate over all the rows of a dataframe using index and printing using iloc
for index in range(len(df)):
    print(df.iloc[index, 0], df.iloc[index, 3])   # instead of column name give col index starting from zero

68 62
24 46
91 90
17 73
98 57
32 62


### e. Use iterrows() Function

In [12]:
# Iterate over all the rows of a dataframe using iterrows() function
for index, row in df.iterrows(): 
            print(row['Col1'], row['Col2'], row['Col3'], row['Col4'], row['Col5'])

68 33 61 62 26
24 30 91 46 11
91 22 70 90 39
17 14 35 73 82
98 23 55 57 26
32 40 76 62 45


### f. Use itertuples() Function

In [13]:
# Iterate over all the rows of a dataframe using df.itertuples() and builtin getattr() function
for row in df.itertuples():
    print(getattr(row, 'Col1'), getattr(row,'Col4'))

68 62
24 46
91 90
17 73
98 57
32 62


### g. Use apply() Function

In [16]:
print(df.apply(lambda row: row['Col1'], axis=1))

0    68
1    24
2    91
3    17
4    98
5    32
dtype: int64


In [55]:
#EX2: Iterating through the first two rows
for index, row in df.iterrows(): 
        while index < 2:
            print("index = ",index)
            print(row, "\n")
            break

index =  0
Col1    59
Col2    90
Col3    30
Col4    77
Col5    64
Name: 0, dtype: int64 

index =  1
Col1    52
Col2    80
Col3    57
Col4    70
Col5    81
Name: 1, dtype: int64 



In [56]:
#EX3: Print values of Col2 only
for index, row in df.iterrows():
        print(df['Col2'].iloc[index])  

90
80
26
78
46
93


### c. Change the Values inside a Column Conditionally using iterrrows() Function

In [57]:
#EX4: Change the values in Col2 conditionally
df
for index, row in df.iterrows():   
    val = row['Col2']
    if val < 50:
        df['Col2'].iloc[index] = 0 
        #df.at[index,'Col2'] = 0 
    else:
        df['Col2'].iloc[index] = 1
        #df.at[index,'Col2'] = 1
df

Unnamed: 0,Col1,Col2,Col3,Col4,Col5
0,59,1,30,77,64
1,52,1,57,70,81
2,79,0,73,62,46
3,72,1,95,95,58
4,44,0,73,99,42
5,33,1,55,78,83


### d. Change the Values inside a Column Conditionally using apply() Function

In [59]:
#EX5: Change the values in Col4 conditionally
# Another way to perform the above function is using lambda function in apply method
# apply() method allows the users to pass a function and apply it on every single value of that specific column
df['Col5'] = df['Col5'].apply(lambda x : 0 if x < 60 else 1)

df

Unnamed: 0,Col1,Col2,Col3,Col4,Col5
0,59,1,30,1,1
1,52,1,57,1,1
2,79,0,73,1,0
3,72,1,95,1,0
4,44,0,73,1,0
5,33,1,55,1,1


## 5. How to Write a Dataframe to a File

In [None]:
# Create a simple dataframe
import pandas as pd

d1 = {
    'name': ['Arif', 'Hadeed', 'Mujahid', 'Rauf', 'Maaz'],
    'math': [80, 82, 90, 81, 75],
    'english': [70, 69, 74, 80, 79],
    'urdu': [87, 71, 66, 85, 65],
}
# Pass this Dictionary of Python Lists to pd.Dataframe()
df = pd.DataFrame(data=d1)
df

### a- Writing Dataframe to a CSV file

In [None]:
# writing this data to a CSV file using .to_csv function
df.to_csv('datasets/mydata.csv', sep=',')

In [None]:
# reading the csv file
df1 = pd.read_csv('datasets/mydata.csv')
df1

In [None]:
# mention the index attribute false to prevent that unnamed column
df.to_csv('datasets/mydata.csv', sep=',', index=False)

In [None]:
# reading the csv file again
df1 = pd.read_csv('datasets/mydata.csv')
df1

In [None]:
# You can also save data separated with tab instead of comma separated
df.to_csv('datasets/mydata.csv', sep='\t', index=False)

In [None]:
df1 = pd.read_csv('datasets/mydata.csv', sep='\t')
df1

### b. Writing Dataframe to Excel File

In [None]:
# Use to_excel method
df.to_excel('datasets/mydata.xlsx', sheet_name = 'Students', index=False)

In [None]:
# Reading from csv
df1 = pd.read_excel('datasets/mydata.xlsx', sheet_name = 'Students')
df1