### Groups of Functions in Pandas for Data Analysis

### A. Creating Series and DataFrames

* I believe that we have learnt list and dictionary data structures when we were learning python. Now, we want to learn how to use both list and dictionaries for creating Pandas Series and DataFrames.


**Creating a Pandas Series**

To do anything with pandas, the first thing to do is to import the pandas library as an alias.

* importing pandas package
```c
import pandas as pd
```

* Creating pandas series
```c
series = pd.Series(data)
```

* Creating pandas DataFrame
```c
dataframe = pd.DataFrame(data)
```

In [2]:
# Lets create a pandas Series using a python list

#Step 1: Import pandas package

import pandas as pd

#Step2: Define a list
data = [1,2,3,4,5,6,7,8,9,10]


#Step3: Create the series
series = pd.Series(data)

# lets view the series that we have created
series.head(10)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

In [3]:
# lets confirm to be sure we had created a pandas series
type(series)

pandas.core.series.Series

In [4]:
# Lets create a series using the same list, but now we will be adding our own serial numbering, in python or pandas it is called index
series2 = pd.Series(data, index = ["a","b","c","d","e","f","g","h","i","j"])
series2.head(10)

a     1
b     2
c     3
d     4
e     5
f     6
g     7
h     8
i     9
j    10
dtype: int64

In [10]:
# Lets create a series using python dictionary


#lets create a python dictionary
data2 = {'a': 10, 'b': 20, 'c': 30}

# lets create the series
series3 = pd.Series(data2)
series3.head()

a    10
b    20
c    30
dtype: int64

**Hands on practice**:
1. Create a bucket list of 6 items. Convert the list to pandas series and define index for it using alphabets.
2. Create a simple python dictionary of your biodata with 5 keys and their corresponding values. Convert the dictionary into a pandas series.

In [12]:
# Solution 1:
bucket_list = ['Laptop', 'Smartphone', 'Tablet', 'Smartwatch', 'WiFi Router', 'Charger']
series4 = pd.Series(bucket_list, index=['a', 'b', 'c', 'd', 'e', 'f'])
series4.head(6)

a         Laptop
b     Smartphone
c         Tablet
d     Smartwatch
e    WiFi Router
f        Charger
dtype: object

In [14]:
# Solutio 2:
bio_data = {
    'Name :': 'Olasunkanmi',
    'Age :': 25,
    'Occupation :': 'AI Engineer',
    'Country :': 'Nigeria',
    'Hobbies :': ['Reading', 'Traveling', 'Coding']
}

series5 = pd.Series(bio_data)
series5.head()

Name :                           Olasunkanmi
Age :                                     25
Occupation :                     AI Engineer
Country :                            Nigeria
Hobbies :       [Reading, Traveling, Coding]
dtype: object

**Creating a DataFrame**


  ```c
  import pandas as pd
  ```
* Create your list of list or dictionary
```c
data = []
#or
data = {}
```
* Create the dataframe using this syntax
```c
df = pd.DataFrame(data)
```

In [15]:
# LEts create a dataframe
#Step1: import pandas

import pandas as pd

# Define the data using dictionary that is having its values as a list.

data = {
    'Name': ['Chris', 'Ayo', 'Chisom'],
    'Age': [26, 24, 22],
    'Home_Town': ['Benin', 'Ibadan', 'Enugu']
}

# Lets create the dataframe using "df" as short for dataframe
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [16]:
# lets do the samething by using list of dictionaries
data2 = [
    {'Name': 'Chris', 'Age': 26, 'Home_Town': 'Benin'},
    {'Name': 'Ayo', 'Age': 24, 'Home_Town': 'Ibadan'},
    {'Name': 'Chisom', 'Age': 22, 'Home_Town': 'Enugu'}
]
# LEts define the dataframe
df2 = pd.DataFrame(data2)
df2.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [17]:
# Lets do the sanething again using list of list

data3 = [
    ['Chris', 26, 'Benin'],
    ['Ayo', 24, 'Ibadan'],
    ['Chisom', 22, 'Enugu']
]
df3 = pd.DataFrame(data3, columns=['Name', 'Age', 'Home_Town'])
df3.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [18]:
# lets print the types to be sure we have defined dataframes
print(type(df))
print(type(df2))
print(type(df3))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


**Hands on practice**

* **Creating a dataset:**

Lets create a google sheet, make the link accessible to everyone to input the following information First_Name, Last_Name, Gender, Seat_No, City, Course_Track, PC_make, PC_Os, and Feedback.


[Click here to respond](https://forms.gle/8VQgWmvqQyiPEifY8)

At the end of the collection, we will use the data to practice data manipulation.
---

**To read in datasets we use**
```c
pd.read_csv() # for csv files
```

```c
pd.read_excel() # for excel files
```
**Note**: There are many other methods for reading in different data files based on their extensions. we have .json, .txt, .sql, .html etc. If you are curious you could check them out.


**To save into csv file or excel file**

```c
df.to_csv()

```
To save to excel
```c
df.to_excel()
```
Usecase example
```c
bio_data.to_csv("bio_data.csv", index = False)
```

**C. Data Inspection and Exploration**

To inspect our dataset we will beusing the following python methods
```c
.head() # To view the first 5 rows
```

```c
.tail() # To view the last 5 rows
```

```c
.info() # To check the information about the data
```

```c
.describe() # statistical summary
```

```c
.shape # Check the dimension of the dataset
```

```c
.columns # for checking the column names
```

**D. Data Cleaning**

 Data cleaning involves identifying and handling errors or inconsistencies in your dataset. Later in this course, data cleaning would be handled in datails.

Handling Missing Values

```c
.isna() or .isnull() # Check for missing values
```

```c
.isna().sum()  # Check the total number of all missing values
```

```c
.fillna() # Fill up missing values
```

```c
.dropna() # Drop missing values
```

Finding and Handling Duplicates

Duplicated are repeated rows or columns.

```c
df.duplicated() # This checks if there are duplicates
```

```c
df.drop_duplicated() # This is use for dropping the duplicate  values

Correcting Data Types

In pandas there are two main types of datatypes, "integer" and "Object"

You can check data type using
```c

df.dtype()
```

To convert the type of perform type casting, you use

```c
df.astype() # this takes in the datatype you want to convert it to as an argument
```

When working with time or time series dat its important to convert the time to pandas recognized time using

```c
pd.to_datatime() # takes in the data column as an argument
```

In [10]:
import pandas as pd
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')
pd.set_option = {"display.max_columns": 100}
pd.set_option = {"display.max_rows": 100}
pd.set_option = {"display width": 10}


workspace = Path("workspace")
workspace.mkdir(exist_ok=True)
bio_data_df = workspace / "bio_data.csv"


In [11]:
df = pd.read_csv(bio_data_df)
df

Unnamed: 0,Timestamp,First Name,Last Name,Course Track,City,Gender,Seat Number,PC-Make,PC - OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking
5,2025/09/11 12:58:55 PM GMT+1,Abdulmalik,Adedotun,AI,Abeokuta,Male,200,HP,Windows,Enjoying the course so far
6,2025/09/11 12:58:55 PM GMT+1,Naheemot,Adebiyi,AI,Abeokuta,Female,32,DELL,Windows,Grateful for the opportunity to be here.
7,2025/09/11 12:59:00 PM GMT+1,Kanyisola,Fagbayi,AI;Data Science,Lagos,Female,00082,HP,Windows,One chin chin for you for this form
8,2025/09/11 12:59:16 PM GMT+1,Blessing,James,Cyber Security,Nairobi,Female,45678,HP,Windows,Thanks for creating the form.
9,2025/09/11 12:59:28 PM GMT+1,Hannah,Tanimola,AI,Abeokuta,Male,30,HP,Windows,On God


In [12]:
# Renaming the columns
df.columns = ['Timestamp', 'First_Name', 'Last_Name', 'Course_Track', 'City', 'Gender', 'Seat_No', 'PC_Make', 'PC_OS', 'Feedback']
df

Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking
5,2025/09/11 12:58:55 PM GMT+1,Abdulmalik,Adedotun,AI,Abeokuta,Male,200,HP,Windows,Enjoying the course so far
6,2025/09/11 12:58:55 PM GMT+1,Naheemot,Adebiyi,AI,Abeokuta,Female,32,DELL,Windows,Grateful for the opportunity to be here.
7,2025/09/11 12:59:00 PM GMT+1,Kanyisola,Fagbayi,AI;Data Science,Lagos,Female,00082,HP,Windows,One chin chin for you for this form
8,2025/09/11 12:59:16 PM GMT+1,Blessing,James,Cyber Security,Nairobi,Female,45678,HP,Windows,Thanks for creating the form.
9,2025/09/11 12:59:28 PM GMT+1,Hannah,Tanimola,AI,Abeokuta,Male,30,HP,Windows,On God


In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34 entries, 0 to 33
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Timestamp     34 non-null     object
 1   First_Name    34 non-null     object
 2   Last_Name     34 non-null     object
 3   Course_Track  34 non-null     object
 4   City          34 non-null     object
 5   Gender        34 non-null     object
 6   Seat_No       34 non-null     object
 7   PC_Make       34 non-null     object
 8   PC_OS         34 non-null     object
 9   Feedback      34 non-null     object
dtypes: object(10)
memory usage: 2.8+ KB


In [42]:
df.describe()

Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
count,34,34,34,34,34,34,34,34,34,34
unique,33,33,34,7,11,2,31,10,3,32
top,2025/09/11 12:58:55 PM GMT+1,Samuel,Okonmah,AI,Abeokuta,Male,2,HP,Windows,None for now
freq,2,2,1,24,22,26,2,19,27,3


In [44]:
df.shape

(34, 10)

In [45]:
df.columns

Index(['Timestamp', 'First_Name', 'Last_Name', 'Course_Track', 'City',
       'Gender', 'Seat_No', 'PC_Make', 'PC_OS', 'Feedback'],
      dtype='object')

In [51]:
df.isnull().sum()

Timestamp       0
First_Name      0
Last_Name       0
Course_Track    0
City            0
Gender          0
Seat_No         0
PC_Make         0
PC_OS           0
Feedback        0
dtype: int64

In [53]:
df.duplicated().sum()

0

In [55]:
df.dtypes

Timestamp       object
First_Name      object
Last_Name       object
Course_Track    object
City            object
Gender          object
Seat_No         object
PC_Make         object
PC_OS           object
Feedback        object
dtype: object

In [62]:
df.astype

<bound method NDFrame.astype of                        Timestamp     First_Name    Last_Name  \
0   2025/09/11 12:55:34 PM GMT+1          Peter      Okonmah   
1   2025/09/11 12:56:11 PM GMT+1       Toyeebat       Nababa   
2   2025/09/11 12:57:08 PM GMT+1      Perpetual      Meninwa   
3   2025/09/11 12:57:56 PM GMT+1         Mahfuz  Abdulhameed   
4   2025/09/11 12:58:41 PM GMT+1         Divine    Gbadamosi   
5   2025/09/11 12:58:55 PM GMT+1     Abdulmalik     Adedotun   
6   2025/09/11 12:58:55 PM GMT+1       Naheemot      Adebiyi   
7   2025/09/11 12:59:00 PM GMT+1      Kanyisola      Fagbayi   
8   2025/09/11 12:59:16 PM GMT+1       Blessing        James   
9   2025/09/11 12:59:28 PM GMT+1         Hannah     Tanimola   
10  2025/09/11 12:59:41 PM GMT+1        Deborah     Adelegan   
11  2025/09/11 12:59:43 PM GMT+1         Esther       Kudoro   
12   2025/09/11 1:00:03 PM GMT+1        Opeyemi      Odejimi   
13   2025/09/11 1:00:13 PM GMT+1    Olasunkanmi        Rasak   
14   202

In [63]:
df['First_Name']

0             Peter
1          Toyeebat
2         Perpetual
3            Mahfuz
4            Divine
5        Abdulmalik
6          Naheemot
7         Kanyisola
8          Blessing
9            Hannah
10          Deborah
11           Esther
12          Opeyemi
13      Olasunkanmi
14           Saheed
15         Kehinde 
16          Oluwole
17           Samuel
18          Ademola
19           Victor
20           Sherif
21            Ayuba
22            Hamid
23          Olajide
24          Solomon
25    Oluwadamilare
26        Oluwaseyi
27           Adeoye
28        Babatunde
29           Samuel
30          Gabriel
31      Ridwanullah
32      Oluwapelumi
33          Michael
Name: First_Name, dtype: object

In [64]:
df.First_Name

0             Peter
1          Toyeebat
2         Perpetual
3            Mahfuz
4            Divine
5        Abdulmalik
6          Naheemot
7         Kanyisola
8          Blessing
9            Hannah
10          Deborah
11           Esther
12          Opeyemi
13      Olasunkanmi
14           Saheed
15         Kehinde 
16          Oluwole
17           Samuel
18          Ademola
19           Victor
20           Sherif
21            Ayuba
22            Hamid
23          Olajide
24          Solomon
25    Oluwadamilare
26        Oluwaseyi
27           Adeoye
28        Babatunde
29           Samuel
30          Gabriel
31      Ridwanullah
32      Oluwapelumi
33          Michael
Name: First_Name, dtype: object

In [65]:
df[['First_Name', 'Last_Name']]

Unnamed: 0,First_Name,Last_Name
0,Peter,Okonmah
1,Toyeebat,Nababa
2,Perpetual,Meninwa
3,Mahfuz,Abdulhameed
4,Divine,Gbadamosi
5,Abdulmalik,Adedotun
6,Naheemot,Adebiyi
7,Kanyisola,Fagbayi
8,Blessing,James
9,Hannah,Tanimola


In [66]:
df[['First_Name', 'Last_Name', 'City', 'Feedback']]

Unnamed: 0,First_Name,Last_Name,City,Feedback
0,Peter,Okonmah,Ogun,non
1,Toyeebat,Nababa,Abeokuta,Excellent
2,Perpetual,Meninwa,Lagos,Thank you so much for the opportunity.
3,Mahfuz,Abdulhameed,Abeokuta,Amazing Shit
4,Divine,Gbadamosi,Abeokuta,Brain Racking
5,Abdulmalik,Adedotun,Abeokuta,Enjoying the course so far
6,Naheemot,Adebiyi,Abeokuta,Grateful for the opportunity to be here.
7,Kanyisola,Fagbayi,Lagos,One chin chin for you for this form
8,Blessing,James,Nairobi,Thanks for creating the form.
9,Hannah,Tanimola,Abeokuta,On God


In [67]:
df['First_Name'][0]

'Peter'

In [68]:
df.at[0, 'First_Name']

'Peter'

In [73]:
df.iat[0, 0]

'2025/09/11 12:55:34 PM GMT+1'

In [74]:
df.iloc[0:5]

Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking


In [75]:
df.iloc[0:5, 0:3]

Unnamed: 0,Timestamp,First_Name,Last_Name
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi


In [76]:
df.loc[0:5]

Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking
5,2025/09/11 12:58:55 PM GMT+1,Abdulmalik,Adedotun,AI,Abeokuta,Male,200,HP,Windows,Enjoying the course so far


In [85]:
filtered_male = df[df['Gender'] == 'Male']
print("Rows where Gender is 'Male': ")
filtered_male

Rows where Gender is 'Male': 


Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking
5,2025/09/11 12:58:55 PM GMT+1,Abdulmalik,Adedotun,AI,Abeokuta,Male,200,HP,Windows,Enjoying the course so far
9,2025/09/11 12:59:28 PM GMT+1,Hannah,Tanimola,AI,Abeokuta,Male,30,HP,Windows,On God
12,2025/09/11 1:00:03 PM GMT+1,Opeyemi,Odejimi,Cloud Computing,Abeokuta,Male,38,HP,Linux,Na wa
13,2025/09/11 1:00:13 PM GMT+1,Olasunkanmi,Rasak,AI,Kobape,Male,3,HP,Windows,My gratitude to the sponsor of this program an...
14,2025/09/11 1:00:27 PM GMT+1,Saheed,Olayinka,AI;Data Science;Web Dev,Abeokuta,Male,29,HP,Windows,None for now
15,2025/09/11 1:00:31 PM GMT+1,Kehinde,Akindele,Cloud Computing,Abeokuta,Male,54,Gateway,Windows,Great
16,2025/09/11 1:00:43 PM GMT+1,Oluwole,Oludayo,AI,Abeokuta,Male,09,HP,Windows,Good training to attend


In [86]:
filtered_female = df[df['Gender'] == 'Female']
print("Rows where Gender is 'Female': ")
filtered_female

Rows where Gender is 'Female': 


Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
6,2025/09/11 12:58:55 PM GMT+1,Naheemot,Adebiyi,AI,Abeokuta,Female,32,DELL,Windows,Grateful for the opportunity to be here.
7,2025/09/11 12:59:00 PM GMT+1,Kanyisola,Fagbayi,AI;Data Science,Lagos,Female,82,HP,Windows,One chin chin for you for this form
8,2025/09/11 12:59:16 PM GMT+1,Blessing,James,Cyber Security,Nairobi,Female,45678,HP,Windows,Thanks for creating the form.
10,2025/09/11 12:59:41 PM GMT+1,Deborah,Adelegan,AI;Data Science,Abeokuta,Female,1,HP,Windows,None for now
11,2025/09/11 12:59:43 PM GMT+1,Esther,Kudoro,AI,Abeokuta,Female,1,HP,Windows,Chill
27,2025/09/11 1:03:12 PM GMT+1,Adeoye,Mary,AI,abeokuta,Female,15,LENOVO,Windows,Still processing


In [93]:
filtered_city = df[(df['City'] == 'Lagos') & (df['Course_Track'] == 'AI')]
print("Rows where City is 'Lagos' and Course_Track id 'Data Science': ")
filtered_city

Rows where City is 'Lagos' and Course_Track id 'Data Science': 


Unnamed: 0,Timestamp,First_Name,Last_Name,Course_Track,City,Gender,Seat_No,PC_Make,PC_OS,Feedback
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
