### Groups of Functions in Pandas for Data Analysis

##### A. Creating Series and DataFrames
- I believe that we have learnt list and dictionary data strctures when we were learning python. Now, we want to learn how to use both list and dictionaries for creating Pandas Series and DataFrames.

**Creating a Pandas Series**

To do anything with pandas, the first thing to do is to import the pandas library as an alias.

* importing pandas package
```c
import pandas as pd
```
* Creating pandas series
```c
series = pd.Series(data)
```
* Creating pandas DataFrame
```c
dataframe = pd.DataFrame(data)

In [3]:
# Lets create a pandas Series using a python list

# Step 1: Import pandas package
import pandas as pd

# Step2: Define a list
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


# Step3: Create the Series
series = pd.Series(data)

# lets view the series that we have created
series.head(10)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

In [4]:
# lets confirm to be sure we had created a pandas series
type(series)

pandas.core.series.Series

In [5]:
# Lets create a series using the same list, but now we will be adding our own serial numbering, in python or pandas it is called index
series2 = pd.Series(data, index = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"])
series2.head(10)

a     1
b     2
c     3
d     4
e     5
f     6
g     7
h     8
i     9
j    10
dtype: int64

In [6]:
# Lets create a series using python dictionary


# lets create a python dictionary
data2 = {'a': 10, 'b': 20, 'c': 30}

# lets create the series
series3 = pd.Series(data2)
series3.head()

a    10
b    20
c    30
dtype: int64

**Hands on practice:**
1. Create a bucket list of 6 items. Convert the list to pandas series and define index for it using alphabets.
2. Create a sample python dictionary of your biodata with 5 keys and their corresponding values. convert the dictionary into a pandas series

depending on where you are viewing this nootbook you are either to download it or make a copy

In [None]:
# 1.
bucket_list = ["Dubai", "Laptop", "Books", "Game pad", "Manchester", "Jotter"]

bucket_list_series = pd.Series(bucket_list, index=['a', 'b', 'c', 'd', 'e', 'f'])
bucket_list_series

a         Dubai
b        Laptop
c         Books
d      Game pad
e    Manchester
f        Jotter
dtype: object

In [9]:
# 2.
bio_data = {"Name": "Ademuyiwa", "Degree": "Software developer", "Gender": "Male", "Age": 15, "Interest": "AI Engineer"}

bio_data_series = pd.Series(bio_data)
bio_data_series

Name                 Ademuyiwa
Degree      Software developer
Gender                    Male
Age                         15
Interest           AI Engineer
dtype: object

**Creating a DataFrame**

import pandas as pd

* Create your list of list or dictionary
```c
data = []
#or
data = {}
```

* Create the dataframe using this syntax
```c
df = pd.DataFrame(data)
```

In [10]:
# Lets create a dataframe
# Step1: import pandas

# Define the data using dictionary that is having its values as a list.

data = {
    'Name': ['Chris', 'Ayo', 'Chisom'],
    'Age': [26, 24, 22],
    'Home_Town': ['Benin', 'Ibadan', 'Enugu']
}

# Lets create the dataframe using "df" as short for dataframe
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [12]:
# lets do something by using list of dictionaries
data2 = [
    {'Name': 'Chris', 'Age': 26, 'Home_Town': 'Benin'},
    {'Name': 'Ayo', 'Age': 24, 'Home_Town': 'Ibadan'},
    {'Name': 'Chisom', 'Age': 22, 'Home_Town': 'Enugu'}
]
# Lets define the dataframe
df2 = pd.DataFrame(data2)
df2.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [14]:
# Lets do the something again using list of list

data3 = [
    ['Chris', 26, 'Benin'],
    ['Ayo', 24, 'Ibadan'],
    ['Chisom', 22, 'Enugu']
]
df3 = pd.DataFrame(data3, columns=['Name', 'Age', 'Home_Town'])
df3.head()

Unnamed: 0,Name,Age,Home_Town
0,Chris,26,Benin
1,Ayo,24,Ibadan
2,Chisom,22,Enugu


In [15]:
# lets print the types to be sure we have defined dataframes
print(type(df))
print(type(df2))
print(type(df3))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


**Hands on practice**

* **Creating a dataset:**

Lets create a google sheet, make the link accessible to everyone to input the following information First_Name, Last_Name, Gender, Seat_No, City, Course_Track, PC_make, PC_Os, and Feedback.


[Click here to respond](https://forms.gle/8VQgWmvqQyiPEifY8)

At the end of the collection, we will use the data to practice data manipulation.
---

### B. Data Input and Output:

**To readin datasets we use**

```c
pd.read_csv() # for csv files

pd.read_excel() # for excel files
```

**Note**: There are many other methods for reading in different data files based on their extensions. we have .json, .txt, .sql, .html etc. If you are curious you could check them out.

**To save into csv file or excel file**

```c
df.to_csv()
```

To save to excel
```c
df.to_excel()
```

Usecase example
```c
bio_data.to_csv("bio_data.csv", index = False)
```

Here, we would download our generated data in csv format and in excel format. Then load it using the `pd.read_csv()`

Then we would inspect and explore the data.

In [21]:
from pathlib import Path
workspace = Path("workspace")
workspace.mkdir(exist_ok=True)

bio_data_path = workspace / "bio_data.csv"

In [33]:
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

In [34]:
# lets get to work...
df = pd.read_csv(bio_data_path)
df

Unnamed: 0,Timestamp,First Name,Last Name,Course Track,City,Gender,Seat Number,PC-Make,PC - OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking
5,2025/09/11 12:58:55 PM GMT+1,Abdulmalik,Adedotun,AI,Abeokuta,Male,200,HP,Windows,Enjoying the course so far
6,2025/09/11 12:58:55 PM GMT+1,Naheemot,Adebiyi,AI,Abeokuta,Female,32,DELL,Windows,Grateful for the opportunity to be here.
7,2025/09/11 12:59:00 PM GMT+1,Kanyisola,Fagbayi,AI;Data Science,Lagos,Female,00082,HP,Windows,One chin chin for you for this form
8,2025/09/11 12:59:16 PM GMT+1,Blessing,James,Cyber Security,Nairobi,Female,45678,HP,Windows,Thanks for creating the form.
9,2025/09/11 12:59:28 PM GMT+1,Hannah,Tanimola,AI,Abeokuta,Male,30,HP,Windows,On God


### C. Data Inspection and Exploration

To inspect our dataset we will beusing the following python methods
```c
.head() # To view the first 5 rows
```

```c
.tail() # To view the last 5 rows
```

```c
.info() # To check the information about the data
```

```c
.describe() # statistical summary
```

```c
.shape # Check the dimention of the dataset
```

```c
.columns # for checking the column names
```

In [35]:
df.head()

Unnamed: 0,Timestamp,First Name,Last Name,Course Track,City,Gender,Seat Number,PC-Make,PC - OS,Feedback
0,2025/09/11 12:55:34 PM GMT+1,Peter,Okonmah,AI,Ogun,Male,28,MACBOOK,Mac OS,non
1,2025/09/11 12:56:11 PM GMT+1,Toyeebat,Nababa,AI,Abeokuta,Female,24,HP,Windows,Excellent
2,2025/09/11 12:57:08 PM GMT+1,Perpetual,Meninwa,AI,Lagos,Female,22,HP,Windows,Thank you so much for the opportunity.
3,2025/09/11 12:57:56 PM GMT+1,Mahfuz,Abdulhameed,AI,Abeokuta,Male,44,HP,Windows,Amazing Shit
4,2025/09/11 12:58:41 PM GMT+1,Divine,Gbadamosi,AI,Abeokuta,Male,35,DELL,Windows,Brain Racking


In [36]:
df.tail()

Unnamed: 0,Timestamp,First Name,Last Name,Course Track,City,Gender,Seat Number,PC-Make,PC - OS,Feedback
29,2025/09/11 1:06:03 PM GMT+1,Samuel,Dasaolu,AI,Abeokuta,Male,100,HP;MACBOOK,Linux,"Good so far, i guess"
30,2025/09/11 1:06:48 PM GMT+1,Gabriel,Bamgbose,AI,Abeokuta,Male,2,HP,Windows,Good
31,2025/09/11 1:10:16 PM GMT+1,Ridwanullah,Osho,AI;Cyber Security;Data Science,Abeokuta,Male,45,DELL,Windows,IT IS WHAT IT IS !!!
32,2025/09/11 1:11:39 PM GMT+1,Oluwapelumi,Adenuga,Web Dev,Abeokuta,Male,36,HP,Windows,live yours
33,2025/09/11 1:18:15 PM GMT+1,Michael,Osisami,AI,Abeokuta,Male,12,DELL;HP,Windows,Nil


In [37]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34 entries, 0 to 33
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Timestamp     34 non-null     object
 1   First Name    34 non-null     object
 2   Last Name     34 non-null     object
 3   Course Track  34 non-null     object
 4   City          34 non-null     object
 5   Gender        34 non-null     object
 6   Seat Number   34 non-null     object
 7   PC-Make       34 non-null     object
 8   PC - OS       34 non-null     object
 9   Feedback      34 non-null     object
dtypes: object(10)
memory usage: 2.8+ KB


In [39]:
df.describe()

Unnamed: 0,Timestamp,First Name,Last Name,Course Track,City,Gender,Seat Number,PC-Make,PC - OS,Feedback
count,34,34,34,34,34,34,34,34,34,34
unique,33,33,34,7,11,2,31,10,3,32
top,2025/09/11 12:58:55 PM GMT+1,Samuel,Okonmah,AI,Abeokuta,Male,2,HP,Windows,None for now
freq,2,2,1,24,22,26,2,19,27,3


In [40]:
df.shape

(34, 10)

In [41]:
df.columns

Index(['Timestamp', 'First Name', 'Last Name', 'Course Track', 'City', 'Gender', 'Seat Number', 'PC-Make', 'PC - OS', 'Feedback'], dtype='object')

### D. Data Cleaning

 Data cleaning involves identifying and handling errors or inconsistencies in your dataset. Later in this course, data cleaning would be handled in datails.

Handling Missing Values

```c
.isna() or .isnull() # Check for missing values
```

```c
.isna().sum()  # Check the total number of all missing values
```

```c
.fillna() # Fill up missing values
```

```c
.dropna() # Drop missing values
```

Correcting Data Types

In pandas there are two main types of datatypes, "integer" and "Object"

You can check data type using
```c

df.dtypes
```

To convert the type of perform type casting, you use

```c
df.astype() # this takes in the datatype you want to convert it to as an argument
```

When working with time or time series dat its important to convert the time to pandas recognized time using

```c
pd.to_datatime() # takes in the data column as an argument
```

In [43]:
# Do we have any missing values? if yess, lets fill them uo

df.isna().sum()

Timestamp       0
First Name      0
Last Name       0
Course Track    0
City            0
Gender          0
Seat Number     0
PC-Make         0
PC - OS         0
Feedback        0
dtype: int64

In [45]:
df['First Name'].duplicated()

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29     True
30    False
31    False
32    False
33    False
Name: First Name, dtype: bool