### 2.1 Pandas

`Pandas: `Pandas is a popular open-source data manipulation and analysis library for Python. It provides data structures and functions needed to work with structured data efficiently, such as in tables or time series.

**With Pandas you can:**
1. `Data Loading and Saving` (Load and save data from various file types, such as CSV, Excel, SQL databases, JSON, etc)
2. `Data Cleaning and Preprocessing` (Handle missing values by filling, dropping, or imputing them, and removing duplicates by filtering
3. `Data Exploration and Analysis` (Perform statistical analysis on data, such as calculating mean, median, mode, standard deviation, and generating summary statistics)
3. `Grouping and Aggregation` (Group data by one or more columns and apply aggregation functions (like sum, mean))
4. `Time Series Analysis` ( Handle date and time data with specialized functions)
5. `Data Visualization` (Create simple plots like line, bar, histogram, and scatter plots directly from Pandas)
6. `Advanced Data Operations` (such as map, applymap, apply)

**There are two main data structures in pandas**
1. `Series :` A one-dimensional labeled array, similar to a list or array but with labeled indices, which can hold any data type
2. `DataFrame :`A two-dimensional, table-like structure with labeled axes (rows and columns). It’s essentially a collection of Series objects, where each column in the DataFrame is a Series

### 2.2 creating Pandas DataFrame by dictionary

In [1]:
import pandas as pd

In [7]:
# Creating Pandas dataframe by dictionary

info = {
    "Name": ["Emon Durjoy", "Faruk Ahmed", "Rakib Ahmed", "Shahidul Hossain"],
    "Email": ["emon@test.com", "Faruk@test.com", "Rakib@test.com", "Shahidul@test.com"],
    "Roll": [220304, 220450, 220876, 2234583],
    "Attendance": ["Present", "Present", "Present", "Absent"]
       }

In [6]:
info

{'Name': ['Emon Durjoy', 'Faruk Ahmed', 'Rakib Ahmed', 'Shahidul Hossain'],
 'Email': ['emon@test.com',
  'Faruk@test.com',
  'Rakib@test.com',
  'Shahidul@test.com'],
 'Roll': ['220304', '220450', '220876', '2234583'],
 'Attendance': ['Present', 'Present', 'Present', 'Absent']}

In [8]:
df = pd.DataFrame(info)

In [9]:
df

Unnamed: 0,Name,Email,Roll,Attendance
0,Emon Durjoy,emon@test.com,220304,Present
1,Faruk Ahmed,Faruk@test.com,220450,Present
2,Rakib Ahmed,Rakib@test.com,220876,Present
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent


In [10]:
df.info() # print concise summary of DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        4 non-null      object
 1   Email       4 non-null      object
 2   Roll        4 non-null      int64 
 3   Attendance  4 non-null      object
dtypes: int64(1), object(3)
memory usage: 256.0+ bytes


In [4]:
df["Name"]

0         Emon Durjoy
1         Faruk Ahmed
2         Rakib Ahmed
3    Shahidul Hossain
Name: Name, dtype: object

In [7]:
df.dtypes # print the data types of every column

Name          object
Email         object
Roll          object
Attendance    object
dtype: object

In [8]:
df.shape # print the size of df

(4, 4)

#### 2.3 Accessing Column and  Row


#### 2.3.1 Accesing Column

In [9]:
df["Name"] 

0         Emon Durjoy
1         Faruk Ahmed
2         Rakib Ahmed
3    Shahidul Hossain
Name: Name, dtype: object

In [10]:
df["Email"]

0        emon@test.com
1       Faruk@test.com
2       Rakib@test.com
3    Shahidul@test.com
Name: Email, dtype: object

##### 2.3.2 Accessing Row

In [41]:
df.loc[1] 

Name             Faruk Ahmed
Email         Faruk@test.com
Roll                  220450
Attendance           Present
First                  Faruk
Last                   Ahmed
Name: 1, dtype: object

In [42]:
df.loc[1, ["Name", "Email"]] 

Name        Faruk Ahmed
Email    Faruk@test.com
Name: 1, dtype: object

In [13]:
df.loc[1, "Name"]

'Faruk Ahmed'

In [14]:
df.iloc[0]

Name            Emon Durjoy
Email         emon@test.com
Roll                 220304
Attendance          Present
Name: 0, dtype: object

In [15]:
df.iloc[3]

Name           Shahidul Hossain
Email         Shahidul@test.com
Roll                    2234583
Attendance               Absent
Name: 3, dtype: object

### 2.4 Adding new cloumn from existing column

In [15]:
df["Name"].str.split(" ")

0         [Emon, Durjoy]
1         [Faruk, Ahmed]
2         [Rakib, Ahmed]
3    [Shahidul, Hossain]
Name: Name, dtype: object

In [16]:
df["First"] = df["Name"].str.split(" ").str[0]
df.head()

Unnamed: 0,Name,Email,Roll,Attendance,First
0,Emon Durjoy,emon@test.com,220304,Present,Emon
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul


In [17]:
df["Last"] = df["Name"].str.split(" ").str[1]
df.head()

Unnamed: 0,Name,Email,Roll,Attendance,First,Last
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


### 2.5 Filtering 

In [43]:
df["Attendance"] == "Present"

0     True
1     True
2     True
3    False
Name: Attendance, dtype: bool

In [14]:
filt = df["Attendance"] == "Present"

df[filt]

Unnamed: 0,Name,Email,Roll,Attendance
0,Emon Durjoy,emon@test.com,220304,Present
1,Faruk Ahmed,Faruk@test.com,220450,Present
2,Rakib Ahmed,Rakib@test.com,220876,Present


In [13]:
filt

0     True
1     True
2     True
3    False
Name: Attendance, dtype: bool

In [45]:
df[filt]["Email"]

0     emon@test.com
1    Faruk@test.com
2    Rakib@test.com
Name: Email, dtype: object

In [46]:
df[filt][['Email', "Roll"]]

Unnamed: 0,Email,Roll
0,emon@test.com,220304
1,Faruk@test.com,220450
2,Rakib@test.com,220876


In [47]:
df[df["Attendance"] == "Present"]

Unnamed: 0,Name,Email,Roll,Attendance,First,Last
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed


In [48]:
filt2 = (df["Attendance"] == "Present") & (df["Roll"] ==  "220304")

In [49]:
df[filt2]

Unnamed: 0,Name,Email,Roll,Attendance,First,Last
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy


In [51]:
filt3 = df["Last"] == "Ahmed"
df[filt3]

Unnamed: 0,Name,Email,Roll,Attendance,First,Last
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed


### 2.6 Changing Column Name

In [52]:
df.columns

Index(['Name', 'Email', 'Roll', 'Attendance', 'First', 'Last'], dtype='object')

In [20]:
[column.lower() for column in df.columns]

['name', 'email', 'roll', 'attendance', 'first', 'last']

In [21]:
df.columns = [column.upper() for column in df.columns]

In [22]:
df

Unnamed: 0,NAME,EMAIL,ROLL,ATTENDANCE,FIRST,LAST
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [23]:
df.columns = [column.lower() for column in df.columns]

In [24]:
df

Unnamed: 0,name,email,roll,attendance,first,last
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [25]:
df.columns = [column.title() for column in df.columns]

In [26]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First,Last
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [29]:
df.columns = ["Name", "Email", "Roll", "Attendance", "First Name", "Last Name"]

In [30]:
df.head()

Unnamed: 0,Name,Email,Roll,Attendance,First Name,Last Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [31]:
df.columns = df.columns.str.replace(" ", "_")

In [32]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First_Name,Last_Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [33]:
df.columns = df.columns.str.replace("_", " ")

In [34]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First Name,Last Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Faruk Ahmed,Faruk@test.com,220450,Present,Faruk,Ahmed
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


### 2.7 Changing row

In [35]:
df.loc[1] = ["Mahfuz Mozumder", "mahfuz@company.com", "2230843", "Absent", "Mahfuz", "Mozumder"]

In [36]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First Name,Last Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Mahfuz Mozumder,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [74]:
df.loc[1, ["Email", "Attendance"]] = ["mozumder@company.com", "Present"]

In [75]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First Name,Last Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Mahfuz Mozumder,mozumder@company.com,2230843,Present,Mahfuz,Mozumder
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


### 2.8 Dropping or removing Columns

In [79]:
df.drop("Name")

KeyError: "['Name'] not found in axis"

In [80]:
df.drop("Name", axis=1)

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name
0,emon@test.com,220304,Present,Emon,Durjoy
1,mozumder@company.com,2230843,Present,Mahfuz,Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [81]:
df

Unnamed: 0,Name,Email,Roll,Attendance,First Name,Last Name
0,Emon Durjoy,emon@test.com,220304,Present,Emon,Durjoy
1,Mahfuz Mozumder,mozumder@company.com,2230843,Present,Mahfuz,Mozumder
2,Rakib Ahmed,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul Hossain,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [38]:
df1

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name
0,emon@test.com,220304,Present,Emon,Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [37]:
df1 = df.drop("Name", axis=1)

In [39]:
df.drop("Name", axis=1, inplace=True)

In [40]:
df

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name
0,emon@test.com,220304,Present,Emon,Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain


In [41]:
df["Name"] = df['First Name'] + " "+ df["Last Name"]

In [42]:
df

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name,Name
0,emon@test.com,220304,Present,Emon,Durjoy,Emon Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder,Mahfuz Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed,Rakib Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain,Shahidul Hossain


### 2.9 Adding Rows

In [45]:
df.append({"Email": "naim@yahoo.com", "Roll": "220343", "Attendance": "Absent", "First Name":"Naimur", "Last Name": "Rahman", "Last Name": "Naimur Rahman"}, ignore_index=True)

  df.append({"Email": "naim@yahoo.com", "Roll": "220343", "Attendance": "Absent", "First Name":"Naimur", "Last Name": "Rahman", "Last Name": "Naimur Rahman"}, ignore_index=True)


Unnamed: 0,Email,Roll,Attendance,First Name,Last Name,Name
0,emon@test.com,220304,Present,Emon,Durjoy,Emon Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder,Mahfuz Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed,Rakib Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain,Shahidul Hossain
4,naim@yahoo.com,220343,Absent,Naimur,Naimur Rahman,


In [46]:
df

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name,Name
0,emon@test.com,220304,Present,Emon,Durjoy,Emon Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder,Mahfuz Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed,Rakib Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain,Shahidul Hossain


In [47]:
df = df.append({"Email": "naim@yahoo.com", "Roll": "220343", "Attendance": "Absent", "First Name":"Naimur", "Last Name": "Rahman", "Last Name": "Naimur Rahman"}, ignore_index=True)

  df = df.append({"Email": "naim@yahoo.com", "Roll": "220343", "Attendance": "Absent", "First Name":"Naimur", "Last Name": "Rahman", "Last Name": "Naimur Rahman"}, ignore_index=True)


In [48]:
df

Unnamed: 0,Email,Roll,Attendance,First Name,Last Name,Name
0,emon@test.com,220304,Present,Emon,Durjoy,Emon Durjoy
1,mahfuz@company.com,2230843,Absent,Mahfuz,Mozumder,Mahfuz Mozumder
2,Rakib@test.com,220876,Present,Rakib,Ahmed,Rakib Ahmed
3,Shahidul@test.com,2234583,Absent,Shahidul,Hossain,Shahidul Hossain
4,naim@yahoo.com,220343,Absent,Naimur,Naimur Rahman,
