# Notebook 01: Pandas : 
- Pandas is an open-source Python library for data manipulation and analysis, providing easy-to-use data structures like DataFrame and Series for handling structured data efficiently. 

## Installation

In [1]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


## Importing pandas
- Generally we makes an alias of pandas as pd to use it late.

In [2]:
import pandas as pd 
import numpy as np

## Data Structure in Pandas
- There are mainly two Data Structure in pandas.
- Series : A one dimensional labeled array.
- DataFrame : A two dimensional table with labeled rows and columns.

In [3]:
# Creating Series with 1D list
arr = [1,2,3,4,5,6] 
sr = pd.Series(arr)
sr

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

In [4]:
# Creating Series with 2D list
arr = [[1,2,3],[4,5,6]] 
sr = pd.Series(arr)
sr

0    [1, 2, 3]
1    [4, 5, 6]
dtype: object

### Creating DataFrame By Multiple ways

In [5]:
# By list of list
li = [[1,'Ali',23],[2,'Hamza',21],[3,'Ubaid',20]]
df = pd.DataFrame(li,columns=['ID','Names','Age'])
df

Unnamed: 0,ID,Names,Age
0,1,Ali,23
1,2,Hamza,21
2,3,Ubaid,20


In [6]:
# By Dictionary of list
dict = {
    'ID' : [1,2,3],
    'Names' : ['Ali','Hamza','Ubaid'],
    'Age' : [23,21,20]
}
df = pd.DataFrame(dict)
df

Unnamed: 0,ID,Names,Age
0,1,Ali,23
1,2,Hamza,21
2,3,Ubaid,20


In [7]:
#By numpy array
arr = np.array([[1,'Ali',23],[2,'Hamza',21],[3,'Ubaid',20]])
df = pd.DataFrame(arr,columns = ['ID','Names','Age'])
df

Unnamed: 0,ID,Names,Age
0,1,Ali,23
1,2,Hamza,21
2,3,Ubaid,20


In [8]:
#By Reading CSV file
students = pd.read_csv('./dataset/students_data.csv')
students

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0
2,S1003,Faraz,Male,20.0,Mathematics,,2022,9876543000.0
3,S1004,Danish,Male,21.0,Physics,2.5,2023,
4,S1005,Laiba,Female,22.0,Engineering,3.0,2024,9876543000.0
5,S1006,Noor,Female,18.0,Computer Science,3.5,2020,9876543000.0
6,S1007,Neha,Female,19.0,Business,4.0,2021,9876543000.0
7,S1008,Aqib,Male,,Mathematics,2.5,2022,9876543000.0
8,S1009,Taha,Male,21.0,Physics,3.0,2023,9876543000.0
9,S1010,Sheri,Male,22.0,Engineering,3.5,2024,9876543000.0


## Exploring DataFrames: Summary, Structure & Overview

- .head(n) : Display n rows from starting of DataFrame.

In [9]:
students.head(5)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0
2,S1003,Faraz,Male,20.0,Mathematics,,2022,9876543000.0
3,S1004,Danish,Male,21.0,Physics,2.5,2023,
4,S1005,Laiba,Female,22.0,Engineering,3.0,2024,9876543000.0


- .tail(n) : Display n rows from ending of DataFrame.

In [10]:
students.tail(3)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact
22,S1023,Saifullah,Male,20.0,Mathematics,4.0,2022,9876543000.0
23,S1024,Mehwish,Female,21.0,Physics,2.5,2023,9876543000.0
24,S1025,Shayan,Male,22.0,Engineering,3.0,2024,9876543000.0


- shape : returns dimension of DataFrame as (rows,columns).

In [11]:
students.shape

(25, 8)

- size :  returns total elements in DataFrame.

In [12]:
students.size

200

- info() : provides a concise summary of a DataFrame, including the column names,number of non-null values, data types, and memory usage.

In [13]:
students.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Student_ID       25 non-null     object 
 1   Name             25 non-null     object 
 2   Gender           25 non-null     object 
 3   Age              21 non-null     float64
 4   Department       25 non-null     object 
 5   GPA              21 non-null     float64
 6   Enrollment_Year  25 non-null     int64  
 7   Contact          23 non-null     float64
dtypes: float64(3), int64(1), object(4)
memory usage: 1.7+ KB


- index : returns the row labels (index) of the DataFrame.

In [14]:
students.index

RangeIndex(start=0, stop=25, step=1)

- columns : returns column names of DataFrame

In [15]:
students.columns

Index(['Student_ID', 'Name', 'Gender', 'Age', 'Department', 'GPA',
       'Enrollment_Year', 'Contact'],
      dtype='object')

- describe() : provides summary statistics of numerical columns, including mean,min,max and count.

In [16]:
students.describe()

Unnamed: 0,Age,GPA,Enrollment_Year,Contact
count,21.0,21.0,25.0,23.0
mean,19.857143,3.166667,2022.0,9876543000.0
std,1.49284,0.555278,1.443376,7.331776
min,18.0,2.5,2020.0,9876543000.0
25%,19.0,2.5,2021.0,9876543000.0
50%,20.0,3.0,2022.0,9876543000.0
75%,21.0,3.5,2023.0,9876543000.0
max,22.0,4.0,2024.0,9876543000.0


- .nunique() :  Returns no of unique values in each column.

In [17]:
students.nunique()

Student_ID         25
Name               25
Gender              2
Age                 5
Department          5
GPA                 4
Enrollment_Year     5
Contact            23
dtype: int64

- .count() : Returns no of no-null values in each column.

In [18]:
students.count()

Student_ID         25
Name               25
Gender             25
Age                21
Department         25
GPA                21
Enrollment_Year    25
Contact            23
dtype: int64

## Adding , Modifying and Deleting Data

### Adding new columns Graduation_year and Scholarship_Status

#### Method 01:

In [19]:
students["Graduation_Year"] = students['Enrollment_Year'] + 4
students["Scholarship_Status"] = np.where(students['GPA'] == 4,'Yes','No')
students.head(3)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0,2025,No
2,S1003,Faraz,Male,20.0,Mathematics,,2022,9876543000.0,2026,No


#### Method 02 : using insert function
- df.insert(pos,Name,column) : pos = position to insert, Name of column, column to be inserted.

In [20]:
# Inserting between name and gender
students.insert(2,column='My_Column',value=np.arange(1,26,1))
students.head(2)

Unnamed: 0,Student_ID,Name,My_Column,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,1,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,2,Male,19.0,Business,3.5,2021,9876543000.0,2025,No


### Deleting Any column.

- .drop(columns,inplace) : columns to be deleted, inplace true means modifies original DataFrame.

In [21]:
students.drop(columns = ['My_Column'],inplace=False) # By default inplace is False
students.head(2)

Unnamed: 0,Student_ID,Name,My_Column,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,1,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,2,Male,19.0,Business,3.5,2021,9876543000.0,2025,No


In [22]:
students.drop(columns = ['My_Column'],inplace=True) # By default inplace is False
students.head(4)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0,2025,No
2,S1003,Faraz,Male,20.0,Mathematics,,2022,9876543000.0,2026,No
3,S1004,Danish,Male,21.0,Physics,2.5,2023,,2027,No


### Modifying Data

- i want to Modify Age of Faraz.

- .at[ row_label , column_label ]

In [23]:
students.at[2,'Age'] = 19 
students.head(4)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0,2025,No
2,S1003,Faraz,Male,19.0,Mathematics,,2022,9876543000.0,2026,No
3,S1004,Danish,Male,21.0,Physics,2.5,2023,,2027,No


In [24]:
students.loc[2,'Name'] = 'Humam'
students.head(3)

Unnamed: 0,Student_ID,Name,Gender,Age,Department,GPA,Enrollment_Year,Contact,Graduation_Year,Scholarship_Status
0,S1001,Ali,Male,18.0,Computer Science,3.0,2020,9876543000.0,2024,No
1,S1002,Umar,Male,19.0,Business,3.5,2021,9876543000.0,2025,No
2,S1003,Humam,Male,19.0,Mathematics,,2022,9876543000.0,2026,No


## Conclusion 🎯
In this notebook, we covered fundamentals of Pandas, including:
- Installation
- Creating DataFrames
- Exploring data with key functions (info(), head(), tail(), describe())
- Modifying data.
<br>These concepts lay the foundation for efficient data manipulation in Pandas.
Next, we'll dive deeper into accessing, slicing, filtering, and handling missing data. Keep practicing and experimenting with different datasets! 💡