# Day 13 - Pandas
Pandas is an open-source powerful library widely used for data manipulation and analysis   
It is well-suited for working with tabular data such as spreadsheets or SQL tables    

What we can do using Pandas?
- Data Cleaning, Merging and Joining.
- Handling Missing Data.
- Column Insertion and Deletion.
- Group By Operations.
- Data Visualization.
- etc.

In [2]:
import pandas as pd

# Dataframe -> a 2D data structure, like a 2D array, a dictionary, or a table with rows and columns, 
# Series -> single row or a single column from a dataframe is known as series,, 1D data

In [18]:
# load data into a DataFrame object:

data = {"calories": [420, 380, 390], 
        "duration": [50, 40, 45], 
        "weight": [67, 54, 75]}

df = pd.DataFrame(data)                 # whole 2D data

print(df)
print(type(df))

calories = df['calories']               # series/single row/column data

print(calories)
print(type(calories))

   calories  duration  weight
0       420        50      67
1       380        40      54
2       390        45      75
<class 'pandas.core.frame.DataFrame'>
0    420
1    380
2    390
Name: calories, dtype: int64
<class 'pandas.core.series.Series'>


### Different file loading in pandas


In [20]:
# CSV (Comma-Separated Values) file        -> most used dataset format

csv_data = pd.read_csv('D13-3_student_data.csv')
# print(csv_data)
print(type(csv_data))


# Excel file

excel_data = pd.read_excel('D13-2_student_marks.xlsx')
# print(excel_data)
print(type(excel_data))


# parquet file        -> faster than CSV, mainly used for very much larger data sets

parquet_data = pd.read_parquet('D13-4_students.parquet')
# print(parquet_data)
print(type(parquet_data))


# we can also work with JSON or other so many files formats

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


### Pandas basic functionalities

In [7]:
df = pd.read_csv('D13-3_student_data.csv')
# print(df)
df

Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location
0,1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka
1,1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram
2,1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka
3,1004,Jannatul Ferdous,78.0,78.0,82.0,Completed,2024-02-12,Ms. Salma,Sylhet
4,1005,Kamal Uddin,,,95.0,In Progress,2024-03-05,Mr. Karim,Chattogram
5,1006,Laila Begum,75.0,75.0,78.0,Completed,2024-03-08,Ms. Salma,Rajshahi
6,1007,Mahmudul Hasan,80.0,80.0,,In Progress,2024-04-01,Mr. Karim,Dhaka
7,1008,Nadia Islam,81.0,81.0,85.0,Completed,2024-04-22,Ms. Salma,Chattogram
8,1009,Omar Faruq,72.0,72.0,76.0,Completed,2024-05-16,Mr. David,Dhaka
9,1010,Priya Sharma,89.0,89.0,88.0,Completed,2024-05-20,Ms. Salma,Sylhet


In [20]:
# df.head(8)      # dataframe er শুরুর দিকের ডাটা এক্সেস করার জন্য,, default is 5
# df.tail()       # dataframe er শেষের দিকের ডাটা এক্সেস করার জন্য,, default is 5
df.info()       # summary of the DataFrame, including data types and non-null counts.
# df.columns      # এখানে column কোনো ফাংশন না,, বরং dataframe এর একটা attribute 
# df.shape        # এখানে column কোনো ফাংশন না,, বরং dataframe এর একটা attribute 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   StudentID             20 non-null     int64  
 1   FullName              20 non-null     object 
 2   Data Structure Marks  16 non-null     float64
 3   Algorithm Marks       16 non-null     float64
 4   Python Marks          15 non-null     float64
 5   CompletionStatus      20 non-null     object 
 6   EnrollmentDate        20 non-null     object 
 7   Instructor            20 non-null     object 
 8   Location              20 non-null     object 
dtypes: float64(3), int64(1), object(5)
memory usage: 1.5+ KB
