# **Pandas & Data Analysis Bootcamp**
Your complete beginner‑to‑advanced guide with explanations, examples, and exercises.

# **Pandas Complete Lecture Course**
Welcome to your all‑in‑one Pandas course for your data science journey!
This notebook covers:
- Installation and Setup
- Series & DataFrames
- Indexing & Selection
- Importing & Exporting Data
- Handling Missing Values
- Filtering, Sorting, Grouping
- Merging, Joining, Concatenation
- DateTime Handling
- Useful Functions
- Real‑world Mini Projects


Excel ---------> Pandas

Worksheet ---> Dataframe

Column ---> Series

Row Heading ---> Index

Row ---> Row

Empty Cell ---> NaN

## **1. Importing Pandas and Numpy**

In [2]:
import pandas as pd
import numpy as np

# How to Create a Pandas DataFrame

You can create a Pandas Dataframe with

1. Arrays
2. Dictionaries
3. CSV Files

In [18]:
# Creating an array
data = np.array([[1,4],[2,5],[3,6]])
data

array([[1, 4],
       [2, 5],
       [3, 6]])

In [None]:
# Creating a DataFrame and renaming rows and columns
df= pd.DataFrame(data, index = ["Row 1", "Row 2", "Row 3"], columns =['col1', 'col2'])
df

Unnamed: 0,col1,col2
Row 1,1,4
Row 2,2,5
Row 3,3,6


In [27]:
# creating a DataFrame from a dictionary

state = ["California", "Texas", "Florida", "New York"]
population = [39538223, 29145505, 21538187, 20201249]

dict_state = {"State": state, "Population": population}

In [28]:
# Creating the DaataFrame From the dictionary
df_dict = pd.DataFrame(dict_state)
df_dict

Unnamed: 0,State,Population
0,California,39538223
1,Texas,29145505
2,Florida,21538187
3,New York,20201249


## **2. Creating Series and DataFrames**

In [26]:
series = pd.Series([10, 20, 30], name='Scores', index =['a', 'b', 'c'])
series

a    10
b    20
c    30
Name: Scores, dtype: int64

In [23]:

df = pd.DataFrame({
    'Name':['John','Mary','Alex'],
    'Age':[25,22,30],
    'Score':[88,92,95]
})
df

Unnamed: 0,Name,Age,Score
0,John,25,88
1,Mary,22,92
2,Alex,30,95


## **3. Reading & Writing Data**


The result when you read a csv file is a DataFrame

The CSV file  should be located in the same directory as your jupyter notebbok script.

We use the .read_csv() method to read a csv file into a DataFrame.

In [None]:
# reading data from a CSV file
df_csv = pd.read_csv("StudentsPerformance.csv")
df_csv
# This shows the summary of the DataFrame

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


In [31]:
# Showing the first 5 rows of the DataFrame
df_csv.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [32]:
# Showing the last 5 rows of the DataFrame
df_csv.tail()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77
999,female,group D,some college,free/reduced,none,77,86,86


In [35]:
# To see the summary of the DataFrame
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   gender                       1000 non-null   object
 1   race/ethnicity               1000 non-null   object
 2   parental level of education  1000 non-null   object
 3   lunch                        1000 non-null   object
 4   test preparation course      1000 non-null   object
 5   math score                   1000 non-null   int64 
 6   reading score                1000 non-null   int64 
 7   writing score                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB


In [39]:
# You can add an argument to either the head() or tail() methods to specify the number of rows to display. 
# For example, to see the first 10 rows, you can use df_csv.head(10).

df_csv.head(10)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
5,female,group B,associate's degree,standard,none,71,83,78
6,female,group B,some college,standard,completed,88,95,92
7,male,group B,some college,free/reduced,none,40,43,39
8,male,group D,high school,free/reduced,completed,64,64,67
9,female,group B,high school,free/reduced,none,38,60,50


In [40]:
df_csv.tail(10)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
990,male,group E,high school,free/reduced,completed,86,81,75
991,female,group B,some high school,standard,completed,65,82,78
992,female,group D,associate's degree,free/reduced,none,55,76,76
993,female,group D,bachelor's degree,free/reduced,none,62,72,74
994,male,group A,high school,standard,none,63,63,62
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77
999,female,group D,some college,free/reduced,none,77,86,86


In [42]:
# Using the shape attribute to get the dimensions of the DataFrame
# It tells you the number of rows and columns in the DataFrame as a tuple (rows, columns).
df_csv.shape

(1000, 8)

In [None]:
# display n rows
# To display all the rows in a DataFrame, you can use the following code:
pd.set_option('display.max_rows', None)
df_csv