#### **🔧 Step 1: Install Pandas**

## 🔹 1. What is Pandas?

Pandas is a Python library for data manipulation and analysis. You’ll use it almost every time you work with data.

In [2]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


#### **Step 2: Import Pandas & Create a Simple DataFrame**

In [3]:
import pandas as pd

# Simple data

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
}

df  = pd.DataFrame(data)

print(df)


      name  age         city
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


#### **Step 3: Inspect the Data**

In [4]:
print("* First 5 rows of the DataFrame:")
print(df.head())

print("\n* Summary: columns, types, nulls:\n")
print(df.info())

print("\n* Stats for numeric columns: \n")
print(df.describe())

print("\n* Columns names:")
print(df.columns)

print("\n* (row, columns):")
print(df.shape)

* First 5 rows of the DataFrame:
      name  age         city
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

* Summary: columns, types, nulls:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   age     3 non-null      int64 
 2   city    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes
None

* Stats for numeric columns: 

        age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0

* Columns names:
Index(['name', 'age', 'city'], dtype='object')

* (row, columns):
(3, 3)


#### **🧪 Mini Exercise**

1. Create a DataFrame with 4 people, their names, salaries, and job titles.

2. Print:

    First 2 rows

    Column names

    Mean salary

In [5]:
data = {
    'name' : ['Alice', 'Bob', 'Charlie'],
    'salary' : [70000, 80000, 90000],
    'job' : ['Engineer', 'Doctor', 'Artist'],
}

df = pd.DataFrame(data)

In [6]:
print (df.head(2))

print(df.columns)

meanSalary = df['salary'].mean()
print("Mean salary: ", round(meanSalary),2)

    name  salary       job
0  Alice   70000  Engineer
1    Bob   80000    Doctor
Index(['name', 'salary', 'job'], dtype='object')
Mean salary:  80000 2


### **📘 Lesson 2: Selecting, Filtering & Sorting Data**

#### **🔹1. Selecting columns**

In [7]:
print("Salary:\n")
print(df['salary'])

print("name & job:\n")
print(df[['name', 'job']])

Salary:

0    70000
1    80000
2    90000
Name: salary, dtype: int64
name & job:

      name       job
0    Alice  Engineer
1      Bob    Doctor
2  Charlie    Artist


#### **🔹2. Selecting Rows by index**

In [8]:
print("Row with index 0:\n")
print(df.loc[0])

print("Row at position 1:\n")
print(df.iloc[1])

Row with index 0:

name         Alice
salary       70000
job       Engineer
Name: 0, dtype: object
Row at position 1:

name         Bob
salary     80000
job       Doctor
Name: 1, dtype: object


* loc[]: label-based (actual index)

* iloc[]: position-based (row number)

#### **🔹 3. Filtering Rows with Conditions**

In [9]:
print("Pople with salary > 75000:\n")
high_paid = df[df['salary']> 75000]

print("Engineers:\n")
engineers = df[df['job'] == 'Engineer']
print(engineers)

Pople with salary > 75000:

Engineers:

    name  salary       job
0  Alice   70000  Engineer


#### **🔹 4. Sorting Rows**

In [10]:
# Sort by salary:
print (df.sort_values('salary'))

print("\n")
# sort by name (descending):
print (df.sort_values('name', ascending=False))

      name  salary       job
0    Alice   70000  Engineer
1      Bob   80000    Doctor
2  Charlie   90000    Artist


      name  salary       job
2  Charlie   90000    Artist
1      Bob   80000    Doctor
0    Alice   70000  Engineer


#### **🧪 Exercise Time!**

1. Add one more person to the data (your choice).

2. Select only the 'name' and 'job' columns.

3. Filter people with salary over 75000.

4. Sort the DataFrame by salary in descending order.

In [11]:
data = {
    'name' : ['Alice', 'Bob', 'Charlie', 'Aminul'],
    'salary' : [70000, 80000, 90000, 100000],
    'job' : ['Engineer', 'Doctor', 'Artist', 'Engineer'],
}

df = pd.DataFrame(data)
print(df)

      name  salary       job
0    Alice   70000  Engineer
1      Bob   80000    Doctor
2  Charlie   90000    Artist
3   Aminul  100000  Engineer


In [12]:
print (df[['name','job']])

      name       job
0    Alice  Engineer
1      Bob    Doctor
2  Charlie    Artist
3   Aminul  Engineer


In [13]:
print(df[df['salary']> 75000])

      name  salary       job
1      Bob   80000    Doctor
2  Charlie   90000    Artist
3   Aminul  100000  Engineer


In [14]:
print(df.sort_values('salary', ascending = False))

      name  salary       job
3   Aminul  100000  Engineer
2  Charlie   90000    Artist
1      Bob   80000    Doctor
0    Alice   70000  Engineer


### **🔜 Next: Pandas Lesson 3 – Modifying & Cleaning Data**

In [15]:
data = {
    'name' : ['Alice', 'Bob', 'Charlie', 'Aminul'],
    'salary' : [70000, 80000, 90000, 100000],
    'job' : ['Engineer', 'Doctor', 'Artist', 'Engineer'],
}

df = pd.DataFrame(data)

#### **🔹 1. Adding a New Column**

In [16]:
df['tax'] = df['salary'] * 0.2

print(df)

      name  salary       job      tax
0    Alice   70000  Engineer  14000.0
1      Bob   80000    Doctor  16000.0
2  Charlie   90000    Artist  18000.0
3   Aminul  100000  Engineer  20000.0


#### **🔹 2. Modifying Values (e.g. give Engineers a raise)**

In [17]:
df.loc[df['job'] == 'Engineer', 'salary'] += 5000
print(df)

      name  salary       job      tax
0    Alice   75000  Engineer  14000.0
1      Bob   80000    Doctor  16000.0
2  Charlie   90000    Artist  18000.0
3   Aminul  105000  Engineer  20000.0


#### **🔹 3. Handling Missing Values**

In [18]:
df.loc[1, 'job'] = None # Remove job for Bob

# Check for null values:
print(df.isnull())

#Fill missing job with "Unknown":
df['job'].fillna('Unknown')

    name  salary    job    tax
0  False   False  False  False
1  False   False   True  False
2  False   False  False  False
3  False   False  False  False


0    Engineer
1     Unknown
2      Artist
3    Engineer
Name: job, dtype: object

#### **🔹 4. Renaming and Dropping Columns**

In [19]:
# Rename 'Salary' to 'Annual Salary':
df = df.rename(columns={'salary': 'annual_salary'})

# Drop the 'tax' column:
df = df.drop(columns=['tax'])

print(df)

      name  annual_salary       job
0    Alice          75000  Engineer
1      Bob          80000      None
2  Charlie          90000    Artist
3   Aminul         105000  Engineer


### **🧪 Exercise**

1. Add a new column called bonus = 15% of salary

2. Increase salary by 5000 if job is 'Artist'

3. Set Aminul’s job to None and fill it with 'Freelancer'

4. Rename column name to employee_name

5. Drop the bonus column

In [20]:
df['bonus'] = df['annual_salary'] * 0.15

df.loc[df['job'] == 'Artist', 'annual_salary'] += 5000

df.loc[3, 'job'] = None
df['job'] = df['job'].fillna('Freelancer')

df = df.rename(columns={'name': 'employee_name'})

df = df.drop(columns=['bonus'])

print(df)


  employee_name  annual_salary         job
0         Alice          75000    Engineer
1           Bob          80000  Freelancer
2       Charlie          95000      Artist
3        Aminul         105000  Freelancer


### **🔜 Next Topic: Real Dataset Practice (CSV)**

#### **🔧 Step 1: Get a Sample Dataset**
Let’s use a popular one: "Titanic passengers" dataset
It contains data like age, gender, class, and survival status.

🔽 Download it here (or I can help simulate one if needed):
Link: https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

Save it as: titanic.csv

#### **🔹 Step 2: Load the CSV with Pandas**

In [21]:
df = pd.read_csv('titanic.csv')

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


#### **🔹 Step 3: Basic Exploration**

In [22]:
print(df.shape)       # Row and column count
print(df.columns)     # Column names

print ("Data Types and nulls: \n")
print(df.info)        # Info about the DataFrame
print("Summary stats for numbers: \n")
print(df.describe())  # Summary statistics for numeric columns

(891, 12)
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
Data Types and nulls: 

<bound method DataFrame.info of      PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3       

#### **🔹 Step 4: Practice Questions**
Try answering these using Pandas:

1. How many passengers are in the dataset?

2. How many survived?

3. What's the average age?

4. How many children (age < 12) were there?

5. What percentage of passengers were male?

In [23]:
# 1. Total passengers
passenger = df['PassengerId'].count()
print("Total passengers:", passenger)

# 2. Total survived
survived = df['Survived'].sum()
print("Total survived:", survived)

# 3. Average age
avg_age = df['Age'].mean()
print("Average age:", round(avg_age, 2))

# 4. Number of children
children = (df['Age'] < 12).sum()
print("Number of children (age < 12):", children)

# 5. Percentage of male passengers
male_percentage = (df['Sex'] == 'male').mean() * 100
print(f"Percentage of male passengers: {male_percentage:.2f}%")


Total passengers: 891
Total survived: 342
Average age: 29.7
Number of children (age < 12): 68
Percentage of male passengers: 64.76%
