# 🐼 Pandas Basics for Data Analysis & ML


**Pandas** is a powerful Python library used for data analysis, data cleaning, and data manipulation.
It is especially useful in **Machine Learning** for handling structured data.


In [75]:
import pandas as pd

## 📌 1. Creating a Series

In [76]:

data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
person = ["Akshay", "Rao", 24, "ParvaM"]
series = pd.Series(person, index = ['First Name', 'Last Name', 'Age', 'Company'])
print(series)


First Name    Akshay
Last Name        Rao
Age               24
Company       ParvaM
dtype: object


## 📦 2. Creating a DataFrame

In [77]:

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## 🔍 3. Inspecting Data

In [78]:

print("Head:\n", df.head())
print("Info:\n")
df.info()
print("Describe:\n", df.describe())


Head:
       Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
Info:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes
Describe:
         Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


## 🎯 4. Selecting Data (Rows / Columns)

In [79]:
print("Single column:\n", df['City'])
print("Multiple columns:\n", df[['Name', 'City']])
print("Row by index:\n", df.iloc[2])


Single column:
 0       New York
1    Los Angeles
2        Chicago
Name: City, dtype: object
Multiple columns:
       Name         City
0    Alice     New York
1      Bob  Los Angeles
2  Charlie      Chicago
Row by index:
 Name    Charlie
Age          35
City    Chicago
Name: 2, dtype: object


## 🔍 5. Filtering Rows

In [80]:

# Filter rows where Age > 28
print(df[df['Age'] > 28])


      Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## ➕➖ 6. Adding / Removing Columns

In [81]:

df['Salary'] = [50000, 60000, 70000]  # Add new column
print("With Salary:\n", df)

df = df.drop('City', axis=1)          # Remove column
print("Without City:\n", df)


With Salary:
       Name  Age         City  Salary
0    Alice   25     New York   50000
1      Bob   30  Los Angeles   60000
2  Charlie   35      Chicago   70000
Without City:
       Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000


## ❓ 7. Handling Missing Data

In [82]:

df2 = pd.DataFrame({
    'A': [1, 2, None],
    'B': [4, None, 6],
    'C': [5, 3, 4]
})
print("Original:\n", df2)
print("Fill NA:\n", df2.fillna(0))
print("Drop NA:\n", df2.dropna())

Original:
      A    B  C
0  1.0  4.0  5
1  2.0  NaN  3
2  NaN  6.0  4
Fill NA:
      A    B  C
0  1.0  4.0  5
1  2.0  0.0  3
2  0.0  6.0  4
Drop NA:
      A    B  C
0  1.0  4.0  5


## 📁 8. Reading Data from CSV

In [83]:

# Sample read (CSV should be in same directory)
# df_csv = pd.read_csv('data.csv')
# print(df_csv.head())
print("Use pd.read_csv('filename.csv') to read a dataset.")


Use pd.read_csv('filename.csv') to read a dataset.


## 📊 9. Grouping and Aggregation

In [84]:

df = pd.DataFrame({
    'Department': ['IT', 'HR', 'IT', 'HR'],
    'Salary': [60000, 50000, 70000, 52000]
})
print(df.groupby('Department')['Salary'].mean())


Department
HR    51000.0
IT    65000.0
Name: Salary, dtype: float64
