Here’s a **simple and complete guide to Pandas** — with all important points and examples:

---

## ✅ 1. What is Pandas?

* Python library used for **data manipulation and analysis**.
* Built on **NumPy**.
* Works with **tables** (rows & columns).

---

## ✅ 2. Key Data Structures

### 🔹 `Series` (1D):

```python
import pandas as pd

s = pd.Series([10, 20, 30])
print(s)
```

### 🔹 `DataFrame` (2D table):

```python
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
```

---

## ✅ 3. Reading/Writing Data

### 🔹 CSV File:

```python
df = pd.read_csv('data.csv')       # Read
df.to_csv('output.csv', index=False)  # Write
```

---

## ✅ 4. Basic Operations

### 🔹 Head and Tail:

```python
df.head()      # First 5 rows
df.tail(3)     # Last 3 rows
```

### 🔹 Shape, Columns, Info:

```python
df.shape       # (rows, columns)
df.columns     # List of column names
df.info()      # Summary of DataFrame
```

---

## ✅ 5. Selecting Data

### 🔹 By Column:

```python
df['Age']             # Single column
df[['Name', 'Age']]   # Multiple columns
```

### 🔹 By Row (Index):

```python
df.iloc[0]    # First row by position
df.loc[0]     # Row with index label 0
```

---

## ✅ 6. Filtering (Conditions)

```python
df[df['Age'] > 25]     # Rows where Age > 25
```

---

## ✅ 7. Adding / Removing Columns

### 🔹 Add:

```python
df['Salary'] = [50000, 60000]
```

### 🔹 Remove:

```python
df.drop('Salary', axis=1, inplace=True)
```

---

## ✅ 8. Sorting

```python
df.sort_values('Age')                 # Ascending
df.sort_values('Age', ascending=False)  # Descending
```

---

## ✅ 9. Aggregations

```python
df['Age'].mean()     # Average
df['Age'].sum()      # Total
df['Age'].max()      # Max
df.describe()        # All stats
```

---

## ✅ 10. Handling Missing Data

```python
df.isnull()              # Check
df.fillna(0)             # Fill with 0
df.dropna()              # Remove rows with NaN
```

---

## ✅ 11. Grouping

```python
df.groupby('Gender')['Age'].mean()
```

---

## ✅ 12. Merging & Joining

```python
pd.merge(df1, df2, on='ID')    # Merge on common column
```

---

## ✅ 13. Apply Function

```python
df['Age'].apply(lambda x: x * 2)
```

---

## ✅ 14. Export to Excel, JSON

```python
df.to_excel("file.xlsx", index=False)
df.to_json("file.json")
```

---

## ✅ 15. Convert to NumPy

```python
df.values     # Converts to NumPy array
```

---

Would you like a revision sheet or mind map of all Pandas functions too?


In [1]:
import pandas as pd

In [3]:
data=[10,20,30,40,50]
ser=pd.Series(data)
print(ser)

0    10
1    20
2    30
3    40
4    50
dtype: int64


In [4]:
#create series with dict
data1={'a':10,'b':20,'c':30}
ser1=pd.Series(data1)
print(ser1)

a    10
b    20
c    30
dtype: int64


In [5]:
#create series without dict but with key value pair
data2=[10,20,30,40,50]
index=['a','b','c','d','e']
ser2=pd.Series(data2,index=index)
print(ser2)

a    10
b    20
c    30
d    40
e    50
dtype: int64


In [None]:
#create a dataframe from dict of list
data={
  'name':["burhan","juzer","hozefa","mehlam"],
  'age':[19,18,20,19],
  'Girl-friend':['no','yes','yes','yes']
}
df=pd.DataFrame(data)
df

Unnamed: 0,name,age,Girl-friend
0,burhan,19,no
1,juzer,18,yes
2,hozefa,20,yes
3,mehlam,19,yes


In [10]:
#create a dataframe from list of dict
data=[
  {'name':'burhan','age':19,'gender':'male'},
  {'name':'ABDE','age':20,'gender':'male'}
]
df=pd.DataFrame(data)
df

Unnamed: 0,name,age,gender
0,burhan,19,male
1,ABDE,20,male


What is a CSV File?
CSV stands for Comma-Separated Values.

It is a simple text file used to store tabular data (like Excel) where:

Each row is a line in the file.

Each column value is separated by a comma ,.

In [6]:
import pandas as pd

df=pd.read_csv("sample_people.csv")
df.head(3)
df.tail()

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist
99,100,b8D0aD3490FC7e1,Mariah,Bernard,Male,pcopeland@example.org,(341)594-6554x44657,2016-11-15,IT sales professional


In [13]:
#accessing data from dataframe
df

Unnamed: 0,name,age,Girl-friend
0,burhan,19,no
1,juzer,18,yes
2,hozefa,20,yes
3,mehlam,19,yes


Perfect! You created a DataFrame like this:

```python
import pandas as pd

data = {
  'name': ["burhan", "juzer", "hozefa", "mehlam"],
  'age': [19, 18, 20, 19],
  'Girl-friend': ['no', 'yes', 'yes', 'yes']
}

df = pd.DataFrame(data)
print(df)
```

The output will be:

```
     name  age Girl-friend
0  burhan   19          no
1   juzer   18         yes
2  hozefa   20         yes
3  mehlam   19         yes
```

---

### 🔹 Now use `loc` and `iloc` with this

#### ✅ `iloc`: Use **index numbers** (0, 1, 2,...)

```python
df.iloc[0]         # Get 1st row (burhan's data)
df.iloc[1, 2]      # Get Girl-friend of juzer → 'yes'
```

✔ Output:

```python
df.iloc[0]
# name            burhan
# age                 19
# Girl-friend         no
```

```python
df.iloc[1, 2]
# 'yes'
```

---

#### ✅ `loc`: Use **row labels**

Right now, the rows are 0, 1, 2, 3 — which are **numbers** — so `loc` will work **same as iloc** for now:

```python
df.loc[2]        # Get hozefa's row
df.loc[3, 'name']  # Get name from row 3 → 'mehlam'
```

✔ Output:

```python
df.loc[2]
# name            hozefa
# age                 20
# Girl-friend        yes
```

```python
df.loc[3, 'name']
# 'mehlam'
```

---

### ✨ Summary (for your DataFrame):

| Task                    | Code                        |
| ----------------------- | --------------------------- |
| Get all info of juzer   | `df.iloc[1]` or `df.loc[1]` |
| Get hozefa's girlfriend | `df.loc[2, 'Girl-friend']`  |
| Get 3rd row             | `df.iloc[2]`                |

---

If you want custom row names like `['a', 'b', 'c', 'd']`, tell me and I’ll show you how `loc` becomes more useful.


In [14]:
df['name']

0    burhan
1     juzer
2    hozefa
3    mehlam
Name: name, dtype: object

In [5]:
import pandas as pd
data={
  'name':["burhan","juzer","hozefa","mehlam"],
  'age':[19,18,20,19],
  'Girl-friend':['no','yes','yes','yes']
}
df=pd.DataFrame(data)
df

Unnamed: 0,name,age,Girl-friend
0,burhan,19,no
1,juzer,18,yes
2,hozefa,20,yes
3,mehlam,19,yes


In [None]:
#iloc[index,element index]
a=df.iloc[1,1]
print(a)

18


In [None]:
#loc[index,column name]
b=df.loc[2,'name']
print(b)

hozefa


In [17]:
#accessing an specified element
c=df.at[1,'name']
print(c)

juzer


In [19]:
d=df.iat[1,1]
print(d)

18


✅ **Yes, you're right** — the **work/output is same**, but the **use case is slightly different**.

Let me explain in **super simple points**:

---

### ✅ `loc` vs `at`

| Point | `loc`                          | `at`                      |
| ----- | ------------------------------ | ------------------------- |
| ✔     | Works with **labels**          | Works with **labels**     |
| 🔁    | Can return **multiple values** | Only for **single value** |
| ⚡     | Slower (more general)          | Faster (optimized for 1)  |

#### Example:

```python
df.loc[1, 'age']   # ✔ works
df.at[1, 'age']    # ✔ same result, faster
```

---

### ✅ `iloc` vs `iat`

| Point | `iloc`                         | `iat`                     |
| ----- | ------------------------------ | ------------------------- |
| ✔     | Uses **index numbers**         | Uses **index numbers**    |
| 🔁    | Can return **multiple values** | Only for **single value** |
| ⚡     | Slower                         | Faster for single value   |

#### Example:

```python
df.iloc[1, 1]     # ✔ gives 18
df.iat[1, 1]      # ✔ same output, faster
```

---

### ✅ Final Summary

| You want to get...       | Use    |
| ------------------------ | ------ |
| 1 value by label         | `at`   |
| 1 value by index number  | `iat`  |
| Multiple values by label | `loc`  |
| Multiple values by index | `iloc` |

---

So yes:
🟢 **Function is same**
⚡ **But `at`/`iat` is faster and only for one value**

Let me know if you want to see a speed comparison or a practice quiz!


In [20]:
#data manipulation with dataframe
df

Unnamed: 0,name,age,Girl-friend
0,burhan,19,no
1,juzer,18,yes
2,hozefa,20,yes
3,mehlam,19,yes


In [21]:
#add a column
df['salary']=[10000,20000,30000,40000]
df

Unnamed: 0,name,age,Girl-friend,salary
0,burhan,19,no,10000
1,juzer,18,yes,20000
2,hozefa,20,yes,30000
3,mehlam,19,yes,40000


In [24]:
#removve a column
df.drop('salary',axis=1,inplace=True)
# we want to give axis because by default it check in a row (axis=0) and to save permenant what we have delete then use inplace=true other wise it sae a data as a temporary

In [25]:
df

Unnamed: 0,name,age,Girl-friend
0,burhan,19,no
1,juzer,18,yes
2,hozefa,20,yes
3,mehlam,19,yes


In [27]:
#adding 5 in a age column
df['age']=df['age']+5
df

Unnamed: 0,name,age,Girl-friend
0,burhan,29,no
1,juzer,28,yes
2,hozefa,30,yes
3,mehlam,29,yes


describe() is a summary function used to get quick statistics about your numeric columns.

In [60]:
import pandas as pd

df=pd.read_csv("sample_people.csv")
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon
...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist


In [43]:
df.describe()

Unnamed: 0,Index
count,100.0
mean,50.5
std,29.011492
min,1.0
25%,25.75
50%,50.5
75%,75.25
max,100.0


In [44]:
df.dtypes

Index             int64
User Id          object
First Name       object
Last Name        object
Sex              object
Email            object
Phone            object
Date of birth    object
Job Title        object
dtype: object

In [35]:
df['Email'].head(5)

0          elijah57@example.net
1         bethany14@example.com
2         bthompson@example.com
3     kaitlinkaiser@example.com
4    buchananmanuel@example.net
Name: Email, dtype: object

In [49]:
#handling missing value
df.isnull()

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
95,False,False,False,False,False,False,False,False,False
96,False,False,False,False,False,False,False,False,False
97,False,False,False,False,False,False,False,False,False
98,False,False,False,False,False,False,False,False,False


In [50]:
df.isnull().any() #give axis=0 or 1 in any

Index            False
User Id          False
First Name       False
Last Name        False
Sex              False
Email            False
Phone            False
Date of birth    False
Job Title         True
dtype: bool

In [63]:
df.isnull().sum() #it give how much missing value is there

Index            0
User Id          0
First Name       0
Last Name        0
Sex              1
Email            1
Phone            1
Date of birth    1
Job Title        4
dtype: int64

In [58]:
pd.reset_option('display.max_rows')
pd.reset_option('display.max_columns')


In [62]:
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon
...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist


In [None]:
#filling a missing value
df['new phone']=df['Phone'].fillna('unknown')
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title,new phone
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer,001-084-906-7849x73518
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist,214.112.6044x4913
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,,unknown
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher,584.094.6111
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon,689-207-3558x7233
...,...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer,001-095-524-2112x257
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister,001-865-478-5157
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer,995-542-3004x76800
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist,001-273-685-6932x092


In [68]:
#if we have to fill numeric vvalue then use mean()
import numpy as np
df['salary']=np.random.randint(30000,80000,size=100)
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title,new phone,salary
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer,001-084-906-7849x73518,63312
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist,214.112.6044x4913,53213
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,,unknown,75044
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher,584.094.6111,65902
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon,689-207-3558x7233,59497
...,...,...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer,001-095-524-2112x257,49383
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister,001-865-478-5157,58627
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer,995-542-3004x76800,50645
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist,001-273-685-6932x092,49587


In [73]:
df.drop('Salary',axis=1,inplace=True)

In [None]:
# Make some values missing(loc is used to access and modify value)
df.loc[[4, 12, 25, 40], 'salary'] = np.nan
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title,new phone,salary
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer,001-084-906-7849x73518,63312.0
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist,214.112.6044x4913,53213.0
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,,unknown,75044.0
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher,584.094.6111,65902.0
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon,689-207-3558x7233,
...,...,...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer,001-095-524-2112x257,49383.0
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister,001-865-478-5157,58627.0
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer,995-542-3004x76800,50645.0
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist,001-273-685-6932x092,49587.0


In [76]:
df['Salary']=df['salary'].fillna(df['salary'].mean())
df

Unnamed: 0,Index,User Id,First Name,Last Name,Sex,Email,Phone,Date of birth,Job Title,new phone,salary,Salary
0,1,88F7B33d2bcf9f5,Shelby,Terrell,Male,elijah57@example.net,001-084-906-7849x73518,1945-10-26,Games developer,001-084-906-7849x73518,63312.0,63312.0
1,2,f90cD3E76f1A9b9,Phillip,Summers,Female,bethany14@example.com,214.112.6044x4913,1910-03-24,Phytotherapist,214.112.6044x4913,53213.0,53213.0
2,3,DbeAb8CcdfeFC2c,1992-07-02,Homeopath,,,,,,unknown,75044.0,75044.0
3,4,A31Bee3c201ef58,Yesenia,Martinez,Male,kaitlinkaiser@example.com,584.094.6111,2017-08-03,Market researcher,584.094.6111,65902.0,65902.0
4,5,1bA7A3dc874da3c,Lori,Todd,Male,buchananmanuel@example.net,689-207-3558x7233,1938-12-01,Veterinary surgeon,689-207-3558x7233,,52640.8
...,...,...,...,...,...,...,...,...,...,...,...,...
95,96,5eFda7caAeB260E,Dennis,Barnes,Female,bmartin@example.org,001-095-524-2112x257,1954-07-30,Software engineer,001-095-524-2112x257,49383.0,49383.0
96,97,CCbFce93d3720bE,Steve,Patterson,Female,latasha46@example.net,001-865-478-5157,1932-04-29,Barrister,001-865-478-5157,58627.0,58627.0
97,98,2fEc528aFAF0b69,Wesley,Bray,Male,regina11@example.org,995-542-3004x76800,1994-12-28,Police officer,995-542-3004x76800,50645.0,50645.0
98,99,Adc7ad9B6e4A1Fe,Summer,Oconnell,Female,alexiscantrell@example.org,001-273-685-6932x092,2012-04-12,Broadcast journalist,001-273-685-6932x092,49587.0,49587.0
