---
# DataFrame
    - Two Dimensional Data structure containing rows and columns, like table
    - A colleciton of Series
    - Each column is a series
    - Syntax: pd.DataFrame({"Name":["Haroon","Ali","John"]},index=['a','b','c'])
    - Columns can have different dtypes
    - Length of all columns should be same

In [None]:
import pandas as pd
import numpy as np

In [None]:
df=pd.DataFrame({ "Name":["Haroon","Ali","John"],"Age":[20,32,12]},index=['a','b','c'])
df


### Creating DataFrame using List

In [None]:
lst=[1,2,3,4]
pd.DataFrame(lst)

lst=[[1,2,3,4,5],[6,7,8,9,10]]
pd.DataFrame(lst)

In [None]:
type(df)

### Creating DataFrame from Dictionary 

In [None]:

dic={ "Name":["Haroon","Ali","John"],"Age":[20,32,12]}
df=pd.DataFrame(dic,columns=[ "Name"])
df


### Creating DataFrame from Series 

In [None]:

dicOfSeries={"s1":pd.Series([12,13,14]),"s2":pd.Series([15,16,17])}
pd.DataFrame(dicOfSeries)

### Get Individual Columns

In [None]:
df["Name"]

### Get Individual Row

In [None]:
df.iloc[0]
df.loc[0]

### Get Particular Cell 

In [None]:
# df["Name"][2]
# using iloc
df.iloc[0]["Name"]

### Get Multiple Columns

In [None]:
# df[["Col",">40"]]

## Arithmetic Operations on Data Frame

In [None]:
df=pd.DataFrame({ "Col1":[10,20,30,40],"Col2":[40,30,20,10]})
df

### Sum

In [None]:
df["Sum"]=df["Col1"]+df["Col2"]
df

### Difference

In [None]:
df["Difference"]=df["Col1"]-df["Col2"]
df

### Product

In [None]:
df["Product"]=df["Col1"]*df["Col2"]
df

### Division

In [None]:
df["Division"]=df["Col1"]/df["Col2"]
df

In [None]:
df=pd.DataFrame({ "Col1":[10,20,30,40,50,60],"Col2":[60,50,40,30,20,10]})
df

### Inserting a Column in DataFrame

In [None]:
# df.insert(1,"Col3",[1,2,3,4,5,6])
df.insert(1,"Col3",df["Col1"])
df

In [None]:
# Using Slicing
df["Col4"]=df["Col3"][:4]
df

### Deleting a Column in DataFrame

#### Using del Keyword

In [None]:
del df["Col4"]
df

#### Using pop("ColName")

In [None]:
col3=df.pop("Col3")
col3

In [None]:
df

----
*****
----
## Practice Problems 

In [None]:
df=pd.DataFrame({ "Col":[10,20,30,40,50,60,70,80]})
df[">40"]=df["Col"] > 40
df["<40"]=df["Col"] < 40
df


In [None]:

df = pd.DataFrame({
    'Product': ['Pen', 'Book', 'Bag', 'Pencil'],
    'Price': [20, 150, 800, 10],
    'Stock': [100, 50, 20, 200]
})

----
üü¢ Level 2 ‚Äî DataFrame Fundamentals

Q4 ‚Äî Selection

    1Ô∏è‚É£ Select the Price column
    2Ô∏è‚É£ Select Product and Stock
    3Ô∏è‚É£ Select the last two rows

In [None]:
# 1
df["Price"]

In [None]:
# 2
df[["Product","Stock"]]

In [None]:
# 3
df.tail(2)

##### Q5 ‚Äî Boolean Filtering

    1Ô∏è‚É£ Products with Price > 50
    2Ô∏è‚É£ Products with Stock >= 100
    3Ô∏è‚É£ Products with Price > 50 AND Stock < 100

    Use .loc at least once.

In [None]:
# 1
df[df["Price"]>50]

In [None]:
#2
df[df["Stock"]>=100]

In [None]:
#3
df.loc[(df["Price"]>50) &( df["Stock"]<100)]

#### üß† Why did we use '&' instead of 'and' ?

    * Python and: Checks if the entire object is True/False. (Fails on arrays).

    * Pandas &: Does element-wise comparison (Row 1 vs Row 1, Row 2 vs Row 2).

üü° Level 3 ‚Äî Shape & Type Awareness

##### Q6



    1Ô∏è‚É£ What is the type of result1?
    2Ô∏è‚É£ What is the type of result2?
    3Ô∏è‚É£ Why does this difference matter?

In [None]:
result1 = df['Price'] # Series
result2 = df[['Price']] # DataFrame

In [None]:
#1 
type(result1)

In [None]:
#2 
type(result2)


##### Q7 ‚Äî Destructive Thinking Test

    What happens here?
    df = df['Price'] > 50
    df.head()
    1Ô∏è‚É£ What is df now?
    2Ô∏è‚É£ Why is this dangerous?

In [None]:
# 1
df=df['Price']>50
df
# df=df[df['Price']>50] # Fix


**2Ô∏è‚É£ Why is this dangerous?**

Answer: You just deleted your entire dataset.
```

By saying df = ..., you overwrote the variable df.
The original columns (Product, Stock) are gone.
The original values are gone.
If you try to run df['Stock'] in the next cell, it will crash
```

----
üî¥ Level 4 ‚Äî Silent Pandas Traps

##### Q8 : Change Price of Product by 999 who's stock is lesser than 50

    Will this modify df correctly? Why or why not?

    df[df['Stock'] < 50]['Price'] = 999


In [None]:
df[df['Stock']<50]['Price']= 999
df

**Fix it properly.**

In [None]:
df.loc[df['Stock'] < 50, 'Price'] = 999
df

In [None]:

# df.loc[df['Price'] > 50 and df['Stock'] < 100] # Error!

df[(df['Price'] > 50) & (df['Stock'] < 100)]

df.loc[(df['Price'] > 50) & (df['Stock'] < 100)] # Fixed.


```
Why .loc is better?
Just looking at data? df[...] is fine (and faster to type).
Modifying data or picking specific columns? Always use .loc.
```

In [None]:
df[(df['Price'] > 50) & (df['Stock'] < 100)]["Product"]


In [None]:
df.loc[(df['Price'] > 50) & (df['Stock'] < 100),'Product'] 

----

### Boolean Conditioning Problems 

In [None]:
df = pd.DataFrame({
    'EmpID': [101, 102, 103, 104, 105, 106],
    'Name': ['Ali', 'Sara', 'Ahmed', 'Zara', 'Usman', 'Hina'],
    'Dept': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT'],
    'Age': [22, 29, 24, 31, 26, 23],
    'Salary': [45000, 52000, 48000, 60000, 50000, 47000],
    'Experience': [1, 5, 2, 7, 3, 1]
})
df

---
**üü¢ SECTION A ‚Äî Boolean Conditioning (SELECT-style)**
```
Q1

Select all employees:

from IT department

with Age < 25

In [None]:
df[df["Dept"]=='IT']

```
Q2

Select employees:

from HR OR Finance

with Salary ‚â• 50000


In [None]:
hrAndFinanceEmp=df.loc[((df["Dept"]=='HR') | (df["Dept"]=='Finance')) & (df["Salary"]>=50000)]
hrAndFinanceEmp

```
Q3

Select employees:

NOT from IT

AND Experience > 3
```

In [None]:
df.loc[(df['Dept']!='IT')&(df['Experience']>3)]

```
Q4 (Tricky)

Select employees whose:

Department is IT

OR Salary is greater than 55000

AND Age is less than 30

(Use correct operator precedence)

In [None]:
df.loc[((df['Dept']=='IT') | (df['Salary']>55000)) & (df['Age'] < 30)]

---

üü° SECTION B ‚Äî Conditional Modification (UPDATE-style)
```
Q5

Increase Salary by 10,000 for:

employees in Finance

In [None]:
df.loc[df['Dept']=="Finance",'Salary']+=10000
df

```
Q6

Set Salary to 50000 for:

HR employees with Experience < 3

In [None]:
df.loc[(df['Dept']=='HR')&(df["Experience"]<3),'Salary']=50000
df

```
Q7

Decrease Salary by 5% for:

IT employees with Age > 23

(No loops. One line.)

In [None]:
df.loc[(df['Dept']=='IT')&(df['Age']>23),'Salary']*=0.95
df

```
Q8 (Overwrite Trap)

Set Experience to 0 for:

employees with Salary < 48000

(Do it safely.)

In [None]:
df.loc[df['Salary']<48000,'Experience']=0
df

----
üî¥ SECTION C ‚Äî Views vs Copies (The Mind-Breakers)

Q9
Predict:
```python
temp = df[df['Dept'] == 'IT']
temp['Salary'] = 99999
```

```
1Ô∏è‚É£ Will this modify df?

2Ô∏è‚É£ Why? **NO** Because its a copy

```
Q10

Fix Q9 so that:

Only IT employees‚Äô Salary becomes 99999

Original df is modified

In [None]:
temp=df[df['Dept']=='IT']
temp['Salary']=99999
df.update(temp)
df

Q11 (Silent Failure)

Explain what is wrong with this code:
```python

df[df['Age'] > 25]['Salary'] += 2000
```
```
‚ùå The "Chained" Way (Bad)
You are asking Python to do two separate steps:
."Get me the data." (Returns a new object).
."Modify that data." (Modifies the new object).
```


Rewrite it correctly.

In [None]:
df.loc[df['Age'] > 25,'Salary'] += 2000
df

Q12 (Chain Indexing Detector)

Which of the following are unsafe and why?
```python
A) df['Salary'][df['Dept'] == 'HR'] = 45000 # ( Bad Chained, Pandas can not  guarantee working)
B) df.loc[df['Dept'] == 'HR', 'Salary'] = 45000 # ( Correct One)
C) df[df['Dept'] == 'HR'].loc[:, 'Salary'] = 45000 # ( ChainedAssignmentError:)
D) df.iloc[0:3]['Salary'] = 30 # ( ChainedAssignmentError:)
```

In [None]:
df

üß† SECTION D ‚Äî Think Like SQL (Hard Mode)

Q13
```
Write a single line Pandas query equivalent to:

UPDATE employees
SET Salary = Salary + 3000
WHERE Dept = 'IT' AND Experience >= 2;
```

In [None]:
df.loc[(df['Dept']=='IT') & (df['Experience']>=2),'Salary']+=3000
df

Q14
```
Equivalent of:

SELECT Name, Dept, Salary
FROM employees
WHERE Salary BETWEEN 48000 AND 55000
AND Dept != 'Finance';
```

In [None]:
df.loc[((df['Salary']>48000 ) & (df['Salary']<55000))&(df['Dept']!='Finance')][['Name','Dept','Salary']]

In [None]:
# df.loc[(df['Salary'] > 50000),'Experience'] +=
df.loc[(df['Salary'] > 50000),'Experience'] = df['Experience'] + 1
df

# Panel
    - 3 Dimensional Array