# why we are using Pandas

# 📊 Pandas Overview  

**Pandas** is a powerful Python library used for **data manipulation and analysis**.  
It provides efficient and flexible data structures like **DataFrame** (2D) and **Series** (1D),  
making it easy to **clean, transform, and analyze data**.  

---

## 🔹 Key Features of Pandas  

- 📥 **Easy Data Import** → from CSV, Excel, SQL, etc.  
- 🧹 **Data Cleansing** → handle missing or incorrect values easily.  
- 📏 **Size Mutability** → add/delete columns and rows dynamically.  
- 🔄 **Reshaping and Pivoting** → transform datasets into desired formats.  
- ⚡ **Efficient Data Manipulation & Extraction** → fast filtering, selection, and aggregation.  


# There are mainly two Data Structures in Pandas  

## 1. **Series** → 1D  
## 2. **DataFrame** → 2D  

**Data Structures** are collections of data types that provide the best way of organizing items (values) in terms of memory usage.


| Feature                            | **Series** (1D)                                         | **DataFrame** (2D)                                               |
|------------------------------------|---------------------------------------------------------|-------------------------------------------------------------------|
| **Definition**                     | One-dimensional labeled array                           | Two-dimensional labeled table (rows + columns)                    |
| **Values mutability**              | **Mutable** — elements can be assigned/changed in-place | **Mutable** — cell values can be assigned/changed in-place        |
| **Structural mutability**          | Structural ops (drop/concat/reindex) **usually return a new object** | Structural ops (drop/concat/rename) **usually return a new object** |
| **Index / axes**                   | Single index                                            | Row index + column labels                                         |
| **Data types**                     | Typically homogeneous (backed by ndarray)               | Heterogeneous (different dtypes per column)                       |
| **Common creation**                | `pd.Series([1,2,3])`                                    | `pd.DataFrame({'a':[1,2],'b':[3,4]})`                             |
| **Interview tip**                  | Emphasize difference: *values* can change in-place; *structure* changes often create copies. |


In [28]:
import pandas as pd
#1. Values are mutable (in-place change works)
# Series Example
s = pd.Series([1, 2, 3])
print("Original Series:\n", s)

# Change element in-place
s[0] = 100
print("\nAfter s[0] = 100 (value mutation):\n", s)

# DataFrame Example
df = pd.DataFrame({"A": [10, 20], "B": [30, 40]})
print("\nOriginal DataFrame:\n", df)

# Change a single cell in-place
df.loc[0, "A"] = 999
print("\nAfter df.loc[0,'A'] = 999 (value mutation):\n", df)


Original Series:
 0    1
1    2
2    3
dtype: int64

After s[0] = 100 (value mutation):
 0    100
1      2
2      3
dtype: int64

Original DataFrame:
     A   B
0  10  30
1  20  40

After df.loc[0,'A'] = 999 (value mutation):
      A   B
0  999  30
1   20  40


In [29]:
#🔹 2. Structural changes return new objects (immutability-like behavior)
# Drop operation on Series
s2 = s.drop(1)   # removes index 1
print("\nResult of s.drop(1):\n", s2)
print("\nOriginal Series remains unchanged:\n", s)

# Drop operation on DataFrame
df2 = df.drop(columns="B")
print("\nResult of df.drop(columns='B'):\n", df2)
print("\nOriginal DataFrame remains unchanged:\n", df)


Result of s.drop(1):
 0    100
2      3
dtype: int64

Original Series remains unchanged:
 0    100
1      2
2      3
dtype: int64

Result of df.drop(columns='B'):
      A
0  999
1   20

Original DataFrame remains unchanged:
      A   B
0  999  30
1   20  40


## 🔑 Pandas Mutability (Interview Note)

- **Series & DataFrame are not fully immutable.**  
- **Values are mutable** → can update in-place (`s[0]=5`, `df.loc[0,'col']=7`).  
- **Structural changes** (add/remove rows/cols) → usually return a **new object** (`drop`, `concat`, `reindex`).  
- `inplace=True` exists but is discouraged (may still copy internally).  

👉 **Interview phrasing:**  
“Pandas allows in-place modification of values, but structural operations generally return new objects — so treat element mutation and structure mutation differently.”


## 🔹 Homogeneous Nature of Pandas Series  

- A **Series is homogeneous** → all elements share the **same dtype**.  
- If we insert a different type, Pandas **upcasts** all values to the most common compatible dtype.  

### Example:




In [30]:
import pandas as pd

# Mixed types: int + string
s3 = pd.Series([10, 23, 43, 54, "abcs"])
print(s3)
print("Dtype:", s1.dtype)

# Pure integers
s2 = pd.Series([10, 23, 43, 54])
print(s2)
print("Dtype:", s2.dtype)

0      10
1      23
2      43
3      54
4    abcs
dtype: object
Dtype: object
0    10
1    23
2    43
3    54
dtype: int64
Dtype: int64


In [31]:
s=pd.Series([10,23,43,4,45,56,78])
s

0    10
1    23
2    43
3     4
4    45
5    56
6    78
dtype: int64

In [32]:
s.dtype

dtype('int64')

In [33]:
#here we are give s.name and it give None b/c we have not asign any value to 
#the column
print(s.name)

None


In [34]:
# now I am assinging the value the to the colume 
s.name="numbers"

In [35]:
print(s)

0    10
1    23
2    43
3     4
4    45
5    56
6    78
Name: numbers, dtype: int64


# 📌 Indexing in Pandas Series  

A **Series** is like a 1D array with **labels (index)**. Indexing helps access, slice, and filter elements efficiently.  

---

## 🔹 Types of Indexing  

1. **Default Indexing**
   - Pandas assigns an integer index starting from 0.
   ```python
   s = pd.Series([10, 20, 30, 40])
   print(s[0])   # Access first element → 10


In [36]:
# 1.Default Indexing
s4 = pd.Series([10, 20, 30, 40])
print(s4[0])
#for selecting a multiple index we use 
s[0:2] # state(included ): stop value (values to jump)


10


0    10
1    23
Name: numbers, dtype: int64

# 2.Custom Indexing

# You can define custom labels for indices.

In [None]:


s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['b'])   # Access element with label 'b' → 20




















# 3.Slicing

## Works similar to Python lists (start:end).

In [None]:
s = pd.Series([10, 20, 30, 40, 50])
print(s[1:4])   # Elements at positions 1 to 3

# 4.Label-based Indexing (.loc)

## Access elements using index labels.

In [None]:
s = pd.Series([100, 200, 300], index=['x', 'y', 'z'])
print(s.loc['y'])   # → 200

# 5.Position-based Indexing (.iloc)

## Access elements using integer positions.

In [41]:
s = pd.Series([100, 200, 300], index=['x', 'y', 'z'])
print(s.iloc[1])   # → 200

200


In [43]:
s.iloc[[0,1,2]]

x    100
y    200
z    300
dtype: int64


# 6.Boolean Indexing

## Filter elements based on conditions.

In [44]:
s = pd.Series([10, 20, 30, 40, 50])
print(s[s > 25])   # Returns elements > 25

2    30
3    40
4    50
dtype: int64


# 📌 Creating a Pandas Series from a Dictionary  

- A **dictionary** in Python has **key-value pairs**.  
- In Pandas Series:
  - **Keys** → become **index labels**  
  - **Values** → become **data elements**  

### Example:
```python


# Dictionary



In [46]:

# Dictionary of fruits and their protein content (grams per 100g)
fruits_protein = {
    "Apple": 0.3,
    "Banana": 1.1,
    "Orange": 0.9,
    "Mango": 0.8,
    "Papaya": 0.5,
    "Guava": 2.6,
    "Grapes": 0.6,
    "Pineapple": 0.5,
    "Strawberry": 0.8,
    "Blueberry": 0.7,
    "Blackberry": 1.4,
    "Raspberry": 1.2,
    "Kiwi": 1.1,
    "Pomegranate": 1.7,
    "Watermelon": 0.6,
    "Cantaloupe (Muskmelon)": 0.8,
    "Cherry": 1.0,
    "Peach": 0.9,
    "Pear": 0.4,
    "Plum": 0.7,
    "Apricot": 1.4,
    "Fig": 0.8,
    "Date": 1.8,
    "Jackfruit": 1.7,
    "Avocado": 2.0,
    "Dragon Fruit": 1.2,
    "Lychee": 0.8,
    "Coconut (fresh)": 3.3
}


# Create Series from dictionary
s = pd.Series(fruits_protein,name="protein")

print("Series from Dictionary:\n", s)
print("\nIndex:", s.index)
print("Values:", s.values)

Series from Dictionary:
 Apple                     0.3
Banana                    1.1
Orange                    0.9
Mango                     0.8
Papaya                    0.5
Guava                     2.6
Grapes                    0.6
Pineapple                 0.5
Strawberry                0.8
Blueberry                 0.7
Blackberry                1.4
Raspberry                 1.2
Kiwi                      1.1
Pomegranate               1.7
Watermelon                0.6
Cantaloupe (Muskmelon)    0.8
Cherry                    1.0
Peach                     0.9
Pear                      0.4
Plum                      0.7
Apricot                   1.4
Fig                       0.8
Date                      1.8
Jackfruit                 1.7
Avocado                   2.0
Dragon Fruit              1.2
Lychee                    0.8
Coconut (fresh)           3.3
Name: protein, dtype: float64

Index: Index(['Apple', 'Banana', 'Orange', 'Mango', 'Papaya', 'Guava', 'Grapes',
       'Pineapple', 'St

### ✅ Conditional Selection in Pandas

- **Definition:** Selecting rows/values based on a condition (Boolean indexing).
- Works like filtering in SQL/Excel.
- Returns only the data that matches the condition.**



### 📌 Conditional Selection in Pandas (Series)

- **Definition:** Extract elements from a Series using conditions.
- Produces a Boolean mask (`True/False`) and returns matching values.

In [None]:
# conditonal selection:
s2>1

In [None]:
s[s>1]

# logical operater
## 1.and
## 2.Or
## 3.Nor