## Pandas Data Types

Pandas provides a variety of data types to store and manipulate different kinds of data efficiently. These data types are built on top of NumPy’s array types, which give Pandas its performance.

Here are the key data types you’ll encounter in Pandas:

---

### 1. **Object (String) Data Type**: 
This is the default data type for string data in Pandas. It can store any Python object, but it is typically used for text (string) data.

- **Example**: 


In [None]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie']}
df = pd.DataFrame(data)
print(df.dtypes)



  Here, the column `Name` contains string values, so Pandas assigns the `object` type.

---

### 2. **Integer Data Type (`int64`)**:
This data type is used to represent integer numbers (whole numbers). Pandas uses the `int64` type by default, which means it uses 64 bits to store integers.

- **Example**:


In [None]:
data = {'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df.dtypes)


  Here, the `Age` column contains integers, so it’s of type `int64`.

---

### 3. **Floating Point Data Type (`float64`)**:
This data type is used to represent decimal numbers or real numbers. Just like integers, Pandas uses the `float64` type by default, which stores 64-bit floating point numbers.

- **Example**:


In [None]:
data = {'Height': [5.5, 6.2, 5.8]}
df = pd.DataFrame(data)
print(df.dtypes)



  The `Height` column contains floating point numbers, so it is of type `float64`.

---

### 4. **Boolean Data Type (`bool`)**:
Pandas uses the `bool` data type to store boolean values, which can be either `True` or `False`.

- **Example**:

In [None]:
data = {'Is_Adult': [True, False, True]}
df = pd.DataFrame(data)
print(df.dtypes)



  The `Is_Adult` column contains boolean values, so it’s of type `bool`.

---

### 5. **Datetime Data Type (`datetime64`)**:
Datetime data type is used to represent time and date values. It is very important when working with time series data.

- **Example**:


In [None]:
data = {'Date': ['2024-01-01', '2024-02-01', '2024-03-01']}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)


  Here, `Date` has been converted to the `datetime64` data type.

---

### 6. **Categorical Data Type**:
This is a special data type used to store categorical values, i.e., values that belong to a fixed set of categories. It is more memory efficient than using object types for categorical data.

- **Example**:


In [None]:
data = {'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)
df['Category'] = df['Category'].astype('category')
print(df.dtypes)



  The `Category` column is now of type `category`.

---

### 7. **Timedelta Data Type (`timedelta64`)**:
This data type is used to represent differences in dates and times, i.e., time durations.

- **Example**:


In [None]:
data = {'Time_Duration': ['1 days', '2 days', '3 days']}
df = pd.DataFrame(data)
df['Time_Duration'] = pd.to_timedelta(df['Time_Duration'])
print(df.dtypes)


---

### Summary of Pandas Data Types:
| Data Type   | Description                                        | Example              |
|-------------|----------------------------------------------------|----------------------|
| `object`    | Typically used for text (string) data              | `"Hello"`            |
| `int64`     | Integer values (whole numbers)                     | `5`, `100`           |
| `float64`   | Floating-point numbers (decimal values)            | `3.14`, `2.71`       |
| `bool`      | Boolean values (`True` or `False`)                 | `True`, `False`      |
| `datetime64`| Date and time values                               | `2024-12-25`         |
| `category`  | Categorical data (fixed set of categories)         | `'A'`, `'B'`         |
| `timedelta` | Time differences (duration between dates/times)   | `1 days`, `2 hours`  |

---

### Converting Data Types in Pandas
Pandas allows you to convert between data types using methods like:
- `.astype()`: Convert a column to a specific type.
- `pd.to_datetime()`: Convert to datetime.
- `pd.to_timedelta()`: Convert to timedelta.
- `pd.to_numeric()`: Convert to numeric data types.

For example:

In [None]:
data = {'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df['Age'] = df['Age'].astype('float64')  # Convert to float
print(df.dtypes)

## Practice all with example

### Create a DataFrame with Different Data Types

In [None]:
import pandas as pd

# Sample data with different types
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 22],  # Integer type
    'Height': [5.5, 6.2, 5.8, 5.9, 5.6],  # Float type
    'Is_Adult': [True, True, True, True, False],  # Boolean type
    'Birthdate': ['1999-01-01', '1994-05-12', '1989-07-08', '1984-09-15', '2002-03-30']  # Date type
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert Birthdate to datetime type
df['Birthdate'] = pd.to_datetime(df['Birthdate'])

# Display DataFrame and check data types
print("DataFrame:")
print(df)
print("\nData Types:")
print(df.dtypes)

### Convert the `Birthdate` to string format (using `astype`):

In [None]:
df['Birthdate'] = df['Birthdate'].astype(str)
print("\nConverted Birthdate to string:")
print(df.dtypes)

### Convert `Name` to `category` data type (for efficiency):

In [None]:
df['Name'] = df['Name'].astype('category')
print("\nConverted Name to category:")
print(df.dtypes)

---

### Step 3: Filter and Operate on Data Based on Data Types

Now that we have the data types set, let’s filter and perform operations on the columns based on their types.

#### 1. Filter rows where `Age` is greater than or equal to 30:

```python
adults = df[df['Age'] >= 30]
print("\nPeople aged 30 or older:")
print(adults)
```

#### 2. Filter rows where `Is_Adult` is `True`:

```python
adults_only = df[df['Is_Adult'] == 1]
print("\nOnly adults (Is_Adult=True):")
print(adults_only)
```

#### 3. Filter out rows where `Height` is greater than or equal to 6.0:

```python
tall_people = df[df['Height'] >= 6.0]
print("\nPeople with Height >= 6.0:")
print(tall_people)
```

#### 4. Extract birth year from the `Birthdate` column:

```python
df['Birthyear'] = df['Birthdate'].apply(lambda x: x.split('-')[0])
print("\nExtracted Birthyear:")
print(df[['Name', 'Birthyear']])
```

### Null Value Functions `.isnull()`, `.dropna()`, and `.fillna()`

Let's create a **DataFrame** with some **null (NaN) values** 

### Step 1: Create DataFrame with Null Values

In [None]:
import pandas as pd
import numpy as np

# Create a DataFrame with NaN values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', None, 'Eva'],
    'Age': [25, None, 35, 40, None],
    'Height': [5.5, 6.2, None, 5.9, 5.6],
    'City': ['New York', None, 'San Francisco', 'Boston', 'Chicago']
}

df = pd.DataFrame(data)
print("Original DataFrame with Null Values:")
print(df)

In [None]:
# Check for null values
null_check = df.isnull()
print("\nCheck for Null Values:")
print(null_check)



### Step 2: Check for Null Values with `.isnull()`

The `.isnull()` function returns a DataFrame of the same shape as the original DataFrame, but with `True` for missing values and `False` for non-missing values.


### Step 3: Drop Rows with Null Values using `.dropna()`

We can remove any rows that contain **NaN** values using `.dropna()`. This can be done either row-wise (by default) or column-wise.

In [None]:
# Drop rows with any NaN values
df_dropped = df.dropna()
print("\nDataFrame After Dropping Rows with Null Values:")
print(df_dropped)

# Drop rows where all values are NaN
df_dropped_all = df.dropna(how='all')
print("\nDataFrame After Dropping Rows with All Null Values:")
print(df_dropped_all)

### Step 4: Fill Null Values with `.fillna()`

You can fill **NaN** values with a specific value using `.fillna()`. For example, let's fill all the **NaN** values with `0` and with column-specific values.


In [None]:
# Fill NaN values with a constant value (e.g., 0)
df_filled_zero = df.fillna(0)
print("\nDataFrame After Filling Null Values with 0:")
print(df_filled_zero)

# Fill NaN values with different values per column
df_filled_custom = df.fillna({'Age': 30, 'Height': 5.5, 'City': 'Unknown'})
print("\nDataFrame After Filling Null Values with Custom Values:")
print(df_filled_custom)


### Step 5: Forward Fill and Backward Fill using `.ffill()` and `.bfill()`

You can also fill **NaN** values by propagating the previous or next valid value in the column using **forward fill** (`.ffill()`) and **backward fill** (`.bfill()`).


In [None]:
# Forward fill (propagate previous value)
df_ffilled = df.ffill()
print("\nDataFrame After Forward Fill:")
print(df_ffilled)

# Backward fill (propagate next value)
df_bfilled = df.bfill()
print("\nDataFrame After Backward Fill:")
print(df_bfilled)

---

### Full Example:

In [None]:
import pandas as pd
import numpy as np

# Create DataFrame with NaN values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', None, 'Eva'],
    'Age': [25, None, 35, 40, None],
    'Height': [5.5, 6.2, None, 5.9, 5.6],
    'City': ['New York', None, 'San Francisco', 'Boston', 'Chicago']
}

df = pd.DataFrame(data)
print("Original DataFrame with Null Values:")
print(df)

# Check for null values
null_check = df.isnull()
print("\nCheck for Null Values:")
print(null_check)

# Drop rows with any NaN values
df_dropped = df.dropna()
print("\nDataFrame After Dropping Rows with Null Values:")
print(df_dropped)


In [None]:
# Drop rows where all values are NaN
df_dropped_all = df.dropna(how='all')
print("\nDataFrame After Dropping Rows with All Null Values:")
print(df_dropped_all)

# Fill NaN values with a constant value (e.g., 0)
df_filled_zero = df.fillna(0)
print("\nDataFrame After Filling Null Values with 0:")
print(df_filled_zero)


In [None]:
# Fill NaN values with different values per column
df_filled_custom = df.fillna({'Age': 30, 'Height': 5.5, 'City': 'Unknown'})
print("\nDataFrame After Filling Null Values with Custom Values:")
print(df_filled_custom)


In [None]:
print("Original DataFrame with Null Values:")
print(df)

# Forward fill (propagate previous value)
df_ffilled = df.ffill()
print("\nDataFrame After Forward Fill:")
print(df_ffilled)

# Backward fill (propagate next value)
df_bfilled = df.bfill()
print("\nDataFrame After Backward Fill:")
print(df_bfilled)

### Pandas Slicing: Accessing Subsets of Data

Pandas slicing allows you to efficiently access specific subsets of data from a **Series** or **DataFrame** using various methods. Here’s a detailed guide on **slicing** in Pandas.

### 1. **Slicing in Pandas Series**

A **Series** is a one-dimensional array, and you can slice it using the same syntax as Python lists (i.e., `start:stop:step`).

#### 1.1. **Basic Indexing and Slicing**

- **Accessing a single element**: Use integer indexing to get a specific element.
- **Slicing a portion of a Series**: Use the same slice notation `start:stop:step`.


In [None]:
import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30, 40, 50, 60])

# Accessing a single element (index 2)
print(s[2])  # Output: 30

# Slicing a portion of the Series (from index 1 to 4, excluding index 4)
print(s[1:4])  # Output: 1    20
               #         2    30
               #         3    40

# Slicing with a step (every 2nd element)
print(s[::2])  # Output: 0    10
               #         2    30
               #         4    50

#### 1.2. **Using Negative Indices**

You can also use negative indices to access elements from the end of the Series.

```python
# Accessing the last element
print(s[-1])  # Output: 60

# Slicing from the second-to-last element to the beginning
print(s[-2:])  # Output: 4    50
               #         5    60
```

#### 1.3. **Slicing with Conditions**

You can apply conditions while slicing Series.

```python
# Select elements greater than 30
print(s[s > 30])  # Output: 3    40
                  #         4    50
                  #         5    60
```

---

### 2. **Slicing in Pandas DataFrame**

A **DataFrame** is a two-dimensional table (rows and columns), and you can slice data by accessing specific rows, columns, or both.

#### 2.1. **Selecting Columns**

You can select columns by their name (as a string) or by passing a list of column names.

```python
# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})

# Selecting a single column
print(df['A'])  # Output: 0    1
                #         1    2
                #         2    3
                #         3    4

# Selecting multiple columns
print(df[['A', 'C']])  # Output:   A   C
                       #         0  1   9
                       #         1  2  10
                       #         2  3  11
                       #         3  4  12
```

#### 2.2. **Selecting Rows by Index Position**

You can slice rows using the `.iloc[]` method, which uses **integer-based indexing** (similar to Python’s list slicing).

```python
# Select rows from index 1 to 3 (exclusive)
print(df.iloc[1:4])  # Output:    A  B   C
                    #         1  2  6  10
                    #         2  3  7  11
                    #         3  4  8  12
```

#### 2.3. **Selecting Rows by Label**

You can also slice rows by their label index using the `.loc[]` method, which uses **label-based indexing** (inclusive of the end index).

```python
# Create DataFrame with custom indices
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}, index=['row1', 'row2', 'row3', 'row4'])

# Select rows by label
print(df.loc['row1':'row3'])  # Output:       A  B   C
                              #         row1  1  5   9
                              #         row2  2  6  10
                              #         row3  3  7  11
```

#### 2.4. **Selecting Rows and Columns Together**

You can combine row and column slicing by specifying both row and column indexes.

```python
# Select rows from 1 to 3 and columns 'A' and 'C'
print(df.loc[1:3, ['A', 'C']])  # Output:    A   C
                                #         row2  2  10
                                #         row3  3  11
                                #         row4  4  12
```

#### 2.5. **Slicing DataFrame Using Conditions**

Just like Series, you can slice DataFrame rows based on conditions.

```python
# Select rows where column 'A' is greater than 2
print(df[df['A'] > 2])  # Output:       A  B   C
                       #         row3  3  7  11
                       #         row4  4  8  12
```

#### 2.6. **Using `.iloc[]` for Both Rows and Columns by Position**

The `.iloc[]` method allows you to slice rows and columns based on **integer position** rather than labels.

```python
# Select rows 1 to 2 and columns 0 to 1 (indexing is zero-based)
print(df.iloc[1:3, 0:2])  # Output:    A  B
                          #         row2  2  6
                          #         row3  3  7
```

#### 2.7. **Using `.loc[]` for Both Rows and Columns by Label**

The `.loc[]` method is label-based, and it allows you to slice rows and columns by their label names.

```python
# Select rows 'row2' to 'row3' and columns 'A' and 'C'
print(df.loc['row2':'row3', ['A', 'C']])  # Output:     A   C
                                         #         row2  2  10
                                         #         row3  3  11
```

---

### Summary of Pandas Slicing

| Operation                         | Series Example                     | DataFrame Example                       |
|------------------------------------|-------------------------------------|-----------------------------------------|
| **Single Element Access**          | `s[2]`                              | `df['A'][2]`                            |
| **Slicing a Range**                | `s[1:4]`                            | `df.iloc[1:4]`                          |
| **Negative Indexing**              | `s[-1]`                             | `df.iloc[-1]`                           |
| **Conditional Slicing**            | `s[s > 30]`                         | `df[df['A'] > 2]`                       |
| **Selecting Columns**              | `df['A']`                           | `df[['A', 'B']]`                        |
| **Row Selection by Label**         | `df.loc[1:3]`                       | `df.loc['row1':'row3']`                 |
| **Row and Column Selection**       | `df.iloc[1:3, 0:2]`                 | `df.loc['row1':'row3', ['A', 'C']]`     |
| **Column Selection by Condition**  | N/A                                 | `df[df['A'] > 2]`                       |

---

### Practice Exercise

1. Create a DataFrame with some missing data (e.g., `NaN`).
2. Slice rows based on a condition (e.g., select rows where `Age` > 30).
3. Try different slicing techniques like `.iloc[]` and `.loc[]` to select specific rows and columns.


## Adding and Removing records in Pandas

### 1. **Adding Data to a DataFrame**

#### 1.1 **Adding a Row (Append)**

You can add a row to a DataFrame using the `.loc[]` or `.append()` method. However, note that `.append()` is being deprecated in future versions, and it's recommended to use `pd.concat()` instead.

##### Method 1: Using `.loc[]` to Add a Row


In [None]:
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Adding a new row using loc (by index label)
df.loc[3] = ['David', 40, 'San Francisco']
print("DataFrame after adding a new row:")
print(df)

##### Method 2: Using `pd.concat()` (Recommended)


In [None]:
# Adding a new row using pd.concat
new_row = pd.DataFrame({'Name': ['Eva'], 'Age': [28], 'City': ['Miami']})
df = pd.concat([df, new_row], ignore_index=True)
print("\nDataFrame after using concat:")
print(df)



#### 1.2 **Adding a Column**

You can add a new column by directly assigning values to it.


In [None]:
# Adding a new column
df['Occupation'] = ['Engineer', 'Doctor', 'Artist', 'Nurse', 'Scientist']
print("\nDataFrame after adding a new column:")
print(df)



#### 1.3 **Adding Multiple Columns**

You can also add multiple columns at once using `pd.DataFrame` or `assign()`.



In [None]:
# Adding multiple columns
df[['Salary', 'Country']] = pd.DataFrame([[50000, 'USA'], [60000, 'USA'], [70000, 'USA'], [45000, 'USA'], [80000, 'USA']])
print("\nDataFrame after adding multiple columns:")
print(df)



---

### 2. **Removing Data from a DataFrame**

#### 2.1 **Removing a Row**

To remove rows, you can use the `.drop()` method. You can drop rows by **index** or using a **condition**.

##### Method 1: Drop by Index

In [None]:
# Dropping a row by index
df = df.drop(1)  # Drop row with index 1 (Bob)
print("\nDataFrame after dropping a row by index:")
print(df)

#### Method 2: Drop Multiple Rows by Index

In [None]:
# Dropping multiple rows by index
df = df.drop([0, 2])  # Drop rows with index 0 (Alice) and 2 (Charlie)
print("\nDataFrame after dropping multiple rows by index:")
print(df)



##### Method 3: Drop Rows by Condition


In [None]:
# Dropping rows based on a condition (e.g., Age > 30)
df = df[df['Age'] <= 30]
print("\nDataFrame after dropping rows where Age > 30:")
print(df)



#### 2.2 **Removing a Column**

You can remove columns using the `.drop()` method by specifying the `axis=1` argument.


In [None]:
# Dropping a column by name
df = df.drop('Occupation', axis=1)
print("\nDataFrame after dropping a column:")
print(df)



#### 2.3 **Removing Columns by Condition**

You can drop columns based on certain conditions or values, but this requires checking the condition first and then using `.drop()`.


In [None]:
# Drop a column if its values are all NaN
df = df.dropna(axis=1, how='all')
print("\nDataFrame after dropping columns where all values are NaN:")
print(df)



#### 2.4 **Removing Rows with Missing Data (NaN)**

You can remove rows containing **NaN** values using `.dropna()`.


In [None]:
# Dropping rows with any NaN values
df = df.dropna()
print("\nDataFrame after dropping rows with NaN values:")
print(df)

#### 2.5 **Removing Duplicates**

You can remove duplicate rows using `.drop_duplicates()`.

In [None]:
# Adding some duplicate rows
df = df.append(df.iloc[0])  # Adding a duplicate row
print("\nDataFrame with duplicate rows:")
print(df)

# Dropping duplicate rows
df = df.drop_duplicates()
print("\nDataFrame after dropping duplicates:")
print(df)

## Grouping

Grouping is a powerful feature in **Pandas** that allows you to **split** the data into groups, **apply** a function (like aggregation or transformation), and **combine** the results back into a DataFrame. This is commonly used for **summarizing** data, such as computing averages, sums, counts, etc., for different categories or groups.

The basic steps for grouping are:

1. **Split**: Split the data based on some criteria (e.g., a column).
2. **Apply**: Perform an operation on each group.
3. **Combine**: Combine the results back into a DataFrame.

We use the `.groupby()` method for grouping in Pandas.

---

### 1. **Basic Grouping**

Let's create a simple dataset and perform basic grouping.

```python
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ian', 'Jack'],
    'Department': ['HR', 'IT', 'IT', 'HR', 'Finance', 'Finance', 'IT', 'HR', 'Finance', 'IT'],
    'Salary': [50000, 70000, 80000, 60000, 90000, 95000, 85000, 55000, 75000, 73000],
    'Experience': [5, 7, 8, 6, 10, 12, 9, 4, 15, 11]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
```

#### Output:
```
      Name Department  Salary  Experience
0    Alice         HR   50000           5
1      Bob         IT   70000           7
2  Charlie         IT   80000           8
3    David         HR   60000           6
4      Eva    Finance   90000          10
5    Frank    Finance   95000          12
6    Grace         IT   85000           9
7   Hannah         HR   55000           4
8      Ian    Finance   75000          15
9     Jack         IT   73000          11
```

### 2. **Grouping by a Single Column**

To group the data by **Department** and compute the **average salary** for each department, we can use the `.groupby()` method followed by `.agg()` or an aggregation function like `.mean()`.

```python
# Group by 'Department' and calculate the average 'Salary'
grouped = df.groupby('Department')['Salary'].mean()

print("\nAverage Salary per Department:")
print(grouped)
```

#### Output:
```
Department
Finance    82500.0
HR         57500.0
IT         75400.0
Name: Salary, dtype: float64
```

**Explanation**:
- `df.groupby('Department')`: This splits the DataFrame into groups based on the values in the `Department` column.
- `['Salary']`: This selects the `Salary` column for aggregation.
- `.mean()`: This calculates the mean salary for each group.

---

### 3. **Grouping and Aggregating Multiple Columns**

You can also apply different aggregation functions to multiple columns at once. For example, to calculate both the **average salary** and the **sum of experience** for each department:

```python
# Group by 'Department' and calculate the mean salary and sum of experience
grouped_multi = df.groupby('Department').agg({
    'Salary': 'mean',       # Calculate mean salary
    'Experience': 'sum'     # Calculate sum of experience
})

print("\nMean Salary and Sum of Experience per Department:")
print(grouped_multi)
```

#### Output:
```
            Salary  Experience
Department                     
Finance    82500.0         37
HR         57500.0         15
IT         75400.0         35
```

**Explanation**:
- `.agg()` allows you to specify different aggregation functions for each column. In this case:
  - `'Salary': 'mean'`: Calculates the mean salary.
  - `'Experience': 'sum'`: Calculates the sum of experience.

---

### 4. **Grouping by Multiple Columns**

You can also group by more than one column. For example, to group by both **Department** and **Experience**:

```python
# Group by 'Department' and 'Experience' and calculate the average salary
grouped_multi_columns = df.groupby(['Department', 'Experience'])['Salary'].mean()

print("\nAverage Salary by Department and Experience:")
print(grouped_multi_columns)
```

#### Output:
```
Department  Experience
Finance     10            90000.0
            12            95000.0
            15            75000.0
HR           4            55000.0
            5            50000.0
            6            60000.0
IT           7            70000.0
            8            80000.0
            9            85000.0
            11           73000.0
Name: Salary, dtype: float64
```

**Explanation**:
- `groupby(['Department', 'Experience'])`: Groups by both `Department` and `Experience` columns.
- The result is a **MultiIndex**, with the first level being the department and the second level being the experience.

---

### 5. **Using `size()` to Count the Number of Records in Each Group**

To get the count of records in each group, you can use `.size()` instead of an aggregation function like `mean()`.

```python
# Group by 'Department' and count the number of records in each department
count_groups = df.groupby('Department').size()

print("\nNumber of records per Department:")
print(count_groups)
```

#### Output:
```
Department
Finance    3
HR         3
IT         4
dtype: int64
```

**Explanation**:
- `.size()` counts the number of records in each group.

---

### `apply()` and `applymap()` in Pandas
Pandas provides `apply()` and `applymap()` to apply functions to DataFrames and Series. These are useful for performing row-wise, column-wise, or element-wise operations on your data.

---

### **1. `apply()`**
The `apply()` function is used to apply a function along an **axis** (rows or columns) of a DataFrame or on elements of a Series.

#### Key Points:
- Works on both **Series** and **DataFrames**.
- Can apply the function row-wise (`axis=1`) or column-wise (`axis=0`) for DataFrames.
- Accepts a `lambda` or a custom function.

---

#### Example 1: Applying on a Series
```python
import pandas as pd

# Sample Series
s = pd.Series([1, 2, 3, 4, 5])

# Apply a lambda function to square each element
result = s.apply(lambda x: x**2)
print(result)
```

**Output:**
```
0     1
1     4
2     9
3    16
4    25
dtype: int64
```

---

#### Example 2: Applying on a DataFrame
```python
# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Apply a lambda function column-wise (default is axis=0)
column_sum = df.apply(lambda col: col.sum(), axis=0)
print("Column-wise sum:")
print(column_sum)
```

**Output:**
```
Column-wise sum:
A     6
B    15
C    24
dtype: int64
```

---

#### Example 3: Row-wise Operations
```python
# Apply a lambda function row-wise
row_sum = df.apply(lambda row: row.sum(), axis=1)
print("\nRow-wise sum:")
print(row_sum)
```

**Output:**
```
Row-wise sum:
0    12
1    15
2    18
dtype: int64
```

---

### **2. `applymap()`**
The `applymap()` function is used to apply a function **element-wise** on a DataFrame. Unlike `apply()`, it works on individual elements of a DataFrame, not rows or columns.

#### Key Points:
- Works **only on DataFrames** (not Series).
- Applies the function to each individual element of the DataFrame.

---

#### Example 1: Element-wise Transformation
```python
# Apply a lambda function to square each element in the DataFrame
squared_df = df.applymap(lambda x: x**2)
print("\nSquared DataFrame:")
print(squared_df)
```

**Output:**
```
Squared DataFrame:
    A   B   C
0   1  16  49
1   4  25  64
2   9  36  81
```

---

#### Example 2: Formatting Data
You can use `applymap()` for tasks like formatting strings or modifying individual elements.

```python
# Example: Add a string prefix to all elements
formatted_df = df.applymap(lambda x: f"Value-{x}")
print("\nFormatted DataFrame:")
print(formatted_df)
```

**Output:**
```
Formatted DataFrame:
           A         B         C
0  Value-1  Value-4  Value-7
1  Value-2  Value-5  Value-8
2  Value-3  Value-6  Value-9
```

---

### **When to Use `apply()` vs `applymap()`**
| Feature                | `apply()`                               | `applymap()`                           |
|------------------------|------------------------------------------|----------------------------------------|
| **Input**              | Series or DataFrame                     | Only DataFrame                        |
| **Output**             | Series or DataFrame                     | DataFrame                              |
| **Function Application** | Applies to rows/columns or individual elements | Applies only to individual elements    |
| **Example Use**        | Aggregating rows/columns, row-wise operations | Element-wise transformations          |

---

### Combined Example: `apply()` and `applymap()` with `lambda`

In [None]:
# Create a DataFrame
data = {
    "Product": ["Apple", "Banana", "Carrot"],
    "Price": [100, 30, 50],
    "Quantity": [3, 10, 5]
}

df = pd.DataFrame(data)

# 1. Apply column-wise: Calculate total cost
df["Total_Cost"] = df.apply(lambda row: row["Price"] * row["Quantity"], axis=1)
print("DataFrame with Total Cost:\n", df)

# 2. Applymap element-wise: Format prices with a currency symbol
formatted_df = df.applymap(lambda x: f"₹{x}" if isinstance(x, int) else x)
print("\nFormatted DataFrame:\n", formatted_df)