### 5. Data Transformation
- `df.apply()`: Apply a function along an axis of the DataFrame.
- `df.map()`: Apply a function to each element in a Series.
- `df.rename()`: Rename columns or indices.
- `df.sort_values()`: Sort data by values.
- `df.sort_index()`: Sort data by index.

In [None]:
import pandas as pd 
import numpy as np

# pandas.DataFrame.apply

## DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, \*\*kwargs)

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

### Parameters:

- **func**: function  
  Function to apply to each column or row.

- **axis**: {0 or ‘index’, 1 or ‘columns’}, default 0  
  Axis along which the function is applied:
  - 0 or ‘index’: apply function to each column.
  - 1 or ‘columns’: apply function to each row.

- **raw**: bool, default False  
  Determines if row or column is passed as a Series or ndarray object:
  - False : passes each row or column as a Series to the function.
  - True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

- **result_type**: {‘expand’, ‘reduce’, ‘broadcast’, None}, default None  
  These only act when axis=1 (columns):
  - ‘expand’ : list-like results will be turned into columns.
  - ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
  - ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
  
  The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However, if the apply function returns a Series these are expanded to columns.

- **args**: tuple  
  Positional arguments to pass to func in addition to the array/series.

- **by_row**: False or “compat”, default “compat”  
  Only has an effect when func is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g. Series().apply(np.sum) will be translated to Series().sum()). If that doesn’t work, will try call to apply again with by_row=True and if that fails, will call apply again with by_row=False (backward compatible). If False, the funcs will be passed the whole Series at once.  
Added in version 2.1.0.

- **engine**: {‘python’, ‘numba’}, default ‘python’  
  Choose between the python (default) engine or the numba engine in apply.
  
  The numba engine will attempt to JIT compile the passed function, which may result in speedups for large DataFrames. It also supports the following engine_kwargs :
  - nopython (compile the function in nopython mode)
  - nogil (release the GIL inside the JIT compiled function)
  - parallel (try to apply the function in parallel over the DataFrame)
  
  Note: Due to limitations within numba/how pandas interfaces with numba, you should only use this if raw=True  
  Note: The numba compiler only supports a subset of valid Python/numpy operations.  
  Please read more about the supported python features and supported numpy features in numba to learn what you can or cannot use in the passed function.  
  Added in version 2.2.0.

- **engine_kwargs**: dict  
  Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.

- **\*\*kwargs**:  
  Additional keyword arguments to pass as keywords arguments to func.

### Returns:
- **Series** or **DataFrame**  
  Result of applying func along the given axis of the DataFrame.

In [7]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.apply(lambda x: x.sum(), axis=0)  # yoki axis='index'


A     6
B    15
dtype: int64

In [8]:
df.apply(lambda x: x**3) 

Unnamed: 0,A,B
0,1,64
1,8,125
2,27,216


In [9]:
df.apply(np.max,axis=0)

A    3
B    6
dtype: int64

# pandas.DataFrame.map

## DataFrame.map(func, na_action=None, \*\*kwargs)

Apply a function to a DataFrame elementwise.

**Added in version 2.1.0:** DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

### Parameters:

- **func**: callable  
  Python function that returns a single value from a single value.

- **na_action**: {None, ‘ignore’}, default None  
  If ‘ignore’, propagate NaN values, without passing them to func.

- **\*\*kwargs**:  
  Additional keyword arguments to pass as keywords arguments to func.



In [13]:
df.map(lambda x: len(str(x)))


Unnamed: 0,A,B
0,1,1
1,1,1
2,1,1


In [15]:
df.iat[0,1]=np.nan

In [23]:
df.map(lambda x:x**2, na_action='ignore')

Unnamed: 0,A,B
0,1,
1,2,5.0
2,3,6.0


### pandas.DataFrame.rename

```python
DataFrame.rename(mapper=None, *, index=None, columns=None, axis=None, copy=None, inplace=False, level=None, errors='ignore')
```

Rename columns or index labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

#### Parameters:

- **mapper**: dict-like or function
  - Dict-like or function transformations to apply to that axis’ values. Use either `mapper` and `axis` to specify the axis to target with `mapper`, or `index` and `columns`.
  
- **index**: dict-like or function
  - Alternative to specifying `axis` (`mapper`, `axis=0` is equivalent to `index=mapper`).

- **columns**: dict-like or function
  - Alternative to specifying `axis` (`mapper`, `axis=1` is equivalent to `columns=mapper`).

- **axis**: {0 or ‘index’, 1 or ‘columns’}, default 0
  - Axis to target with `mapper`. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

- **copy**: bool, default True
  - Also copy underlying data.
  
  Note: 
  - The `copy` keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a `copy` keyword will use a lazy copy mechanism to defer the copy and ignore the `copy` keyword. The `copy` keyword will be removed in a future version of pandas.
  - You can already get the future behavior and improvements through enabling copy on write `pd.options.mode.copy_on_write = True`.

- **inplace**: bool, default False
  - Whether to modify the DataFrame rather than creating a new one. If `True` then value of `copy` is ignored.

- **level**: int or level name, default None
  - In case of a MultiIndex, only rename labels in the specified level.

- **errors**: {‘ignore’, ‘raise’}, default ‘ignore’
  - If ‘raise’, raise a `KeyError` when a dict-like `mapper`, `index`, or `columns` contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

#### Returns:
- **DataFrame or None**: DataFrame with the renamed axis labels or None if `inplace=True`.

#### Raises:
- **KeyError**: If any of the labels is not found in the selected axis and “errors=’raise’”.

In [24]:

# Data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ivan', 'Julia'],
    'Role': ['Developer', 'Designer', 'Manager', 'Developer', 'Designer', 'Manager', 'Tester', 'Developer', 'Tester',
             'Designer'],
    'Experience (Years)': [5, 3, 10, 4, 2, 8, 6, 3, 7, 1],
    'Salary ($)': [80000, 65000, 120000, 75000, 60000, 110000, 70000, 72000, 68000, 62000],
    'Location': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York',
                 'Chicago', 'Chicago', 'Los Angeles']
}

# Creating DataFrame
df = pd.DataFrame(data)

print(df)


      Name       Role  Experience (Years)  Salary ($)       Location
0    Alice  Developer                   5       80000       New York
1      Bob   Designer                   3       65000  San Francisco
2  Charlie    Manager                  10      120000    Los Angeles
3    David  Developer                   4       75000       New York
4      Eva   Designer                   2       60000  San Francisco
5    Frank    Manager                   8      110000    Los Angeles
6    Grace     Tester                   6       70000       New York
7   Hannah  Developer                   3       72000        Chicago
8     Ivan     Tester                   7       68000        Chicago
9    Julia   Designer                   1       62000    Los Angeles


In [29]:
df.rename(index=str).index

Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype='object')

In [31]:
df.rename(index={1:'a'})

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
0,Alice,Developer,5,80000,New York
a,Bob,Designer,3,65000,San Francisco
2,Charlie,Manager,10,120000,Los Angeles
3,David,Developer,4,75000,New York
4,Eva,Designer,2,60000,San Francisco
5,Frank,Manager,8,110000,Los Angeles
6,Grace,Tester,6,70000,New York
7,Hannah,Developer,3,72000,Chicago
8,Ivan,Tester,7,68000,Chicago
9,Julia,Designer,1,62000,Los Angeles


In [33]:
df.rename(str.upper,axis=1)

Unnamed: 0,NAME,ROLE,EXPERIENCE (YEARS),SALARY ($),LOCATION
0,Alice,Developer,5,80000,New York
1,Bob,Designer,3,65000,San Francisco
2,Charlie,Manager,10,120000,Los Angeles
3,David,Developer,4,75000,New York
4,Eva,Designer,2,60000,San Francisco
5,Frank,Manager,8,110000,Los Angeles
6,Grace,Tester,6,70000,New York
7,Hannah,Developer,3,72000,Chicago
8,Ivan,Tester,7,68000,Chicago
9,Julia,Designer,1,62000,Los Angeles


In [36]:
df = df.rename(index={0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j'})

In [37]:
df

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
b,Bob,Designer,3,65000,San Francisco
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles


In [38]:
df.rename(str.title,axis='index')

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
A,Alice,Developer,5,80000,New York
B,Bob,Designer,3,65000,San Francisco
C,Charlie,Manager,10,120000,Los Angeles
D,David,Developer,4,75000,New York
E,Eva,Designer,2,60000,San Francisco
F,Frank,Manager,8,110000,Los Angeles
G,Grace,Tester,6,70000,New York
H,Hannah,Developer,3,72000,Chicago
I,Ivan,Tester,7,68000,Chicago
J,Julia,Designer,1,62000,Los Angeles


### DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]

Sort by the values along either axis.

#### Parameters:

- **by**: `str or list of str`
  - Name or list of names to sort by.
  - If `axis` is 0 or 'index', then `by` may contain index levels and/or column labels.
  - If `axis` is 1 or ‘columns’, then `by` may contain column levels and/or index labels.

- **axis**: `{0 or ‘index’, 1 or ‘columns’}`, default 0
  - Axis to be sorted.

- **ascending**: `bool or list of bool`, default True
  - Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must match the length of `by`.

- **inplace**: `bool`, default False
  - If True, perform the operation in-place.

- **kind**: `{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}`, default ‘quicksort’
  - Choice of sorting algorithm. See also numpy.sort() for more information. `mergesort` and `stable` are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

- **na_position**: `{‘first’, ‘last’}`, default ‘last’
  - Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.

- **ignore_index**: `bool`, default False
  - If True, the resulting axis will be labeled 0, 1, …, n - 1.

- **key**: callable, optional
- Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

In [43]:
df.sort_values('Name',ascending=False)

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
j,Julia,Designer,1,62000,Los Angeles
i,Ivan,Tester,7,68000,Chicago
h,Hannah,Developer,3,72000,Chicago
g,Grace,Tester,6,70000,New York
f,Frank,Manager,8,110000,Los Angeles
e,Eva,Designer,2,60000,San Francisco
d,David,Developer,4,75000,New York
c,Charlie,Manager,10,120000,Los Angeles
b,Bob,Designer,3,65000,San Francisco
a,Alice,Developer,5,80000,New York


In [57]:
df.sort_values('Experience (Years)')

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
j,Julia,Designer,1,62000,Los Angeles
e,Eva,Designer,2,60000,San Francisco
b,Bob,Designer,3,65000,San Francisco
h,Hannah,Developer,3,72000,Chicago
d,David,Developer,4,75000,New York
a,Alice,Developer,5,80000,New York
g,Grace,Tester,6,70000,New York
i,Ivan,Tester,7,68000,Chicago
f,Frank,Manager,8,110000,Los Angeles
c,Charlie,Manager,10,120000,Los Angeles


In [61]:
df.sort_values(by='Name',key=lambda col: col.str.upper())

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
b,Bob,Designer,3,65000,San Francisco
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles


In [63]:
df.iat[1,0]='bob'

In [64]:
df.sort_values('Name')

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles
b,bob,Designer,3,65000,San Francisco


In [66]:
df.sort_values('Name', key=lambda x:x.str.upper())

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
b,bob,Designer,3,65000,San Francisco
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles


# pandas.DataFrame.sort_index

`DataFrame.sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)`

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if `inplace` argument is False, otherwise updates the original DataFrame and returns None.

#### Parameters:

- **axis**: `{0 or ‘index’, 1 or ‘columns’}`, default 0
  - The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.

- **level**: `int or level name or list of ints or list of level names`
  - If not None, sort on values in specified index level(s).

- **ascending**: `bool or list-like of bools`, default True
  - Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.

- **inplace**: `bool`, default False
  - Whether to modify the DataFrame rather than creating a new one.

- **kind**: `{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}`, default ‘quicksort’
  - Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

- **na_position**: `{‘first’, ‘last’}`, default ‘last’
  - Puts NaNs at the beginning if `first`; `last` puts NaNs at the end. Not implemented for MultiIndex.

- **sort_remaining**: `bool`, default True
  - If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.

- **ignore_index**: `bool`, default False
  - If True, the resulting axis will be labeled 0, 1, …, n - 1.

- **key**: `callable, optional`
  - If not None, apply the key function to the index values before sorting. This is similar to the `key` argument in the builtin `sorted()` function, with the notable difference that this `key` function should be vectorized. It should expect an `Index` and return an `Index` of the same shape. For MultiIndex inputs, the key is applied per level.

#### Returns:

- **DataFrame or None**
  - The original DataFrame sorted by the labels or None if `inplace=True`.

In [67]:
df.sort_index()

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
b,bob,Designer,3,65000,San Francisco
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles


In [68]:
df.sort_index(ascending=False)

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
j,Julia,Designer,1,62000,Los Angeles
i,Ivan,Tester,7,68000,Chicago
h,Hannah,Developer,3,72000,Chicago
g,Grace,Tester,6,70000,New York
f,Frank,Manager,8,110000,Los Angeles
e,Eva,Designer,2,60000,San Francisco
d,David,Developer,4,75000,New York
c,Charlie,Manager,10,120000,Los Angeles
b,bob,Designer,3,65000,San Francisco
a,Alice,Developer,5,80000,New York


In [69]:
df.sort_index(level=0,ascending=True)

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
a,Alice,Developer,5,80000,New York
b,bob,Designer,3,65000,San Francisco
c,Charlie,Manager,10,120000,Los Angeles
d,David,Developer,4,75000,New York
e,Eva,Designer,2,60000,San Francisco
f,Frank,Manager,8,110000,Los Angeles
g,Grace,Tester,6,70000,New York
h,Hannah,Developer,3,72000,Chicago
i,Ivan,Tester,7,68000,Chicago
j,Julia,Designer,1,62000,Los Angeles
