###  **Essential Functionality**

In [1]:
import numpy as np
import pandas as pd

### Reindexing :
- which means to create a new object with the values rearranged to align with the new index.


In [8]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=["d", "b", "a", "c"])
obj2 = obj.reindex(["d", "b", "a", "c","d","e"])
obj2

d    4.5
b    7.2
a   -5.3
c    3.6
d    4.5
e    NaN
dtype: float64

In [14]:
obj3 = pd.Series(["blue", "purple", "yellow"], index=[0, 2, 4])
obj3.reindex(np.arange(6),method='ffill')

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

The `ffill` *(forward fill)* method fills in missing values (i.e., for indices 1, 3, and 5) by propagating the last valid value forward.

In [5]:
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
                     index=["a", "c", "d"],
                     columns=["Ohio", "Texas", "California"])
frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [6]:
# loc !!
frame.loc[['a','d'],['California','Texas']]

Unnamed: 0,California,Texas
a,2,1
d,8,7


In [3]:
df = pd.DataFrame(np.arange(36).reshape((6,6)),
                  index=np.arange(6),
                  columns=np.arange(6))
df

Unnamed: 0,0,1,2,3,4,5
0,0,1,2,3,4,5
1,6,7,8,9,10,11
2,12,13,14,15,16,17
3,18,19,20,21,22,23
4,24,25,26,27,28,29
5,30,31,32,33,34,35


### **reindexing functions :**

| Argument     | Description                                                                                                                                          |
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| `labels`     | New sequence to use as an index. Can be an `Index` instance or any other sequence-like Python data structure. An `Index` will be used exactly as is without any copying. |
| `index`      | Use the passed sequence as the new index labels.                                                                                                     |
| `columns`    | Use the passed sequence as the new column labels.                                                                                                    |
| `axis`       | The axis to reindex, whether "index" (rows) or "columns". The default is "index". You can alternately do `reindex(index=new_labels)` or `reindex(columns=new_labels)`. |
| `method`     | Interpolation (fill) method; `"ffill"` fills forward, while `"bfill"` fills backward.                                                                |
| `fill_value` | Substitute value to use when introducing missing data by reindexing. Use `fill_value="missing"` (the default behavior) when you want absent labels to have null values. |
| `limit`      | When forward filling or backfilling, the maximum size gap (in number of elements) to fill.                                                           |
| `tolerance`  | When forward filling or backfilling, the maximum size gap (in absolute numeric distance) to fill for inexact matches.                                |
| `level`      | Match simple `Index` on the level of `MultiIndex`; otherwise select subset of.                                                                       |
| `copy`       | If `True`, always copy underlying data even if the new index is equivalent to the old index; if `False`, do not copy the data when the indexes are equivalent. |


### **Drop Function**

In [18]:
obj = pd.DataFrame(np.arange(6),
                   index=['a','b','c','d','e','f'],
                   columns=['c1'])
obj2 = obj.drop(index=['b','c'])
obj2

Unnamed: 0,c1
a,0
d,3
e,4
f,5


In [26]:
obj3 = pd.Series(np.arange(6),index=['a','b','c','d','e','f'])
obj3.loc['a':'d'] = 0
obj3

a    0
b    0
c    0
d    0
e    4
f    5
dtype: int32

In [35]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
    index=["Ohio", "Colorado", "Utah", "New York"],
    columns=["one", "two", "three", "four"])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [32]:
data[data > 0] = 5
data

Unnamed: 0,one,two,three,four
Ohio,0,5,5,5
Colorado,5,5,5,5
Utah,5,5,5,5
New York,5,5,5,5


### **Selection on DataFrame with loc and iloc**
- The result of selecting a single row is a Series with an index that contains the DataFrame’s column labels :

In [36]:
data.loc['Colorado']

one      4
two      5
three    6
four     7
Name: Colorado, dtype: int32

In [38]:
data.loc[['Colorado','Utah'],['two','three']]

Unnamed: 0,two,three
Colorado,5,6
Utah,9,10


In [39]:
data.loc[:'Utah',['two']]

Unnamed: 0,two
Ohio,1
Colorado,5
Utah,9


In [41]:
data.iloc[:, :3][data.three > 5]
# pandas could “fall back” on integer indexing
# On the other hand, with a noninteger index, there is no such ambiguity:

Unnamed: 0,one,two,three
Colorado,4,5,6
Utah,8,9,10
New York,12,13,14


- as a result of these pitfalls, it is best to always prefer indexing with loc and iloc to avoid ambiguity.

### **Arithmetic and Data Alignment**

In [2]:
df = pd.DataFrame(np.arange(12.).reshape((4, 3)),
    columns=list("abc"),
    index=list("Utah Ohio Texas Oregon".split(' ')))
df

Unnamed: 0,a,b,c
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


By default, arithmetic between DataFrame and Series matches the index of the Series on the columns of the DataFrame, broadcasting down the rows:

In [49]:
series = df.iloc[0]
series

a    0.0
b    1.0
c    2.0
Name: Utah, dtype: float64

In [50]:
df - series

Unnamed: 0,a,b,c
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


In [4]:
series = pd.Series(np.arange(3),index=['a','c','d'])
df + series

Unnamed: 0,a,b,c,d
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


If you want to instead broadcast over the columns, matching on the rows, you have to use one of the arithmetic methods and specify to match over the index.

### **Function Application and Mapping**

In [5]:
def f1(x):
    return x.max() - x.min()

In [10]:
frame = pd.DataFrame(np.random.standard_normal((4, 3)),
    columns=list("bde"),
    index=["Utah", "Ohio", "Texas", "Oregon"])
frame

Unnamed: 0,b,d,e
Utah,2.452993,0.499371,0.468286
Ohio,-0.886893,0.782552,-1.557506
Texas,1.036456,0.220436,1.264421
Oregon,-0.883887,-0.366489,0.686345


In [11]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,2.452993,0.499371,0.468286
Ohio,0.886893,0.782552,1.557506
Texas,1.036456,0.220436,1.264421
Oregon,0.883887,0.366489,0.686345


In [12]:
frame.apply(f1)

b    3.339886
d    1.149041
e    2.821927
dtype: float64