## <mark> Sorting
    
    df.sort_values(
        by,
        axis: 'Axis' = 0,
        ascending=True,
        inplace: 'bool' = False,
        kind: 'str' = 'quicksort',
        na_position: 'str' = 'last',
        ignore_index: 'bool' = False,
        key: 'ValueKeyFunc' = None,
    )

In [1]:
import pandas as pd

data = pd.read_csv('diabetes.csv')
df = pd.DataFrame(data)

In [2]:
df.sort_values(by=['BMI', 'Age'], ascending=[0, 1], inplace=True)

In [25]:
df["Glucose"].sort_values()

342      0
182      0
349      0
75       0
502      0
      ... 
228    197
408    197
579    197
561    198
661    199
Name: Glucose, Length: 768, dtype: int64

### <mark> Filtering Dataframe

#### LOOKUP LOGIC - loc, iloc, at

These **are used in slicing** of data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame. They **are used in filtering the data according to some conditions.** 

loc() : Access a group of rows and columns by label(s).

    loc() is label based data selecting method which means that we have to pass the name of the row or column which we want to select. 
    This method includes the last element of the range passed in it, unlike iloc().
    loc() can accept the boolean data unlike iloc()

iloc() : Access a group of rows and columns by integer position(s)

    iloc() is a indexed based selecting method which means that we have to pass integer index in the method to select specific row/column. 
    This method does not include the last element of the range passed in it unlike loc().
    iloc() does not accept the boolean data unlike loc().
    
df.at() :  Access a single value for a row/column pair by label.
    
    Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.
    
df.iat() : Access a single value for a row/column pair by integer position

### <mark> Hnadling Missing data
    
    - df.dropna(how='any', subset=['col1', 'col2'], inplace=True)
    - df.fillna(df[col1].mode(), inplcae=True)
    - df.drop_duplicates(inplace=True)
    - df.duplicated()

In [None]:
df.dropna(
    axis: 'Axis'|0, 'Columns'=1
    how: 'any'|'all',
    thresh=None, # Require that many non-NA values.
    subset: ['Age', 'BMI'],
    inplace: True,
)

In [None]:
df["Age"].dropna(inplace=True)

In [None]:
df.fillna(
    value: df["Age"].mean(),
    method: {'backfill', 'bfill', 'pad', 'ffill', None},
    axis: 'Axis | None',
    inplace: True,
    limit=None,
    downcast=None,
)

In [21]:
df["Age"].fillna(df["Age"].mean(), inplace=True)

In [19]:
df.duplicated()

177    False
445    False
673    False
125    False
120    False
       ...  
426    False
522    False
706    False
9      False
684    False
Length: 768, dtype: bool

In [20]:
df.drop_duplicates(inplace=True)

In [38]:
df['Age'].isna()

177    False
445    False
673    False
125    False
120    False
       ...  
426    False
522    False
706    False
9      False
684    False
Name: Age, Length: 768, dtype: bool

### <Mark> Indexing and Multi-indexing

In [28]:
df.index

Int64Index([177, 445, 673, 125, 120, 303, 247, 193, 155,  99,
            ...
            145, 371,  81, 494,  49, 426, 522, 706,   9, 684],
           dtype='int64', length=768)

In [27]:
new_df = df.sort_index(axis=0, ascending=False)

In [32]:
new_df.set_index('BMI', inplace=True)
new_df.index

Float64Index([30.4, 30.1, 26.2, 36.8, 32.9, 22.5, 44.0, 28.4, 35.5, 37.5,
              ...
               0.0, 30.5, 35.3, 31.0, 25.6, 43.1, 28.1, 23.3, 26.6, 33.6],
             dtype='float64', name='BMI', length=768)

In [35]:
new_df.reset_index(inplace=True)

### <mark> FUNCTIONS: apply, map, applymap, replace

**apply - Apply a function along an axis of the DataFrame.**

    df['Location'].apply(len)
    df['Location'].apply(lower_location_function)
    df['Location'].apply(lambda x: x.lower())
    df.apply(lambda x: x.min())

**map - Works with only series.**

    df['gender'].map({"M":'Male',"F":'Female'})

**applymap - Works with only dataframe. Apply a map function across df elementwise**

    df.applymap(len)
    df.applymap(str.lower)
    df.applymap(lower_string_function)

**replace() - To replace values of series/df by specified values**
    
    s.replace(1, 5)
    s.replace([3, 4], [3333, 4444])
    df['name'].replace({'Lea':'Lea_andre','Kevin':'Kevin_Peterson'})
    df.replace(regex='^l', value='LONDON')
    df.replace({'name': 'Patrick', 'Location': 'london'}, None)

In [39]:
import pandas as pd

data = pd.read_csv('weather.csv')
df = pd.DataFrame(data)

In [40]:
data = {'name': ['Patrick', 'Lea', 'Kevin'],
       'Location': ['London', 'NewYork', 'Berlin'],
       'gender': ['M', 'F', 'M']
       }
df = pd.DataFrame(data)
df

Unnamed: 0,name,Location,gender
0,Patrick,London,M
1,Lea,NewYork,F
2,Kevin,Berlin,M


In [41]:
df.sample()

Unnamed: 0,name,Location,gender
2,Kevin,Berlin,M


### <mark> Apply
    
    df.apply(
        func: 'AggFuncType',
        axis: 'Axis' = 0,
    )

In [42]:
df["Location"].apply(len)

0    6
1    7
2    6
Name: Location, dtype: int64

In [43]:
def update_location(locations):
    return locations.upper()

df['Location'] = df['Location'].apply(update_location)
df

Unnamed: 0,name,Location,gender
0,Patrick,LONDON,M
1,Lea,NEWYORK,F
2,Kevin,BERLIN,M


In [44]:
df['Location'] = df['Location'].apply(lambda x: x.lower())
df

Unnamed: 0,name,Location,gender
0,Patrick,london,M
1,Lea,newyork,F
2,Kevin,berlin,M


In [45]:
df['Location'].apply(len)

0    6
1    7
2    6
Name: Location, dtype: int64

In [46]:
df.apply(len)

name        3
Location    3
gender      3
dtype: int64

In [47]:
df.apply(len, axis='columns')

0    3
1    3
2    3
dtype: int64

In [48]:
df.apply(lambda x: x.min())

name         Kevin
Location    berlin
gender           F
dtype: object

### <mark> applymap
    
Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar
to every element of a DataFrame.
    
    df.applymap(
        functions,
        na_action = None|ignore  # how to treate na values
        )

In [49]:
df.applymap(len)

Unnamed: 0,name,Location,gender
0,7,6,1
1,3,7,1
2,5,6,1


In [50]:
df.applymap(str.lower)

Unnamed: 0,name,Location,gender
0,patrick,london,m
1,lea,newyork,f
2,kevin,berlin,m


### <mark> map

Works only with series

In [51]:
df['name'].map({'Lea':'lea_poche','Kevin':'kevin_anderson'},
              na_action=None)

0               NaN
1         lea_poche
2    kevin_anderson
Name: name, dtype: object

In [52]:
df['gender'].map({"M":'Male',"F":'Female'})

0      Male
1    Female
2      Male
Name: gender, dtype: object

### <mark> replace
    
works with bothe series and df
    
Replace values given in `to_replace` with `value`.

Values of the DataFrame are replaced with other values dynamically.

This differs from updating with ``.loc`` or ``.iloc``, which require
you to specify a location to update with some value.

In [53]:
df['name'].replace({'Lea':'Lea_andre','Kevin':'Kevin_Peterson'},
                  inplace=True)

In [54]:
df.replace({'newyork':'NYC', 'F':'Female'})

Unnamed: 0,name,Location,gender
0,Patrick,london,M
1,Lea_andre,NYC,Female
2,Kevin_Peterson,berlin,M


In [55]:
# regex replace
df.replace(regex='^l', value='LONDON')

Unnamed: 0,name,Location,gender
0,Patrick,LONDONondon,M
1,Lea_andre,newyork,F
2,Kevin_Peterson,berlin,M


In [56]:
# replace values from select columns
df.replace({'name': 'Patrick', 'Location': 'london'}, None)

Unnamed: 0,name,Location,gender
0,,,M
1,Lea_andre,newyork,F
2,Kevin_Peterson,berlin,M


In [57]:
s = pd.Series([1, 2, 3, 4, 5])
s.replace(1, 5)

0    5
1    2
2    3
3    4
4    5
dtype: int64

In [58]:
# list like replace
s.replace([3, 4], [3333, 4444])

0       1
1       2
2    3333
3    4444
4       5
dtype: int64