.values: A two-dimensional NumPy array of values.

.columns: An index of columns: the column names.

.index: An index for the rows: either row numbers or row names.

### _Sorting Values_

one column : .sort_values('COL_Name')
    
multiple colums : .sort_values(['COL_Name', 'COL_Name'])

### _Subsetting Columns_

To select one column : df['COL_Name']

To select multiple column : df[['COL_Name, 'COL_Name']]

### _Subsetting Rows_

known as filtering rows or selecting rows

select one rows : df[df['berat' > 60]]

select multiple rows : df[(df["berat"] > 60) & (df["warna"] == "merah")]

#### _Subsetting Rows by caterogical variables_

colors = ["brown", "black", "tan"]
condition = dogs["color"].isin(colors)
dogs[condition]

canu = ["California", "Arizona", "Nevada", "Utah"]
mojave_homelessness = homelessness[homelessness["state"].isin(canu)]

#### Import NumPy and create custom IQR function
import numpy as np
def iqr(column):
    return column.quantile(0.75) - column.quantile(0.25)

#### Update to print IQR and median of temperature_c, fuel_price_usd_per_l, & unemployment
print(sales[["temperature_c", "fuel_price_usd_per_l", "unemployment"]].agg([iqr, np.median]))

### _Cumulative_

.cumsum() : summing (+) a data in a column for each row in dataframe
    
.cummax()

.cummin()

.cumprod()

### _Dropping Duplicate Data_

df.drop_duplicate(subset='Col_Name') : drop similiar values in a chosen column
1|Andi|
2|Budi|
3|Andi| -> this one will be deleted

df.drop_duplicate(subset=['Col_Name', 'Col_Name']) : drop similiar values 
1|Andi|Anjing
2|Budi|Kucing
3|Andi|Kucing -> this one will not be deleted because both columns have diff values
3|Budi|Kucing -> this one will be deleted

### _Value Count_
df['Col_Name'].value_counts()

df['Col_Name'].value_counts(sort=True)

df['Col_Name'].value_counts(normalize=True) -> in percentage

### _Grouped Summaries_

df.groupby('color')['berat'].mean() : dataframe will be grouped by 'color' and each datas in 'color' will be get the 'berat' mean

df.groupby('color')['berat'].agg([min, max, sum])

df.groupby(['color', 'breed'])['berat'].mean()

df.groupby(['color', 'breed'])[['berat', 'tinggi']].mean()

### _Group by Pivot Tables_
dogs.pivot_table(values='berat', index='color', aggfunc=[np.mean, np.median])

#### _Pivot 2 variables_
dogs.pivot_table(values='berat', index='color', columns='breed')

dogs.pivot_table(values='berat', index='color', columns='breed', fill_value=0) : fill_value=0 if theres NaN value will be replaced by 0

dogs.pivot_table(values='berat', index='color', columns='breed', fill_value=0, margins=True)

### _Set Index_
dogs.set_index('breed')

### _Remove Index_
dog.reset_index()

### _Subsetting with index_
dogs[dogs['name'].isin(['Andi', 'Budi'])]

dogs.loc[['Andi', 'Budi']] -> set 'name' as index first

### _Sort Index_
dogs.sort_index('Index_Name')

dogs.sort_index(Level=['Index_Name', 'Index_Name'], ascending=[True, False])

### _Slicing Index_
#### _Slicing the outer index_
dogs.loc['Chihuahua:Poodle'] -> get all datas from index chihuahua until poodle

#### _Slicing the inner index_
dogs.loc[('Chihuahua', 'brown'):('Poodle', 'black')]

### _Slicing Column_
dogs.loc[:, 'Col_Name':'Col_Name']

### _Slice Twice_
dogs.loc[('Chihuahua', 'brown'):('Poodle', 'black'), 'Col_Name':'Col_Name']

### _Set Date as Index_
dogs.set_index('date of birth').sort_index()
dogs.loc['2014-10-10':'2018-10-10']

dogs.loc[2014:2016]

### _Subsetting by Row/Column Number
dogs.iloc[1:10, 2:4]

### _Detecting Missing Value_

df.isna()

df.isna().any()

df.isna().sum()

### _Replace Missing Value_

df.fillna(0)