# Pandas Operations (Calculations)

  - This is not exhaustive tutorial about Pandas operations. However, we will present to most common operations.

### Setting up the environment

In [59]:
import pandas as pd
import numpy as np
from random import choices, seed, sample, shuffle
from string import ascii_letters, ascii_uppercase

#### Note: 

I will be using __df.some_method_name__ as a __pandas DataFrame object__ to explain methods. but in the documentation they use __pd.some_method_name__.

### DataFrame Example: 

I am using the random module choices (which can give duplicate numbers), and sample function (which gives random unique numbers) to create an example DataFrame to work with.

In [60]:
seed(10111)
df = pd.DataFrame({'VAR_1': range(1, 7),
                  'VAR_2': choices(range(20, 30), k = 6),
                  'VAR_3': sample(range(10, 200, 14), k = 6)})
df

Unnamed: 0,VAR_1,VAR_2,VAR_3
0,1,29,178
1,2,22,150
2,3,26,38
3,4,29,136
4,5,25,10
5,6,27,24


### The Number of Non Missing Rows or Columns

   - Checking the number of missing data points is one of the very first steps after importing the data. Because we have to be aware of whether our data is complete or has missing data values. The __count__ method can be used here.  
   
#### count method

 - We can check the number of non-missing data either for rows or columns (default) using __pd.count() method__
 
```python

   df.count(axis = 0)  ----> Counts are generate for each column
    
   df.count(axis = 1) -----> Counts are generated for each row
# Check the docs
pd.count?
```

### Count method example

1. The number of values in each column. 

In [61]:
df.count()

VAR_1    6
VAR_2    6
VAR_3    6
dtype: int64

This means six values for each variable

2. The number of values in each row 

In [62]:
df.count(axis = 1)

0    3
1    3
2    3
3    3
4    3
5    3
dtype: int64

**All rows has three values, no missing values**

### Uniques and Duplicates

#### 1. Unique Numbers 

  - In data analysis, we care about unique values, and we don't like duplicates. For this reason, we check for unique values of a certain column using the __unique__ method.
  
  
  - __unique__ method returns an array with only **uniqe values**.
  
  
  - __unique__ method is a Series method, which works only with one variable from a DataFrame. 
  
Here is the syntax:
---
```python

   df['VAR'].unique() 
    
# For the documentation 
pd.unique? 
```

### Unique Values Example

In [63]:
df['VAR_2'].unique()

array([29, 22, 26, 25, 27])

### Number of Unique values

   - Knowing about the unique values can give us an idea about the variable as well. 
   
   - The number of unique values can be achieved using the built-in function __len__ or the Pandas __nunique__ method. (The latter is preferred).


Here is the syntax:
```python 

   len(df['var'].unique())
    or 
- The better way is to use nunique() method.

   df['var'].nunique()
    
# for docs run 
pd.unique?
```

### Number of unique values example

In [64]:
df['VAR_2'].nunique()

5

###  Value Counts or (Table)
  
   - Often times we want a table of __unique values and how many times they have appeared__. Pandas has a method for this called __value_counts__. (pay attention to the "s" at the end)
   
   
   - __df.value_counts()__ is similar to the __table function in R__ language.
   
   
   - Tables or count values are convenient for __categorical of nominal variable__ but not for __numeric or continuous variables__
   
Here is the syntax
   
```python 

df['var'].value_counts()
# Chech the docs using
pd.value_counts?
 ```

### Value counts example

In [65]:
df['VAR_2'].value_counts()

29    2
22    1
26    1
25    1
27    1
Name: VAR_2, dtype: int64

### Duplicate Values

  - We can check for complete duplicates (all variables) using __df.duplicated__ method.
  
  
- If nothing passed to the method, all the variables are considered


- This method returns a series of booleans.


- By default, the __first occurence is not considered a duplicate__ 


- We can change the default behavior by passing **keep=last**, which means __last occurence is not considered a duplicate__, or **keep = False** which mean duplicates to be duplicates.

Here is the syntax:

```python

df.duplicated()

# Check the docs
df.duplicated?
```

### Duplicated Values Example

In [66]:
df.duplicated()

0    False
1    False
2    False
3    False
4    False
5    False
dtype: bool

We see a Series of **Falses**, which means no row is a complete duplicate. 

We check a column for duplicates as well

In [67]:
df['VAR_2'].duplicated()

0    False
1    False
2    False
3     True
4    False
5    False
Name: VAR_2, dtype: bool

We might be tempted to know how many duplicate are there. Chaining a __sum()__ method will give us just that. 

In [68]:
df['VAR_2'].duplicated().sum()

1

### Drop Duplicate Values: 

  - Droping the complete duplicate rows is an essential data cleaning step. Because we dp not want to have redundant information.
  
  - To perform droping duplicates, the __df.drop_duplicates__ method is used.
  
  - We have to decide with row to keep. If the first occurence what we want to retain, then set __keep = 'first'__ which is the default, or __keep = 'last'__ if we want to retain the last occurence of the duplicate row. 
  
  - Setting __keep = False__ will drop all the duplicates. 
  
  - Dropping the duplicate does not affect the original data unless we set __inplace=True__. (Before doing this, keep a copy of the original data).
  
Here is the syntax:

```python
df.drop_duplicates()

# For one variable 
df['VAR-Name'].drop_duplicates
# For Docs
df.droplicates?
```

### Dropping the Duplicate Example

In [69]:
df.drop_duplicates()

Unnamed: 0,VAR_1,VAR_2,VAR_3
0,1,29,178
1,2,22,150
2,3,26,38
3,4,29,136
4,5,25,10
5,6,27,24


This DataFrame does not have any complete duplicates.

### Example of Dropping duplicates for one variable

In [70]:
df['VAR_2']

0    29
1    22
2    26
3    29
4    25
5    27
Name: VAR_2, dtype: int64

In [71]:
df['VAR_2'].drop_duplicates(keep = 'first')

0    29
1    22
2    26
4    25
5    27
Name: VAR_2, dtype: int64

In [72]:
df['VAR_2'].drop_duplicates(keep = 'last')

1    22
2    26
3    29
4    25
5    27
Name: VAR_2, dtype: int64

In [73]:
df['VAR_2'].drop_duplicates(keep = False)

1    22
2    26
4    25
5    27
Name: VAR_2, dtype: int64

In the last example, we did not have a complete duplicate. To clarify the idea even better, I am goint to create another DataFrame Example then apply counting and dropping duplicates.

In [74]:
dup_df = pd.DataFrame({'Col_1': [3, 2, 5, 3, 5, 1], 
                      'Col_2': [3, 1, 4, 3, 4, 2],
                      "Col_3":[3, 2, 5, 3, 5, 3]})
dup_df

Unnamed: 0,Col_1,Col_2,Col_3
0,3,3,3
1,2,1,2
2,5,4,5
3,3,3,3
4,5,4,5
5,1,2,3


Now, we can see the first and the fourth rows, and the third and the fifth are complete duplicates. This will be shown by __duplicated__ method. __True__ means a duplicate while __False__ means not.

In [75]:
dup_df.duplicated()

0    False
1    False
2    False
3     True
4     True
5    False
dtype: bool

Applying the sum function will return the number of duplicates.

In [76]:
dup_df.duplicated().sum()

2

When we have complete, the solution is to drop them completely from the DataFrame. But you have to decide which row to keep, the __first, the last or none__

In [77]:
dup_df.drop_duplicates(keep = 'first')

Unnamed: 0,Col_1,Col_2,Col_3
0,3,3,3
1,2,1,2
2,5,4,5
5,1,2,3


The rows indexed 3 and 4 are dropped.

In [78]:
dup_df.drop_duplicates(keep = 'last')

Unnamed: 0,Col_1,Col_2,Col_3
1,2,1,2
3,3,3,3
4,5,4,5
5,1,2,3


The rows indexed 0 and 2 are dropped.

In [79]:
dup_df.drop_duplicates(keep = False)

Unnamed: 0,Col_1,Col_2,Col_3
1,2,1,2
5,1,2,3


Only two columns are left, all other duplicates are dropped.

# Pandas Apply Function

   - One of the most eligant functionaly in Pandas is __apply function__.
   
   
   - __Apply function__ has the same concetp as __apply family in R language__ 
   
   
   - __Apply function__ becomes more powerful when combined with __lambda function (unonymous function__). 
   
   
   - __Apply function__ enables broadcasting other functions on __DataFrame columns__. That is apply the function on each variable in the DataFrame.

### Review of lambda function

   - **Functions created with lambda expression return another function**
   
   
   - **lambda does not assign a name to the function, that is the reason is called anonymous (unnamed)**
   
   
   - lambda functions are used as an inline function definition with map(), filter, apply() functions. 

   
   - **lambdas body is a single expression, not a block of statements**

### Lambda function syntax

```python

lambda arg1, arg1, ..., argN: expression using args
    
```

### Lambda function examples

In [80]:
add_numbers = lambda x, y: x + y
add_numbers(12,10)

22

In [81]:
# Add bangs to strings
add_bangs = lambda x : x +'!!!'

In [82]:
# Repeating words
rep_word = lambda word, rep: word * rep
rep_word('Python ', 5)

'Python Python Python Python Python '

### Scenarios of Applying Apply Function

1. Doing calculation on a variable, we do that through two steps: 

    1. Creating a function (user defined function) or use a built-in function.
    
    2. Pass the created function as an argument to __apply function__.
    
here is the syntax

```python 

df['var'].apply(user-defined function)
or 
df['var'].apply(built-in function)
```

### Using Apply function: Example 01 

  - Square the second variable
  - First, you write a function.

In [83]:
def square(x):
    return (x**2)

- Second, apply the __apply function__ using the previously created function.

In [84]:
df['VAR_2'].apply(square)

0    841
1    484
2    676
3    841
4    625
5    729
Name: VAR_2, dtype: int64

2. Using builtin functions with apply function

  - Feeding apply functions with __built-in functions__ is more common


#### Using Apply function: Example 02

  - Suppose we want to convert a column into a string. We pass __str__ function to apply

In [85]:
df['VAR_3'].apply(str)

0    178
1    150
2     38
3    136
4     10
5     24
Name: VAR_3, dtype: object

- What if we do not have what we need as a built-in function, and we don't want to write functions all the time. Because we use the needed function only __once__. 

    --> __lambda__ function is the solution to this situation

### Using  Apply  with Lambda function Example

   - We want to multipy the second variable by 5 and divide by 9. 
   
Here is the syntax
   
```python 
df['var'].apply(lambda x: x*5 / 9) 
```

In [86]:
df['VAR_2'].apply(lambda v: v*5/9)

0    16.111111
1    12.222222
2    14.444444
3    16.111111
4    13.888889
5    15.000000
Name: VAR_2, dtype: float64

**We combined two steps in one step thanks to lambda, and another step thanks to apply**

#### Note:

**We are not restricted to one variable, two or more are possible, here is an example of two variables**

In [87]:
df[['VAR_2', 'VAR_3']].apply(lambda v: v*5/9)

Unnamed: 0,VAR_2,VAR_3
0,16.111111,98.888889
1,12.222222,83.333333
2,14.444444,21.111111
3,16.111111,75.555556
4,13.888889,5.555556
5,15.0,13.333333


#### Note: 

 - Chaining apply functions are possible. Here is an example. First we do some calculation then we format the results.

In [88]:
df['VAR_2'].apply(lambda v: v*5/9).apply(lambda x: '%2.2f' % x)

0    16.11
1    12.22
2    14.44
3    16.11
4    13.89
5    15.00
Name: VAR_2, dtype: object

### Element-wise applymap function

   - To perform element-wise operations on a DataFrame object, use __applymap__ function. For Series objects there is a __map__ function. That is the reason the pandas developers had to change the name to applymap to distinguish between the Series and DataFrame functions.

We are going tho string format the previous example

In [89]:
res = df.apply(lambda v: v*5/9)
res

Unnamed: 0,VAR_1,VAR_2,VAR_3
0,0.555556,16.111111,98.888889
1,1.111111,12.222222,83.333333
2,1.666667,14.444444,21.111111
3,2.222222,16.111111,75.555556
4,2.777778,13.888889,5.555556
5,3.333333,15.0,13.333333


In [90]:
fmt = lambda x: '%.2f' % x

In [91]:
res.applymap(fmt)

Unnamed: 0,VAR_1,VAR_2,VAR_3
0,0.56,16.11,98.89
1,1.11,12.22,83.33
2,1.67,14.44,21.11
3,2.22,16.11,75.56
4,2.78,13.89,5.56
5,3.33,15.0,13.33


We can do it in one step, that is the purpose of using lambda function. But, if you are willing to use this function many times in the code, it is best to save it as an object.

In [92]:
res.applymap(lambda x: '%.2f' % x)

Unnamed: 0,VAR_1,VAR_2,VAR_3
0,0.56,16.11,98.89
1,1.11,12.22,83.33
2,1.67,14.44,21.11
3,2.22,16.11,75.56
4,2.78,13.89,5.56
5,3.33,15.0,13.33


# Sorting DataFrames

Because DataFrames are indexed objects, therefore the sorting will take place through indexes or through values. 

## Index Sorting

 - First we consider sorting a DataFrame object by __index__
 
 - We perform sorting by index using __sort_index__ method
 
 - __sort_index__ method has many options: 
 
     - **Axis** axis = 0 or sorting rows by default, set __axis = 1__ for sorting by columns.
     
     - **Ascending**: is True by default, set __ascending = False__ to achieve descending sort.
     - **Inplace**: False by default, set __inplace = True__ for permanent change. 
     
Here is the syntax:
```python

df.sort_index()                      ---> default
df.sort_index(axis=1)                ---> Sorting by column-index
df.sort_index(ascending = False)     ---> Descending sorting
df.sort_index(inplace=True).         ---> Permanent change
# For docs
df.sort_index?
```

### Sorting by index example

In [93]:
ind = list(ascii_letters)[:7]
shuffle(ind)

cols = list(ascii_uppercase)[:4]
shuffle(cols)

dat = np.array(choices(range(100), k = 28)).reshape(7, 4)
ind_df = pd.DataFrame(dat, index = ind, columns = cols)
ind_df

Unnamed: 0,D,C,B,A
e,75,24,22,32
f,53,51,77,42
a,24,34,26,40
c,47,84,2,74
b,75,52,61,80
g,3,58,28,43
d,99,34,50,38


### 1. Sorting by index by row 

In [94]:
# The default behavior
ind_df.sort_index()

Unnamed: 0,D,C,B,A
a,24,34,26,40
b,75,52,61,80
c,47,84,2,74
d,99,34,50,38
e,75,24,22,32
f,53,51,77,42
g,3,58,28,43


### 2. Sorting by index by columns

In [95]:
# input axis=1
ind_df.sort_index(axis = 1)

Unnamed: 0,A,B,C,D
e,32,22,24,75
f,42,77,51,53
a,40,26,34,24
c,74,2,84,47
b,80,61,52,75
g,43,28,58,3
d,38,50,34,99


### 3. Sorting by index by rows or columns in descending order

In [96]:
# By row index in descending order
ind_df.sort_index(ascending =False)

Unnamed: 0,D,C,B,A
g,3,58,28,43
f,53,51,77,42
e,75,24,22,32
d,99,34,50,38
c,47,84,2,74
b,75,52,61,80
a,24,34,26,40


In [97]:
# By columns index in descending order
ind_df.sort_index(axis =1, ascending =False)

Unnamed: 0,D,C,B,A
e,75,24,22,32
f,53,51,77,42
a,24,34,26,40
c,47,84,2,74
b,75,52,61,80
g,3,58,28,43
d,99,34,50,38


### 4. By row index and column index and inplace

In [98]:
ind_df.sort_index(axis = 0, inplace = True)
ind_df.sort_index(axis = 1, inplace = True)
ind_df

Unnamed: 0,A,B,C,D
a,40,26,34,24
b,80,61,52,75
c,74,2,84,47
d,38,50,34,99
e,32,22,24,75
f,42,77,51,53
g,43,28,58,3


###  Values Sorting

  - Another common task in data analysis is sorting values in ascending or descending order.
  
  - Sorting values is performed using __sort_values__ method
  
  - Sorting a DataFrame values requires one or more variables to sort by
  
  - __sort_values__ has several options:
  
      - **By**: a column or a list of columns to sort the data by.
      - **axis**: sorting values is performed by rows by default, setting __axis = 1__ will sort along columns.
      - **ascending**: Sorting is in an ascending order by default, set __ascending = False__ to perform descending sorting.
      
      - **na_position**: missing values are put last in the DataFrame by default. set __na_position= 'first__ to put in the first rows.
      
      - **inplace**: False by default, settting it True (__inplace=True__) will override the original data.

Here is the syntax:
```python
df.sort_values(by ='col-name')                     ---> sorting values by one column

df.sort_values(by = ['col_1', 'col_2', ...])       ---> Sorting by more columns

df.sort_values(by = 'col-name', ascending = False) ---> Sorting in descending order

df.sort_values(by='col-names', inplace = True).    ---> permanent change

df.sort_values(by ='col',  na_position ='first')   ---> missing values first

df.sort_values(by='col', ascending=False, inplace = True) 

      ==> Change the original data to be sorted in descending order by one column.
```

### Sorting by values examples

  We use the previous example used in sorting by index to work with

In [99]:
vsort_df = ind_df
vsort_df.head()

Unnamed: 0,A,B,C,D
a,40,26,34,24
b,80,61,52,75
c,74,2,84,47
d,38,50,34,99
e,32,22,24,75


### Sorting values by one variable

1. Default values sorting: ascending

In [100]:
vsort_df.sort_values(by ='A')

Unnamed: 0,A,B,C,D
e,32,22,24,75
d,38,50,34,99
a,40,26,34,24
f,42,77,51,53
g,43,28,58,3
c,74,2,84,47
b,80,61,52,75


2. Sorting in descending order

In [101]:
vsort_df.sort_values(by = 'A', ascending = False)

Unnamed: 0,A,B,C,D
b,80,61,52,75
c,74,2,84,47
g,43,28,58,3
f,42,77,51,53
a,40,26,34,24
d,38,50,34,99
e,32,22,24,75


### Sorting by more variables in descending order

In [102]:
vsort_df.sort_values(by =['A', 'B', 'C', 'D'], ascending=False)

Unnamed: 0,A,B,C,D
b,80,61,52,75
c,74,2,84,47
g,43,28,58,3
f,42,77,51,53
a,40,26,34,24
d,38,50,34,99
e,32,22,24,75


### Sorting and changing the data

In [103]:
vsort_df.sort_values(by =['A', 'B', 'C', 'D'], ascending=False, inplace=True)
vsort_df

Unnamed: 0,A,B,C,D
b,80,61,52,75
c,74,2,84,47
g,43,28,58,3
f,42,77,51,53
a,40,26,34,24
d,38,50,34,99
e,32,22,24,75


### Sorting data containing NaNs

In [104]:
df_nan = pd.DataFrame({"A": [np.nan, 3, 11, 7, np.nan],
                      "D": [4, np.nan, np.nan, 12, 22]})
df_nan

Unnamed: 0,A,D
0,,4.0
1,3.0,
2,11.0,
3,7.0,12.0
4,,22.0


### NaNs last example

In [105]:
df_nan.sort_values(by ='A',na_position='last')

Unnamed: 0,A,D
1,3.0,
3,7.0,12.0
2,11.0,
0,,4.0
4,,22.0


### NaNs first

In [106]:
df_nan.sort_values(by ='A',na_position='first')

Unnamed: 0,A,D
0,,4.0
4,,22.0
1,3.0,
3,7.0,12.0
2,11.0,


In [107]:
df_nan.sort_values(by = 'A',na_position='first')

Unnamed: 0,A,D
0,,4.0
4,,22.0
1,3.0,
3,7.0,12.0
2,11.0,


## Ranking

- Ranking means giving ranks to numerical data from one through the number of valid data points in an array. 

- By default, equal values are assigned a rank that is the average of the ranks of those values

- Ranking a DataFrame is carried out using the __rank__ function. 

- __rank__ has several options

  - **Axis**: is set to 0 by default, to perform the ranking on columns set __axis = 1__.
  - **Method**: When having equal values of a numerical variable, we have five options:
     - **average**: rank takes the average of the position of equal values.
     - **min**: rank takes the lowest position.
     - **max**: rank takes the highest position.
     - **first**: ranks assigned in order they appear in the array.
     - **dense**: like ‘min’, but rank always increases by 1 between groups.
     
  - **numeric_only**: rank only numeric columns if set to True. 
  - **na_option** : to decide how to rank the missing values, it has three options:

    - **keep**: assign NaN rank to NaN values
    - **top**: assign lowest rank to NaN values
    - **bottom**: assign highest rank to NaN values

  - **ascending** : Whether or not the elements should be ranked in ascending order. True by default.
  - **pct** : Whether or not to display the returned rankings in percentile form, False by default.
  
## How rank works

   1. Sort the numerical values in ascending order
   2. Use a method to compute the rank if there is equal values (average by default) (computing the position not the values themselves)
   3. Assign each value its corresponding rank.
   4. For percentiles, rank divides the present rank by the highest rank $\ \frac{PRESENT\  \ RANK}{HIGHEST\ \ RANK} \$ 
   

## Ranking Example

In [108]:
grades = pd.DataFrame({"Subjects": ["math", "computing", "stats", "calculus", 
                                   "linear_algebra", "economics"],
                      "Grades": [11, 12, 18, 7, np.nan, 12]},
                     index = ["Ahmad", "Nabil", "Ramy", "Zaki", 
                             "Bilal", "Nacer"])
grades

Unnamed: 0,Subjects,Grades
Ahmad,math,11.0
Nabil,computing,12.0
Ramy,stats,18.0
Zaki,calculus,7.0
Bilal,linear_algebra,
Nacer,economics,12.0


### Ranking using the default options

In [109]:
def_grades = grades.rank(numeric_only=True)
def_grades

Unnamed: 0,Grades
Ahmad,2.0
Nabil,3.5
Ramy,5.0
Zaki,1.0
Bilal,
Nacer,3.5


Here, we will replicate the result manually.

In [110]:
grades.sort_values(by = 'Grades')

Unnamed: 0,Subjects,Grades
Zaki,calculus,7.0
Ahmad,math,11.0
Nabil,computing,12.0
Nacer,economics,12.0
Ramy,stats,18.0
Bilal,linear_algebra,


- We can see that Zaki is ranked first, that why the rank function gave him 1. 

- Nabil and Nacer have the same grades, and they are in position 3 and 4. Thus, we should add 3 plus 4 then divide by 2 to find the rank. $3+4= 7 \  \ \ 7/2 = 3.5$. So the 3.5 in the result. and so on. 

Here is the full result


In [111]:
grades['default_rank'] = grades['Grades'].rank()
grades

Unnamed: 0,Subjects,Grades,default_rank
Ahmad,math,11.0,2.0
Nabil,computing,12.0,3.5
Ramy,stats,18.0,5.0
Zaki,calculus,7.0,1.0
Bilal,linear_algebra,,
Nacer,economics,12.0,3.5


### Ranking with max method

   - As shown above, Nabil is in third position, and Nacer in the fourth. 
   
   - By setting __method__ to **max**, both Nacer and Nabil will be ranked 4.

In [112]:
grades['max_rank'] = grades['Grades'].rank(method='max')
grades

Unnamed: 0,Subjects,Grades,default_rank,max_rank
Ahmad,math,11.0,2.0,2.0
Nabil,computing,12.0,3.5,4.0
Ramy,stats,18.0,5.0,5.0
Zaki,calculus,7.0,1.0,1.0
Bilal,linear_algebra,,,
Nacer,economics,12.0,3.5,4.0


### Ranking with min method

   - By setting __method__ to **min**, Both Nacer and Nabil will be ranked 3.

In [113]:
grades['max_rank'] = grades['Grades'].rank(method='min')
grades

Unnamed: 0,Subjects,Grades,default_rank,max_rank
Ahmad,math,11.0,2.0,2.0
Nabil,computing,12.0,3.5,3.0
Ramy,stats,18.0,5.0,5.0
Zaki,calculus,7.0,1.0,1.0
Bilal,linear_algebra,,,
Nacer,economics,12.0,3.5,3.0


### Ranking with  NaN at the bottom 

   - By setting __na_option__ to **Bottom**, The NaN will be ranked the highest. In this example it will be ranked 6. 

In [114]:
grades['NA_bottom'] = grades['Grades'].rank(na_option='bottom')
grades

Unnamed: 0,Subjects,Grades,default_rank,max_rank,NA_bottom
Ahmad,math,11.0,2.0,2.0,2.0
Nabil,computing,12.0,3.5,3.0,3.5
Ramy,stats,18.0,5.0,5.0,5.0
Zaki,calculus,7.0,1.0,1.0,1.0
Bilal,linear_algebra,,,,6.0
Nacer,economics,12.0,3.5,3.0,3.5


### Computing the percentiles  

   - By setting __pct__ to **True**, the rankings will be returned in percentile form.

In [115]:
grades['pct_rank'] = grades['Grades'].rank(pct=True)
grades

Unnamed: 0,Subjects,Grades,default_rank,max_rank,NA_bottom,pct_rank
Ahmad,math,11.0,2.0,2.0,2.0,0.4
Nabil,computing,12.0,3.5,3.0,3.5,0.7
Ramy,stats,18.0,5.0,5.0,5.0,1.0
Zaki,calculus,7.0,1.0,1.0,1.0,0.2
Bilal,linear_algebra,,,,6.0,
Nacer,economics,12.0,3.5,3.0,3.5,0.7


To understand how the perc_rank calculated. 

  - First, NaN is assigned (default). Thus, 5 is the __highest rank__
  
  - The present rank of __Ahmad__ is 2.
  
  - Divide 2 by 5
  
  - Result is 0.4

For Nabil: 

  - 3.5 / 5 = 0.7
  
and so on.

## The Final Ranked Table

In [116]:
grades

Unnamed: 0,Subjects,Grades,default_rank,max_rank,NA_bottom,pct_rank
Ahmad,math,11.0,2.0,2.0,2.0,0.4
Nabil,computing,12.0,3.5,3.0,3.5,0.7
Ramy,stats,18.0,5.0,5.0,5.0,1.0
Zaki,calculus,7.0,1.0,1.0,1.0,0.2
Bilal,linear_algebra,,,,6.0,
Nacer,economics,12.0,3.5,3.0,3.5,0.7


# Conclusion 

  Counting values, checking for duplicates, sorting, and ranking are necessary steps for exploring and preparing data for further analysis. 