<a href="https://colab.research.google.com/github/MonkeyWrenchGang/PythonBootcamp/blob/main/day_4/4_1_Rank_nLargest_nSmallest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to `rank()`, `nlargest()`, and `nsmallest()` in Pandas

1. `rank()`:
  The `rank()` method in Pandas assigns ranks to values in a column based on their order. It can rank values in ascending or descending order, with ties handled using different methods.

  - By default, the `rank()` method ranks values in ascending order, where the smallest value gets a rank of 1.
  - You can specify the `ascending` parameter as `False` to rank values in **descending order, where the largest value gets a rank of 1**
  - The `ascending` parameter allows you to control the direction of ranking based on your needs - JUST LIKE SORT!

  > Ties occur when multiple values have the same rank. Pandas provides different methods to handle ties when ranking values:
  - `'average'` (default): Assigns the average rank to the tied values.
  - `'min'`: Assigns the minimum rank to the tied values.
  - `'max'`: Assigns the maximum rank to the tied values.
  - `'first'`: Assigns ranks in the order they appear in the DataFrame, without averaging for ties.

  Example:
  ```python
  df['Rank'] = df['ColumnA'].rank(ascending=False, method='min')
  ```

2. `nlargest()`:
   - The `nlargest()` method returns the n largest values from a specific column.
   - It takes `n` (number of values to return) and `column` (the column to consider) as arguments.
   - It returns a new DataFrame containing the n largest values based on the specified column, sorted in descending order.

3. `nsmallest()`:
   - The `nsmallest()` method returns the n smallest values from a specific column.
   - It takes `n` (number of values to return) and `column` (the column to consider) as arguments.
   - It returns a new DataFrame containing the n smallest values based on the specified column, sorted in ascending order.

# NBA data


---


Let's check out these functions with NBA player salary information. Our datset The 'nba.csv' dataset contains:

- **Name**: The name of the player (string)
- **Team**: The team the player belongs to (string)
- **Number**: The player's jersey number (float)
- **Position**: The player's position (string)
- **Age**: The player's age (float)
- **Height**: The player's height (string)
- **Weight**: The player's weight (float)
- **College**: The college the player attended (string)
- **Salary**: The player's salary (float)







## NBA.csv dataset

import this
```
"https://raw.githubusercontent.com/MonkeyWrenchGang/PythonBootcamp/main/day_4/data/nba.csv"
```

In [1]:
import warnings
warnings.filterwarnings('ignore')
# ------------------------------------------------------------------
import pandas as pd


# ------------------------------------------------------------------
pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [2]:
nba = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/PythonBootcamp/main/day_4/data/nba.csv")
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


#  Rank the players based on their salaries in descending order.

1. create a new column `salary_rank`
2. sort the data by rank
3. use head to display top 5 records

> Expected Output: A DataFrame with a new column 'Salary Rank' indicating the rank of each player's salary



In [8]:
nba["salary_rank"] = nba['Salary'].rank(ascending=False)
nba = nba.sort_values(['salary_rank'], ascending=True)
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
109,Kobe Bryant,Los Angeles Lakers,24.0,SF,37.0,6-6,212.0,,25000000.0,1.0
169,LeBron James,Cleveland Cavaliers,23.0,SF,31.0,6-8,250.0,,22970500.0,2.0
33,Carmelo Anthony,New York Knicks,7.0,SF,32.0,6-8,240.0,Syracuse,22875000.0,3.0
251,Dwight Howard,Houston Rockets,12.0,C,30.0,6-11,265.0,,22359364.0,4.0
339,Chris Bosh,Miami Heat,1.0,PF,32.0,6-11,235.0,Georgia Tech,22192730.0,5.0


## Bottom 3 Players


---

using the salary_rank column and sort_values get the bottom three players by salary


In [10]:
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
397,Axel Toupane,Denver Nuggets,6.0,SG,23.0,6-7,210.0,,,
409,Greg Smith,Minnesota Timberwolves,4.0,PF,25.0,6-10,250.0,Fresno State,,
457,,,,,,,,,,


In [12]:
nba.query("salary_rank.notnull()").tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
291,Orlando Johnson,New Orleans Pelicans,0.0,SG,27.0,6-5,220.0,UC Santa Barbara,55722.0,444.5
130,Phil Pressey,Phoenix Suns,25.0,PG,25.0,5-11,175.0,Missouri,55722.0,444.5
32,Thanasis Antetokounmpo,New York Knicks,43.0,SF,23.0,6-7,205.0,,30888.0,446.0


## Heavy Players w. nlargest()

---
to get the top 5 heaviest players by Weight!



In [13]:
nba.nlargest(5,["Weight"])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
405,Nikola Pekovic,Minnesota Timberwolves,14.0,C,30.0,6-11,307.0,,12100000.0,54.0
302,Boban Marjanovic,San Antonio Spurs,40.0,C,27.0,7-3,290.0,,1200000.0,314.0
330,Al Jefferson,Charlotte Hornets,25.0,C,31.0,6-10,289.0,,13500000.0,43.0
395,Jusuf Nurkic,Denver Nuggets,23.0,C,21.0,7-0,280.0,,1842000.0,270.5
188,Andre Drummond,Detroit Pistons,0.0,C,22.0,6-11,279.0,Connecticut,3272091.0,199.0


# Who are the top 5 oldest players

In [14]:
nba.nlargest(5,["Age"])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
400,Kevin Garnett,Minnesota Timberwolves,21.0,PF,40.0,6-11,240.0,,8500000.0,83.0
298,Tim Duncan,San Antonio Spurs,21.0,C,40.0,6-11,250.0,Wake Forest,5250000.0,135.0
304,Andre Miller,San Antonio Spurs,24.0,PG,40.0,6-3,200.0,Utah,250750.0,428.0
261,Vince Carter,Memphis Grizzlies,15.0,SG,39.0,6-6,220.0,North Carolina,4088019.0,168.0
102,Pablo Prigioni,Los Angeles Clippers,9.0,PG,39.0,6-3,185.0,,947726.0,351.0


# Who are the top 5 youngest players

---

nsmallest()

In [17]:
nba.nsmallest(5,["Age"])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
122,Devin Booker,Phoenix Suns,1.0,SG,19.0,6-6,206.0,Kentucky,2127840.0,257.0
226,Rashad Vaughn,Milwaukee Bucks,20.0,SG,19.0,6-6,202.0,UNLV,1733040.0,275.0
410,Karl-Anthony Towns,Minnesota Timberwolves,32.0,C,20.0,7-0,244.0,Kentucky,5703600.0,124.0
116,D'Angelo Russell,Los Angeles Lakers,1.0,PG,20.0,6-5,195.0,Ohio State,5103120.0,142.0
56,Jahlil Okafor,Philadelphia 76ers,8.0,C,20.0,6-11,275.0,Duke,4582680.0,155.0


# Nlargest & Nsmallest TIES!

When using the nlargest() and nsmallest() functions in Pandas, ties can be handled using the keep= optoin.

- keep='first': This is the default option. It keeps the first occurrence of each tied value and discards the subsequent tied values.

- keep='last': It keeps the last occurrence of each tied value and discards the preceding tied values.

- keep='all': It keeps all tied values and includes all occurrences of tied values in the result.



In [22]:
nba.nsmallest(5, 'Age', keep='all')


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,salary_rank
122,Devin Booker,Phoenix Suns,1.0,SG,19.0,6-6,206.0,Kentucky,2127840.0,257.0
226,Rashad Vaughn,Milwaukee Bucks,20.0,SG,19.0,6-6,202.0,UNLV,1733040.0,275.0
410,Karl-Anthony Towns,Minnesota Timberwolves,32.0,C,20.0,7-0,244.0,Kentucky,5703600.0,124.0
116,D'Angelo Russell,Los Angeles Lakers,1.0,PG,20.0,6-5,195.0,Ohio State,5103120.0,142.0
56,Jahlil Okafor,Philadelphia 76ers,8.0,C,20.0,6-11,275.0,Duke,4582680.0,155.0
356,Aaron Gordon,Orlando Magic,0.0,PF,20.0,6-9,220.0,Arizona,4171680.0,166.0
40,Kristaps Porzingis,New York Knicks,6.0,PF,20.0,7-3,240.0,,4131720.0,167.0
445,Dante Exum,Utah Jazz,11.0,PG,20.0,6-6,190.0,,3777720.0,181.0
393,Emmanuel Mudiay,Denver Nuggets,0.0,PG,20.0,6-5,200.0,,3102240.0,205.0
192,Stanley Johnson,Detroit Pistons,3.0,SF,20.0,6-7,245.0,Arizona,2841960.0,223.0


## Exercises with rank(), nlargest(), and nsmallest() on NBA Dataset

1. **Exercise**: Compute the rank of players based on their salaries so largest salary is No.1

2. **Exercise**: Find the top 10 players with the highest salaries.

3. **Exercise**: Find the 5 players with the lowest salaries.

4. **Exercise**: Compute the rank of players based on their ages in descending order.

5. **Exercise**: Find the player with the lowest weight.

6. **Exercise**: Find the 3 players with the lowest player `Number`.

7. Challenge! Tallest and Shortest players, use the following code to create height in inches

```python
def convert_height(height):
    feet, inches = height.split('-')
    total_inches = int(feet) * 12 + int(inches)
    return total_inches

nba['Height (inches)'] = nba['Height'].apply(convert_height)
```
  7.1 - who are the top 5 tallest players
  7.2 - who are the bottom 5 shortest players
  7.3 - create BMI, rank players with the top 5 and bottom 5 BMI calculation
  ```python
  nba['BMI'] = (nba['Weight'] / (nba['Height (inches)'] ** 2)) * 703
  ```


**Example 1 Solution:**
```python
# Exercise 1:
nba['Salary Rank'] = nba['Salary'].rank(ascending=False)
```