# DataFrames I

In [2]:
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [None]:
pip install pandas

## Methods and Attributes between Series and DataFrames
- A **DataFrame** is a 2-dimensional table consisting of rows and columns.
- Pandas uses a `NaN` designation for cells that have a missing value. It is short for "not a number". Most operations on `NaN` values will produce `NaN` values.
- Like with a **Series**, Pandas assigns an index position/label to each **DataFrame** row.
- The **DataFrame** and **Series** have common and exclusive methods/attributes.
- The `hasnans` attribute exists only a **Series**. The `columns` attribute exists only on a **DataFrame**.
- Some methods/attributes will return different types of data.
- The `info` method returns a summary of the pandas object.

In [6]:
nba = pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [8]:
nba.isna().sum()

Name          1
Team          1
Position      8
Height        7
Weight        8
College      14
Salary      104
dtype: int64

In [9]:
nba.values

array([['Saddiq Bey', 'Atlanta Hawks', 'F', ..., 215.0, 'Villanova',
        4556983.0],
       ['Bogdan Bogdanovic', 'Atlanta Hawks', 'G', ..., 225.0,
        'Fenerbahce', 18700000.0],
       ['Kobe Bufkin', 'Atlanta Hawks', 'G', ..., 195.0, 'Michigan',
        4094244.0],
       ...,
       ['Tristan Vukcevic', 'Washington Wizards', 'F', ..., 220.0,
        'Real Madrid', nan],
       ['Delon Wright', 'Washington Wizards', 'G', ..., 185.0, 'Utah',
        8195122.0],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=object)

In [10]:
nba.index

RangeIndex(start=0, stop=592, step=1)

In [11]:
nba.shape

(592, 7)

In [12]:
nba.dtypes

Name         object
Team         object
Position     object
Height       object
Weight      float64
College      object
Salary      float64
dtype: object

In [13]:
nba.axes

[RangeIndex(start=0, stop=592, step=1),
 Index(['Name', 'Team', 'Position', 'Height', 'Weight', 'College', 'Salary'], dtype='object')]

In [14]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      591 non-null    object 
 1   Team      591 non-null    object 
 2   Position  584 non-null    object 
 3   Height    585 non-null    object 
 4   Weight    584 non-null    float64
 5   College   578 non-null    object 
 6   Salary    488 non-null    float64
dtypes: float64(2), object(5)
memory usage: 32.5+ KB


## Differences between Shared Methods
- The `sum` method adds a **Series's** values.
- On a **DataFrame**, the `sum` method defaults to adding the values by traversing the index (row values).
- The `axis` parameter customizes the direction that we add across. Pass `"columns"` or `1` to add "across" the columns.

In [15]:
revenue=pd.read_csv("revenue.csv", index_col = "Date")
revenue

Unnamed: 0_level_0,New York,Los Angeles,Miami
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/26,985,122,499
1/2/26,738,788,534
1/3/26,14,20,933
1/4/26,730,904,885
1/5/26,114,71,253
1/6/26,936,502,497
1/7/26,123,996,115
1/8/26,935,492,886
1/9/26,846,954,823
1/10/26,54,285,216


In [16]:
revenue.sum()

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [17]:
revenue.sum(axis=1)

Date
1/1/26     1606
1/2/26     2060
1/3/26      967
1/4/26     2519
1/5/26      438
1/6/26     1935
1/7/26     1234
1/8/26     2313
1/9/26     2623
1/10/26     555
dtype: int64

In [18]:
revenue.sum(axis=0)

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

## Select One Column from a DataFrame
- We can use attribute syntax (`df.column_name`) to select a column from a **DataFrame**. The syntax will not work if the column name has spaces.
- We can also use square bracket syntax (`df["column name"]`) which will work for any column name.
- Pandas extracts a column from a **DataFrame** as a **Series**.
- The **Series** is a view, so changes to the **Series** *will* affect the **DataFrame**.
- Pandas will display a warning if you mutate the **Series**. Use the `copy` method to create a duplicate.

In [19]:
df =pd.read_csv("nba.csv")
df

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [20]:
Name = df["Name"].copy()
Name

0             Saddiq Bey
1      Bogdan Bogdanovic
2            Kobe Bufkin
3           Clint Capela
4         Bruno Fernando
             ...        
587         Ryan Rollins
588        Landry Shamet
589     Tristan Vukcevic
590         Delon Wright
591                  NaN
Name: Name, Length: 592, dtype: object

In [21]:
Name.iloc[0] = "TESLIM"

In [22]:
Name

0                 TESLIM
1      Bogdan Bogdanovic
2            Kobe Bufkin
3           Clint Capela
4         Bruno Fernando
             ...        
587         Ryan Rollins
588        Landry Shamet
589     Tristan Vukcevic
590         Delon Wright
591                  NaN
Name: Name, Length: 592, dtype: object

In [23]:
df.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0


In [34]:
nba.Team

0           Atlanta Hawks
1           Atlanta Hawks
2           Atlanta Hawks
3           Atlanta Hawks
4           Atlanta Hawks
              ...        
587    Washington Wizards
588    Washington Wizards
589    Washington Wizards
590    Washington Wizards
591                   NaN
Name: Team, Length: 592, dtype: object

## Select Multiple Columns from a DataFrame
- Use square brackets with a list of names to extract multiple **DataFrame** columns.
- Pandas stores the result in a new **DataFrame** (a copy).

In [24]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [25]:
nba[['Name', 'Team', 'Position']]

Unnamed: 0,Name,Team,Position
0,Saddiq Bey,Atlanta Hawks,F
1,Bogdan Bogdanovic,Atlanta Hawks,G
2,Kobe Bufkin,Atlanta Hawks,G
3,Clint Capela,Atlanta Hawks,C
4,Bruno Fernando,Atlanta Hawks,F-C
...,...,...,...
587,Ryan Rollins,Washington Wizards,G
588,Landry Shamet,Washington Wizards,G
589,Tristan Vukcevic,Washington Wizards,F
590,Delon Wright,Washington Wizards,G


## Add New Column to DataFrame
- Use square bracket extraction syntax with an equal sign to add a new **Series** to a **DataFrame**.
- The `insert` method allows us to insert an element at a specific column index.
- On the right-hand side, we can reference an existing **DataFrame** column and perform a broadcasting operation on it to create the new **Series**.

In [26]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [27]:
nba["locaiion"] = "USA"

In [28]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,locaiion
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0,USA
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0,USA
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0,USA
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0,USA
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0,USA
...,...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0,USA
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0,USA
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,,USA
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0,USA


In [29]:
nba.insert(loc=3, column="Sport", value="Home Basketball", allow_duplicates=True)
nba

Unnamed: 0,Name,Team,Position,Sport,Height,Weight,College,Salary,locaiion
0,Saddiq Bey,Atlanta Hawks,F,Home Basketball,6-7,215.0,Villanova,4556983.0,USA
1,Bogdan Bogdanovic,Atlanta Hawks,G,Home Basketball,6-5,225.0,Fenerbahce,18700000.0,USA
2,Kobe Bufkin,Atlanta Hawks,G,Home Basketball,6-5,195.0,Michigan,4094244.0,USA
3,Clint Capela,Atlanta Hawks,C,Home Basketball,6-10,256.0,Elan Chalon,20616000.0,USA
4,Bruno Fernando,Atlanta Hawks,F-C,Home Basketball,6-10,240.0,Maryland,2581522.0,USA
...,...,...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,Home Basketball,6-3,180.0,Toledo,1719864.0,USA
588,Landry Shamet,Washington Wizards,G,Home Basketball,6-4,190.0,Wichita State,10250000.0,USA
589,Tristan Vukcevic,Washington Wizards,F,Home Basketball,6-10,220.0,Real Madrid,,USA
590,Delon Wright,Washington Wizards,G,Home Basketball,6-5,185.0,Utah,8195122.0,USA


In [30]:
nba["Double Weight"] = nba["Weight"] * 2
nba

Unnamed: 0,Name,Team,Position,Sport,Height,Weight,College,Salary,locaiion,Double Weight
0,Saddiq Bey,Atlanta Hawks,F,Home Basketball,6-7,215.0,Villanova,4556983.0,USA,430.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Home Basketball,6-5,225.0,Fenerbahce,18700000.0,USA,450.0
2,Kobe Bufkin,Atlanta Hawks,G,Home Basketball,6-5,195.0,Michigan,4094244.0,USA,390.0
3,Clint Capela,Atlanta Hawks,C,Home Basketball,6-10,256.0,Elan Chalon,20616000.0,USA,512.0
4,Bruno Fernando,Atlanta Hawks,F-C,Home Basketball,6-10,240.0,Maryland,2581522.0,USA,480.0
...,...,...,...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,Home Basketball,6-3,180.0,Toledo,1719864.0,USA,360.0
588,Landry Shamet,Washington Wizards,G,Home Basketball,6-4,190.0,Wichita State,10250000.0,USA,380.0
589,Tristan Vukcevic,Washington Wizards,F,Home Basketball,6-10,220.0,Real Madrid,,USA,440.0
590,Delon Wright,Washington Wizards,G,Home Basketball,6-5,185.0,Utah,8195122.0,USA,370.0


In [31]:
print(nba["Salary"].add)

<bound method Series.add of 0       4556983.0
1      18700000.0
2       4094244.0
3      20616000.0
4       2581522.0
          ...    
587     1719864.0
588    10250000.0
589           NaN
590     8195122.0
591           NaN
Name: Salary, Length: 592, dtype: float64>


In [35]:
nba

Unnamed: 0,Name,Team,Position,Sport,Height,Weight,College,Salary,locaiion,Double Weight
0,Saddiq Bey,Atlanta Hawks,F,Home Basketball,6-7,215.0,Villanova,4556983.0,USA,430.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Home Basketball,6-5,225.0,Fenerbahce,18700000.0,USA,450.0
2,Kobe Bufkin,Atlanta Hawks,G,Home Basketball,6-5,195.0,Michigan,4094244.0,USA,390.0
3,Clint Capela,Atlanta Hawks,C,Home Basketball,6-10,256.0,Elan Chalon,20616000.0,USA,512.0
4,Bruno Fernando,Atlanta Hawks,F-C,Home Basketball,6-10,240.0,Maryland,2581522.0,USA,480.0
...,...,...,...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,Home Basketball,6-3,180.0,Toledo,1719864.0,USA,360.0
588,Landry Shamet,Washington Wizards,G,Home Basketball,6-4,190.0,Wichita State,10250000.0,USA,380.0
589,Tristan Vukcevic,Washington Wizards,F,Home Basketball,6-10,220.0,Real Madrid,,USA,440.0
590,Delon Wright,Washington Wizards,G,Home Basketball,6-5,185.0,Utah,8195122.0,USA,370.0


In [53]:
info_columns = ["Weight", "Salary"]
info = nba[info_columns]*4
info

Unnamed: 0,Weight,Salary
0,860.0,18227932.0
1,900.0,74800000.0
2,780.0,16376976.0
3,1024.0,82464000.0
4,960.0,10326088.0
...,...,...
587,720.0,6879456.0
588,760.0,41000000.0
589,880.0,
590,740.0,32780488.0


In [54]:
nba[["info_1", "info_2"]] = info
nba

Unnamed: 0,Name,Team,Position,Sport,Height,Weight,College,Salary,locaiion,Double Weight,info_1,info_2
0,Saddiq Bey,Atlanta Hawks,F,Home Basketball,6-7,215.0,Villanova,4556983.0,USA,430.0,860.0,18227932.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Home Basketball,6-5,225.0,Fenerbahce,18700000.0,USA,450.0,900.0,74800000.0
2,Kobe Bufkin,Atlanta Hawks,G,Home Basketball,6-5,195.0,Michigan,4094244.0,USA,390.0,780.0,16376976.0
3,Clint Capela,Atlanta Hawks,C,Home Basketball,6-10,256.0,Elan Chalon,20616000.0,USA,512.0,1024.0,82464000.0
4,Bruno Fernando,Atlanta Hawks,F-C,Home Basketball,6-10,240.0,Maryland,2581522.0,USA,480.0,960.0,10326088.0
...,...,...,...,...,...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,Home Basketball,6-3,180.0,Toledo,1719864.0,USA,360.0,720.0,6879456.0
588,Landry Shamet,Washington Wizards,G,Home Basketball,6-4,190.0,Wichita State,10250000.0,USA,380.0,760.0,41000000.0
589,Tristan Vukcevic,Washington Wizards,F,Home Basketball,6-10,220.0,Real Madrid,,USA,440.0,880.0,
590,Delon Wright,Washington Wizards,G,Home Basketball,6-5,185.0,Utah,8195122.0,USA,370.0,740.0,32780488.0


## A Review of the value_counts Method
- The `value_counts` method counts the number of times that each unique value occurs in a **Series**.

In [56]:
nba["Team"].value_counts(normalize=True)*100

Team
Dallas Mavericks          3.891709
Miami Heat                3.722504
Denver Nuggets            3.722504
Milwaukee Bucks           3.722504
Memphis Grizzlies         3.722504
Indiana Pacers            3.553299
Utah Jazz                 3.553299
Toronto Raptors           3.553299
Philadelphia 76ers        3.553299
Oklahoma City Thunder     3.553299
New York Knicks           3.553299
Washington Wizards        3.553299
Phoenix Suns              3.384095
Houston Rockets           3.384095
Charlotte Hornets         3.384095
San Antonio Spurs         3.384095
Los Angeles Clippers      3.214890
Minnesota Timberwolves    3.214890
Detroit Pistons           3.214890
Cleveland Cavaliers       3.214890
Los Angeles Lakers        3.214890
Chicago Bulls             3.214890
Sacramento Kings          3.045685
Orlando Magic             3.045685
Boston Celtics            3.045685
Atlanta Hawks             3.045685
Portland Trail Blazers    2.876481
Golden State Warriors     2.876481
Brooklyn Nets  

In [33]:
data = pd.read_csv(r'/Users/teslim/OneDrive/adult_WS2.csv')

FileNotFoundError: [Errno 2] No such file or directory: '/Users/teslim/OneDrive/adult_WS2.csv'

In [38]:
data["education"].value_counts()

education
HS-grad         15784
Some-college    10878
Bachelors        8025
Masters          2657
Assoc-voc        2061
11th             1812
Assoc-acdm       1601
10th             1389
7th-8th           955
Prof-school       834
9th               756
12th              657
Doctorate         594
5th-6th           509
1st-4th           247
Preschool          83
Name: count, dtype: int64

## Drop Rows with Missing Values
- Pandas uses a `NaN` designation for cells that have a missing value.
- The `dropna` method deletes rows with missing values. Its default behavior is to remove a row if it has *any* missing values.
- Pass the `how` parameter an argument of "all" to delete rows where all the values are `NaN`.
- The `subset` parameters customizes/limits the columns that pandas will use to drop rows with missing values.

In [58]:
nba.dropna(how="any")

Unnamed: 0,Name,Team,Position,Sport,Height,Weight,College,Salary,locaiion,Double Weight,info_1,info_2
0,Saddiq Bey,Atlanta Hawks,F,Home Basketball,6-7,215.0,Villanova,4556983.0,USA,430.0,860.0,18227932.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Home Basketball,6-5,225.0,Fenerbahce,18700000.0,USA,450.0,900.0,74800000.0
2,Kobe Bufkin,Atlanta Hawks,G,Home Basketball,6-5,195.0,Michigan,4094244.0,USA,390.0,780.0,16376976.0
3,Clint Capela,Atlanta Hawks,C,Home Basketball,6-10,256.0,Elan Chalon,20616000.0,USA,512.0,1024.0,82464000.0
4,Bruno Fernando,Atlanta Hawks,F-C,Home Basketball,6-10,240.0,Maryland,2581522.0,USA,480.0,960.0,10326088.0
...,...,...,...,...,...,...,...,...,...,...,...,...
585,Eugene Omoruyi,Washington Wizards,F,Home Basketball,6-6,235.0,Oregon,559782.0,USA,470.0,940.0,2239128.0
586,Jordan Poole,Washington Wizards,G,Home Basketball,6-4,194.0,Michigan,27955357.0,USA,388.0,776.0,111821428.0
587,Ryan Rollins,Washington Wizards,G,Home Basketball,6-3,180.0,Toledo,1719864.0,USA,360.0,720.0,6879456.0
588,Landry Shamet,Washington Wizards,G,Home Basketball,6-4,190.0,Wichita State,10250000.0,USA,380.0,760.0,41000000.0


In [None]:
nba.dropa(subset)

## Fill in Missing Values with the fillna Method
- The `fillna` method replaces missing `NaN` values with its argument.
- The `fillna` method is available on both **DataFrames** and **Series**.
- An extracted **Series** is a view on the original **DataFrame**, but the `fillna` method returns a copy.

In [3]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [4]:
nba.fillna(0)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [6]:
nba["Salary"].fillna(100)

0       4556983.0
1      18700000.0
2       4094244.0
3      20616000.0
4       2581522.0
          ...    
587     1719864.0
588    10250000.0
589         100.0
590     8195122.0
591         100.0
Name: Salary, Length: 592, dtype: float64

In [11]:
nba["Salary"] = nba["Salary"].fillna(100)

In [12]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,100.0
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [13]:
nba["College"] = nba.College.fillna(value="unknown")

In [14]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,100.0
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


## The astype Method I
- The `astype` method converts a **Series's** values to a specified type.
- Pass in the specified type as either a string or the core Python data type.
- Pandas cannot convert `NaN` values to numeric types, so we need to eliminate/replace them before we perform the conversion.
- The `dtypes` attribute returns a **Series** with the **DataFrame's** columns and their types.

In [4]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [5]:
nba.dtypes

Name         object
Team         object
Position     object
Height       object
Weight      float64
College      object
Salary      float64
dtype: object

In [6]:
# remove the naa first before the convertng of the column to int. 
nba["Salary"] = nba["Salary"].fillna(0)
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [7]:
# converting the salary column to int and re-assigned it back to the column. 
nba["Salary"]=nba["Salary"].astype(int)

In [21]:
# checking the column to confirm the changes. 
nba.dtypes

Name         object
Team         object
Position     object
Height       object
Weight      float64
College      object
Salary        int64
dtype: object

In [8]:
# Applying the same princples to weight column 

# fill any missing row in the columns with the zero
nba["Weight"] = nba["Weight"].fillna(0)

# Change the column data types to int
nba["Weight"] = nba["Weight"].astype(int)

In [9]:
# checking the column to confirm the changes. 
nba.dtypes

Name        object
Team        object
Position    object
Height      object
Weight       int64
College     object
Salary       int64
dtype: object

## The astype Method II
- The `category` type is ideal for columns with a limited number of unique values.
- The `nunique` method will return a **Series** with the number of unique values in each column.
- With categories, pandas does not create a separate value in memory for each "cell". Rather, the cells point to a single copy for each unique value.

In [62]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240,Maryland,2581522
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220,Real Madrid,0
590,Delon Wright,Washington Wizards,G,6-5,185,Utah,8195122


In [63]:
# this is a long way to code, and this can be achieve with the nunique
nba["Team"].value_counts().count()

30

In [64]:
# this is checking the unique value in a column
nba["Team"].nunique()

30

In [65]:
# checking the data frame unique values 
nba.nunique()

Name        591
Team         30
Position      7
Height       20
Weight       94
College     182
Salary      299
dtype: int64

In [66]:
 # checking the memory before grouping the data 
 nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Name      591 non-null    object  
 1   Team      591 non-null    category
 2   Position  584 non-null    object  
 3   Height    585 non-null    object  
 4   Weight    592 non-null    int64   
 5   College   578 non-null    object  
 6   Salary    592 non-null    int64   
dtypes: category(1), int64(2), object(4)
memory usage: 29.7+ KB


In [60]:
nba["Team"] = nba["Team"].astype("category")

In [68]:
nba["Position"]=nba["Position"].astype("category")

In [69]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Name      591 non-null    object  
 1   Team      591 non-null    category
 2   Position  584 non-null    category
 3   Height    585 non-null    object  
 4   Weight    592 non-null    int64   
 5   College   578 non-null    object  
 6   Salary    592 non-null    int64   
dtypes: category(2), int64(2), object(3)
memory usage: 26.0+ KB


In [11]:
# importing a new data to test what I am doing 
file = r'/Users/teslim/OneDrive/12_console_dca.xlsx'
dca = pd.read_excel(file)
dca.head()

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding
0,305401649499,,"Jul 13, 2020",61-90,,38000.0,2348023133719,ajibolaibrahim2@gmail.com,First Bank,153333.34,"Aug 1, 2021",44409,24999906,25426.66
1,305919711003,Stephen Udoh,"Oct 7, 2020",91-120,14983.33,2300.0,2348186023049,peresinde68@gmail.com,GT Bank,23000.0,"Aug 1, 2021",44409,25011145,4683.33
2,305380284878,,"Jul 6, 2020",91-120,,10500.0,2348062507954,oghenekarojeremiah@gmail.com,Zenith Bank,61000.0,"Aug 1, 2021",44409,24998061,19497.24
3,305380284878,,"Jul 6, 2020",91-120,,7000.0,2348062507954,oghenekarojeremiah@gmail.com,Zenith Bank,61000.0,"Aug 1, 2021",44409,24998528,19497.24
4,305996407259,,"Oct 7, 2020",91-120,30693.4,3000.0,2348050295205,ajohnsonadewumi@gmail.com,GT Bank,41500.0,"Aug 1, 2021",44409,25001640,21693.4


In [13]:
# confirming the status ofthe missing 
dca.isna().sum()

systemloanId            0
DCA 2                 205
Assignment_Date         0
Assignment_Type         0
Amount_Delinquent     298
Repayment_Amount        0
mobilePhone1            0
emailAddress            0
BankName                1
loanAmount              0
Value_Date              0
Posting_Date            0
transactionId           0
BalanceOutstanding      0
dtype: int64

In [14]:

dca["DCA 2"]=wd["DCA 2"].fillna(value="un-assigned agent")
dca

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding
0,305401649499,un-assigned agent,"Jul 13, 2020",61-90,,38000.00,2348023133719,ajibolaibrahim2@gmail.com,First Bank,153333.34,"Aug 1, 2021",44409,24999906,25426.66
1,305919711003,Stephen Udoh,"Oct 7, 2020",91-120,14983.33,2300.00,2348186023049,peresinde68@gmail.com,GT Bank,23000.00,"Aug 1, 2021",44409,25011145,4683.33
2,305380284878,un-assigned agent,"Jul 6, 2020",91-120,,10500.00,2348062507954,oghenekarojeremiah@gmail.com,Zenith Bank,61000.00,"Aug 1, 2021",44409,24998061,19497.24
3,305380284878,un-assigned agent,"Jul 6, 2020",91-120,,7000.00,2348062507954,oghenekarojeremiah@gmail.com,Zenith Bank,61000.00,"Aug 1, 2021",44409,24998528,19497.24
4,305996407259,un-assigned agent,"Oct 7, 2020",91-120,30693.40,3000.00,2348050295205,ajohnsonadewumi@gmail.com,GT Bank,41500.00,"Aug 1, 2021",44409,25001640,21693.40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
445,306757417283,un-assigned agent,"Jan 12, 2021",91-120,,5000.00,2348178124314,samvelsirseunayobams@gmail.com,First Bank,20000.00,"Jul 31, 2021",44408,24967687,9329.37
446,306636404787,Olalekan Olaokun,"Jan 12, 2021",91-120,,50.00,2348143758790,ezughaalexander@gmail.com,Diamond Bank,20000.00,"Jul 31, 2021",44408,24974774,22858.16
447,305645708159,Olalekan Olaokun,"Jul 6, 2020",91-120,,50064.34,2348027181436,rebeccaotaru@gmail.com,EcoBank,80000.00,"Jul 31, 2021",44408,24968763,0.00
448,305785433039,Ridwan Akeeb,"Oct 7, 2020",91-120,8051.38,2528.25,2348023253702,elderags@yahoo.com,Stanbic IBTC,24500.00,"Jul 31, 2021",44408,24967669,2000.00


In [15]:
dca.nunique()

systemloanId          309
DCA 2                  11
Assignment_Date         7
Assignment_Type         3
Amount_Delinquent     105
Repayment_Amount      228
mobilePhone1          309
emailAddress          309
BankName               17
loanAmount            176
Value_Date             52
Posting_Date           31
transactionId         450
BalanceOutstanding    270
dtype: int64

In [16]:
dca.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 450 entries, 0 to 449
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   systemloanId        450 non-null    object 
 1   DCA 2               450 non-null    object 
 2   Assignment_Date     450 non-null    object 
 3   Assignment_Type     450 non-null    object 
 4   Amount_Delinquent   152 non-null    float64
 5   Repayment_Amount    450 non-null    float64
 6   mobilePhone1        450 non-null    int64  
 7   emailAddress        450 non-null    object 
 8   BankName            449 non-null    object 
 9   loanAmount          450 non-null    float64
 10  Value_Date          450 non-null    object 
 11  Posting_Date        450 non-null    int64  
 12  transactionId       450 non-null    int64  
 13  BalanceOutstanding  450 non-null    float64
dtypes: float64(4), int64(3), object(7)
memory usage: 49.3+ KB


In [24]:
dca['BankName']=dca['BankName'].astype('category')

## Sort a DataFrame with the sort_values Method I
- The `sort_values` method sorts a **DataFrame** by the values in one or more columns. The default sort is an ascending one (alphabetical for strings).
- The first parameter (`by`) expects the column(s) to sort by.
- If sorting by a single column, pass a string with its name.
- The `ascending` parameter customizes the sort order.
- The `na_position` parameter customizes where pandas places `NaN` values.

In [70]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240,Maryland,2581522
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220,Real Madrid,0
590,Delon Wright,Washington Wizards,G,6-5,185,Utah,8195122


In [74]:
# use the sort values to arrange your data 
nba.sort_values("Salary", ascending=True)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
591,,,,,0,,0
441,Louis King,Philadelphia 76ers,F,6-7,205,Oregon,0
440,Danuel House Jr.,Philadelphia 76ers,F-G,6-6,220,Texas A&M,0
132,Christian Wood,Dallas Mavericks,F,6-9,214,UNLV,0
133,McKinley Wright IV,Dallas Mavericks,G,5-11,192,Colorado,0
...,...,...,...,...,...,...,...
145,Nikola Jokic,Denver Nuggets,C,6-11,284,Mega Basket,47607350
261,LeBron James,Los Angeles Lakers,F,6-9,250,St. Vincent-St. Mary HS (OH),47607350
436,Joel Embiid,Philadelphia 76ers,C-F,7-0,280,Kansas,47607350
461,Kevin Durant,Phoenix Suns,F,6-10,240,Texas,47649433


In [76]:
wd.isna().sum()

systemloanId            0
DCA 2                   0
Assignment_Date         0
Assignment_Type         0
Amount_Delinquent     298
Repayment_Amount        0
mobilePhone1            0
emailAddress            0
BankName                1
loanAmount              0
Value_Date              0
Posting_Date            0
transactionId           0
BalanceOutstanding      0
dtype: int64

In [1]:
wd.sort_values("Amount_Delinquent", na_position=)

SyntaxError: expected argument value expression (2361667655.py, line 1)

## Sort a DataFrame with the sort_values Method II
- To sort by multiple columns, pass the `by` parameter a list of column names. Pandas will sort in the specified column order (first to last).
- Pass the `ascending` parameter a Boolean to sort all columns in a consistent order (all ascending or all descending).
- Pass `ascending` a list to customize the sort order *per* column. The `ascending` list length must match the `by` list.

In [51]:
dca.sort_values(by='Repayment_Amount', ascending=True)

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding
90,306199059397,Pelumi Fatoki,"Oct 7, 2020",91-120,47.0,-62804.67,2348035324129,madu2525@yahoo.com,Zenith Bank,60000.00,"Aug 6, 2021",44414,25123176,32804.67
303,305415063205,un-assigned agent,"Oct 7, 2020",91-120,16.0,-60000.00,2348032632702,oolayode@gmail.com,UBA,181500.00,"Oct 22, 2020",44432,25451611,0.00
347,305029882099,Kehinde Abraham,"Jun 24, 2020",91-120,,-40000.00,2348023176166,peaceenikele@yahoo.com,First Bank,664000.00,"Jul 23, 2021",44403,24855076,126122.62
311,305415063205,un-assigned agent,"Oct 7, 2020",91-120,16.0,-20000.00,2348032632702,oolayode@gmail.com,UBA,181500.00,"Nov 30, 2020",44432,25451610,0.00
297,305415063205,un-assigned agent,"Oct 7, 2020",91-120,16.0,-15520.00,2348032632702,oolayode@gmail.com,UBA,181500.00,"Feb 1, 2021",44432,25451609,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
235,306555065356,Ridwan Akeeb,"Aug 11, 2021",120+,42.0,72066.05,2348065325506,joeaboh961@gmail.com,Standard Chartered,155500.00,"Aug 19, 2021",44427,25366028,0.00
183,303208867,un-assigned agent,"Oct 7, 2020",91-120,35.0,74500.00,2348125712809,segkaa@gmail.com,Zenith Bank,67500.00,"Aug 15, 2021",44423,25289017,0.00
386,305848498565,Stephen Udoh,"Jul 6, 2020",61-90,,76663.88,2347036088491,panycool@yahoo.com,First Bank,76588.34,"Jul 28, 2021",44405,24909724,0.00
208,305114297293,un-assigned agent,"Jul 13, 2020",61-90,,82180.00,2348092000006,yemijewels@gmail.com,Sterling Bank,477000.00,"Aug 18, 2021",44426,25346436,0.00


In [50]:
dca.fillna(0)

TypeError: Cannot setitem on a Categorical with a new category (0), set the categories first

In [44]:
dca['Amount_Delinquent']= dca['Amount_Delinquent'].rank(ascending=False).astype(int)
dca

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

In [36]:
nba.sort_values(['Salary', 'Weight'], ascending=[True, False])
nba.sort_values()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
284,Kenneth Lofton Jr.,Memphis Grizzlies,F,6-6,275,Louisiana Tech,0
418,Wendell Carter Jr.,Orlando Magic,C-F,6-10,270,Duke,0
327,Meyers Leonard,Milwaukee Bucks,F-C,7-0,260,Illinois,0
457,Bismack Biyombo,Phoenix Suns,C,6-8,255,Baloncesto Fuenlabrada,0
268,Tristan Thompson,Los Angeles Lakers,C-F,6-9,254,Texas-Austin,0
...,...,...,...,...,...,...,...
145,Nikola Jokic,Denver Nuggets,C,6-11,284,Mega Basket,47607350
436,Joel Embiid,Philadelphia 76ers,C-F,7-0,280,Kansas,47607350
261,LeBron James,Los Angeles Lakers,F,6-9,250,St. Vincent-St. Mary HS (OH),47607350
461,Kevin Durant,Phoenix Suns,F,6-10,240,Texas,47649433


## Sort a DataFrame by its Index
- The `sort_index` method sorts the **DataFrame** by its index positions/labels.

## Rank Values with the rank Method
- The `rank` method assigns a numeric ranking to each **Series** value.
- Pandas will assign the same rank to equal values and create a "gap" in the dataset for the ranks.

In [52]:
pca = pd.read_excel(file)

In [73]:
pca = pca.dropna()
pca.isnull().sum()

systemloanId          0
DCA 2                 0
Assignment_Date       0
Assignment_Type       0
Amount_Delinquent     0
Repayment_Amount      0
mobilePhone1          0
emailAddress          0
BankName              0
loanAmount            0
Value_Date            0
Posting_Date          0
transactionId         0
BalanceOutstanding    0
dtype: int64

In [75]:
pca['Amount_Delinquent'] = pca['Amount_Delinquent'].astype(int)
pca

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding
1,305919711003,Stephen Udoh,"Oct 7, 2020",91-120,182,2300.00,2348186023049,peresinde68@gmail.com,GT Bank,23000.0,"Aug 1, 2021",44409,25011145,4683.33
5,305420813733,Stephen Udoh,"Jun 24, 2020",91-120,81,1000.00,2348062411335,gracemercymfb@gmail.com,First Bank,8500.0,"Aug 2, 2021",44410,25025917,4380.88
6,305573486175,Stephen Udoh,"Jun 24, 2020",91-120,81,2472.50,2348034914546,mytwinangels2014@yahoo.com,UBA,2500.0,"Aug 2, 2021",44410,25027496,0.00
8,305852396706,Kehinde Abraham,"Jul 6, 2020",61-90,81,2000.00,2348034381543,enobong77@gmail.com,UBA,20000.0,"Aug 2, 2021",44410,25023455,17490.66
9,305831623826,Pelumi Fatoki,"Jul 6, 2020",61-90,81,2000.00,2348022159679,masterchoice36@gmail.com,GT Bank,49500.0,"Aug 2, 2021",44410,25028476,57624.86
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,305580766309,Stephen Udoh,"Jun 24, 2020",91-120,81,1286.25,2348142961129,oshydamiagbolade@gmail.com,FCMB,4000.0,"Jul 31, 2021",44408,24972290,0.00
442,305935346782,Taiwo Abraham,"Jul 6, 2020",61-90,81,2081.00,2348035323478,sjegede495@gmail.com,GT Bank,7000.0,"Jul 31, 2021",44408,24967787,2006.66
446,306636404787,Olalekan Olaokun,"Jan 12, 2021",91-120,81,50.00,2348143758790,ezughaalexander@gmail.com,Diamond Bank,20000.0,"Jul 31, 2021",44408,24974774,22858.16
447,305645708159,Olalekan Olaokun,"Jul 6, 2020",91-120,81,50064.34,2348027181436,rebeccaotaru@gmail.com,EcoBank,80000.0,"Jul 31, 2021",44408,24968763,0.00


In [82]:
pca['Repayment_Amount'].rank(ascending=False)

1      161.5
5      205.0
6      157.0
8      176.0
9      176.0
       ...  
441    196.0
442    166.0
446    231.5
447      4.0
448    154.0
Name: Repayment_Amount, Length: 244, dtype: float64

In [84]:
pca['Ranking'] = pca['Repayment_Amount'].rank(ascending=False).astype(int)
pca.sort_values('Repayment_Amount', ascending=False)

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding,Ranking
386,305848498565,Stephen Udoh,"Jul 6, 2020",61-90,81,76663.88,2347036088491,panycool@yahoo.com,First Bank,76588.34,"Jul 28, 2021",44405,24909724,0.00,1
235,306555065356,Ridwan Akeeb,"Aug 11, 2021",120+,230,72066.05,2348065325506,joeaboh961@gmail.com,Standard Chartered,155500.00,"Aug 19, 2021",44427,25366028,0.00,2
88,306199059397,Pelumi Fatoki,"Oct 7, 2020",91-120,226,62804.67,2348035324129,madu2525@yahoo.com,Zenith Bank,60000.00,"Aug 6, 2021",44414,25123170,32804.67,3
447,305645708159,Olalekan Olaokun,"Jul 6, 2020",91-120,81,50064.34,2348027181436,rebeccaotaru@gmail.com,EcoBank,80000.00,"Jul 31, 2021",44408,24968763,0.00,4
153,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,242,50000.00,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.00,"Aug 12, 2021",44420,25240931,66299.07,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
414,304553375730,Michael Egwuatu,"Jul 6, 2020",61-90,81,-5000.00,2347038175188,kotilaadeyinka@yahoo.com,GT Bank,156000.00,"Mar 19, 2021",44407,24948780,5000.00,239
420,304553375730,Michael Egwuatu,"Jul 6, 2020",61-90,81,-6000.00,2347038175188,kotilaadeyinka@yahoo.com,GT Bank,156000.00,"May 13, 2021",44407,24948778,5000.00,241
410,304553375730,Michael Egwuatu,"Jul 6, 2020",61-90,81,-10000.00,2347038175188,kotilaadeyinka@yahoo.com,GT Bank,156000.00,"Mar 22, 2021",44407,24948779,5000.00,242
347,305029882099,Kehinde Abraham,"Jun 24, 2020",91-120,81,-40000.00,2348023176166,peaceenikele@yahoo.com,First Bank,664000.00,"Jul 23, 2021",44403,24855076,126122.62,243


In [69]:
pca.sort_values(by='Amount_Delinquent', ascending=True)

Unnamed: 0,systemloanId,DCA 2,Assignment_Date,Assignment_Type,Amount_Delinquent,Repayment_Amount,mobilePhone1,emailAddress,BankName,loanAmount,Value_Date,Posting_Date,transactionId,BalanceOutstanding
171,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,3,36000.00,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.0,"Aug 13, 2021",44421,25251188,0.00
153,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,3,50000.00,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.0,"Aug 12, 2021",44420,25240931,66299.07
173,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,3,18299.08,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.0,"Aug 13, 2021",44421,25256796,0.00
154,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,3,35000.00,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.0,"Aug 12, 2021",44420,25240104,66299.07
170,306603605157,Michael Egwuatu,"Aug 11, 2021",120+,3,12000.00,2348039410977,ogungbeifeoluwa@gmail.com,EcoBank,273000.0,"Aug 13, 2021",44421,25255422,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
246,305702635922,Michael Egwuatu,"Jul 6, 2020",61-90,163,200.00,2347036668542,kanmare19@yahoo.com,GT Bank,50000.0,"Aug 20, 2021",44428,25376799,49679.04
247,305837670352,Kehinde Abraham,"Jul 6, 2020",61-90,163,2180.00,2348063816356,sundayoluwafemiomitowoju@gmail.com,First Bank,36500.0,"Aug 20, 2021",44428,25374811,28051.16
251,305903669625,Michael Egwuatu,"Jul 13, 2020",61-90,163,5000.00,2347068497568,rolandmario2@gmail.com,First Bank,27000.0,"Aug 20, 2021",44428,25386851,7546.26
80,305650928119,Pelumi Fatoki,"Jul 6, 2020",91-120,163,5000.00,2348037036726,afnanareef@gmail.com,GT Bank,75000.0,"Aug 6, 2021",44414,25124613,72700.83
