# Top 3 Pandas Functions You Don't Know About (Probably)
https://towardsdatascience.com/top-3-pandas-functions-you-dont-know-about-probably-5ae9e1c964c8

https://habr.com/ru/company/ruvds/blog/479276/

In [1]:
import numpy as np
import pandas as pd

## idxmin() and idxmax()
In a nutshell, those functions will return the ID (the index position) of the desired entry. Let’s say that I create the following Pandas Series:

In [2]:
 x = pd.Series ([1, 3, 2, 8, 124, 4, 2, 1])

And want to find the index location of the smallest and largest item. Of course, this isn’t hard to figure out just by looking at it, but you will never (and I mean never) have so few data points on your projects.
What that means is that you’ll want to utilize the power of idxmin() and idxmax() functions, and let’s see how:

In [4]:
x.idxmin()

0

In [5]:
x.idxmax()

4

Just keep in mind that the functions will return the index of the first occurrence of the smallest/largest value.

## ne()
This one was a big revelation to me. Some time back I was handling some time-series data at work and had the problem where the first n observations were 0.
For simplicity's sake, think of how you might have bought something but not consumed it for a period of time. The item is in your possession, but as you’re not using it, the consumption at the given date is 0. As I’m only interested in the usage once you actually start to use the damn thing, ne() was the function that saved the day.

Let’s consider the following scenario. You have a Pandas DataFrame object with some observations that are 0 at the start:

In [6]:
df = pd.DataFrame()
df['X'] = [0,0,0,0,0,0,1,3,2,4,3,12,7]

In [7]:
df

Unnamed: 0,X
0,0
1,0
2,0
3,0
4,0
5,0
6,1
7,3
8,2
9,4


Now what ne() would do, is to return True if the current value isn’t the one you’ve specified (let’s say 0), and False otherwise:

In [8]:
df['X'].ne(0)

0     False
1     False
2     False
3     False
4     False
5     False
6      True
7      True
8      True
9      True
10     True
11     True
12     True
Name: X, dtype: bool

This by itself isn’t useful much. Do you remember how I’ve said you need to know idxmax() to understand the point of this? Well, I wasn’t lying, you can chain idxmax() to the code from above:

In [9]:
df['X'].ne(0).idxmax()

6

So at index position 6, we have a first non-zero observation. Once again this doesn’t provide much value to you. The good thing is that we can use this information to subset the DataFrame to show only the values starting at when the item was first used:

In [12]:
df.loc[df['X'].ne(0).idxmax():]

Unnamed: 0,X
6,1
7,3
8,2
9,4
10,3
11,12
12,7


And this comes in handy so many times you’re dealing with time-series data.

In [14]:
# amount of non-zero elements:
df['X'].ne(0).sum()

7

## nsmallest() and nlargest()
I’m guessing that you can conclude just from the names what those two functions are all about. Let’s say that I create the following DataFrame object:

In [16]:
df = pd.DataFrame({
'Name': ['Bob', 'Mark', 'Steph', 'Jess', 'Becky'],
'Points': [55, 98, 46, 77, 81]
})

In [17]:
df

Unnamed: 0,Name,Points
0,Bob,55
1,Mark,98
2,Steph,46
3,Jess,77
4,Becky,81


Just for fun let’s say that those 5 are observations of points achieved after writing a test. You’re interested in finding out which 3 students performed the worst:

In [18]:
df.nsmallest(3, 'Points')

Unnamed: 0,Name,Points
2,Steph,46
0,Bob,55
3,Jess,77


Or which 3 students performed the best:

In [19]:
df.nlargest(3, 'Points')

Unnamed: 0,Name,Points
1,Mark,98
4,Becky,81
3,Jess,77


Those two functions are a nifty substitute for functions like sort_values().