**Row Indexes**

Both rows and columns in a DataFrame have labels called indexes. In the vehicles DataFrame

- rows are labelled by numbers (0, 1, 2, etc.)
- columns are labelled with text (id, model, year, etc.)

If we talk about “the index” of a DataFrame, we’ll always be referring to the row index.

**Default Index**

By default, DataFrames come with a RangeIndex where the first row is labelled 0, the second row is labelled 1, and so on.

To confirm this, we can call the .index attribute on a DataFrame:

In [115]:
import pandas as pd
dataframe = pd.read_csv('filename.csv')
dataframe.index

RangeIndex(start=0, stop=10, step=1)

The above output indicates that the row labels: 

- start at 0
- increase by 1 for each row
- end before reaching the total number of rows (stop=10)

Resetting an index can be done with the .reset_index() method

**Sorting Rows**

Exploring datasets is easier if the rows have a predictable structure. For example, the a dataset would be simpler to inspect if we knew that the models were listed in chronological order. The syntax to sort a DataFrame df in pandas by a particular column is:

In [116]:
df = dataframe.sort_values(
  by = 'Games Played', #column name
  #ascending = False #(if we want to order in a descending fashion instead)
)

df

Unnamed: 0,Player Name,Games Played,Points Per Game,Assists Per Game,Rebounds Per Game,Steals Per Game,Field Goal Percentage
1,Player 2,53,28.8,8.7,11.0,1.0,46.0
6,Player 7,54,24.7,10.0,11.4,2.8,51.3
4,Player 5,58,18.0,9.4,11.5,2.5,44.6
0,Player 1,60,19.6,9.0,2.5,1.9,58.6
2,Player 3,67,27.0,9.3,3.3,1.1,42.0
7,Player 8,67,11.6,5.6,10.0,1.8,54.0
8,Player 9,78,10.3,6.7,2.2,1.3,40.3
3,Player 4,79,27.6,1.6,11.9,1.8,57.5
5,Player 6,81,28.3,1.7,9.4,1.7,53.5
9,Player 10,81,5.5,1.0,10.7,2.5,53.9


**Comparison Operators in Pandas**

Pandas can perform comparisons for an entire column at once using the comparison operators (==, !=, >, <, >=, <=>). For example, to determine which rows of the Games Played column contain 81, we would use the code:

In [117]:
dataframe['Games Played'] == 81

0    False
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8    False
9     True
Name: Games Played, dtype: bool

 What we usually want is to actually filter the DataFrame down to only those rows.  We would do this with the following syntax: 

In [118]:
dataframe[dataframe['Games Played'] == 81] # might be better to set everything inside the brackets to a variable first   

Unnamed: 0,Player Name,Games Played,Points Per Game,Assists Per Game,Rebounds Per Game,Steals Per Game,Field Goal Percentage
5,Player 6,81,28.3,1.7,9.4,1.7,53.5
9,Player 10,81,5.5,1.0,10.7,2.5,53.9


We can even combine multiple operators to check for more than one thing to be True or False at a time to filter further:

In [119]:
dataframe[(dataframe['Games Played'] == 81) & (dataframe['Points Per Game'] > 6)]

Unnamed: 0,Player Name,Games Played,Points Per Game,Assists Per Game,Rebounds Per Game,Steals Per Game,Field Goal Percentage
5,Player 6,81,28.3,1.7,9.4,1.7,53.5


Note that pandas uses | instead of or for combining comparisons

In [120]:
dataframe[(dataframe['Assists Per Game'] > 9) | (dataframe['Points Per Game'] > 27)]

Unnamed: 0,Player Name,Games Played,Points Per Game,Assists Per Game,Rebounds Per Game,Steals Per Game,Field Goal Percentage
1,Player 2,53,28.8,8.7,11.0,1.0,46.0
2,Player 3,67,27.0,9.3,3.3,1.1,42.0
3,Player 4,79,27.6,1.6,11.9,1.8,57.5
4,Player 5,58,18.0,9.4,11.5,2.5,44.6
5,Player 6,81,28.3,1.7,9.4,1.7,53.5
6,Player 7,54,24.7,10.0,11.4,2.8,51.3


We can use ~ to represent not/non.  If we have already defined a variable, we can put ~ in front of it to get the opposite result:

In [121]:
is_elite = dataframe['Points Per Game'] > 27

is_elite

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7    False
8    False
9    False
Name: Points Per Game, dtype: bool

In [122]:
~is_elite

0     True
1    False
2     True
3    False
4     True
5    False
6     True
7     True
8     True
9     True
Name: Points Per Game, dtype: bool