- **Indexing in Pandas**
- Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame.
- Indexing can also be known as **Subset Selection**.
- There are alot of ways to pull the elements,rows and columns from a DataFrame. Pandas support four types of Multi-axes indexing which are:
  1. **Dataframe.[]**: also known as indexing operator.
  2. **DataFrame.loc[]**: used for labels
  3. **DataFrame.iloc[]**: used for positions or integer based
  4. **DataFrame.ix[]**: used for both label and integer based
- Collectively, they are called the **indexers**.
- These are four function which help in getting the elements, rows and columns from a DataFrame.

- **Indexing a DataFrame using indexing operator []:**
- Indexing operator is used to refer to the square brackets following an objects. The .loc and .iloc indexers also use the indexing operator to make selections.

- **Selecting a single column**
- In order to select a single column, we simply put the name of the column in-between the brakets.

In [2]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col ='Name')

# retrieving columns by indexing operator
first =data['Age']
print(first)

Name
Avery Bradley    25.0
Jae Crowder      25.0
John Holland     27.0
R.J. Hunter      22.0
Jonas Jerebko    29.0
                 ... 
Shelvin Mack     26.0
Raul Neto        24.0
Tibor Pleiss     26.0
Jeff Withey      26.0
NaN               NaN
Name: Age, Length: 458, dtype: float64


- **Selecting multiple columns**
- In order to select multiple columns, we have to pass a list of columns in an indexing operator.

In [5]:
#Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# Retrieving multiple colummns by indexing operator
first = data[['Age','College','Salary']]
print(first)

                Age            College     Salary
Name                                             
Avery Bradley  25.0              Texas  7730337.0
Jae Crowder    25.0          Marquette  6796117.0
John Holland   27.0  Boston University        NaN
R.J. Hunter    22.0      Georgia State  1148640.0
Jonas Jerebko  29.0                NaN  5000000.0
...             ...                ...        ...
Shelvin Mack   26.0             Butler  2433333.0
Raul Neto      24.0                NaN   900000.0
Tibor Pleiss   26.0                NaN  2900000.0
Jeff Withey    26.0             Kansas   947276.0
NaN             NaN                NaN        NaN

[458 rows x 3 columns]


- **Indexing a DataFrame using .loc[]:**
- This function select data by the **label** of the rows and columns.
- The df.loc indexer selects data in a different way than just the indexing operator.
- It can select subsets of rows or columns and it can also simultaneously select subsets of rows and columns.

- **Selecting a single row**
- In order to select a single row using .loc[], we put a single row label in a .loc function.

In [6]:
# Example
import pandas as pd

# making dataframe from csv file
data = pd.read_csv('nba.csv',index_col = 'Name')

# retrieving row by loc method
first = data.loc['Avery Bradley']
second = data.loc['R.J. Hunter']

print(first, '\n\n\n', second)

Team        Boston Celtics
Number                 0.0
Position                PG
Age                   25.0
Height                 6-2
Weight               180.0
College              Texas
Salary           7730337.0
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


- **Selecting multiple rows**
- In order to select multiple rows, we put the rows labels in a list and pass that to .loc function.

In [7]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# retrieving multiple rows by loc method
first = data.loc[['Avery Bradley', 'R.J. Hunter']]

print(first)

                         Team  Number Position   Age Height  Weight  \
Name                                                                  
Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   

                     College     Salary  
Name                                     
Avery Bradley          Texas  7730337.0  
R.J. Hunter    Georgia State  1148640.0  


- **Selecting multiple rows and multiple columns**
- In order to select multiple rows and columns, we select the rows and columns of interest and put them in separate list. For instance:
   - DataFrame.loc[['row1','row2',---, 'rown'], ['column1','column2',----,'columnz']]


In [10]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')
# retrieving two rows and three columns by loc method
first = data.loc[['Avery Bradley', 'R.J. Hunter'], ['Team','Number','Position']]

print(first)

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
R.J. Hunter    Boston Celtics    28.0       SG


- **Selecting all of the rows and some columns**
- In order to select all of the rows and some columnns, we use single colon [:] to select all of rows and list of some columns which we want to select i.e.
  - Dataframe.loc[:, ['column1,'column2,_____,'columnz']]

In [11]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# retrieving all rows and some columns by loc method
first = data.loc[:,['Team','Number','Position']]

print(first)

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
Jae Crowder    Boston Celtics    99.0       SF
John Holland   Boston Celtics    30.0       SG
R.J. Hunter    Boston Celtics    28.0       SG
Jonas Jerebko  Boston Celtics     8.0       PF
...                       ...     ...      ...
Shelvin Mack        Utah Jazz     8.0       PG
Raul Neto           Utah Jazz    25.0       PG
Tibor Pleiss        Utah Jazz    21.0        C
Jeff Withey         Utah Jazz    24.0        C
NaN                       NaN     NaN      NaN

[458 rows x 3 columns]


- **Indexing a DataFrame using .iloc[]:**
- This function allows us to retrieve rows and columns by position.
- In order to do that, we'll need to specify the positions of the rows that we want, and the positions of the columns that we want as well.
- The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

- **Selecting a single row**
- In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.

In [14]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col  = 'Name')

# retrieving rows by iloc method
row2 = data.iloc[3]

print(row2)

Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


- **Selecting multiple rows**
- In order to select multiple rows, we can pass a list of integer to .iloc[] function.

In [15]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# retrieving multiple rows by iloc method
row2 = data.iloc[[3,5,7]]

row2

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


- **Selecting two rows and two columns**
- In order to select two rows and two columns, we create a list of 2 integer for rows and list of 2 integer for columns then pass to a .iloc[] function.

In [18]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# retrieving two rows and two columns by iloc method
row2 = data.iloc[[3,4],[1,2]]

print(row2)

               Number Position
Name                          
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF


- **Selecting all the rows and some columns**
- In order to select all rows and some columns, we use single colon [:] to select all the rows and for columns we male a list of integer then pass to a .iloc[] function.

In [21]:
# Example
import pandas as pd

data = pd.read_csv('nba.csv', index_col = 'Name')

# retrieving all rows and some columns by iloc method
row2 = data.iloc[:,[1,2]]

print(row2)

               Number Position
Name                          
Avery Bradley     0.0       PG
Jae Crowder      99.0       SF
John Holland     30.0       SG
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF
...               ...      ...
Shelvin Mack      8.0       PG
Raul Neto        25.0       PG
Tibor Pleiss     21.0        C
Jeff Withey      24.0        C
NaN               NaN      NaN

[458 rows x 2 columns]


**Pandas functions and their description**
- **DataFrame.head():** Returns top n rows of a dataframe
- **DataFrame.tail():** Returns bottom n rows of a dataframe.
- **DataFrame.at[]:** Access a single value for a row/column pair by a label.
- **DataFrame.iat[]:** Access a single value for a row/column pair by integer position
- **DataFrame.lookup():** label-based 'fancy indexing' function for DataFrame.
- **DataFrame.pop():** Return item and drop from frame
- **DataFrame.xs():** Returns a cross-section (row(s) or column(s)) from the DataFrame.
- **DataFrame.get():** Get item from object for given key (DataFrame column, panel slice, etc.)
- **DataFrame.isin():** Returns boolean DataFrame showing whether each element in the DataFrame is contained in values.
- **DataFrame.where();** Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
- **DataFrame.mask():** Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other.
- **DataFrame.query():** Qyuery the columns of a frame with a boolean expression.
- **Dataframe.insert():** Insert column into DataFrame at specified location.
  