## DataFrame.loc[]
In Pandas, the Dataframe provides a property `loc[]`, to select the __subset of a DataFrame based on row and column names/labels__. We can choose single or multiple rows/columns using it.

Syntax:
```
Dataframe.loc[row_segment , column_segment]
Dataframe.loc[row_segment]
```

###### The column_segment argument is optional. Therefore, if column_segment is not provided, loc [] will select the subset of Dataframe based on row_segment argument only.

In [2]:
import pandas as pd

# List of Tuples
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])
                   
print(df)

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US


### Row_segement
* It contains information about the rows to be selected. Its value can be,
* A single label like ‘A’ or 7 etc.
* In this case, it selects the single row with given label name.
* For example, if ‘B’ only is given, then only the row with label ‘B’ is selected from Dataframe.
* A list/array of label names like, [‘B’, ‘E’, ‘H’]
* In this case, multiple rows will be selected based on row labels given in the list.
* For example, if [‘B’, ‘E’, ‘H’] is given as argument in row segment, then the rows with label name ‘B’, ‘E’ and ‘H’ will be selected.
* A slice object with ints like -> a:e .
* This case will select multiple rows i.e. from row with label a to one before the row with label e.
* For example, if ‘B’:’E’ is provided in the row segment of loc[], it will select a range of rows from label ‘B’ to one before label ‘E’
* For selecting all rows, provide the value ( : )
* A boolean sequence of same size as number of rows.
* In this case, it will select only those rows for which the corresponding value in boolean array/list is True.
* A callable function :
* It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

### Column_segement
* It is optional.
* It contains information about the columns to be selected. Its value can be,
* A single label like ‘A’ or 7 etc.
* In this case, it selects the single column with given label name.
* For example, if ‘Age’ only is given, then only the column with label ‘Age’ is selected from Dataframe.
* A list/array of label names like, [‘Name’, ‘Age’, ‘City’]
* In this case, multiple columns will be selected based on column labels given in the list.
* For example, if [‘Name’, ‘Age’, ‘City’] is given as argument in column segment, then the columns with label names ‘Name’, ‘Age’, and ‘City’ will be selected.
* A slice object with ints like -> a:e .
* This case will select multiple columns i.e. from column with label a to one before the column with label e.
* For example, if ‘Name’:’City’ is provided in the column segment of loc[], it will select a range of columns from label ‘Name’ to one before label ‘City’
* For selecting all columns, provide the value ( : )
* A boolean sequence of same size as number of columns.
* In this case, it will select only those columns for which the corresponding value in boolean array/list is True.
* A callable function :
* It can be a lambda function or general function that accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

### Returns
It returns a reference to the selected subset of the dataframe based on the provided row and column names.  
* If `column_segment` is not provided, it returns the subset of the `DataFrame` containing only selected rows based on the row_segment argument.

### Error Scenarios
`Dataframe.loc[row_sgement, column_segement]` will give a `KeyError`, if any label name provided is invalid.

## Select a Few Rows from DataFrame: Include All Column Values


### Select a Single Row of the Dataframe
To select a row from the dataframe, pass the row name to the `loc[]`. For example,

In [None]:
# Select row at with label name '2'
row = df.loc['c']

print(row)

###### This row `c` is a Series object in and of itself. It returned the row with label name `c` from the Dataframe, as a Series object.

### Select Multiple Rows from DataFrame Based on List of Names
You can pass a list of row label names to the `row_segment` of `loc[]`. It will return a subset of the DataFrame containing **only** mentioned rows.

In [None]:
# Select multiple rows from Dataframe by label names
subsetDf = df.loc[ ['c', 'f', 'a']]

print(subsetDf)

###### It returned a subset of the Dataframe containing only three rows with labels ‘c’, ‘f’ and ‘a’.



### Select Multiple Rows from DataFrame Based on Name Range
Pass an name range -> `start:end` in row segment of loc. It will return a `subset` of the DataFrame containing **only** the rows from name start to end from the original dataframe. 


In [None]:
# Select rows of Dataframe based on row label range
subsetDf = df.loc[ 'b' : 'f']

print(subsetDf)

### Select Rows of DataFrame Based on Boolean Array
Pass a `boolean array/list` in the row segment of `loc[]`. It will return a subset of the Dataframe containing only the rows for which the corresponding value in the boolean array/list is `True`.

In [3]:
# Select rows of Dataframe based on bool array
subsetDf = df.loc[ [True, False, True, False, True, False]]

print(subsetDf)

    Name  Age      City    Country
a   jack   34    Sydeny  Australia
c  Vikas   31    Mumbai      India
e   John   16  New York         US


### Select Rows of Dataframe Based on Callable Function
Create a lambda function that accepts a `DataFrame` as an argument, applies a condition on a *column* (ex. Age), and returns a `bool list`. This bool list will contain True **only** for those rows where the condition is True. If you pass that lambda function to `loc[]`, only those rows will be selected for which condition returns True in the list. 

For example, select only those rows where column `‘Age’` has a value of more than `25`:


In [None]:
# Select rows of Dataframe based on callable function
subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist()]

print(subsetDf)

## Select Multiple Columns from a DataFrame
You can add the `':'` in the *row* segment argument of `DataFrame.loc[]` to select all rows (which is what we want), but only certain columns (ex. Age).


### Select a Single Column of a DataFrame
To select a column from the DataFrame, pass the column name to the column segment of the `loc[]` method.

In [None]:
# Select single column from Dataframe by column name
column = df.loc[:, 'Age']

print(column) # Returns a single column as a series object

### Select Multiple Columns from DataFrame Based on List/Range of Names
Pass a list of `column names` to the column segment of `loc[]`. It will return a subset of the Dataframe containing only mentioned columns. 


In [None]:
# Select multiple columns from Dataframe based on LIST "[array]" of names
subsetDf = df.loc[:, ['Age', 'City', 'Name']]

print(subsetDf)

# Select multiple coolumns from a DataFrame based on a RANGE of names
subsetDf = df.loc[:, 'Name' : 'City']
print (subsetDf)

### Select Columns of DataFrame Based on Bool Array
Pass a boolean array/list in the column segment of `loc[]`. It will return a subset of the DataFrame containing **only** the columns for which the corresponding value in the boolean array/list is True.

###### Note: we still include the `':'` to make sure all the columns are selected (think of the two parameters as x (row) and y(column) coordinates)


In [None]:
# Select columns of Dataframe based on bool array
subsetDf = df.iloc[:, [True, True, False, False]]

print(subsetDf)

It is important to remember that **the boolean list must be the same length as the rows or columns** (depending on which segment of `loc[]` you pass it through)

## Select a Subset of DataFrame
Provide the row *and* column segment arguments of the `DataFrame.loc[]`. It will return a subset of a DataFrame based on the row and column names provided in row and column segments of `loc[]`.



### Select a Cell value from Dataframe
To select a single cell value from the dataframe, just pass the row and column name in the row and column segment of `loc[]`. 

In [None]:
# Select a Cell value from Dataframe by row and column name
cellValue = df.loc['c','Name']

print(cellValue)

### Select a Subset of a DataFrame Based on row/column Names in List
You can provide multiple row and column specification parameters in the `loc[]` method to retrieve a subset from the DataFrame.


In [None]:
# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]

print(subsetDf)

## Changing the Values of a DataFrame using `loc[]`
`loc[]` returns a `view` object, so **any changes made in the returned subset will be reflected in the original Dataframe object**.

In [None]:
print(df)

# Change the contents of row 'C' to 0
df.loc['c'] = 0

print(df)