In this class, you'll learn all about indexing, selecting and assigning in pandas, the most popular Python library for data analysis.
In this lecture, we include:
1. Native accessors
2. Indexing in Pandas
3. Manipulating the index
4. Conditional selection
5. Assigning data
___

# Pre-lecture

Almost every data operation begins with selecting specific values. For this reason, quickly and effectively choosing the right data points from a pandas DataFrame or Series is a vital first skill to learn.

#### Assignment 0
Write the code to read the **winemag-data-130k-v2.csv** file into a DataFrame named **reviews**

In [None]:
#Write your code below:

Solution:

In [None]:
import pandas as pd
reviews = pd.read_csv("Datasets/winemag-data-130k-v2.csv", index_col=0)

<br>
<br>

In [None]:
#Run this code to have a preview of the dataset we are going to work with:
reviews

___

# Native accessors

In Python, we can access the property of an object by accessing it as an attribute.

In [None]:
#To access the country property of reviews we can use
reviews.country

There is one more way to do the same.<br>
Just as you use the [] operator to access values in a Python dictionary, you can use the same indexing to select columns in a DataFrame.

In [None]:
#Using []
reviews['country']

**Note:** The indexing operator [] does have the advantage that it can handle column names with reserved characters in them

To access a single specific value, simply apply the indexing operator ([]) a second time.

In [None]:
#To access the 'country' of the first entry in the dataset
reviews['country'][0]

___

# Indexing in Pandas

The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, **loc** and **iloc**. For more advanced operations, these are the ones you're supposed to be using.

#### Index-based selection
This allows you to pick data based on its numerical location in the data frame.<br> 
*iloc* adheres to this philosophy.<br>
*iloc* is row-first, column-second.<br>
Also, it follows 0-indexing.

In [None]:
#To get first row from the dataset 'reviews':
reviews.iloc[0]

In [None]:
#To get the first column column, we use the following code:
reviews.iloc[:, 0]

**Note:** The colon operator ':' is like the one we use in native Python. It can be used to indicate a range of values. It can be used for both columns and rows<br> 
*.iloc[]* uses integer positions (0-based) for indexing and slicing, similar to standard Python lists and NumPy arrays. The slice start:stop includes the start position but excludes the stop position. 

In [None]:
#To get first three row values from the column 'country' (column no. 0) in reviews:
reviews.iloc[:3, 0]

#### Assignment 1
Write the code to get the the second and third countries from the DataFrame named **reviews**

In [None]:
#Write you code below:

In [None]:
#Solution:
reviews.iloc[1:3, 0]

We can also pass lists as an argument

In [None]:
#To get first three row values from the column 'country' (column no. 0) in 'reviews':
reviews.iloc[[0, 1, 2], 0]

Observe that we acheived the same using: 'reviews.iloc[:3, 0]'

Lastly, we can also use negative numbers for selection and it will start counting forwards from the end of the values

In [None]:
#The following code will return the last 3 row entries from the dataframe
reviews.iloc[-3:]

#### Label-based selection
The second model of attribute selection is that of the *loc* operator. In this model, it is the data index value which is important, not its place in the array.
*iloc* adheres to this philosophy.<br>
*iloc* is row-first, column-second.<br>
Also, it follows 0-indexing.

In [None]:
#To get first three row values from the column 'country' in 'reviews'
reviews.loc[:2, 'country']

**Note:** The colon operator ':' is like the one we use in native Python. It can be used to indicate a range of values. It can be used for both columns and rows<br> 
*.loc[]* uses labels for indexing and slicing. The slice start:stop includes both the start and stop labels in the result.<br>

___

# Manipulating the Index

Label-based selection utilizes the labels present in the index as its main source of strength. What is crucial is that the index we are using is not fixed. We have the freedom to alter the index in any manner we think is appropriate.

In [None]:
reviews.set_index('title')

___

# Conditional Selection

During the indexing process of different data types, we have relied on the inherent structural features of the DataFrame. But to go beyond the simple data manipulation and carry out interesting things with the data, we need to frequently raise questions that come with certain conditions.

In [None]:
#To get all the wines which are Italian
reviews.country == 'Italy'


This operation produced a Series of True/False booleans based on the country of each record. <br>This result can then be used inside of loc to select the relevant data:

In [None]:
reviews.loc[reviews.country == 'Italy']

**For cases when we need multiple conditions to be true:**
For multiple conditions, we need to use ampersand ('&').<br>
Syntax: reviews.loc[(cond1) & (condn2)]

Example:<br>
We can also fetch all such wines that are better than average. <br>
Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points are better than average.<br>
To fetch all Italian wines that are better than average:

In [None]:
reviews.loc[(reviews.country == 'Italy') & (reviews.points>=90)]

**For cases when we need one of the multiple conditions to be true:** <br>
For such situations, we need to use pipe ('|').<br>
Syntax: reviews.loc[(cond1) | (condn2)]

Example:<br>
To fetch all wines that are either Italian or better than average:

In [None]:
reviews.loc[(reviews.country == 'Italy') | (reviews.points>=90)]

#### Built-in conditional selectors

Pandas comes with a few built-in conditional selectors, three of which we will highlight here.

**isin:** <br>
isin is lets you select data whose value "is in" a list of values.<br> 
For example, here's how we can use it to select wines only from Italy or France:

In [None]:
reviews.loc[reviews.country.isin(['Italy', 'France'])]

**isnull:** <br>
This methods lets you highlight values which are empty (NaN).
For example, to filter out wines lacking a price tag in the dataset, here's what we would do:

In [None]:
reviews.loc[reviews.price.isnull()]

**notnull:** <br>
This methods lets you highlight values which are not empty (NaN).
For example, to filter out wines with a price tag in the dataset, here's what we would do:

In [None]:
reviews.loc[reviews.price.notnull()]

___

# Assigning Data:

Assigning data to a DataFrame is easy.<br> You can assign either a constant value:

In [None]:
reviews['critic']= 'everyone'
reviews
#Observe that a new column 'critic' is added at the end

Or with an iterable of values:

In [None]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews
#Observe that a new column 'index_backwards' is added at the end