## Notes from Kaggle's "Pandas" microcourse
### Topic: Indexing, Selecting, and Assigning

Native Python objects provide accessible ways of indexing data. 

Consider the DataFrame `reviews` with an attribute of `country`. We can access the property of this column with: `reviews.country` or `reviews['country']`. We are effectively selecting a Series from the DataFrame. 

If we want one entry from the `country` column, we can do: `reviews['country'][0]`. 

Attribute selection follows the "row-first, column-second" rule using `iloc` and `loc`.

In [None]:
#Index-based selection (iloc)

#Selecting the first row of data in a DataFrame:
reviews.iloc[0]

#Selecting the first column of data in a DataFrame: 
reviews.iloc[:, 0]

#Selecting first-second-third row from country column: 
reviews.iloc[:3, 0]
#Note: this selects rows 0-2 (up to, but excluding, row 3)

#Select only second-third entry from country column: 
reviews.iloc[1:3, 0] 

#or we  can pass a list: 
reviews.iloc[[1, 2], 0]

#Negative numbers can be used. 
#Start counting forwards for the *end* of values. 
reviews.iloc[-5:] #Last five elements (rows) of the dataset

In [None]:
#Label-based selection (loc). Use the 'data-index' value. 
#Easier when you do not know the attribute index number, but you know the name. 

#First entry in country:
reviews.loc[0, 'country']

#Select columns with given header names
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]    

### Subtleties to keep in mind

* `iloc`: Uses the Python stdlib indexing scheme
    * First element of range is included, and last is excluded
* `loc`: Indexes inclusively instead. 

#### Conditionals (and, or, isin, isnull):

In [None]:
#Using conditional selection to filter
reviews.country == 'Italy'
#Returns country column with True/False entries

#Only return rows where country == 'Italy'
reviews.loc[reviews.country =='Italy']

# AND / OR CONDITIONALS 
#Can compound conditionals use AND (&) / OR (|)
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]

# BUILT-IN CONDITIONALS 
# IS IN? Can be reproduced using 'OR' command
reviews.loc[reviews.country.isin(['Italy', 'France'])]

# NOT EMPTY: notnull
reviews.loc[reviews.price.notnull()]
#Returns rows which price entry is not empty 