# Indexing, Selecting & Assigning

In [1]:
import pandas as pd
titanic = pd.read_csv("../Datasets/titanic.csv")
health = pd.read_csv("../Datasets/Health_heart_experimental.csv")

## Native accessors

Native Python objects provide good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.

In [None]:
health

In Python, we can access the property of an object by accessing it as an attribute.

In [None]:
health.age

We can access its values using the indexing ([]) operator. 

In [None]:
health['BMI']

The key advantages of indexing operator is it can handle whitespaces.

TO spot a specific value:

In [None]:
health['BMI'][0]

## Indexing in pandas

Pandas has two acessor operator: loc and iloc

In [None]:
health.iloc[0]

Both loc and iloc are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.<br>
To get a column with iloc, we can do the following:

In [None]:
health.iloc[:,0]

On its own, the : operator, which also comes from native Python, means "everything". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the country column from just the first, second, and third row, we would do:

In [None]:
titanic.iloc[:3,0]

Or, to select just the second and third entries, we would do:

In [None]:
titanic.iloc[1:3,4]

It's also possible to pass a list:

In [None]:
titanic.iloc[[3,4,5],3]

Finally, it's worth knowing that negative numbers can be used in selection. This will start counting forwards from the end of the values.

In [None]:
titanic.iloc[-5:]

#### Label-based selection

The second paradigm for attribute selection is the one followed by the loc operator: label-based selection. In this paradigm, it's the data index value, not its position, which matters.

In [None]:
titanic.loc[0,'Name']

iloc is conceptually simpler than loc because it ignores the dataset's indices. When we use iloc we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. loc, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using loc instead. For example, here's one operation that's much easier using loc:

In [None]:
titanic.loc[:,['Name','Cabin']]

## Manipulating the index

set_index can be used to make a local data index into the index, can be used if the data has a better index.

In [None]:
titanic.set_index('Name')

## Conditional selection

FOR EXAMPLE: If we want to check whether or not the person in titanic is male, we can check it by:

In [None]:
titanic.Sex == 'male'

This operation can be used to check whether the data Sex is male or not, as it returns a series of bool acoording to it.<br>
It can be used in loc to select only relevant data:

In [None]:
titanic.loc[titanic.Sex == 'male']

The logical operation like 'OR', 'AND' can be performed. <br>
For Example: We want to select only male and greater than 25 years.

In [None]:
titanic.loc[(titanic.Sex == 'male') & (tireviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)tanic.Age > 25.0)]

In [None]:
titanic.loc[(titanic.Sex == 'male') | (titanic.Age < 10)]

Pandas come in with some built-in functions for conditional selectors. <br>
The first is isin. isin is lets you select data whose value "is in" a list of values.

In [None]:
titanic.loc[titanic.Age.isin([20.0, 19.0])]

The second is isnull (and its companion notnull). These methods let you highlight values which are (or are not) empty (NaN)

In [None]:
titanic.loc[titanic.Age.isnull()]

## Assigning data

We can either assign a constant value or iterate values.

In [None]:
titanic['Sex'] = 'Male'
titanic['Sex']

In [None]:
titanic['PassengerId'] = range(0,len(titanic),1)
titanic['PassengerId']

---