### Introduction to Data Indexing and Selection



#### **Basic Indexing and Slicing**
- A `Series` is a one-dimensional array-like object containing an array of data and an associated array of data labels.
- You can select data using labels (like a dictionary) or numerical indices (like a list).

## Data Selection in Series

In [74]:
import pandas as pd
data = pd.Series([0.25,0.45,0.32,1.58],index = ['a','b','c','d'])
print(data)

a    0.25
b    0.45
c    0.32
d    1.58
dtype: float64


**1. Indexing**

Use labels to get specific data

In [77]:
print(data['c'])

0.32


**2. Slicing**

Use labels to slice data

In [80]:
print(data[1:3])

b    0.45
c    0.32
dtype: float64


**3. Masking**

Masking allows to select data based on a condition

In [82]:
print(data[data>0.2])

a    0.25
b    0.45
c    0.32
d    1.58
dtype: float64


**4. Fancy Indexing**

Fancy indexing uses a list of labels

In [87]:
print(data[['a','c']])

a    0.25
c    0.32
dtype: float64


## Data Selection in DataFrame

**DataFrame Introduction**
  
A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data. \
It has labeled axes (rows and columns).

In [103]:
import pandas as pd

data = pd.DataFrame({'area': ['California', 'Texas', 'New York'],
                     'population': [38332521, 26448193, 19651127]})
print(data)

         area  population
0  California    38332521
1       Texas    26448193
2    New York    19651127


#### **Column Selection**
  
Select columns by name

In [106]:
print(data['area'])

0    California
1         Texas
2      New York
Name: area, dtype: object


#### **Row Selection**

Use `.iloc` for row selection by numerical index:
  
  

In [110]:
print(data.iloc[0])

area          California
population      38332521
Name: 0, dtype: object


#### **Conditional Selection**

Use conditions to select rows:

  

In [118]:
print(data[data['population'] > 20000000])

         area  population
0  California    38332521
1       Texas    26448193


#### **Combined Indexing**

Combine masking and fancy indexing



In [122]:
data = pd.DataFrame({'pop': [38332521, 26448193, 19651127],
                     'density': [241.7, 105.2, 411.2]},
                    index=['California', 'Texas', 'New York'])

print(data.loc[data.density > 100, ['pop', 'density']])

                 pop  density
California  38332521    241.7
Texas       26448193    105.2
New York    19651127    411.2


####  **Setting Values**

Modify values using `.iloc`:
 
  

In [140]:
data.iloc[0, 1] = 250.0
print(data)

                 pop  density
California  38332521    250.0
Texas       26448193    105.2
New York    19651127    411.2



#### **Additional Indexing Conventions**
- Slicing for rows:

  

In [132]:
print(data['California':'New York'])


                 pop  density
California  38332521    241.7
Texas       26448193    105.2
New York    19651127    411.2



- Masking for rows:


In [146]:
print(data[data.density > 100])

                 pop  density
California  38332521    250.0
Texas       26448193    105.2
New York    19651127    411.2
