# Data Indexing and Selection

Data indexing and selection refers to the methods used to access specific parts of your data in a structured format like a pandas Series or DataFrame. It's essentially about how you can retrieve, view, or modify particular pieces of information from your dataset.

## Key points about data indexing and selection:

1. It allows you to access specific elements, rows, columns, or subsets of your data.

2. You can select data based on labels (like column names or row indices), numerical positions, or conditions.

3. It's a fundamental operation in data analysis, as it lets you focus on relevant parts of your data for further processing or analysis.

4. In pandas, there are various methods for indexing and selection, including:
   * Using square brackets `[]`
   * Using `.loc[]` for label-based selection
   * Using `.iloc[]` for integer-based selection
   * Boolean indexing for condition-based selection

5. These methods can be combined for more complex data retrieval operations.

In essence, data indexing and selection is about how you "ask" your data structure (like a DataFrame) to give you specific pieces of information. It's similar to how you might look up information in a spreadsheet by specifying row and column coordinates, but with more flexible and powerful options.

**1. Series (One-dimensional data):
A Series is like a single column of data, similar to a list or a one-dimensional array.**

In [16]:
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
s

a    1
b    2
c    3
d    4
e    5
dtype: int64

**--You can access elements in a Series in several ways:**

a. By label(index)

In [20]:
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

print(s['b'])  

2


b. By position

In [22]:
s = pd.Series([1, 2, 3, 4, 5])

print(s[1])  

2


c. By slicing

In [24]:
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

print(s[1:3])  

b    2
c    3
dtype: int64


d. Boolean Indexing

In [28]:
print(s[s > 2]) 

c    3
d    4
e    5
dtype: int64


**2. DataFrame (Two-dimensional data):
A DataFrame is like a table with rows and columns, similar to a spreadsheet.**

In [34]:
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Paris', 'London']
})
df


Unnamed: 0,name,age,city
0,Alice,25,New York
1,Bob,30,Paris
2,Charlie,35,London


**--You can access data in a DataFrame in several ways:**

a) Selecting column

In [46]:
print(df['name'])  # Returns the 'name' column as a Series

0      Alice
1        Bob
2    Charlie
Name: name, dtype: object


b) Selecting row by label using .loc

In [49]:
print(df.loc[0])  # Returns the first row

name       Alice
age           25
city    New York
Name: 0, dtype: object


c) Selecting row by position using .iloc

In [52]:
print(df.iloc[1])  # Returns the second row

name      Bob
age        30
city    Paris
Name: 1, dtype: object


d) Selecting specific cells

In [55]:
print(df.loc[1, 'age'])  # Returns Bob's age (30)

30


e) Slicing

In [58]:
print(df.loc[0:2, ['name', 'city']])  # Returns names and cities for all rows

      name      city
0    Alice  New York
1      Bob     Paris
2  Charlie    London


f) Boolean Indexing

In [61]:
print(df[df['age'] > 30])  # Returns rows where age is greater than 30

      name  age    city
2  Charlie   35  London


## Key points to remember:

1. In a Series, you can use either labels or integer positions to access data.

2. In a DataFrame, use column names to select columns, and `.loc[]` or `.iloc[]` to select rows.

3. `.loc[]` uses labels/index for selection, while `.iloc[]` uses integer positions.

4. You can combine these methods for more complex selections.

5. Boolean indexing works similarly to NumPy, allowing you to select data based on conditions.