# Tutorial 3: Slicing and Extracting Data in Pandas

In this tutorial, we'll explore different ways to subset, filter, and isolate data in your DataFrames using Pandas.

## Isolating One Column Using `[]`

You can isolate a single column by using square brackets `[]` with the column name. The output is a Pandas `Series` object.

```python
df['Outcome']
```
This isolates the 'Outcome' column from the DataFrame.

In [3]:
import pandas as pd
df = pd.read_csv("diabetes.csv")
df['Age']
# Isolating one column in Pandas

0      50
1      31
2      32
3      21
4      33
       ..
763    63
764    27
765    30
766    47
767    23
Name: Age, Length: 768, dtype: int64

## Isolating Two or More Columns Using `[[]]`

You can provide a list of column names inside the square brackets to fetch more than one column.

```python
df[['Pregnancies', 'Outcome']]
```
This isolates the 'Pregnancies' and 'Outcome' columns.

In [5]:
df[['Pregnancies', 'Age', 'Outcome']]
# Isolating two columns in Pandas

Unnamed: 0,Pregnancies,Age,Outcome
0,6,50,1
1,1,31,0
2,8,32,1
3,1,21,0
4,0,33,1
...,...,...,...
763,10,63,0
764,2,27,0
765,5,30,0
766,1,47,1


## Isolating One Row Using `[]`

A single row can be fetched by passing in a boolean series with one `True` value.

```python
df[df.index == 1]
```
This isolates the row with index 1.

In [6]:
df.head(n=5)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,,35,,33.6,0.627,50,1
1,1,85,,29,0.0,26.6,0.351,31,0
2,8,183,64.0,0,0.0,23.3,0.672,32,1
3,1,89,66.0,23,94.0,28.1,0.167,21,0
4,0,137,40.0,35,168.0,43.1,2.288,33,1


In [3]:
df[df.index == 1]
# Isolating one row in Pandas

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
1,1,85,,29,0.0,26.6,0.351,31,0


## Isolating Two or More Rows Using `[]`

You can return two or more rows using the `.isin()` method.

```python
df[df.index.isin(range(2, 10))]
```
This isolates rows from index 2 to 9.

In [8]:
df[df.index.isin(range(2, 10))]
# Isolating specific rows in Pandas

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
2,8,183,64.0,0,0.0,23.3,0.672,32,1
3,1,89,66.0,23,94.0,28.1,0.167,21,0
4,0,137,40.0,35,168.0,43.1,2.288,33,1
5,5,116,74.0,0,0.0,25.6,0.201,30,0
6,3,78,50.0,32,,31.0,0.248,26,1
7,10,115,0.0,0,0.0,35.3,0.134,29,0
8,2,197,70.0,45,543.0,30.5,0.158,53,1
9,8,125,96.0,0,0.0,0.0,0.232,54,1


## Using `.loc[]` and `.iloc[]` to Fetch Rows

You can fetch specific rows by labels or conditions using `.loc[]` and `.iloc[]` ("location" and "integer location").

```python
df.loc[1]
```
This fetches the row with label 1.

```python
df.iloc[1]
```
This fetches the second row (index 1).

In [9]:
df.loc[1]
# Fetching a row by label with .loc[]

Pregnancies                  1.000
Glucose                     85.000
BloodPressure                  NaN
SkinThickness               29.000
Insulin                      0.000
BMI                         26.600
DiabetesPedigreeFunction     0.351
Age                         31.000
Outcome                      0.000
Name: 1, dtype: float64

In [11]:
df[df.index==1]

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
1,1,85,,29,0.0,26.6,0.351,31,0


In [10]:
df.iloc[1]
# Fetching a row by position with .iloc[]

Pregnancies                  1.000
Glucose                     85.000
BloodPressure                  NaN
SkinThickness               29.000
Insulin                      0.000
BMI                         26.600
DiabetesPedigreeFunction     0.351
Age                         31.000
Outcome                      0.000
Name: 1, dtype: float64

## Fetching Multiple Rows with `.loc[]` and `.iloc[]`

You can also fetch multiple rows by providing a range.

```python
df.loc[100:110]
```
This fetches rows from index 100 to 110.

```python
df.iloc[100:110]
```
This fetches the rows from position 100 to 109.

In [12]:
df.loc[100:110]
# Fetching multiple rows with .loc[]

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
100,1,163,72.0,0,0.0,39.0,1.222,33,1
101,1,151,60.0,0,0.0,26.1,0.179,22,0
102,0,125,96.0,0,0.0,22.5,0.262,21,0
103,1,81,72.0,18,40.0,26.6,0.283,24,0
104,2,85,65.0,0,0.0,39.6,0.93,27,0
105,1,126,56.0,29,152.0,28.7,0.801,21,0
106,1,96,122.0,0,0.0,22.4,0.207,27,0
107,4,144,58.0,28,140.0,29.5,0.287,37,0
108,3,83,58.0,31,18.0,34.3,0.336,25,0
109,0,95,85.0,25,36.0,37.4,0.247,24,1


In [8]:
df.iloc[100:110]
# Fetching multiple rows with .iloc[]

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
100,1,163,72.0,0,0.0,39.0,1.222,33,1
101,1,151,60.0,0,0.0,26.1,0.179,22,0
102,0,125,96.0,0,0.0,22.5,0.262,21,0
103,1,81,72.0,18,40.0,26.6,0.283,24,0
104,2,85,65.0,0,0.0,39.6,0.93,27,0
105,1,126,56.0,29,152.0,28.7,0.801,21,0
106,1,96,122.0,0,0.0,22.4,0.207,27,0
107,4,144,58.0,28,140.0,29.5,0.287,37,0
108,3,83,58.0,31,18.0,34.3,0.336,25,0
109,0,95,85.0,25,36.0,37.4,0.247,24,1


## Selecting Specific Columns Along with Rows

You can select specific columns along with rows using `.loc[]` and `.iloc[]`.

```python
df.loc[100:110, ['Pregnancies', 'Glucose', 'BloodPressure']]
```
This fetches rows from index 100 to 110 for the specified columns.

```python
df.iloc[100:110, :3]
```
This fetches rows from position 100 to 109 and the first three columns.

In [9]:
df.loc[100:110, ['Pregnancies', 'Glucose', 'BloodPressure']]
# Fetching specific columns with .loc[]

Unnamed: 0,Pregnancies,Glucose,BloodPressure
100,1,163,72.0
101,1,151,60.0
102,0,125,96.0
103,1,81,72.0
104,2,85,65.0
105,1,126,56.0
106,1,96,122.0
107,4,144,58.0
108,3,83,58.0
109,0,95,85.0


In [None]:
df.iloc[100:110, :3]
# Fetching specific columns with .iloc[]

## Conditional Slicing

Pandas lets you filter data by conditions over row/column values.

```python
df[df.BloodPressure == 122]
```
This selects rows where `BloodPressure` is exactly 122.

```python
df.loc[df['BloodPressure'] > 100, ['Pregnancies', 'Glucose', 'BloodPressure']]
```
This fetches `Pregnancies`, `Glucose`, and `BloodPressure` for all records with `BloodPressure` greater than 100.

In [15]:
df.loc[df['BloodPressure'] > 100, ['BloodPressure', 'Glucose', 'BMI']]
# Isolating rows based on a condition in Pandas

Unnamed: 0,BloodPressure,Glucose,BMI
43,110.0,171,45.4
84,108.0,137,48.8
106,122.0,96,22.4
177,110.0,129,67.1
207,104.0,162,37.7
362,108.0,103,39.2
369,102.0,133,32.8
440,104.0,189,34.3
549,110.0,189,28.5
658,106.0,127,39.0


In [11]:
df.loc[df['BloodPressure'] > 100, ['Pregnancies', 'Glucose', 'BloodPressure']]
# Isolating rows and columns based on a condition in Pandas

Unnamed: 0,Pregnancies,Glucose,BloodPressure
43,9,171,110.0
84,5,137,108.0
106,1,96,122.0
177,0,129,110.0
207,5,162,104.0
362,5,103,108.0
369,1,133,102.0
440,0,189,104.0
549,4,189,110.0
658,11,127,106.0
