**Master data selection in Pandas by grabbing specific columns, rows (by position and label), and filtering rows based on conditions.**

***(1)df['column_name] to get column*** 

and to get multiple columns
df[['col1','col2']]



***(2).loc[] (label based selection)*** This selects data based on its Index Label (the row name).

df.loc[0] to get row with label 0 ||||

df.loc[0, 'column_name'] = get data at row 0 of column_name

***(3).iloc[] (integer-based-selection)*** This selects data based on its Integer Position (its order, 0, 1, 2...)

df.iloc[0] = gets the first row at position 0 

df.iloc[0,0] = gets the data from row1 , col1

***(4) Boolean Indexing (Filtering)***

This is the exact same concept as NumPy! You create a "mask" of True/False values and pass it into the [] brackets.

mask = df['petal_length']>5


In [2]:
import pandas as pd

url= "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
df = pd.read_csv(url)
print(df)

     sepal_length  sepal_width  petal_length  petal_width    species
0             5.1          3.5           1.4          0.2     setosa
1             4.9          3.0           1.4          0.2     setosa
2             4.7          3.2           1.3          0.2     setosa
3             4.6          3.1           1.5          0.2     setosa
4             5.0          3.6           1.4          0.2     setosa
..            ...          ...           ...          ...        ...
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]


In [3]:
print(df.head())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [6]:
# 1. Select a single column (this returns a Series)

species_col = df['species']
print(species_col.head())

0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: species, dtype: object


In [10]:
#2. Select multiple columns (this returns a DataFrame)

measurements = df[['sepal_length', 'sepal_width']]
print(measurements.head())

   sepal_length  sepal_width
0           5.1          3.5
1           4.9          3.0
2           4.7          3.2
3           4.6          3.1
4           5.0          3.6


# Selecting Rows with .iloc[]

In [18]:
# 1. Get the very first row (position 0)

first_row = df.iloc[0]
print(f"First Row :\n{first_row}\n")

First Row :
sepal_length       5.1
sepal_width        3.5
petal_length       1.4
petal_width        0.2
species         setosa
Name: 0, dtype: object



In [17]:
# 2. Get the first 5 rows (slicing)

first_five_rows = df.iloc[0:5]
print(f"First 5 Rows : \n{first_five_rows}")

First 5 Rows : 
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [20]:
# 3. Get the data at row 0, column 0  [first element]

row_col = df.iloc[0,0]
print(f"First Element : {row_col}")

First Element : 5.1


# Filtering with Boolean Indexing

In [23]:
# 1. Find all flowers of the 'setosa' species

setosa = df[df['species'] == 'setosa']
print(f" Setosa Flowers List :\n{setosa}")

 Setosa Flowers List :
    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
1            4.9          3.0           1.4          0.2  setosa
2            4.7          3.2           1.3          0.2  setosa
3            4.6          3.1           1.5          0.2  setosa
4            5.0          3.6           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
6            4.6          3.4           1.4          0.3  setosa
7            5.0          3.4           1.5          0.2  setosa
8            4.4          2.9           1.4          0.2  setosa
9            4.9          3.1           1.5          0.1  setosa
10           5.4          3.7           1.5          0.2  setosa
11           4.8          3.4           1.6          0.2  setosa
12           4.8          3.0           1.4          0.1  setosa
13           4.3          3.0           1.1          0.1  setosa
14

In [32]:
# 2. Find all flowers with a sepal length > 7.0

long_sepals = df[df['sepal_length']> 7.0]
print(f"Long Sepals : \n{long_sepals}")

Long Sepals : 
     sepal_length  sepal_width  petal_length  petal_width    species
102           7.1          3.0           5.9          2.1  virginica
105           7.6          3.0           6.6          2.1  virginica
107           7.3          2.9           6.3          1.8  virginica
109           7.2          3.6           6.1          2.5  virginica
117           7.7          3.8           6.7          2.2  virginica
118           7.7          2.6           6.9          2.3  virginica
122           7.7          2.8           6.7          2.0  virginica
125           7.2          3.2           6.0          1.8  virginica
129           7.2          3.0           5.8          1.6  virginica
130           7.4          2.8           6.1          1.9  virginica
131           7.9          3.8           6.4          2.0  virginica
135           7.7          3.0           6.1          2.3  virginica


# Types of Tabulate that can be used 
"plain", "simple", "github", "grid", "fancy_grid", "pipe","orgtbl",
"rst", "mediawiki", "latex", "latex_raw","latex_booktabs", "tsv",
"jira", "presto", "youtrack","outline"


In [27]:
# 3. printing the whole

from tabulate import tabulate

print(tabulate(df, headers='keys', tablefmt='github', showindex=False))


|   sepal_length |   sepal_width |   petal_length |   petal_width | species    |
|----------------|---------------|----------------|---------------|------------|
|            5.1 |           3.5 |            1.4 |           0.2 | setosa     |
|            4.9 |           3   |            1.4 |           0.2 | setosa     |
|            4.7 |           3.2 |            1.3 |           0.2 | setosa     |
|            4.6 |           3.1 |            1.5 |           0.2 | setosa     |
|            5   |           3.6 |            1.4 |           0.2 | setosa     |
|            5.4 |           3.9 |            1.7 |           0.4 | setosa     |
|            4.6 |           3.4 |            1.4 |           0.3 | setosa     |
|            5   |           3.4 |            1.5 |           0.2 | setosa     |
|            4.4 |           2.9 |            1.4 |           0.2 | setosa     |
|            4.9 |           3.1 |            1.5 |           0.1 | setosa     |
|            5.4 |          

In [28]:
# 3. printing the whole with index , just change Flase to True 

from tabulate import tabulate

print(tabulate(df, headers='keys', tablefmt='github', showindex=True))

|     |   sepal_length |   sepal_width |   petal_length |   petal_width | species    |
|-----|----------------|---------------|----------------|---------------|------------|
|   0 |            5.1 |           3.5 |            1.4 |           0.2 | setosa     |
|   1 |            4.9 |           3   |            1.4 |           0.2 | setosa     |
|   2 |            4.7 |           3.2 |            1.3 |           0.2 | setosa     |
|   3 |            4.6 |           3.1 |            1.5 |           0.2 | setosa     |
|   4 |            5   |           3.6 |            1.4 |           0.2 | setosa     |
|   5 |            5.4 |           3.9 |            1.7 |           0.4 | setosa     |
|   6 |            4.6 |           3.4 |            1.4 |           0.3 | setosa     |
|   7 |            5   |           3.4 |            1.5 |           0.2 | setosa     |
|   8 |            4.4 |           2.9 |            1.4 |           0.2 | setosa     |
|   9 |            4.9 |           3.1 |   

# Mini-Challenge (Combining Conditions)

Find all 'setosa' flowers that also have a sepal width > 3.5.

Remember the NumPy syntax: & for and, | for or, and () for each condition.

Use & (ampersand) for and

Use | (pipe) for or

In [42]:
challenge = df[(df['species'] == 'setosa') & (df['sepal_width'] > 3.5)]
print(f"Output for the given challenge is : \n{challenge}")

Output for the given challenge is : 
    sepal_length  sepal_width  petal_length  petal_width species
4            5.0          3.6           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
10           5.4          3.7           1.5          0.2  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa
16           5.4          3.9           1.3          0.4  setosa
18           5.7          3.8           1.7          0.3  setosa
19           5.1          3.8           1.5          0.3  setosa
21           5.1          3.7           1.5          0.4  setosa
22           4.6          3.6           1.0          0.2  setosa
32           5.2          4.1           1.5          0.1  setosa
33           5.5          4.2           1.4          0.2  setosa
37           4.9          3.6           1.4          0.1  setosa
44           5.1          3.8           1.9          