# <font color="green">Selecting Rows and Columns</font>

----------------------------------



## Columns: Selecting and Working with Columns from a DataFrame

### Selecting Columns
- You can select columns in Pandas using simple indexing with column names or with the `df[['column1', 'column2']]` syntax for multiple columns.
- Selecting columns is a fundamental skill because it is often the first step in many data manipulation tasks.

### Working with Columns
- Once columns are selected, they can be used for various operations such as computation, aggregation, and combination with other data.
- Tasks such as renaming columns, handling missing values, or changing the data type of columns are essential for further analysis.

## Rows: Understanding the Difference Between `loc` and `iloc` Methods, and Selecting Rows from a DataFrame

### Difference Between `loc` and `iloc`
- **`loc`**: Used to access a group of rows and columns by labels or a boolean array. It is label-based, requiring specification of the row and column labels.
- **`iloc`**: Used for integer index-based access, requiring row and column indices for selection.

### Selecting Rows
- Selecting rows is a process akin to selecting columns, typically accomplished using `loc` or `iloc`.
- This is especially useful for filtering data based on some condition or when dealing with a specific range of indices.

Understanding and mastering the manipulation of rows and columns in Pandas is pivotal for effective data cleaning, preparation, and analysis. These skills are integral to performing a broad spectrum of data analysis tasks with efficiency and precision.


------------------

### Key Concepts
It is an essential part of working with data to able to select specific parts of a dataset. For example in order to fill in missing data in a particular column(s) this skill comes in very handy. This means that you should be very comfortable with the syntax of selecting rows and columns.

| Command                       | Description                                           |
|-------------------------------|-------------------------------------------------------|
| df[col]                       | select one column as a Series                         |
| df[[col]]                     | select one column as a DataFrame                      |
| df[[col1, col2, ... ]]        | select 2+ columns as a DataFrame                      |
| df['column_name'] = new_values| assign new values to the column                        |
| df.drop()                     | drop specified rows or columns                        |
| df['column'].astype()         | cast a pandas column to a specified dtype             |
| df.loc[row]                   | select one row as a Series by index                   |
| df.loc[[row1, row2]]          | select 1+ rows as a DataFrame by index                |
| df.loc[[row], [col]]          | select rows and columns as a DataFrame by index       |
| df.iloc[a:b, c:d]             | select rows/columns by integer-location               |
| df.set_index()                | set selected column as index                           |


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('./data/penguins_simple.csv', sep=";")

In [3]:
# look at the dataframe

df

Unnamed: 0,Species,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex
0,Adelie,39.1,18.7,181.0,3750.0,MALE
1,Adelie,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,36.7,19.3,193.0,3450.0,FEMALE
4,Adelie,39.3,20.6,190.0,3650.0,MALE
...,...,...,...,...,...,...
328,Gentoo,47.2,13.7,214.0,4925.0,FEMALE
329,Gentoo,46.8,14.3,215.0,4850.0,FEMALE
330,Gentoo,50.4,15.7,222.0,5750.0,MALE
331,Gentoo,45.2,14.8,212.0,5200.0,FEMALE


In [5]:
df.columns

Index(['Species', 'Culmen Length (mm)', 'Culmen Depth (mm)',
       'Flipper Length (mm)', 'Body Mass (g)', 'Sex'],
      dtype='object')

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Species              333 non-null    object 
 1   Culmen Length (mm)   333 non-null    float64
 2   Culmen Depth (mm)    333 non-null    float64
 3   Flipper Length (mm)  333 non-null    float64
 4   Body Mass (g)        333 non-null    float64
 5   Sex                  333 non-null    object 
dtypes: float64(4), object(2)
memory usage: 15.7+ KB


## 1. Columns

### 1.1 Renaming the columns

In [7]:
df.columns =['species', 'culmen_length_mm','culmen_depth_mm', 'flipper_length_mm', 'body_mass_gg', 'sex']

In [8]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex
0,Adelie,39.1,18.7,181.0,3750.0,MALE
1,Adelie,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,36.7,19.3,193.0,3450.0,FEMALE
4,Adelie,39.3,20.6,190.0,3650.0,MALE
...,...,...,...,...,...,...
328,Gentoo,47.2,13.7,214.0,4925.0,FEMALE
329,Gentoo,46.8,14.3,215.0,4850.0,FEMALE
330,Gentoo,50.4,15.7,222.0,5750.0,MALE
331,Gentoo,45.2,14.8,212.0,5200.0,FEMALE


### 1.2 Selecting a single column

In [9]:
# using single square brackets, we select a single column as a series
df['sex']

0        MALE
1      FEMALE
2      FEMALE
3      FEMALE
4        MALE
        ...  
328    FEMALE
329    FEMALE
330      MALE
331    FEMALE
332      MALE
Name: sex, Length: 333, dtype: object

This will extract the column as a pd.Series.

In [10]:
type(df['sex']) # as we learned it in the pandas encounter, it is pandas series

pandas.core.series.Series

In [11]:
# using double square brackets, we select a single columns as a dataframe
df[['sex']]

Unnamed: 0,sex
0,MALE
1,FEMALE
2,FEMALE
3,FEMALE
4,MALE
...,...
328,FEMALE
329,FEMALE
330,MALE
331,FEMALE


In [12]:
type(df[['sex']])

pandas.core.frame.DataFrame

### 1.3 Selecting Multiple Columns

In [13]:
# when selecting multiple columns, we HAVE to use double square brackets
# and we get a dataframe back
df[['species','culmen_length_mm','sex']]

Unnamed: 0,species,culmen_length_mm,sex
0,Adelie,39.1,MALE
1,Adelie,39.5,FEMALE
2,Adelie,40.3,FEMALE
3,Adelie,36.7,FEMALE
4,Adelie,39.3,MALE
...,...,...,...
328,Gentoo,47.2,FEMALE
329,Gentoo,46.8,FEMALE
330,Gentoo,50.4,MALE
331,Gentoo,45.2,FEMALE


### 1.4 Changing column types

In [14]:
df.dtypes # Notice that dataframes may contain different data types and that's why here it is dtypes 

species               object
culmen_length_mm     float64
culmen_depth_mm      float64
flipper_length_mm    float64
body_mass_gg         float64
sex                   object
dtype: object

In [15]:
# we sometimes need to change the data type of a column, that is usually part of data cleaning

df['culmen_length_mm'] = df['culmen_length_mm'].astype(int)

In [16]:
# Let's check the datatypes again
df.dtypes

species               object
culmen_length_mm       int64
culmen_depth_mm      float64
flipper_length_mm    float64
body_mass_gg         float64
sex                   object
dtype: object

###  1.5 Creating columns

In [17]:
# Adding a new column
# For example, let's add a 'year_collected' column with a constant value
df['year_collected'] = 2024

How to round numbers in python:
`round(<number_to_round>, <number_of_decimal_points>)`

In [18]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,year_collected
0,Adelie,39,18.7,181.0,3750.0,MALE,2024
1,Adelie,39,17.4,186.0,3800.0,FEMALE,2024
2,Adelie,40,18.0,195.0,3250.0,FEMALE,2024
3,Adelie,36,19.3,193.0,3450.0,FEMALE,2024
4,Adelie,39,20.6,190.0,3650.0,MALE,2024
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,2024
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,2024
330,Gentoo,50,15.7,222.0,5750.0,MALE,2024
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,2024


In [20]:
df.dtypes

species               object
culmen_length_mm       int64
culmen_depth_mm      float64
flipper_length_mm    float64
body_mass_gg         float64
sex                   object
year_collected         int64
dtype: object

In [19]:
df.describe()

Unnamed: 0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,year_collected
count,333.0,333.0,333.0,333.0,333.0
mean,43.561562,17.164865,200.966967,4207.057057,2024.0
std,5.478872,1.969235,14.015765,805.215802,0.0
min,32.0,13.1,172.0,2700.0,2024.0
25%,39.0,15.6,190.0,3550.0,2024.0
50%,44.0,17.3,197.0,4050.0,2024.0
75%,48.0,18.7,213.0,4775.0,2024.0
max,59.0,21.5,231.0,6300.0,2024.0


In [21]:
# Convert body_mass_g from grams to kilograms, round to 2 decimal places, and create a new column

df['body_mass_kg'] = (df['body_mass_gg'] / 1000.0).round(2)

In [22]:
# Now, df includes a new column 'body_mass_kg' with the body mass in kilograms, rounded to 2 decimal places
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,year_collected,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,2024,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,2024,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,2024,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,2024,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,2024,3.65
...,...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,2024,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,2024,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,2024,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,2024,5.20


### 1.6 Dropping columns

In [26]:
# dropping a single column

df.drop('year_collected', axis='columns')
#df.drop('year_collected', axis=1)

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [27]:
# note that the dataframe did not change!
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,year_collected,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,2024,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,2024,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,2024,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,2024,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,2024,3.65
...,...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,2024,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,2024,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,2024,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,2024,5.20


Uh-oh! Dropping returned a (changed) copy of the dataframe, but didn't change the original!

To make the changes stick, you can:
* assign the result to another dataframe
* use the `inplace=True` parameter

In [28]:
# assign the result to another datafram
df_new = df.drop('year_collected', axis=1) # notice that instead of columns we used axis=1 which is the same thing

In [29]:
df_new

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [30]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,year_collected,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,2024,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,2024,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,2024,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,2024,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,2024,3.65
...,...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,2024,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,2024,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,2024,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,2024,5.20


In [31]:
# to drop from the dataframe we use the inplace=True parameter
df.drop('year_collected', axis='columns', inplace=True)

In [32]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


## 2. Selecting rows (and columns): `.loc[]` and `.iloc[]` methods

A brief slicing recap:

In [33]:
a = [1,2,3,4,5,6]

In [34]:
a

[1, 2, 3, 4, 5, 6]

Reminder: slicing syntax is `a[start:end:step]`
* If not specified, `start` is the beginning of the list.
* If not specified, `end` is the end of the list.
* You can use minus to count from the back, e.g. the second element from the back is `a[-2]`
* If not specified, `step` is 1.

In [35]:
# select first 4 numbers
a[:4]

[1, 2, 3, 4]

In [36]:
a[1:4]

[2, 3, 4]

In [37]:
# what does this do?
a[::3]
# it gives us every third element in the list

[1, 4]

## Selecting rows with .loc[row_index]

### Here's a quick guide on how to use `loc[]`:

`loc[]` is a versatile selection method in Pandas that allows you to specify rows and columns to access based on their labels.

- **Select a single row:** `df.loc['index_label']`
  
  Accesses the row with the specified label.

- **Select multiple rows:** `df.loc[['label1', 'label2']]`
  
  Retrieves multiple rows with the given labels.

- **Select rows by range of labels:** `df.loc['label1':'label3']`
  
  Selects all rows between and including the specified label range.

- **Conditional selection:** `df.loc[df['column'] > value]`
  
  Filters rows based on a condition applied to column values.

- **Select specific rows and columns:** `df.loc[['row1', 'row2'], ['column1', 'column2']]`
  
  Selects particular rows and columns by specifying their labels.

#### Important Notes:
- `loc[]` operates on the DataFrame's index labels and column names. It is particularly well-suited for DataFrames where these labels are meaningful.
- It's often used when the operations are based on the data's content rather than its position.
- When using `loc[]` to select rows by a range of labels, the end label is included in the output, which is different from typical Python slicing.

Mastering `loc[]` will enhance your data manipulation capabilities, allowing you to efficiently access and modify your data based on its labels.




In [38]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [47]:
# Example 1: Select the first row
df.loc[0, :]

species              Adelie
culmen_length_mm         39
culmen_depth_mm        18.7
flipper_length_mm     181.0
body_mass_gg         3750.0
sex                    MALE
body_mass_kg           3.75
Name: 0, dtype: object

In [41]:
# Example 2: Select first two rows
df.loc[0:1,:]

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.8


In [42]:
df.loc[0:1]

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.8


In [43]:
df

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [44]:
df.loc[df['culmen_length_mm'] > 40] 

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
7,Adelie,41,17.6,182.0,3200.0,FEMALE,3.20
12,Adelie,42,20.7,197.0,4500.0,MALE,4.50
14,Adelie,46,21.5,194.0,4200.0,MALE,4.20
32,Adelie,42,18.5,180.0,3550.0,FEMALE,3.55
38,Adelie,44,19.7,196.0,4400.0,MALE,4.40
...,...,...,...,...,...,...,...
328,Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
329,Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
330,Gentoo,50,15.7,222.0,5750.0,MALE,5.75
331,Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [48]:
#If you want to select all rows where the species is "Adelie", you can use:

adelie_penguins = df.loc[df['species'] == 'Adelie']
adelie_penguins

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
0,Adelie,39,18.7,181.0,3750.0,MALE,3.75
1,Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
2,Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
3,Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
4,Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...,...
141,Adelie,36,18.4,184.0,3475.0,FEMALE,3.48
142,Adelie,36,17.8,195.0,3450.0,FEMALE,3.45
143,Adelie,37,18.1,193.0,3750.0,MALE,3.75
144,Adelie,36,17.1,187.0,3700.0,FEMALE,3.70


In [49]:
chinstrap=df.loc[df['species']== 'Chinstrap']
chinstrap

Unnamed: 0,species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
146,Chinstrap,46,17.9,192.0,3500.0,FEMALE,3.50
147,Chinstrap,50,19.5,196.0,3900.0,MALE,3.90
148,Chinstrap,51,19.2,193.0,3650.0,MALE,3.65
149,Chinstrap,45,18.7,188.0,3525.0,FEMALE,3.52
150,Chinstrap,52,19.8,197.0,3725.0,MALE,3.72
...,...,...,...,...,...,...,...
209,Chinstrap,55,19.8,207.0,4000.0,MALE,4.00
210,Chinstrap,43,18.1,202.0,3400.0,FEMALE,3.40
211,Chinstrap,49,18.2,193.0,3775.0,MALE,3.78
212,Chinstrap,50,19.0,210.0,4100.0,MALE,4.10


**_How does loc[ ] work?_**

- Notice that loc is including both 0 and 1 rows
- Loc is using the labels of the row. Let's make some changes to understand it better

In [50]:
# changing the index column of the dataframe
df.set_index('species', inplace=True) 

# It is also possible to assign a column as index, when we are reading the data


In [51]:
df.head()

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.8
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65


In [55]:
df.columns

Index(['culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm',
       'body_mass_gg', 'sex', 'body_mass_kg'],
      dtype='object')

In [52]:
df.head(2)

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.8


In [53]:
df.tail()

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
Gentoo,50,15.7,222.0,5750.0,MALE,5.75
Gentoo,45,14.8,212.0,5200.0,FEMALE,5.2
Gentoo,49,16.1,213.0,5400.0,MALE,5.4


In [54]:
df.tail(3)

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Gentoo,50,15.7,222.0,5750.0,MALE,5.75
Gentoo,45,14.8,212.0,5200.0,FEMALE,5.2
Gentoo,49,16.1,213.0,5400.0,MALE,5.4


In [56]:
# Example 1: Select the second row

# we will get an error now, because we changed the index labels in species . we should give the new labels
df.loc[1, :] 

KeyError: 1

In [58]:
df

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...
Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
Gentoo,50,15.7,222.0,5750.0,MALE,5.75
Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [57]:
df.loc['Adelie', :] # the new index labels is the species. loc uses the labels.

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...
Adelie,36,18.4,184.0,3475.0,FEMALE,3.48
Adelie,36,17.8,195.0,3450.0,FEMALE,3.45
Adelie,37,18.1,193.0,3750.0,MALE,3.75
Adelie,36,17.1,187.0,3700.0,FEMALE,3.70


In [59]:
# Example 2: Select rows that contain Adelie and Chinstrap Species
df.loc['Adelie':'Chinstrap', :]

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...
Chinstrap,55,19.8,207.0,4000.0,MALE,4.00
Chinstrap,43,18.1,202.0,3400.0,FEMALE,3.40
Chinstrap,49,18.2,193.0,3775.0,MALE,3.78
Chinstrap,50,19.0,210.0,4100.0,MALE,4.10


In [60]:
df.loc['Adelie':'Gentoo', :]

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...
Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
Gentoo,50,15.7,222.0,5750.0,MALE,5.75
Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


## Selecting rows with .iloc[integer_position]

+ Te iloc[] function in pandas is used for integer-location based indexing, which means it selects rows and columns using integer indices. Here are some scenarios when you might use iloc():

## iloc[] Usage Guide

`iloc[]` provides flexibility in data selection within pandas DataFrames through integer indexing. Below are some common ways to utilize `iloc[]`:

- **Select a Single Row**: `df.iloc[5]`
  - This accesses the sixth row in the DataFrame since indexing starts at 0.

- **Select Multiple Rows**: `df.iloc[5:10]`
  - Retrieves rows 6 to 10, including the start index but excluding the end index.

- **Select Rows and Specific Columns**: `df.iloc[5:10, 0:2]`
  - Selects rows 6 to 10 and the first two columns.

- **Select Specific Rows and All Columns**: `df.iloc[[1, 3, 7], :]`
  - Accesses rows 2, 4, and 8 across all columns.

## Additional Tips

- Remember that `iloc[]` uses zero-based indexing, similar to indexing in native Python lists or numpy arrays.

- The end index in a range is excluded, aligning with standard Python slicing notation.

- With `iloc[]`, you can also access rows and columns in reverse order by employing negative indices.


In [61]:
df

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.80
Adelie,40,18.0,195.0,3250.0,FEMALE,3.25
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Adelie,39,20.6,190.0,3650.0,MALE,3.65
...,...,...,...,...,...,...
Gentoo,47,13.7,214.0,4925.0,FEMALE,4.92
Gentoo,46,14.3,215.0,4850.0,FEMALE,4.85
Gentoo,50,15.7,222.0,5750.0,MALE,5.75
Gentoo,45,14.8,212.0,5200.0,FEMALE,5.20


In [62]:

df.iloc[5] #(selects the row at position 5)

culmen_length_mm         38
culmen_depth_mm        17.8
flipper_length_mm     181.0
body_mass_gg         3625.0
sex                  FEMALE
body_mass_kg           3.62
Name: Adelie, dtype: object

In [64]:
df.iloc[0:2]

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,18.7,181.0,3750.0,MALE,3.75
Adelie,39,17.4,186.0,3800.0,FEMALE,3.8


In [65]:
df.iloc['Adelie':'Gentoo', :]

TypeError: cannot do positional indexing on Index with these indexers [Adelie] of type str

In [63]:
df.iloc[5:10] #(selects rows at positions 5 to 9)

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,38,17.8,181.0,3625.0,FEMALE,3.62
Adelie,39,19.6,195.0,4675.0,MALE,4.68
Adelie,41,17.6,182.0,3200.0,FEMALE,3.2
Adelie,38,21.2,191.0,3800.0,MALE,3.8
Adelie,34,21.1,198.0,4400.0,MALE,4.4


In [66]:
df.iloc[5:10, 0:2]  #(selects rows at positions 5 to 9 and columns at positions 0 to 1)

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm
species,Unnamed: 1_level_1,Unnamed: 2_level_1
Adelie,38,17.8
Adelie,39,19.6
Adelie,41,17.6
Adelie,38,21.2
Adelie,34,21.1


In [69]:
df.iloc[[1, 3, 300], :] #(selects rows at positions 1, 3, and 7, along with all columns)

Unnamed: 0_level_0,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_gg,sex,body_mass_kg
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelie,39,17.4,186.0,3800.0,FEMALE,3.8
Adelie,36,19.3,193.0,3450.0,FEMALE,3.45
Gentoo,47,14.0,212.0,4875.0,FEMALE,4.88


# Recap: .loc vs .iloc in Pandas

Pandas offers two powerful methods for data selection within DataFrames, `.loc` and `.iloc`, each catering to different selection criteria:

## .loc Method (Label-based Selection)

- **Label-based Selection**: Utilizes row and column **labels** for data access.
- **Syntax**: `df.loc[row_label, column_label]`
  - You specify the rows and columns by using their labels.
  - This method is ideal when you know the exact labels of the rows and columns you're interested in.

## .iloc Method (Integer Position-based Selection)

- **Integer Position-based Selection**: Relies on **integer position values** (0-based) for selecting rows and columns.
- **Syntax**: `df.iloc[row_position, column_position]`
  - Rows and columns are specified by their integer index positions.
  - This approach is useful when you want to access data based on its position in the DataFrame, similar to indexing in Python lists or arrays.


## lambda

In [70]:
def adding():
    a=int(input("give me a number"))
    b=int(input("give me another number"))
    return a+b

In [71]:
adding()

give me a number 3
give me another number 5


8

In [72]:
h = lambda a,b: a+b

In [73]:
h(3,2)

5