# Jupyter Notebook: 03_pandas_basics.ipynb

# ---

# # Pandas Basics

Welcome! Now that you know NumPy, let's learn **Pandas**, the library for data manipulation and analysis.

---

## Table of Contents

1. What is Pandas?
2. Series and DataFrames
3. Reading and Writing Data
4. Selecting and Filtering Data
5. Operations on Data
6. Mini-Exercises

---

# 1. What is Pandas?

**Pandas** is a Python library that makes it easy to work with structured data (tables).

First, let's import it:

```python
import pandas as pd
```

---

# 2. Series and DataFrames

## Series
A **Series** is a one-dimensional labeled array.

```python
# Create a Series
s = pd.Series([10, 20, 30, 40])
print(s)
```

## DataFrames
A **DataFrame** is a two-dimensional labeled data structure (like a spreadsheet).

```python
# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Paris', 'London']
}

df = pd.DataFrame(data)
print(df)
```

---

# 3. Reading and Writing Data

## Reading CSV files

```python
# Read a CSV file
df = pd.read_csv('path/to/your/file.csv')
print(df.head())
```

> (We'll use built-in small examples for now. No need for a real file yet.)

## Writing CSV files

```python
# Save a DataFrame to CSV
df.to_csv('my_data.csv', index=False)
```

---

# 4. Selecting and Filtering Data

## Selecting Columns

```python
# Select one column
print(df['Name'])

# Select multiple columns
print(df[['Name', 'City']])
```

## Selecting Rows

```python
# Select by index
print(df.iloc[0])  # First row

# Select by label (requires a set index)
# df.set_index('Name', inplace=True)
# print(df.loc['Alice'])
```

## Filtering Rows

```python
# People older than 28
older_than_28 = df[df['Age'] > 28]
print(older_than_28)
```

---

# 5. Operations on Data

## Basic Operations

```python
# Mean age
print(df['Age'].mean())
```

## Adding Columns

```python
# Add a new column

# Example: Age in 5 years
df['Age_in_5_years'] = df['Age'] + 5
print(df)
```

## Grouping Data

```python
# Group by city and get average age
print(df.groupby('City')['Age'].mean())
```

---

# 6. Mini-Exercises

### 6.1 Create a DataFrame with your own data (3 columns, 5 rows)

```python
# Your code here
data = {
    'Animal': ['Dog', 'Cat', 'Rabbit', 'Hamster', 'Bird'],
    'Age': [5, 3, 2, 1, 4],
    'Type': ['Mammal', 'Mammal', 'Mammal', 'Mammal', 'Bird']
}

pets = pd.DataFrame(data)
print(pets)
```

### 6.2 Select only the animals older than 2 years

```python
# Your code here
older_pets = pets[pets['Age'] > 2]
print(older_pets)
```

### 6.3 Group your data by type and calculate the average age

```python
# Your code here
avg_age_by_type = pets.groupby('Type')['Age'].mean()
print(avg_age_by_type)
```

---

# Congratulations! 🎉

You've learned the basics of **Pandas**!

Next, we'll move on to **visualizing data** with **Matplotlib**!

---

# Quick Recap
- **Series** = 1D data; **DataFrame** = 2D data.
- Load data with `read_csv()`, save with `to_csv()`.
- Select columns and rows easily.
- Filter, group, and calculate statistics.

See you in the next notebook!


# Pandas

Pandas stands for panel data (so not the animal 🐼). It is a more USER.

If you want to learn more about it, visit [https://pandas.pydata.org/](https://pandas.pydata.org/).

In [1]:
# Import dependencies
import pandas as pd

# MOVE BUT SHOW

In [72]:
type(palmer[["bill_length_mm"]].values)

numpy.ndarray

# Creating a pandas dataframe

In [9]:
# Create a Series
series = pd.Series(data=[10, 20, 30, 40, 180])
print(series)

0     10
1     20
2     30
3     40
4    180
dtype: int64


In [10]:
series.skew()

np.float64(2.095594125462855)

In [12]:
# Create a small dictionary
# You should know by now
penguins_dico = {
    "bill_length_mm": [39.1, 39.5, 40.3],
    "bill_depth_mm": [18.7, 17.4, 18.0]
    }

print(penguins_dico)

{'bill_length_mm': [39.1, 39.5, 40.3], 'bill_depth_mm': [18.7, 17.4, 18.0]}


In [14]:
penguins_df = pd.DataFrame(data=penguins_dico)
print(penguins_df)

   bill_length_mm  bill_depth_mm
0            39.1           18.7
1            39.5           17.4
2            40.3           18.0


In [18]:
# Have a look at the values attribute
print(penguins_df.values)

#
print("-" * 50)

# Have a look at the type of object it is
print(type(penguins_df.values))
print(penguins_df.shape)
print(penguins_df.dtypes)

[[39.1 18.7]
 [39.5 17.4]
 [40.3 18. ]]
--------------------------------------------------
<class 'numpy.ndarray'>
(3, 2)
bill_length_mm    float64
bill_depth_mm     float64
dtype: object


In [19]:
# So you still have access to familiar methods
penguins_df.sum(axis=0)

bill_length_mm    118.9
bill_depth_mm      54.1
dtype: float64

# Better indexing

Because Pandas has column indices, it is capable of

In [140]:
# Access the column directly
print(palmer["bill_length_mm"])
#
#print(palmer["island"])

0      39.1
1      39.5
2      40.3
3       NaN
4      36.7
       ... 
339    55.8
340    43.5
341    49.6
342    50.8
343    50.2
Name: bill_length_mm, Length: 344, dtype: float64


In [141]:
# You can access several columns by giving their names as a list
print(palmer[["body_mass_g", "bill_length_mm"]])

     body_mass_g  bill_length_mm
0         3750.0            39.1
1         3800.0            39.5
2         3250.0            40.3
3            NaN             NaN
4         3450.0            36.7
..           ...             ...
339       4000.0            55.8
340       3400.0            43.5
341       3775.0            49.6
342       4100.0            50.8
343       3775.0            50.2

[344 rows x 2 columns]


In [142]:
# Accessing both columns and rows is (usually) done with .loc[]
palmer.loc[0:3, ["bill_length_mm", "body_mass_g"]]

Unnamed: 0,bill_length_mm,body_mass_g
0,39.1,3750.0
1,39.5,3800.0
2,40.3,3250.0
3,,


In [145]:
# You can access column and 
palmer.loc[(palmer["sex"] == "male"), ["bill_length_mm", "body_mass_g"]]

Unnamed: 0,bill_length_mm,body_mass_g
0,39.1,3750.0
5,39.3,3650.0
7,39.2,4675.0
13,38.6,3800.0
14,34.6,4400.0
...,...,...
334,50.2,3800.0
336,51.9,3950.0
339,55.8,4000.0
341,49.6,3775.0


In [68]:
# You can access column and 
palmer.loc[(palmer["sex"] == "male") & (palmer["island"] == "Dream"), ["bill_length_mm", "body_mass_g"]]

Unnamed: 0,bill_length_mm,body_mass_g
31,37.2,3900.0
33,40.9,3900.0
35,39.2,4150.0
36,38.8,3950.0
39,39.8,4650.0
...,...,...
334,50.2,3800.0
336,51.9,3950.0
339,55.8,4000.0
341,49.6,3775.0


In [None]:
# Another method exists, integer location
# Based on NumPy counting
palmer.iloc[0:3, 0:2]

Unnamed: 0,species,island
0,Adelie,Torgersen
1,Adelie,Torgersen
2,Adelie,Torgersen


# Additional features

Pandas is more than just sugarcoating NumPy.

In [20]:
# Read the entire Palmer penguins dataset
palmer = pd.read_csv(filepath_or_buffer="../data/penguins.csv")

#
print(palmer)
print(type(palmer))
print(type(palmer.values))

       species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
0       Adelie  Torgersen            39.1           18.7              181.0   
1       Adelie  Torgersen            39.5           17.4              186.0   
2       Adelie  Torgersen            40.3           18.0              195.0   
3       Adelie  Torgersen             NaN            NaN                NaN   
4       Adelie  Torgersen            36.7           19.3              193.0   
..         ...        ...             ...            ...                ...   
339  Chinstrap      Dream            55.8           19.8              207.0   
340  Chinstrap      Dream            43.5           18.1              202.0   
341  Chinstrap      Dream            49.6           18.2              193.0   
342  Chinstrap      Dream            50.8           19.0              210.0   
343  Chinstrap      Dream            50.2           18.7              198.0   

     body_mass_g     sex  year  
0         3750.0  

In [22]:
# You can check the column names with the .columns attribute
print(palmer.columns)

Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',
       'flipper_length_mm', 'body_mass_g', 'sex', 'year'],
      dtype='object')


In [26]:
# Columns are now attributes of the DataFrame
# This means that you can select individual columns by their name rather
palmer.species
#palmer["island"]

0         Adelie
1         Adelie
2         Adelie
3         Adelie
4         Adelie
         ...    
339    Chinstrap
340    Chinstrap
341    Chinstrap
342    Chinstrap
343    Chinstrap
Name: species, Length: 344, dtype: object

In [27]:
# You can also get quick summaries of variables
print(palmer.describe(include="number"))

       bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  \
count      342.000000     342.000000         342.000000   342.000000   
mean        43.921930      17.151170         200.915205  4201.754386   
std          5.459584       1.974793          14.061714   801.954536   
min         32.100000      13.100000         172.000000  2700.000000   
25%         39.225000      15.600000         190.000000  3550.000000   
50%         44.450000      17.300000         197.000000  4050.000000   
75%         48.500000      18.700000         213.000000  4750.000000   
max         59.600000      21.500000         231.000000  6300.000000   

              year  
count   344.000000  
mean   2008.029070  
std       0.818356  
min    2007.000000  
25%    2007.000000  
50%    2008.000000  
75%    2009.000000  
max    2009.000000  


In [29]:
# Qualitative variables use the argument "object"
print(palmer.describe(include="object"))

       species  island   sex
count      344     344   333
unique       3       3     2
top     Adelie  Biscoe  male
freq       152     168   168


In [None]:
# You can use the argument "all" to get it for all variables,
# but the resulting table is ugly and somewhat useless
print(palmer.describe(include="all"))

       species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
count      344     344      342.000000     342.000000         342.000000   
unique       3       3             NaN            NaN                NaN   
top     Adelie  Biscoe             NaN            NaN                NaN   
freq       152     168             NaN            NaN                NaN   
mean       NaN     NaN       43.921930      17.151170         200.915205   
std        NaN     NaN        5.459584       1.974793          14.061714   
min        NaN     NaN       32.100000      13.100000         172.000000   
25%        NaN     NaN       39.225000      15.600000         190.000000   
50%        NaN     NaN       44.450000      17.300000         197.000000   
75%        NaN     NaN       48.500000      18.700000         213.000000   
max        NaN     NaN       59.600000      21.500000         231.000000   

        body_mass_g   sex         year  
count    342.000000   333   344.000000  
uniqu

# Data cleaning

Data cleaning is efficient, you can remove NaN values

In [32]:
palmer.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007


In [35]:
palmer.dropna(inplace=False).head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007


In [36]:
# Data cleaning is efficient, you can remove
#palmer.dropna(inplace=False)
#palmer.dropna(inplace=False, subset=["sex"])
#palmer[palmer.isna().any(axis=1)]

In [39]:
# Removing NaN on only a particular variable
# .dropna() removes values that are 
palmer_piece = palmer.loc[267:272, :].copy()
#
print(palmer_piece)

    species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
267  Gentoo  Biscoe            55.1           16.0              230.0   
268  Gentoo  Biscoe            44.5           15.7              217.0   
269  Gentoo  Biscoe            48.8           16.2              222.0   
270  Gentoo  Biscoe            47.2           13.7              214.0   
271  Gentoo  Biscoe             NaN            NaN                NaN   
272  Gentoo  Biscoe            46.8           14.3              215.0   

     body_mass_g     sex  year  
267       5850.0    male  2009  
268       4875.0     NaN  2009  
269       6000.0    male  2009  
270       4925.0  female  2009  
271          NaN     NaN  2009  
272       4850.0  female  2009  


In [None]:
# If you use .dropna(), you will remove both rows 268 and 271
#
print(palmer_piece.dropna())


--------------------------------------------------
    species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
267  Gentoo  Biscoe            55.1           16.0              230.0   
269  Gentoo  Biscoe            48.8           16.2              222.0   
270  Gentoo  Biscoe            47.2           13.7              214.0   
272  Gentoo  Biscoe            46.8           14.3              215.0   

     body_mass_g     sex  year  
267       5850.0    male  2009  
269       6000.0    male  2009  
270       4925.0  female  2009  
272       4850.0  female  2009  


In [43]:
# But you can specify on which columns you want to drop rows if they contain NaN values
#
print(palmer_piece.dropna(subset=["bill_depth_mm"]))

    species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
267  Gentoo  Biscoe            55.1           16.0              230.0   
268  Gentoo  Biscoe            44.5           15.7              217.0   
269  Gentoo  Biscoe            48.8           16.2              222.0   
270  Gentoo  Biscoe            47.2           13.7              214.0   
272  Gentoo  Biscoe            46.8           14.3              215.0   

     body_mass_g     sex  year  
267       5850.0    male  2009  
268       4875.0     NaN  2009  
269       6000.0    male  2009  
270       4925.0  female  2009  
272       4850.0  female  2009  


In [44]:
#  You can build new variables
print(palmer["bill_length_mm"] / palmer["bill_depth_mm"])

# Adding a column is done simply by 
palmer["bill_ratio"] = palmer["bill_length_mm"] / palmer["bill_depth_mm"]

0      2.090909
1      2.270115
2      2.238889
3           NaN
4      1.901554
         ...   
339    2.818182
340    2.403315
341    2.725275
342    2.673684
343    2.684492
Length: 344, dtype: float64


In [45]:
# You can check to make sure it is there.
print(palmer.columns)

Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',
       'flipper_length_mm', 'body_mass_g', 'sex', 'year', 'bill_ratio'],
      dtype='object')


# XXX Summarizing variables using .groupby

In [66]:
# You can use the .groupby() method to group dataframes by a qualitative variable
print(palmer.groupby(by="island"))
print(type(palmer.groupby(by="island")))

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001CE2BEAB590>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>


In [73]:
# You can get the function you want by using it as a method afterwards
# .count()
# .mean()
# .min()
# .max()
print("Average penguin bill length by island")
(palmer.groupby(by="island").count())

Average penguin bill length by island


Unnamed: 0_level_0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_ratio
island,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Biscoe,168,167,167,167,167,163,168,167
Dream,124,124,124,124,124,123,124,124
Torgersen,52,51,51,51,51,47,52,51


In [None]:
# The following code will fail, but look at the output to
# understand
print("Average penguin bill length by island")
print(palmer.groupby(by="island").mean())

In [52]:
# You can get 
print("Average penguin bill length by island and by species")
print(palmer.groupby(by=["island", "species"]).count())

Average penguin bill length by island and by species
                     bill_length_mm  bill_depth_mm  flipper_length_mm  \
island    species                                                       
Biscoe    Adelie                 44             44                 44   
          Gentoo                123            123                123   
Dream     Adelie                 56             56                 56   
          Chinstrap              68             68                 68   
Torgersen Adelie                 51             51                 51   

                     body_mass_g  sex  year  bill_ratio  
island    species                                        
Biscoe    Adelie              44   44    44          44  
          Gentoo             123  119   124         123  
Dream     Adelie              56   55    56          56  
          Chinstrap           68   68    68          68  
Torgersen Adelie              51   47    52          51  


In [74]:
(palmer.groupby(by=["island", "species", "sex"]).count())

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,year,bill_ratio
island,species,sex,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Biscoe,Adelie,female,22,22,22,22,22,22
Biscoe,Adelie,male,22,22,22,22,22,22
Biscoe,Gentoo,female,58,58,58,58,58,58
Biscoe,Gentoo,male,61,61,61,61,61,61
Dream,Adelie,female,27,27,27,27,27,27
Dream,Adelie,male,28,28,28,28,28,28
Dream,Chinstrap,female,34,34,34,34,34,34
Dream,Chinstrap,male,34,34,34,34,34,34
Torgersen,Adelie,female,24,24,24,24,24,24
Torgersen,Adelie,male,23,23,23,23,23,23


In [None]:
# You can choose a particular variable
print("Average penguin bill length by island")
print(palmer.dropna().groupby("species")["bill_ratio"].mean())

Average penguin bill length by island
species
Adelie       2.121478
Chinstrap    2.653756
Gentoo       3.176602
Name: bill_ratio, dtype: float64


In [None]:
palmer.dropna().groupby("species").

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001CE2BEA9490>

In [61]:
palmer.dropna().groupby("species")["bill_ratio"]

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CE2BE3B9B0>

In [131]:
# You can choose add several variables
# Note: Notice that it needs to be given as a list.
print("Average penguin bill length by island")
print(palmer.dropna().groupby("species")[["body_mass_g", "bill_ratio"]].mean())

Average penguin bill length by island
           body_mass_g  bill_ratio
species                           
Adelie     3706.164384    2.121478
Chinstrap  3733.088235    2.653756
Gentoo     5092.436975    3.176602


In [None]:
# You can even go full complexity by aggregating 
palmer.groupby(["island", "species", "sex"])[["body_mass_g", "bill_ratio"]].agg(["min", "median", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,body_mass_g,body_mass_g,body_mass_g,bill_ratio,bill_ratio,bill_ratio
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,min,median,max,min,median,max
island,species,sex,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Biscoe,Adelie,female,2850.0,3375.0,3900.0,1.87,2.16,2.36
Biscoe,Adelie,male,3550.0,4000.0,4775.0,1.89,2.15,2.33
Biscoe,Gentoo,female,3950.0,4700.0,5200.0,2.84,3.22,3.49
Biscoe,Gentoo,male,4750.0,5500.0,6300.0,2.57,3.13,3.61
Dream,Adelie,female,2900.0,3400.0,3700.0,1.95,2.07,2.37
Dream,Adelie,male,3425.0,3987.5,4650.0,1.86,2.13,2.39
Dream,Chinstrap,female,2700.0,3550.0,4150.0,2.35,2.63,3.26
Dream,Chinstrap,male,3250.0,3950.0,4800.0,2.48,2.67,2.87
Torgersen,Adelie,female,2900.0,3400.0,3800.0,1.76,2.17,2.43
Torgersen,Adelie,male,3325.0,4000.0,4700.0,1.64,2.14,2.45


In [107]:
# You can get 
print("Average penguin bill length by island")
print(palmer.dropna().groupby("island")["bill_length_mm"].mean())
#
print("\nCounts of penguin bill length per island")
palmer.dropna().groupby("island")["bill_length_mm"].count()

Average penguin bill length by island
island
Biscoe       45.248466
Dream        44.221951
Torgersen    39.038298
Name: bill_length_mm, dtype: float64

Counts of penguin bill length per island


island
Biscoe       163
Dream        123
Torgersen     47
Name: bill_length_mm, dtype: int64

In [38]:
# You can get 
print("Average penguin bill length by island, then by species, then by sex")
print(palmer.groupby(["island", "species", "sex"])["bill_length_mm"].mean())
#
print("\nCounts of penguin bill length per island, then by species then by sex")
palmer.groupby(["island", "species", "sex"])["bill_length_mm"].size()

Average penguin bill length by island, then by species, then by sex
island     species    sex   
Biscoe     Adelie     female    37.359091
                      male      40.590909
           Gentoo     female    45.563793
                      male      49.473770
Dream      Adelie     female    36.911111
                      male      40.071429
           Chinstrap  female    46.573529
                      male      51.094118
Torgersen  Adelie     female    37.554167
                      male      40.586957
Name: bill_length_mm, dtype: float64

Counts of penguin bill length per island, then by species then by sex


island     species    sex   
Biscoe     Adelie     female    22
                      male      22
           Gentoo     female    58
                      male      61
Dream      Adelie     female    27
                      male      28
           Chinstrap  female    34
                      male      34
Torgersen  Adelie     female    24
                      male      23
Name: bill_length_mm, dtype: int64

In [None]:
# You can even go full complexity by aggregating 
palmer.groupby(["island", "species", "sex"])[["bill_length_mm", "bill_depth_mm"]].agg(["min","max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,bill_length_mm,bill_length_mm,bill_depth_mm,bill_depth_mm
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,min,max,min,max
island,species,sex,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Biscoe,Adelie,female,34.5,40.5,16.0,20.7
Biscoe,Adelie,male,37.6,45.6,17.2,21.1
Biscoe,Gentoo,female,40.9,50.5,13.1,15.5
Biscoe,Gentoo,male,44.4,59.6,14.1,17.3
Dream,Adelie,female,32.1,42.2,15.5,19.3
Dream,Adelie,male,36.3,44.1,17.0,21.2
Dream,Chinstrap,female,40.9,58.0,16.4,19.4
Dream,Chinstrap,male,48.5,55.8,17.5,20.8
Torgersen,Adelie,female,33.5,41.1,15.9,19.3
Torgersen,Adelie,male,34.6,46.0,17.6,21.5


In [69]:
toto = palmer.groupby(["island", "species", "sex"])[["bill_length_mm", "bill_depth_mm"]].agg(["min","max"])
toto.to_csv(path_or_buf="toto.csv")
dodo = pd.read_csv("toto.csv", header=2)
dodo

Unnamed: 0,island,species,sex,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,Biscoe,Adelie,female,34.5,40.5,16.0,20.7
1,Biscoe,Adelie,male,37.6,45.6,17.2,21.1
2,Biscoe,Gentoo,female,40.9,50.5,13.1,15.5
3,Biscoe,Gentoo,male,44.4,59.6,14.1,17.3
4,Dream,Adelie,female,32.1,42.2,15.5,19.3
5,Dream,Adelie,male,36.3,44.1,17.0,21.2
6,Dream,Chinstrap,female,40.9,58.0,16.4,19.4
7,Dream,Chinstrap,male,48.5,55.8,17.5,20.8
8,Torgersen,Adelie,female,33.5,41.1,15.9,19.3
9,Torgersen,Adelie,male,34.6,46.0,17.6,21.5
