# Practicing Series Vectorized Operations with Penguins Data

Now it's time to put your knowledge of vectorized operations on Pandas series to the test. In this lab, we will be working with a dataset that contains information about penguins. Each penguin is described by various attributes such as species, island, culmen length, culmen depth, flipper length, body mass, and gender.

In this lab we will practice vectorized operations on Pandas series. We will learn how to perform arithmetic operations on series and apply mathematical functions to series.

In addition to individual topic-based activities, there will also be mixed-topic activities that require you to combine different operations to achieve a specific outcome. These activities will test your ability to apply multiple concepts simultaneously.

By the end of this lab, you will have gained a solid understanding of vectorized operations on Pandas series and be able to manipulate and analyze data efficiently using these techniques.

Let's dive into the activities and explore the power of vectorized operations on Pandas series!

## Look at the dataset

In [1]:
import pandas as pd

In [6]:
# Read the dataset into a DataFrame
df = pd.read_csv('/content/drive/MyDrive/Projetos/DataWars/penguins.csv')
df

Unnamed: 0,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


In [8]:
#Verify the number of missing values
df.isna().sum()

species               0
island                0
culmen_length_mm      2
culmen_depth_mm       2
flipper_length_mm     2
body_mass_g           2
sex                  10
dtype: int64

In [9]:
#Verify the rows with missing values
df.loc[df.isna().any(axis=1)]

Unnamed: 0,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
3,Adelie,Torgersen,,,,,
8,Adelie,Torgersen,34.1,18.1,193.0,3475.0,
9,Adelie,Torgersen,42.0,20.2,190.0,4250.0,
10,Adelie,Torgersen,37.8,17.1,186.0,3300.0,
11,Adelie,Torgersen,37.8,17.3,180.0,3700.0,
47,Adelie,Dream,37.5,18.9,179.0,2975.0,
246,Gentoo,Biscoe,44.5,14.3,216.0,4100.0,
286,Gentoo,Biscoe,46.2,14.4,214.0,4650.0,
324,Gentoo,Biscoe,47.3,13.8,216.0,4725.0,
339,Gentoo,Biscoe,,,,,


In [11]:
#Verify the number of duplicated rows
df.duplicated().sum()

0

In [12]:
#With no duplicates I'll just drop the rows with missing values
penguins = df.dropna()

In [13]:
# Convert all columns to pandas Series
species = penguins['species']
island = penguins['island']
culmen_length_mm = penguins['culmen_length_mm']
culmen_depth_mm = penguins['culmen_depth_mm']
flipper_length_mm = penguins['flipper_length_mm']
body_mass_g = penguins['body_mass_g']
gender = penguins['sex']

In [14]:
print("Species: ", species)

Species:  0      Adelie
1      Adelie
2      Adelie
4      Adelie
5      Adelie
        ...  
338    Gentoo
340    Gentoo
341    Gentoo
342    Gentoo
343    Gentoo
Name: species, Length: 334, dtype: object


In [15]:
print("Island: ", island)

Island:  0      Torgersen
1      Torgersen
2      Torgersen
4      Torgersen
5      Torgersen
         ...    
338       Biscoe
340       Biscoe
341       Biscoe
342       Biscoe
343       Biscoe
Name: island, Length: 334, dtype: object


In [16]:
print("Culmen Length (mm): ", culmen_length_mm)

Culmen Length (mm):  0      39.1
1      39.5
2      40.3
4      36.7
5      39.3
       ... 
338    47.2
340    46.8
341    50.4
342    45.2
343    49.9
Name: culmen_length_mm, Length: 334, dtype: float64


In [17]:
print("Culmen Depth (mm): ", culmen_depth_mm)

Culmen Depth (mm):  0      18.7
1      17.4
2      18.0
4      19.3
5      20.6
       ... 
338    13.7
340    14.3
341    15.7
342    14.8
343    16.1
Name: culmen_depth_mm, Length: 334, dtype: float64


In [18]:
print("Flipper Length (mm): ", flipper_length_mm)

Flipper Length (mm):  0      181.0
1      186.0
2      195.0
4      193.0
5      190.0
       ...  
338    214.0
340    215.0
341    222.0
342    212.0
343    213.0
Name: flipper_length_mm, Length: 334, dtype: float64


In [19]:
print("Body Mass (g): ", body_mass_g)

Body Mass (g):  0      3750.0
1      3800.0
2      3250.0
4      3450.0
5      3650.0
        ...  
338    4925.0
340    4850.0
341    5750.0
342    5200.0
343    5400.0
Name: body_mass_g, Length: 334, dtype: float64


In [20]:
print("Gender: ", gender)

Gender:  0        MALE
1      FEMALE
2      FEMALE
4      FEMALE
5        MALE
        ...  
338    FEMALE
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 334, dtype: object


### 1. Add a constant value of 100 to the body_mass_g series


In [21]:
body_mass_g_plus_100 = body_mass_g + 100
body_mass_g_plus_100

0      3850.0
1      3900.0
2      3350.0
4      3550.0
5      3750.0
        ...  
338    5025.0
340    4950.0
341    5850.0
342    5300.0
343    5500.0
Name: body_mass_g, Length: 334, dtype: float64

#### 2. Subtract the 'culmen_length_mm' series from the 'flipper_length_mm' series

In [22]:
length_difference = flipper_length_mm - culmen_length_mm
length_difference

0      141.9
1      146.5
2      154.7
4      156.3
5      150.7
       ...  
338    166.8
340    168.2
341    171.6
342    166.8
343    163.1
Length: 334, dtype: float64

#### 3. Multiply to series

In [23]:
double_culmen_depth_mm = culmen_depth_mm * 2
double_culmen_depth_mm

0      37.4
1      34.8
2      36.0
4      38.6
5      41.2
       ... 
338    27.4
340    28.6
341    31.4
342    29.6
343    32.2
Name: culmen_depth_mm, Length: 334, dtype: float64

#### 4. Raise the 'flipper_length_mm' series to the power

In [24]:
flipper_length_mm_squared = flipper_length_mm ** 2
flipper_length_mm_squared

0      32761.0
1      34596.0
2      38025.0
4      37249.0
5      36100.0
        ...   
338    45796.0
340    46225.0
341    49284.0
342    44944.0
343    45369.0
Name: flipper_length_mm, Length: 334, dtype: float64

#### 5. Calculate the mean of the 'culmen_length_mm' series and subtract it from each value in the series

In [26]:
culmen_length_mm_mean_centered = culmen_length_mm - culmen_length_mm.mean()
culmen_length_mm_mean_centered

0     -4.894311
1     -4.494311
2     -3.694311
4     -7.294311
5     -4.694311
         ...   
338    3.205689
340    2.805689
341    6.405689
342    1.205689
343    5.905689
Name: culmen_length_mm, Length: 334, dtype: float64

#### 6. Concatenate the 'species' and 'gender' series

In [27]:
species_and_gender = species + '-' + gender
species_and_gender

0        Adelie-MALE
1      Adelie-FEMALE
2      Adelie-FEMALE
4      Adelie-FEMALE
5        Adelie-MALE
           ...      
338    Gentoo-FEMALE
340    Gentoo-FEMALE
341      Gentoo-MALE
342    Gentoo-FEMALE
343      Gentoo-MALE
Length: 334, dtype: object

#### 7. Perform element-wise addition

In [28]:
culmen_length_plus_depth_mm = culmen_length_mm + culmen_depth_mm
culmen_length_plus_depth_mm

0      57.8
1      56.9
2      58.3
4      56.0
5      59.9
       ... 
338    60.9
340    61.1
341    66.1
342    60.0
343    66.0
Length: 334, dtype: float64

#### 8. Sort `culmen_length_mm` in descending order



In [29]:
culmen_length_mm_sorted = culmen_length_mm.sort_values(ascending=False)
culmen_length_mm_sorted

253    59.6
169    58.0
321    55.9
215    55.8
335    55.1
       ... 
18     34.4
92     34.0
70     33.5
98     33.1
142    32.1
Name: culmen_length_mm, Length: 334, dtype: float64

#### 9. Divide `flipper_length_mm` by `culmen_length_mm`


In [30]:
length_ratio = flipper_length_mm / culmen_length_mm
length_ratio

0      4.629156
1      4.708861
2      4.838710
4      5.258856
5      4.834606
         ...   
338    4.533898
340    4.594017
341    4.404762
342    4.690265
343    4.268537
Length: 334, dtype: float64