<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Exercise.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Exercise: Working with DataFrames


In this exercise, we'll be exploring some ways in which we can manipulate the data stored within a Pandas DataFrame.

## Learning objectives

By the end of this train, you should be able to:
* Sort, filter, and group a Pandas DataFrame.
* Create and delete columns in a Pandas DataFrame.
* Transform a Pandas DataFrame.

## Exercises

### Import libraries and dataset

In [1]:
import pandas as pd
import numpy as np

We are using a dataset named `Animals.csv` that stores information about a variety of animal species.

In [2]:
animal_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Animals.csv')

### Exercise 1

Filter the dataset to show only species that have an `Average Lifespan (Years)` **greater than 20** and are classified as `Vulnerable` in `Conservation Status`.

In [4]:
# Your solution here...

df = animal_df

# Filter species with Average Lifespan (Years) > 20 and Conservation Status == 'Vulnerable'
filtered_df = df[(df['Average Lifespan (Years)'] > 20) & (df['Conservation Status'] == 'Vulnerable')]

filtered_df



Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
0,African Elephant,60,Grasslands,Vulnerable
4,Komodo Dragon,30,Islands,Vulnerable
5,Polar Bear,25,Arctic,Vulnerable


### Exercise 2

**a)** Create a new column `Lifespan in Months` by converting the `Average Lifespan (Years)` to months.


In [14]:
# Your solution here...


animal_df['lifespan_in_months'] = animal_df['Average Lifespan (Years)'] * 12

animal_df

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status,lifespan_in_months
0,African Elephant,60,Grasslands,Vulnerable,720
1,Bengal Tiger,15,Forests,Endangered,180
2,Blue Whale,80,Ocean,Endangered,960
3,Giant Panda,20,Temperate Forest,Vulnerable,240
4,Komodo Dragon,30,Islands,Vulnerable,360
5,Polar Bear,25,Arctic,Vulnerable,300
6,Red Fox,5,Various,Least Concern,60
7,Emperor Penguin,20,Antarctic,Near Threatened,240
8,Koala,13,Eucalyptus Forests,Vulnerable,156
9,Orangutan,35,Rainforests,Critically Endangered,420


**b)** Delete the `Lifespan in Months` column from the DataFrame.

In [15]:
# Your solution here...

animal_df.drop(columns=['lifespan_in_months'], axis=1, inplace=True)

### Exercise 3

**a)** Filter the DataFrame `animal_df` to include only species with an `Average Lifespan (Years)` **greater than 50 years**. Store the result in a new DataFrame `Long_lived_species`.

In [17]:
# Your solution here...

long_lived_species = animal_df[animal_df['Average Lifespan (Years)'] > 50 ]

long_lived_species

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
0,African Elephant,60,Grasslands,Vulnerable
2,Blue Whale,80,Ocean,Endangered
13,Green Sea Turtle,80,"Oceans, Beaches",Endangered


**b)** Using the `apply` and `lambda` functions, convert the `Average Lifespan (Years)` of all species into the float data type in `Long_lived_species`.

In [20]:
# Your solution here...

long_lived_species['Average Lifespan (Years)'] = long_lived_species['Average Lifespan (Years)'].apply(lambda x: float(x))

# Display the updated DataFrame 
long_lived_species

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  long_lived_species['Average Lifespan (Years)'] = long_lived_species['Average Lifespan (Years)'].apply(lambda x: float(x))


Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
0,African Elephant,60.0,Grasslands,Vulnerable
2,Blue Whale,80.0,Ocean,Endangered
13,Green Sea Turtle,80.0,"Oceans, Beaches",Endangered


### Exercise 4

List all columns for species that have the string "`Endangered`" as part of their conservation status.

In [26]:
# Your solution here...

Endangered_df = animal_df[animal_df['Conservation Status'].str.contains('Endangered')]

Endangered_df

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
1,Bengal Tiger,15,Forests,Endangered
2,Blue Whale,80,Ocean,Endangered
9,Orangutan,35,Rainforests,Critically Endangered
13,Green Sea Turtle,80,"Oceans, Beaches",Endangered


### Exercise 5

List the mean `Average Lifespan (Years)` of species for each `Habitat` in descending order.

In [31]:
# Your solution here...

animal_df.groupby('Habitat')['Average Lifespan (Years)'].mean().sort_values(ascending=False)

Habitat
Oceans, Beaches       80.0
Ocean                 80.0
Grasslands            60.0
Rainforests           35.0
Islands               30.0
Arctic                25.0
Antarctic             20.0
Temperate Forest      20.0
Forests, Seacoasts    20.0
Forests               15.0
Mountain Ranges       15.0
Savanna               14.0
Eucalyptus Forests    13.0
Forests, Tundra        8.0
Various                5.0
Name: Average Lifespan (Years), dtype: float64

### Exercise 6

Create a column `Threat Level` that categorises species into `High`, `Medium`, or `Low` based on their `Conservation Status`: 
- **High** for `Critically Endangered` and `Endangered` 
- **Medium** for `Vulnerable` and `Near Threatened`
- **Low** for `Least Concern`

In [30]:
# Your solution here...

# Function to categorise species based on their conservation status.
def categorise_threat(status):
    if status in ['Critically Endangered', 'Endangered']:
        return 'High'
    elif status in ['Vulnerable', 'Near Threatened']:
        return 'Medium'
    else:
        return 'Low'
# Apply the 'categorise_threat' function to the 'Conservation Status' column
animal_df['Threat Level'] = animal_df['Conservation Status'].apply(categorise_threat)

animal_df

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status,Threat Level
0,African Elephant,60,Grasslands,Vulnerable,Medium
1,Bengal Tiger,15,Forests,Endangered,High
2,Blue Whale,80,Ocean,Endangered,High
3,Giant Panda,20,Temperate Forest,Vulnerable,Medium
4,Komodo Dragon,30,Islands,Vulnerable,Medium
5,Polar Bear,25,Arctic,Vulnerable,Medium
6,Red Fox,5,Various,Least Concern,Low
7,Emperor Penguin,20,Antarctic,Near Threatened,Medium
8,Koala,13,Eucalyptus Forests,Vulnerable,Medium
9,Orangutan,35,Rainforests,Critically Endangered,High


<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:140px";/>
</div>