<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Exercise.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Exercise: Working with DataFrames
© ExploreAI Academy

In this exercise, we'll be exploring some ways in which we can manipulate the data stored within a Pandas DataFrame.

## Learning objectives

By the end of this train, you should be able to:
* Sort, filter, and group a Pandas DataFrame.
* Create and delete columns in a Pandas DataFrame.
* Transform a Pandas DataFrame.

## Exercises

### Import libraries and dataset

In [None]:
import pandas as pd
import numpy as np

We are using a dataset named `Animals.csv` that stores information about a variety of animal species.

In [None]:
animal_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Animals.csv')

### Exercise 1

Filter the dataset to show only species that have an `Average Lifespan (Years)` **greater than 20** and are classified as `Vulnerable` in `Conservation Status`.

In [None]:
# Your solution here...

### Exercise 2

**a)** Create a new column `Lifespan in Months` by converting the `Average Lifespan (Years)` to months.


In [None]:
# Your solution here...

**b)** Delete the `Lifespan in Months` column from the DataFrame.

In [None]:
# Your solution here...

### Exercise 3

**a)** Filter the DataFrame `animal_df` to include only species with an `Average Lifespan (Years)` **greater than 50 years**. Store the result in a new DataFrame `Long_lived_species`.

In [None]:
# Your solution here...

**b)** Using the `apply` and `lambda` functions, convert the `Average Lifespan (Years)` of all species into the float data type in `Long_lived_species`.

In [None]:
# Your solution here...

### Exercise 4

List all columns for species that have the string "`Endangered`" as part of their conservation status.

In [None]:
# Your solution here...

### Exercise 5

List the mean `Average Lifespan (Years)` of species for each `Habitat` in descending order.

In [None]:
# Your solution here...

### Exercise 6

Create a column `Threat Level` that categorises species into `High`, `Medium`, or `Low` based on their `Conservation Status`: 
- **High** for `Critically Endangered` and `Endangered` 
- **Medium** for `Vulnerable` and `Near Threatened`
- **Low** for `Least Concern`

In [None]:
# Your solution here...

## Solutions

### Exercise 1

In [None]:
# Filter the DataFrame based on two conditions
filtered_df = animal_df[(animal_df['Average Lifespan (Years)'] > 20) & (animal_df['Conservation Status'] == 'Vulnerable')]

# Display the filtered DataFrame
filtered_df

We filter our DataFrame based on two conditions that are combined using the `&` operator:
- The first part of the condition, `(animal_df['Average Lifespan (Years)'] > 20)`, selects rows where the average lifespan is over 20 years.
- The second part, `(animal_df['Conservation Status'] == 'Vulnerable')`, selects rows where the conservation status is equal to 'Vulnerable'.

### Exercise 2

**a)**

In [None]:
# Creating 'Lifespan in Months'
animal_df['Lifespan in Months'] = animal_df['Average Lifespan (Years)'] * 12
animal_df

We add a new column named `Lifespan in Months` calculated by multiplying each value in `Average Lifespan (Years)` by 12 to convert from years to months.

**b)**

In [None]:
# Deleting 'Lifespan in Months'
animal_df.drop('Lifespan in Months', axis=1, inplace=True)
animal_df

We remove the newly created `Lifespan in Months` from the DataFrame using the `drop` method. `axis=1` indicates the deletion of a column while `inplace=True` is used to ensure that the change is applied directly to `animal_df`.

### Exercise 3

**a)**

In [None]:
# Filtering to find long-lived species whose 'Average Lifespan (Years)' is greater than 50 years.
Long_lived_species = animal_df[animal_df['Average Lifespan (Years)'] > 50]

# Display the filtered DataFrame
Long_lived_species

**b)**

In [None]:
# Transform the Average Lifespan (Years) column to the floats
Long_lived_species['Average Lifespan (Years)'] = Long_lived_species['Average Lifespan (Years)'].apply(lambda x: float(x))

# Display the updated DataFrame 
Long_lived_species

We use a **lambda** function to **apply** the operation of converting to float across all values in the `Average Lifespan (Years)` column in the `Long_lived_species` DataFrame.

### Exercise 4

In [None]:
# Filtering to find species with 'Endangered' in their conservation status
Endangered_species = animal_df[animal_df['Conservation Status'].str.contains('Endangered')]

# Display the filtered DataFrame
Endangered_species

We use the method `.str.contains('Endangered')` to identify species whose `Conservation Status` contains the substring `Endangered`. 
The result is stored in a new DataFrame, `Endangered_species`.

### Exercise 5

In [None]:
Mean_lifespan_by_habitat = animal_df.groupby('Habitat')['Average Lifespan (Years)'].mean().sort_values(ascending=False)

Mean_lifespan_by_habitat

The `groupby('Habitat')` groups the data in the DataFrame `animal_df`, where each group corresponds to a unique habitat.

Within each group, the `.mean()` method is applied to find the average lifespan of species within that habitat.

The `sort_values(ascending=False)` method then sorts these average values in descending order. 

### Exercise 6

In [None]:
# Function to categorise species based on their conservation status.
def categorise_threat(status):
    if status in ['Critically Endangered', 'Endangered']:
        return 'High'
    elif status in ['Vulnerable', 'Near Threatened']:
        return 'Medium'
    else:
        return 'Low'
# Apply the 'categorise_threat' function to the 'Conservation Status' column
animal_df['Threat Level'] = animal_df['Conservation Status'].apply(categorise_threat)

animal_df

We define the `categorise_threat` function which uses conditional statements to categorise species based on their conservation status.

We then use the `apply` method to apply the function to each row in the `Conservation Status` column.

The results are stored in a new column, `Threat Level`.

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>