<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Exercise.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Exercise: Introduction to Pandas 
© ExploreAI Academy

In this exercise, we test our understanding of basic Pandas functions.


## Learning objectives

By the end of this train, you should be able to:
* Create and display the data stored in a DataFrame.


## Exercises

### Import libraries

In [1]:
import pandas as pd
import numpy as np

### Exercise 1

Create a Pandas DataFrame from a dictionary. The dictionary keys should be `Crop`, `Yield_per_Acre`, and `Country`, and the values should be lists, as shown in the data below:

- Crop: `['Wheat', 'Corn', 'Rice', 'Soybean']`
- Yields per Acre (in tons): `[3, 4.5, 5, 2.8]`
- Country: `['USA', 'China', 'India', 'Brazil']`

In [10]:
# Your solution here...
dict_df = dict(Crop=['Wheat', 'Corn', 'Rice', 'Soybean'], Yield_per_Acre=[3, 4.5, 5, 2.8], Country=['USA', 'China', 'India', 'Brazil'])
df = pd.DataFrame(dict_df)
df.head()

Unnamed: 0,Crop,Yield_per_Acre,Country
0,Wheat,3.0,USA
1,Corn,4.5,China
2,Rice,5.0,India
3,Soybean,2.8,Brazil


### Exercise 2

Using the DataFrame we created in **Exercise 1**, return only observations for index values 1 and 2 from the dataset.

In [12]:
# Your solution here...
df.iloc[1:3]

Unnamed: 0,Crop,Yield_per_Acre,Country
1,Corn,4.5,China
2,Rice,5.0,India


### Exercise 3

- Using the NumPy library, create a a 3x2 array named `tree_array` to store information for different tree species as follows: 

`[[30, 25], [60, 30], [25, 20]]`

- Use the array to create a DataFrame with the following columns: `Max_Height_m`, and `Growth_Rate_cm_per_year`, and the following index: `Maple`, `Pine` and `Birch`. 

In [14]:
# Your solution here...
tree_array = np.array([[30, 25], [60, 30], [25, 20]])
columns = ['Max_Height_m', 'Growth_Rate_cm_per_year']
index = ['Maple', 'Pine', 'Birch']

df = pd.DataFrame(data=tree_array, columns=columns, index=index)
df

Unnamed: 0,Max_Height_m,Growth_Rate_cm_per_year
Maple,30,25
Pine,60,30
Birch,25,20


### Exercise 4

Using the DataFrame we created in **Exercise 3**, display the `Max_Height_m` and `Growth_Rate_cm_per_year` columns for the `Pine` tree.

In [20]:
# Your solution here...
df.loc['Pine']

Max_Height_m               60
Growth_Rate_cm_per_year    30
Name: Pine, dtype: int32

### Exercise 5

- We have a dataset named `Animals.csv` that stores information about a variety of animal species. Use Pandas to load the CSV file into a DataFrame called `animals_df`. 

- Display the first 5 rows of the DataFrame.

**NB:** Use this link to access the dataset: https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Animals.csv

In [21]:
# Your solution here...
animals_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Animals.csv')
animals_df.head()

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
0,African Elephant,60,Grasslands,Vulnerable
1,Bengal Tiger,15,Forests,Endangered
2,Blue Whale,80,Ocean,Endangered
3,Giant Panda,20,Temperate Forest,Vulnerable
4,Komodo Dragon,30,Islands,Vulnerable


### Exercise 6

Using the DataFrame we have created in **Exercise 5**, access and display information on animals that are listed as `Critically Endangered` in the `Conservation Status` column.

In [22]:
# Your solution here...
animals_df[animals_df['Conservation Status'] == 'Critically Endangered']

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
9,Orangutan,35,Rainforests,Critically Endangered


## Solutions

### Exercise 1

In [9]:
# Creating a dictionary to hold agriculture data
agri_data = {
    'Crop': ['Wheat', 'Corn', 'Rice', 'Soybean'],
    'Yield_per_Acre': [3, 4.5, 5, 2.8],
    'Country': ['USA', 'China', 'India', 'Brazil']
}

# Creating a DataFrame from the agriculture data dictionary
agri_df = pd.DataFrame(agri_data)

# Printing the created DataFrame
print(agri_df)

      Crop  Yield_per_Acre Country
0    Wheat             3.0     USA
1     Corn             4.5   China
2     Rice             5.0   India
3  Soybean             2.8  Brazil


### Exercise 2

In [13]:
agri_df.iloc[1:3]

Unnamed: 0,Crop,Yield_per_Acre,Country
1,Corn,4.5,China
2,Rice,5.0,India


In the solution above, we use `.iloc` for **position-based indexing** where we select rows based on their integer index.

We use the range 1 to 3 (not included). This returns the 2nd and the 3rd rows of data which correspond to `Corn` and `Rice` observations.

### Exercise 3

In [18]:
# Creating a NumPy array to represent data for different tree species
tree_array = np.array([[30, 25], [60, 30], [25, 20]])

# Defining the column names for the DataFrame
columns = ['Max_Height_m', 'Growth_Rate_cm_per_year']

# Defining the index for the DataFrame which represent the names of the tree species
index = ['Maple', 'Pine', 'Birch']

# Creating a DataFrame from the NumPy array above
tree_df = pd.DataFrame(data=tree_array, index=index, columns=columns)

# Printing the created DataFrame
print(tree_df)

       Max_Height_m  Growth_Rate_cm_per_year
Maple            30                       25
Pine             60                       30
Birch            25                       20


### Exercise 4

In [19]:
tree_df.loc['Pine'][['Max_Height_m', 'Growth_Rate_cm_per_year']]

Max_Height_m               60
Growth_Rate_cm_per_year    30
Name: Pine, dtype: int32

In the above solution, we have used the `.loc` method for **label-based indexing.**

The label `Pine` is used to select the row corresponding to the Pine tree. From this row, only the specified columns (`Max_Height_m` and `Growth_Rate_cm_per_year`) are selected and displayed. 

### Exercise 5

In [None]:
# Loading data from a CSV file named 'Animals.csv' into a DataFrame
animals_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/Python/Animals.csv')

# Displaying the first 5 rows of the DataFrame
animals_df.head()

The `.head(n)` function returns the first n rows, with the default being 5. This is useful for getting a quick view of the dataset

### Exercise 6

In [23]:
# Create new DataFrame based on a condition
Critically_endangered = animals_df[animals_df['Conservation Status'] == 'Critically Endangered']

# Displaying the newly created DataFrame
Critically_endangered

Unnamed: 0,Species,Average Lifespan (Years),Habitat,Conservation Status
9,Orangutan,35,Rainforests,Critically Endangered


**Conditional selection** is performed on the `animals_df` DataFrame. The condition `(animals_df['Conservation Status'] == 'Critically Endangered')` checks each row to see if the `Conservation Status` column's value is `Critically Endangered`. The rows that meet this condition are then used to create a new DataFrame named `Critically_endangered`.

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>