# Pandas Exercises with Tree Data

This notebook provides a series of exercises using a dataset of trees in Paris.  These exercises will focus on basic data cleaning, exploration, and filtering.

In [1]:
import pandas as pd

# Load the dataset
url = "https://raw.githubusercontent.com/SkatAI/efrei-ml/refs/heads/master/data/les_arbres_upload_1k.csv"
df = pd.read_csv(url)

## Exercise 1: Basic Inspection

1.  Display the first 5 rows of the DataFrame using `.head()`.
2.  Display the last 5 rows of the DataFrame using `.tail()`.
3.  Get and print the shape of the DataFrame (number of rows and columns) using `.shape`.
4. Get and print the column names using `.columns`.


In [None]:
# Exercise 1 Solution Here


## Exercise 2: Handling Missing Data

1. Check for missing values in each column using `.isna().sum()`, and print the result.
2. Fill the missing values in the `variety` column with 'Unknown' using `.fillna()` and then display the number of missing values using `.isna().sum()`. 

In [None]:
# Exercise 2 Solution Here

## Exercise 3: Value Counts

1.  Calculate and print the value counts for the `location_type` column.
2.  Calculate and print the value counts for the `arrondissement` column.
3.  Calculate and print the value counts for the `species` column. 

- make sure to include NaNs
- display the 1st 10 rows of the results with `.head(10)`

In [None]:
# Exercise 3 Solution Here

## Exercise 4: Data Type Conversion

1. Convert the `circumference` column to integer type using `.astype(int)`. 
2. Convert the `height` column to float type using `.astype(float)`. 

In [None]:
# Exercise 4 Solution Here

## Exercise 5: Filtering and Subsetting

1. Create a new DataFrame `df_tall_trees` that contains only the trees with a `height` greater than 20.
2. Create a new DataFrame `df_platanes` that contains only the trees of `species` equal to 'x hispanica'.
3. Create a new DataFrame `df_old_trees` that contains only the trees of `stage` equal to 'Mature'.

In [None]:
# Exercise 5 Solution Here

Try to modify the df_tall_trees dataframe. For instance, replace the NaNs for variety with the empty string using .fillna()

`.fillna(values = {'variety': ''}, inplace = True)`

You will get an error 

        A value is trying to be set on a copy of a slice from a DataFrame

How do you solve that error?
Think of Mutable objects in python.

Once you've found the trick, recreate the `df_tall_trees` dataframe and make sure you cna modify it without seeing the same error.

In [None]:
# solution

## Exercise 6: Creating a new Column with Lambda

1. Create a new column `diameter` by calculating the diameter from the `circumference` column. 

Use a lambda function with `apply()` (diameter = circumference / pi ).

The pattern to use is : 

`df['new column'] = df['column'].apply( lambda row : <some tranformation on row>  )`


In [None]:
# Exercise 6 Solution Here

Now use the same pattern to create a new column from 2 existing columns

Calculate the ratio of height / max(circumference, 1) for a new column named `ratio`

Hint: you need to use `df.apply( <lambda pattern>, axis = 1   )`


## Exercise 7: Exporting Data

1. Calculate the value counts of the `species` column again and store the result in a new dataframe `species_counts` using `.value_counts().reset_index()`
2. Export `species_counts` DataFrame to a CSV file named `species_counts.csv` without the index using `.to_csv(index=False)`

In [None]:
# Exercise 7 Solution Here

## Exercise 8: Plotting a Histogram

1.  Create a histogram of the `circumference` column with 100 bins using `.hist(bins=100)`.

In [None]:
# Exercise 8 Solution Here