# Damaged Logging

## Introduction

A project about of analyzing a statistic of damaged logging wood in Germany using Python.

This is my individual project for the module **Research Software Engineering** in SS24.
The task was to analyze a dataset from [genesis.destatis](https://www-genesis.destatis.de/genesis/online?operation=abruftabelleBearbeiten&levelindex=1&levelid=1713202276894&auswahloperation=abruftabelleAuspraegungAuswaehlen&auswahlverzeichnis=ordnungsstruktur&auswahlziel=werteabruf&code=41261-0003&auswahltext=&werteabruf=starten)
using Python and to find interesting aspects and potential questions that could be explored using this data.

## What is the dataset about?

The dataset contains statistics on forest wood harvesting due to various damages in Germany,
listed by year, type of wood species groups, and ownership types of forests. 

Each entry specifies the volume of wood harvested (in cubic meters) due to different causes
such as wind/storm, snow/ice damage, insects, drought, and other reasons.

In [1]:
import sys
import os

# Get the current notebook directory
notebook_dir = os.getcwd()

# Construct the path to the src directory
src_dir = os.path.join(notebook_dir, '..', 'src')

# Add the src directory to the Python path
sys.path.append(src_dir)

from src.damagedlogginganalyzer.DamagedLoggingAnalyzer import DamagedLoggingAnalyzer

Please set the output directory where the plots will be saved.

In [2]:
out_dir = '../path/to/your/out/dir'

## Temporal Trends

**Question**:
How has the damage-caused wood harvesting changed over the years? 

I created individual plots for the total volume of wood harvested due to different reasons (drought, wind/storm, snow, insects, miscellaneous, total) and different owners over the years for different types of wood species.
Here are some examples (the other plots can be found in the [plots](https://github.com/HokageM/DamagedLoggingAnalyzer/tree/main/plots)/specie/reason/owner/plot.png directory or can be generated with:



In [3]:
with DamagedLoggingAnalyzer(out_dir) as analyzer:
    analyzer.read_in_csv('../data/DamagedLoggingWoodFixTable.csv')
    analyzer.analyze(plot_temporal_dependencies_all=True)
    print("Plotting Done")

Plotting Temporal Dependencies (all specie, reason and owner combinations)...
Plots saved in: ../path/to/your/out/dir
Plotting Done


Deaths of Oak and Red Oak caused by insects and owned by `Insgesamt` over the years in Germany:

<img src="../plots/Eiche_und_Roteiche/Insekten/Insgesamt/plot.png" width="500">


Deaths of Pine caused by insects and owned by `Insgesamt` over the years in Germany:

<img src="../plots/Kiefer_und_L�rche/Insekten/Insgesamt/plot.png" width="500">


Additionally, I created combined plots for the different types of wood species.
**Note:** In the following, I will only show the combined plots for the different types of wood species and owned by `Insgesamt`. The other plots can be found in the [plots](https://github.com/HokageM/DamagedLoggingAnalyzer/tree/main/plots)/specie/all_reasons/owner/plot.png directory or can be generated with the following:


In [4]:
with DamagedLoggingAnalyzer(out_dir) as analyzer:
    analyzer.read_in_csv('../data/DamagedLoggingWoodFixTable.csv')
    analyzer.analyze(plot_reason_dependencies=True)
    print("Plotting Done")

Plotting Reason Dependencies ...
Plots saved in: ../path/to/your/out/dir
Plotting Done


Total Oak and Red Oak deaths over the years in Germany:

<img src="../plots/Eiche_und_Roteiche/all_reasons/Insgesamt/plot.png" width="500">

Total Beech and Hardwood deaths over the years in Germany:

<img src="../plots/Buche_und_sonstiges_Laubholz/all_reasons/Insgesamt/plot.png" width="500">

Total Spruce deaths over the years in Germany:

<img src="../plots/Fichte_und_Tanne_und_Douglasie_und_sonstiges_Nadelholz/all_reasons/Insgesamt/plot.png" width="500">

Total Pine deaths over the years in Germany:

<img src="../plots/Kiefer_und_L�rche/all_reasons/Insgesamt/plot.png" width="500">

Total tree deaths over the years in Germany:

<img src="../plots/Insgesamt/all_reasons/Insgesamt/plot.png" width="500">

All in all, one can see that the deaths of all species due to the most reasons depend on the year and fluctuate between high and low values.
However, the total deaths of all species are increasing over the years especially for the reasons `Sonstiges` (miscellaneous), which could be caused by fires, diseases, or other reasons.
The definition of `Sonstiges` is not clear in the dataset.

**Question**:
Are there increasing trends in certain types of damage like drought or insects, possibly linked to climate change?

All in all, the death of all species due to drought, snow, and insects can be modeled as linear (near constant) functions.
Please look in [Prediction_2024](https://github.com/HokageM/DamagedLoggingAnalyzer/tree/main/plots/Prediction_2024) for the function estimations.

The deaths of all species due to wind/storm depends on the year and fluctuate between high and low values.
But one can see a very high number of deaths due to wind/storm in the year 2006 and 2018.

The deaths of all species due to `Sonstiges` (miscellaneous) can be modeled quit good with a polynomial function and are increasing over the years.

The total deaths of all species are increasing over the years.


**Question**:
How will the year 2024 look like in terms of the volume of wood harvested due to different causes?

I used polynomial regression with k-fold cross validation to predict the volume of wood harvested due to different causes in the year 2024.
**Note:** The prediction is based on the data from 2006 to 2023. All plots can be found in the [Prediction_2024](https://github.com/HokageM/DamagedLoggingAnalyzer/tree/main/plots/Prediction_2024) directory or can be generated with the following command:


In [5]:
with DamagedLoggingAnalyzer(out_dir) as analyzer:
    analyzer.read_in_csv('../data/DamagedLoggingWoodFixTable.csv')
    analyzer.analyze(predict_temporal_dependencies=True)
    print("Prediction Done")

Minimum value: 64.54089118567248 at index 0
Best degree: 1
Train R² score (1 is best) for the best model: 0.015957323629618325

Eiche und Roteiche, Einschlagsursache: Wind/ Sturm, Staatswald (Bundes- und Landeswald) in 2024: [44.47058824]
Minimum value: 1.6523611290688178 at index 0
Best degree: 1
Train R² score (1 is best) for the best model: 0.0005065264945637304

Eiche und Roteiche, Einschlagsursache: Wind/ Sturm, K�rperschaftswald in 2024: [25.77777778]
Minimum value: 508.0141264795974 at index 0
Best degree: 1
Train R² score (1 is best) for the best model: 0.20342007093337378

Eiche und Roteiche, Einschlagsursache: Wind/ Sturm, Privatwald in 2024: [82.1503268]
Minimum value: 15.3667500929319 at index 0
Best degree: 1
Train R² score (1 is best) for the best model: 0.05879223673737988

Eiche und Roteiche, Einschlagsursache: Wind/ Sturm, Insgesamt in 2024: [152.60130719]
Minimum value: 2.7867310407869885 at index 0
Best degree: 1
Train R² score (1 is best) for the best model: 0.14519

The death of all species due to "Sonsitges" (miscellaneous) can be modeled quit good with a polynomial function, e.g. for the Beech and Hardwood species group:

<img src="../plots/Prediction_2024/Buche_und_sonstiges_Laubholz/Sonstiges/Insgesamt/plot.png" width="500">

**In some cases prediction does not make sense**, because the death do not follow a polynomial function and depend on other factors, e.g. death causes by insects:

<img src="../plots/Prediction_2024/Buche_und_sonstiges_Laubholz/Insekten/Insgesamt/plot.png" width="500">


Deaths due to nature like wind/storm, snow, and drought can be modeled as linear functions, e.g. for the Beech and Hardwood species group.
**Note**: One need to handle the outliers in the data, e.g. the death of the year 2018 for the Beech and Hardwood species group due to wind/storm, 
this can be done by using a Ridge Regression model.
Those outliers come from special events like storms, which are not predictable with the current model.

<img src="../plots/Prediction_2024/Eiche_und_Roteiche/Wind__Sturm/Insgesamt/plot.png" width="500">


## Damage Types 
**Question**:
Which type of damage causes the most wood harvesting? 

This question can be answered by calculating the total volume of wood harvested due to different reasons for each type of wood species group:

In [6]:
with DamagedLoggingAnalyzer(out_dir) as analyzer:
    analyzer.read_in_csv('../data/DamagedLoggingWoodFixTable.csv')
    analyzer.analyze(calculate_most_dangerous_reasons=True)

Most dangerous reasons for Eiche und Roteiche:
Einschlagsursache: Wind/ Sturm: 2048.0
Einschlagsursache: Sonstiges: 725.0
Einschlagsursache: Schnee/ Duft: 98.0
Einschlagsursache: Insgesamt: 74.0
Einschlagsursache: Trockenheit: 72.0
Einschlagsursache: Insekten: 46.0
Most dangerous reasons for Buche und sonstiges Laubholz:
Einschlagsursache: Wind/ Sturm: 9124.0
Einschlagsursache: Sonstiges: 1343.0
Einschlagsursache: Insekten: 499.0
Einschlagsursache: Insgesamt: 91.0
Einschlagsursache: Trockenheit: 87.0
Einschlagsursache: Schnee/ Duft: 86.0
Most dangerous reasons for Kiefer und L�rche:
Einschlagsursache: Wind/ Sturm: 19806.0
Einschlagsursache: Sonstiges: 7364.0
Einschlagsursache: Insekten: 3467.0
Einschlagsursache: Schnee/ Duft: 116.0
Einschlagsursache: Trockenheit: 84.0
Einschlagsursache: Insgesamt: 80.0
Most dangerous reasons for Fichte und Tanne und Douglasie und sonstiges Nadelholz:
Einschlagsursache: Sonstiges: 208725.0
Einschlagsursache: Wind/ Sturm: 104206.0
Einschlagsursache: Inse

The maximum damage for each type of wood species group is:

| Specie                                                 | Reason      | Amount |
|--------------------------------------------------------|-------------|--------|
| Eiche und Roteiche                                     | Wind/ Sturm | 2048   |
| Buche und sonstiges Laubholz                           | Wind/ Sturm | 9124   |
| Kiefer und L�rche                                      | Wind/ Sturm | 19806  |
| Fichte und Tanne und Douglasie und sonstiges Nadelholz | Sonstiges   | 208725 |
| Insgesamt                                              | Sonstiges   | 218181 |

**Question**:
How do different types of forests compare in their vulnerability to specific damage types?

Please look at the results from above.

Analyzing the plots show that `Buche und sonstiges Laubholz` and `Eiche und Roteiche` have fewer deaths in any reason 
compared to `Kiefer und L�rche` and `Fichte und Tanne und Douglasie und sonstiges Nadelholz`.
The death counts are up to 10 times higher for `Kiefer und L�rche` and `Fichte und Tanne und Douglasie und sonstiges Nadelholz`
compared to `Buche und sonstiges Laubholz` and `Eiche und Roteiche`.

## Forest Management
**Question**:
Are there noticeable differences in wood harvesting due to damage across different forest ownership types 
(e.g., state-owned vs. privately-owned forests)? This could reflect different management practices and their effectiveness.

This question can be answered by plotting the total volume of wood harvested due to different reasons for each owner type combined:

In [7]:
with DamagedLoggingAnalyzer(out_dir) as analyzer:
    analyzer.read_in_csv('../data/DamagedLoggingWoodFixTable.csv')
    analyzer.analyze(plot_owner_dependencies=True)
    print("Plotting Done")

Plotting Owner Dependencies ...
Plots saved in: ../path/to/your/out/dir
Plotting Done


Analyzing the plots show that the deaths due to different reasons are similar for the different owners. So it seems that the owner does not have a big impact on the death count.
Here are some examples and the other plots can be found in the [plots](https://github.com/HokageM/DamagedLoggingAnalyzer/tree/main/plots)/specie/reason/all_owners/plot.png:

Deaths of Oak and Red Oak caused by insects and owned by `Insgesamt` over the years in Germany:

<img src="../plots/Eiche_und_Roteiche/Insekten/all_owners/plot.png" width="500">

Deaths of Pine caused by insects and owned by `Insgesamt` over the years in Germany:

<img src="../plots/Kiefer_und_L�rche/Insekten/all_owners/plot.png" width="500">