Supporting Google Colab document for the Medium tutorial ["AI for sustainability (#1): A tool for analyzing company transition plans"](https://medium.com/@schimanski.tobi/ai-for-sustainability-1-a-tool-for-analyzing-company-transition-plans-7d75853f933b?source=friends_link&sk=d7c5aaf0af36d4618d26fdc1c34abf01). Please have a read to understand how we got here. Also feel free to read the [paper for this tool](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4826207).

Let's start with the code. We actually only have to do three steps to analyse full reports along 64 indicators.

This file will resets every time you open it. So you might want to download or store your results somewhere.

## 1. Clone the GitHub Repository

Execute this code:

In [None]:
!git clone https://github.com/tobischimanski/transition_NLP.git
# move to folder and show content
%cd transition_NLP/
%ls

If you click on the "folder" object on the left, then you'll see some data and code has been downloaded inside the folder "transition_NLP". Amongst others, you can already see examples of how the output will look like, test data and the code.

## 2. Install the requirements

We need openai and llama-index for running the code.

In [None]:
!pip install -r requirements.txt

## 3. Run the code

**IMPORTANT**: For running the code, you will need an own OpenAI API key. Please follow [this tutorial](https://www.merge.dev/blog/chatgpt-api-key) to create an API key. The API key will look something like "sk-...".

The code requires four inputs:

- The OpenAI API key (you've just created). It should look something like "sk-…".
- The path to the file you want to analyze. In this example, we use the Volkswagen 2022 sustainability report that is already in the folder "Test_Data".
- The model, you want to use. In the example, we use gpt-4o. I would recommend using gpt-3.5-turbo for playing around (marginal cost) and gpt-4o for production these days (see all models [here](https://platform.openai.com/docs/models/)).
- Optional: For testing, it might make sense to not run all 64 indicators. By adding a number k in the end (e.g., 4), you limit the analysis to the first k indicators.

In [None]:
# Usage pattern: python transition_analysis.py api_key report model [num indicators]
# Example
# FILL IN YOUR API KEY HERE
!python transition_analysis.py sk-... ./Test_Data/CSR_VOW3_2022.pdf gpt-4o

## Explore Output

If you have run this code, you will find a new Excel in the folder "Excel_Output" with the name "CSR_VOW3_2022_gpt-4o_topk8_paramsall.xlsx". This signals the name of the report, the model used for answering, and finally "paramsall" shows you that you used all 64 indicators (not just a subset). "topk8" is a topic for further tutorials.

You can download the Excel and look at it or explore it in code. Since I'm not sure that everyone could run this because of the API key, I'll explore the output that is already there, i.e., "CSR_VOW3_2022_gpt-4_topk8_paramsall.xlsx" with was created with GPT-4 instead of GPT-4o.

In [None]:
# load already existing data
import pandas as pd
output_gpt4 = pd.read_excel("./Excel_Output/CSR_VOW3_2022_gpt-4_topk8_paramsall.xlsx", index_col=0)

In [None]:
# show first five rows
output_gpt4.head()