<center><img src="../images/logo.png" width="500" /></center>
<h3><center> Using GenerativeAI in Python: a compact self-contained overview </center></h3>

<div class="alert alert-block alert-danger"> Please note that an individual API-Key is required to run the notebook. </div>

### Introduction: Generative Machine Learning

####  _Basic idea_

Imagine to roll a dice multiple times, collect the outcomes and then build a probability distribution based on the frequencies.
By sampling from such a distribution, you can generate "realistic" numbers without using the dice anymore.
You can also combine the results with arithmeric operations so that "new" numbers can appear, even if they are not pictured on 
the starting faces.

Despite its simplicity, the previous idea is essential. By replacing "numbers" with "pictures/text/...",
so that "dice outcomes" are now general multimedia datasets,
and using more advanced probabilistic models (e.g. deep convolutional neoral networks)
instead of naive "frequencies",
it is possible to obtain models models capable of producing "creative" multimedia content, precisely as it was possible for the dice 
to produce "new" numbers.

*Generative Machine Learning is then like rolling a dice, but instead of producing numbers, you get content mirroring 
your training data.*

#### _Why today_

Nowadays we have either the computational power to run massive scientific
simulations, and enough theoretical understanding (Academic Research) to understand the fundamentals.
Applications in the Industry are the natural consequences.

On the one hand, you can in principle build a model according to your needs.
For instance, the [Pytorch](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) webpage provides a good starting point.
On the other hand, you can download a model already prepared (trained) by someone else, so that you only have to "roll the dice".
The following table offers a quick overview of multiple offers available today (November 2024).

| Provider | User | Source code | Content Trained for | Remarks |
| --- | --- | --- | --- | --- |
| [OpenAI-GPT](https://openai.com/api/pricing/) | Business |Closed, partially Open | General | Famous for ChatGPT |
| [BloombergGPT](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/) | Business |Closed | Finance | Possibly the best performance in Finance |
| [FinGPT](https://github.com/AI4Finance-Foundation/FinGPT) | Academic | Open | Finance | Can be partially retrained *with your PC!* |
| [ColossalAI](https://colossalai.org/) | Academic |Open | General | Offer business support | 
| [ClaudeAI](https://www.anthropic.com/news/claude-3-5-sonnet) | Business | Closed | General | Strongly ethical, results respect a "Constitution" |

The table is of course not comprehensive. The website [huggingface](https://huggingface.co/models) offers a huge collection of alternative solutions.

### Use Cases and Privacy Remarks

Due to its very recent development, it is hard to _precisely_ fill in a list of companies using Generative AI,
but a great landscape of many (400+)  Generative AI Startups is available from the [dealroom website](https://app.dealroom.co/lists/33530/).
Notable examples include the entertainment field, like
[videogames](https://www.forbes.com/sites/bernardmarr/2024/04/18/the-role-of-generative-ai-in-video-game-development/) or
[video editing](https://openai.com/index/waymark/).
Other examples include support for Software Development (e.g. [Micosoft Copilot](https://visualstudio.microsoft.com/de/github-copilot/)
and [Turintech](https://www.turintech.ai/)),
or Chatbot for Customer Assistance ([booking.com, Electronic Arts, ...](https://aws.amazon.com/de/ai/generative-ai/use-cases/chatbots-and-virtual-assistants/#:~:text=New%20generative%20AI%20features%20for,functional%20chatbot%20in%20just%20minutes)).


Examples in Finance include the [Bundesbank](https://www.bundesbank.de/de/aufgaben/themen/bundesbank-startet-textbasierte-intelligente-assistenten--856178), testing this technology for Text analysis, [Morgan Stanley](https://openai.com/index/morgan-stanley/) for more efficient
document search and Jane Street, included in the [list of Claude AI Customers](https://www.anthropic.com/customers) 
(without any additional information).

Inspired by these very informative articles from the [Turing Institute](https://www.turing.ac.uk/sites/default/files/2024-06/the_impact_of_large_language_models_in_finance_-_towards_trustworthy_adoption_1.pdf), and [McKinsey](https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/how-generative-ai-can-help-banks-manage-risk-and-compliance), we propose three simple and concrete use case prototypes:

 - Search for **global information** from the web more efficiently;
 - Find and exctract **local informations** from a given document;
 - Generate simple source code to help **learning by examples**.

We remark how when using online Generative AI, it is a good idea to **never provide sensitive informations** (unless you subscribed to
services ensuring on that regard, more on that in the following lines). We recommend to always
**generate output in a way that can be verified**, and to respect the
[Europen AI regulation](https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence)
on the matter.


Concerning the very important aspect of user privacy, even if most of the providers do not exacly claim how (and if) user data are stored,
the service [ClaudeAI](https://www.anthropic.com/news/claude-3-5-sonnet) explicitely claims that
at least your data are **not used as training material**, meaning that if another user is using the same service,
it should not be possible for him to retrieve information about your input.
OpenAI seems [to be less explicit](https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance) on this point,
while BloombergGPT does not provide any additional information on the website.
The topic is also not covered from FinGTP or Colossal AI, but since their models are open source
and in principle trainable on your local servers, this should be possible.
We recommend always contacting the correspoinding customer service to get clarifications on such a theme.

In our examples we used the OpenAI Python API due to the author's familiarity with the language,
without the intention of making any form of advertisement. In order to run the examples, **you need to provide a OpenAI API Key** for the 4o-mini model, as explained [here](https://platform.openai.com/docs/quickstart) (an OpenAI account is required). Finally, please note that some weblinks offered as resources at the end of task executions might be dead, since the model does not operate in real-time.

#### Case 1: _Performing an Efficient Web Search_

In this section we use the OpenAI library to building an interactive \"search\" button capable of summing up the **key points of a certain topic of choice**, including appropriate references from the web.

In [1]:
import os
# Modules for the interactive widgets
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interactive_output, interact
from ipywidgets import HBox, VBox
# Enable support for Word files: temporarily disabled
# import docx2txt
# Enable import from PDF files, for later
from pypdf import PdfReader
# OpenAI API modules
import openai
from openai import OpenAI

<div class="alert alert-block alert-info"> <b>Note:</b> Configure your OpenAI API Licence key. </div>

In [2]:
try:
    the_key = os.environ["OPENAI_API_KEY"]
except KeyError:
    print("It seems that you have no API key installed on your system")
    the_key = input("Please enter the API Key manually:")

In [3]:
openai.api_key = the_key
client = OpenAI(api_key=the_key)

In [4]:
def user_request(topic):
    '''
    Look for information about a specific thema,
    sum up in few words and provide two external resources.
    '''
    msg = "Explain me about" + topic + "in fewer than 100 words."
    msg = msg + " Add three reliable website links that you used as references."
    completion = client.chat.completions.create(
                    model="gpt-4o-mini",
                    messages=[{"role": "user", "content": msg}]
                    )
    return completion.choices[0].message.content

In [5]:
def wrap_f (topic, go):
    '''
    Wrapper for the function user_request
    to be combined with Jupyter Widgets.
    '''
    if go and topic:
        print("*** Click the button in any moment to STOP ***")
        print(f"-> Searching about {topic}...")
        print(user_request(topic))
        print("*** CLICK THE BUTTON TO RESET ***")
    else:
        print("")

In [6]:
# Jupyter interactive widgets: creation
w_topic = widgets.Text(description='Topic:')
w_go = widgets.ToggleButton(description='WEB SEARCH/STOP', button_style='success')
out = interactive_output(wrap_f, {"topic" : w_topic, "go" : w_go})

# Jupyter interactive widgets: User Iterface
display(HBox([w_topic, w_go]))
display(out)

HBox(children=(Text(value='', description='Topic:'), ToggleButton(value=False, button_style='success', descrip…

Output()

#### Case 2: _Extracting Information From a Document_

In this further example we built a button capable of extracting the key points from a given document.
You also provide a question to be answered, then the model points out in the text the **sentence where
the information is found**. Then the user can (and should) check the correctness of the claim: the experience suggests
that the model can fail and return unexpected answers. Despite this limitation, such a tool might provide a productivity boost
in common working environments.

In [7]:
def sum_up(filename, question):
    '''
    Build a sum up of the given file (txt/docx/pdf).
    Depending on your API licence it is advised to use only small files.
    '''
    extension = filename.split(".")[1]
    complete_path = "documents" + os.sep + filename
    if extension == "docx":
        message = "ERROR: Since the package docx2txt is causing incompatibilities,"
        message = message + " reading Word files is temporarily not possible.\n"
        message = message + "Please use only .txt or .pdf"
        return message
        #text = docx2txt.process(filename)
    elif extension == "txt":
        text = open(complete_path, "r").read()
    elif extension == "pdf":
        reader = PdfReader(complete_path)
        # Read just the first page of the file
        page = reader.pages[0]
        text = page.extract_text()    
    else:
        return f"Extension {extension} not supported."
 
    print(f"Reading file {filename} from folder 'extra'...")
    msg = "After the description of your task, you will be given a text. "
    msg = "You have to sum up the text with a list of no more than 5 key points. "
    msg = msg + "Then, try to answer to the following question using only information available in the text. "
    msg = msg + "The question is:\n" + question
    msg = msg + "\nTo elaborate your answer, use only informations from the text. Do not use any external knowledge. "
    msg = msg + "If no answer is found in the text, and you are sure about it, tell me. "
    msg = msg + "In this case, do not answer to the question. \n"
    msg = msg + "Conversely, if you found an answer, tell me in which sentence in the text you found the answer. "
    msg = msg + "Write this sentence. Motivate your answer. "
    msg = msg + "The description of your task ends here. The text to analyze is the following:\n" + text
    completion = client.chat.completions.create(
                    model="gpt-4o-mini",
                    messages=[{"role": "user", "content": msg}],
                    temperature = 0.01
                    )
    return completion.choices[0].message.content

In [8]:
def wrap_su (filename, question, go):
    '''
    Wrapper for the function sum_up
    to be combined with Jupyter Widgets.
    '''
    if go and filename:
        print("*** Cick the button in any moment to STOP ***")
        print(sum_up(filename, question))
        print("*** CLICK THE BUTTON AGAIN TO RESET ***")
    else:
        print("")

In [9]:
# List of all the files contained in the extra folder
available_files = os.listdir("documents")

In [10]:
# Source: https://www.nbb.be/doc/cp/eng/2015/20150117_eu_2015_35.pdf
print(available_files)

['python_tutorial_6.txt', 'risk_regulation_sample.pdf', 'water.txt']


In [11]:
# Jupyter interactive widgets: creation
# Available local files are: water.docx, python_tutorial_6.txt, risk_regulation.pdf

w_wordfile = widgets.Dropdown(options=available_files, value="risk_regulation_sample.pdf", description='Document:')
w_question = widgets.Text(description='Question:', placeholder="Do penguins drink?", value="What is the capital requirement for spread risk?")
w_go2 = widgets.ToggleButton(description='ANALYZE TEXT', button_style='success')
out2 = interactive_output(wrap_su, {"filename" : w_wordfile, "question" : w_question, "go" : w_go2})

# Jupyter interactive widgets: User Iterface
display(VBox([w_wordfile, w_question, w_go2]))
display(out2)

VBox(children=(Dropdown(description='Document:', index=1, options=('python_tutorial_6.txt', 'risk_regulation_s…

Output()

#### Case 3: _Generating Coding Examples_

In this section we propose the topic of automated code generation imaging the following scenario.
The user needs a Python program to perform a certain simple task, but is unsure about the right syntax.
Similarly, another compatible case is when the user is learning a certain the topic but would like to **first try with some examples**,
before clarifying the theory from an appropriate source. 

In [12]:
def generate_code(task):
    '''
    Generate Python Code
    '''
    msg = "Write a Python script solving the task mentioned below. "
    msg = msg + "The code must be well commented and formatted, so that I can learn and understand. "
    msg = msg + "Give me three websites where I can learn the programming techniques that you used and more on the task itself. "
    msg = msg + "The task is:\n" + task
    completion = client.chat.completions.create(
                    model="gpt-4o-mini",
                    messages=[{"role": "user", "content": msg}],
                    temperature = 0.1
                    )
    return completion.choices[0].message.content
#---

In [13]:
def wrap_generate_code (task, go):
    '''
    Wrapper for the function sum_up
    to be combined with Jupyter Widgets.
    '''
    if go and task:
        print("*** Click the button in any moment to STOP ***")
        print(generate_code(task))
        print("*** Click the button to reset ***")
        print("*** Do not type if not RESETted! ***")
    else:
        print("")

In [14]:
# Jupyter interactive widgets: creation
w_task = widgets.Text(description='Task:', placeholder="Example with Value At Risk.", value="Example with Value At Risk.")
w_go3 = widgets.ToggleButton(description='GENERATE CODE', button_style='success')
out3 = interactive_output(wrap_generate_code, {"task" : w_task, "go" : w_go3})

# Jupyter interactive widgets: User Iterface
display(HBox([w_task, w_go3]))
display(out3)

HBox(children=(Text(value='Example with Value At Risk.', description='Task:', placeholder='Example with Value …

Output()