# Recreating Code with AI

## Introduction


The **aim** of this project is to determine how well an AI$^{1}$ is able to **recreate** code which was written by human professionals.

This experiment was done by three students: *Mathis Jeroncic*, *Thomas Velard* and *Luka Bolješić*. The *Jožef Stefan Institute* asked us to determine how **good AI is** for **code generation**. Our task is to prepare the right prompts$^{2}$, use them to generate code and determine the quality of it.

The code we tried to reproduce was taken from the [*Valence project*](https://github.com/VALENCEML/eBOOK), a collection of jupyter notebooks$^{3}$ written with the aim of teaching **machine learning** and **other algorithms** to high school **students**. All the code is written in **Python**$^{4}$. Thanks to the amount of **libraries** and its **simplicity**, it is one of the most popular programming languages used for developing **artificial intelligence**.

You can find out more about our results [*here*](https://github.com/MathisJeroncic/Valence_with_AI/tree/main). 

## Generating the Code

First, we **chose the code** segments from *Valence*. We selected different types of algorithms, that require **different methods**, in order to see how artificial intelligence would react to **various problems**.

In order to have as similar results as possible, for ease of comparison, we decided to use the **same starting prompts**. Because further prompts depend on the quality of the code before, we could not use the same ones. However, we agreed not to give too much **detail** about the actual code in the prompts, as the aim was to see if it is possible for someone with **no prior knowledge** to generate useful code.
For example, the first prompt was just an idea of what we wanted the AI to generate, not how exactly we wanted it done. If that did not work, we could specify some libraries or other information that might be useful. If there was an error, we just told it where it was or what it said, etc.

We decided to use **five** prompts at most for each part, because if it had not worked after five prompts it most likely would not **at all**, and it also makes the comparing of results easier.

Artificial intelligences **change** rapidly, and it is hard to tell which one is the best. That is why we decided to experiment using three different AI chatbots, to get a general idea which one is the most useful in terms of code generation. *Luka Bolješić* used *ChatGPT*$^{5}$, *Thomas Velard* worked with *Bard* and *Mathis Jeroncic* used *Auto-GPT*.

## Interpretation of the Results

After generating the code, the next step was to **grade** each result. We put together this **system**:

- **5**: The code works perfectly and meets our expectations.
- **4**: A detail is missing, the output is almost perfect.
- **3**: The output is coherent, but it is not exactly what we are asking for.
- **2**: There is an output, but it is not at all what was expected.
- **1**: There is no output, or there is an error.

## Glossary

$^{1}$ **AI**: or artificial intelligence, is a program with the capability to imitate intelligent human behaviour.


$^{2}$ **Prompt**: A text describing what you want the AI to generate.


$^{3}$ **Notebook**: A Jupyter notebook is a type of document that can contain both text and code. It includes its own kernel, used to execute the code and see the results inside the file. It is mostly used with Python.


$^{4}$ **Python**: Python is a high-level, general-purpose programming language.


$^{5}$ **GPT**: Generative Pre-trained Transformers, are a type of large language model and a prominent framework for generative artificial intelligence.