# What is Generative AI

Generative AI is an exciting field of artificial intelligence that opens the door to creating new and original content, spanning from written text to stunning visuals and even computer-generated music. It showcases the innovative side of AI by going beyond simple analytical tasks to engage in creative processes.

## Technical Terms Explained:
* **Text Generation:** This involves making computers write text that makes sense and is relevant to the topic, akin to an automatic storyteller.

* **Image Generation:** This allows computers to make new pictures or change existing ones, like a digital artist using a virtual paintbrush.

* **Code Generation:** This is Gen AI for programming, where the computer helps write new code.

* **Audio Generation:** Computers can also create sounds or music, a bit like a robot composer coming up with its own tunes.

* **Chat GPT:** A language model developed by OpenAI that can generate responses similar to those a human would give in a conversation by predicting the next word in a sequence based on context.

* **DALL·E:** An AI program by OpenAI that produces images from textual descriptions, mimicking creativity in visual art.

* **GitHub Copilot:** A coding assistant tool that suggests code snippets and completes code lines to help developers write more efficiently and with fewer errors.

* **Contextual Suggestions:** Recommendations provided by AI tools, like Copilot, which are relevant to the current task or context within which a user is working.

The applications of Generative AI span a gamut of exciting fields, broadening creativity and innovation in **content creation, product design, scientific inquiry, data enhancement, and personalized experiences**. The power of Generative AI lies in its ability to imagine and refine with speed, offering solutions and opening doors to future inventions.

## Resource Links
[Millions of New Materials Discovered with Deep Learning](https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/) - Google DeepMind Research

[Reinventing the wheel? “FelGAN” inspires new rim designs with AI](https://www.audi-mediacenter.com/en/press-releases/reinventing-the-wheel-felgan-inspires-new-rim-designs-with-ai-15097) - Audi

[May the force of text data analysis be with you: Unleashing the power of generative AI for social psychology research](https://www.sciencedirect.com/science/article/pii/S2949882123000063) - Salah, Halbusi, and Abdelfattah (Computers in Human Behavior: Artificial Humans)

For a deeper dive into synthetic data techniques, check out Udacity's course on [Small Datasets in Machine Learning](https://www.udacity.com/course/small-data--cd12528).

The AI and machine learning timeline is a journey of technological breakthroughs starting with early advances like the perceptron in the 1950s, and moving through various challenges and innovations that have led to recent breakthroughs in generative AI. This timeline shows us the perseverance and ingenuity involved in the evolution of AI, highlighting how each decade built upon the last to reach today's exciting capabilities.

<img src="img/img_00.png">

## Technical terms explained:

* **Perceptron:** An early type of neural network component that can decide whether or not an input, represented by numerical values, belongs to a specific class.

* **Neural Networks:** Computer systems modeled after the human brain that can learn from data by adjusting the connections between artificial neurons.

* **Back Propagation:** A method used in artificial neural networks to improve the model by adjusting the weights by calculating the gradient of the loss function.

* **Statistical Machine Learning:** A method of data analysis that automates analytical model building using algorithms that learn from data without being explicitly programmed.

* **Deep Learning:** A subset of machine learning composed of algorithms that permit software to train itself to perform tasks by exposing multilayered neural networks to vast amounts of data.

* **Generative Adversarial Networks (GANs):** A class of machine learning models where two networks, a generator and a discriminator, are trained simultaneously in a zero-sum game framework.

* **Transformer:** A type of deep learning model that handles sequential data and is particularly noted for its high performance in natural language processing tasks.

# Exercise: Familiarize Yourself with a Commercial Large Language Model (LLM)

Large Language Models (LLMs) are revolutionizing the AI domain with their ability to parse and generate human language. From content creation to personal assistants, their applications are vast and varied.

In this exercise, you will familiarize yourself with a commercial LLM. You will explore its capabilities and think about how you could use it in your own work or personal projects.

First, choose one of many commercial LLMs with a chat interface. Some include:

* [ChatGPT](https://chat.openai.com/) from OpenAI
* [Bing](https://www.bing.com/) from Microsoft
* [Gemini](https://gemini.google.com/) from Google

Second, create an account with the LLM provider if you haven't already. You may need to provide a credit card number, but you can use the free trial period to explore the LLM's capabilities.

# Training Generative AI Models

The exciting world of training generative AI models is about teaching computers to create new content, like text or images, by learning from huge datasets. This training helps AI to understand and recreate the complex patterns found in human language and visual arts. The process is intricate but immensely rewarding, leading to AI that can generate amazingly realistic outputs.

## Technical Terms Explained:

* **Large Language Models (LLMs)**: These are AI models specifically designed to understand and generate human language by being trained on a vast amount of text data.

* **Variational Autoencoders (VAEs)**: A type of AI model that can be used to create new images. It has two main parts: the encoder reduces data to a simpler form, and the decoder expands it back to generate new content.

* **Latent Space**: A compressed representation of data that the autoencoder creates in a simpler, smaller form, which captures the most important features needed to reconstruct or generate new data.

* **Parameters**: Parameters are the variables that the model learns during training. They are internal to the model and are adjusted through the learning process. In the context of neural networks, parameters typically include weights and biases.

* **Weights**: Weights are coefficients for the input data. They are used in calculations to determine the importance or influence of input variables on the model's output. In a neural network, each connection between neurons has an associated weight.

* **Biases**: Biases are additional constants attached to neurons and are added to the weighted input before the activation function is applied. Biases ensure that even when all the inputs are zero, there can still be a non-zero output.

* **Hyperparameters**: Hyperparameters, unlike parameters, are not learned from the data. They are more like settings or configurations for the learning process. They are set prior to the training process and remain constant during training. They are external to the model and are used to control the learning process.

Generation algorithms are incredible tools that allow AI to create text and images that seem amazingly human-like. By understanding and applying these smart algorithms, AI can generate new content by building upon what it knows, just like filling in missing puzzle pieces.

## Technical Terms Explained:

* **Autoregressive text generation**: Autoregressive text generation is like a game where the computer guesses the next word in a sentence based on the words that came before it. It keeps doing this to make full sentences.

* **Latent space decoding**: Imagine if you had a map of all the possible images you could create, with each point on the map being a different image. Latent space decoding is like picking a point on that map and bringing the image at that point to life.

* **Diffusion models**: Diffusion models start with a picture that's full of random dots like TV static, and then they slowly clean it up, adding bits of the actual picture until it looks just like a real photo or painting.

# Exercise: Generating one token at a time

Generative AI models can create text by predicting one token at a time, with each token representing a piece of a word or sometimes a whole word. This process involves using probabilities to determine the most likely next piece of text, which helps the model construct coherent sentences.

In this exercise, we will get to understand how an LLM generates text--one token at a time, using the previous tokens to predict the following ones.


## Step 1. Load a tokenizer and a model

First we load a tokenizer and a model from HuggingFace's transformers library. A tokenizer is a function that splits a string into a list of numbers that the model can understand.

In this exercise, all the code will be written for you. All you need to do is follow along!

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# To load a pretrained model and a tokenizer using HuggingFace, we only need two lines of code!
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

# We create a partial sentence and tokenize it.
text = "Udacity is the best place to learn about generative"
inputs = tokenizer(text, return_tensors="pt")

# Show the tokens as numbers, i.e. "input_ids"
inputs["input_ids"]

tensor([[  52,   67, 4355,  318,  262, 1266, 1295,  284, 2193,  546, 1152,  876]])

## Step 2. Examine the tokenization

Let's explore what these tokens mean!

In [2]:
# Show how the sentence is tokenized
import pandas as pd


def show_tokenization(inputs):
    return pd.DataFrame(
        [(id, tokenizer.decode(id)) for id in inputs["input_ids"][0]],
        columns=["id", "token"],
    )


show_tokenization(inputs)

Unnamed: 0,id,token
0,tensor(52),U
1,tensor(67),d
2,tensor(4355),acity
3,tensor(318),is
4,tensor(262),the
5,tensor(1266),best
6,tensor(1295),place
7,tensor(284),to
8,tensor(2193),learn
9,tensor(546),about


### Subword tokenization

The interesting thing is that tokens in this case are neither just letters nor just words. Sometimes shorter words are represented by a single token, but other times a single token represents a part of a word, or even a single letter. This is called subword tokenization.

## Step 2. Calculate the probability of the next token

Now let's use PyTorch to calculate the probability of the next token given the previous ones.

In [3]:
# Calculate the probabilities for the next token for all possible choices. We show the
# top 5 choices and the corresponding words or subwords for these tokens.

import torch

with torch.no_grad():
    logits = model(**inputs).logits[:, -1, :]
    probabilities = torch.nn.functional.softmax(logits[0], dim=-1)


def show_next_token_choices(probabilities, top_n=5):
    return pd.DataFrame(
        [
            (id, tokenizer.decode(id), p.item())
            for id, p in enumerate(probabilities)
            if p.item()
        ],
        columns=["id", "token", "p"],
    ).sort_values("p", ascending=False)[:top_n]


show_next_token_choices(probabilities)

Unnamed: 0,id,token,p
8300,8300,programming,0.157589
4673,4673,learning,0.148414
4981,4981,models,0.048506
17219,17219,biology,0.046481
16113,16113,algorithms,0.027796


Interesting! The model thinks that the most likely next word is "programming", followed up closely by "learning".

In [4]:
# Obtain the token id for the most probable next token
next_token_id = torch.argmax(probabilities).item()

print(f"Next token id: {next_token_id}")
print(f"Next token: {tokenizer.decode(next_token_id)}")

Next token id: 8300
Next token:  programming


In [5]:
# We append the most likely token to the text.
text = text + tokenizer.decode(8300)
text

'Udacity is the best place to learn about generative programming'

## Step 3. Generate some more tokens

The following cell will take `text`, show the most probable tokens to follow, and append the most likely token to text. Run the cell over and over to see it in action!

In [6]:
# Press ctrl + enter to run this cell again and again to see how the text is generated.

from IPython.display import Markdown, display

# Show the text
print(text)

# Convert to tokens
inputs = tokenizer(text, return_tensors="pt")

# Calculate the probabilities for the next token and show the top 5 choices
with torch.no_grad():
    logits = model(**inputs).logits[:, -1, :]
    probabilities = torch.nn.functional.softmax(logits[0], dim=-1)

display(Markdown("**Next token probabilities:**"))
display(show_next_token_choices(probabilities))

# Choose the most likely token id and add it to the text
next_token_id = torch.argmax(probabilities).item()
text = text + tokenizer.decode(next_token_id)

Udacity is the best place to learn about generative programming


**Next token probabilities:**

Unnamed: 0,id,token,p
13,13,.,0.352227
11,11,",",0.135983
290,290,and,0.109372
287,287,in,0.069529
8950,8950,languages,0.058288


## Step 4. Use the `generate` method

In [7]:
from IPython.display import Markdown, display

# Start with some text and tokenize it
text = "Once upon a time, generative models"
inputs = tokenizer(text, return_tensors="pt")

# Use the `generate` method to generate lots of text
output = model.generate(**inputs, max_length=100, pad_token_id=tokenizer.eos_token_id)

# Show the generated text
display(Markdown(tokenizer.decode(output[0])))

Once upon a time, generative models of the human brain were used to study the neural correlates of cognitive function. In the present study, we used a novel model of the human brain to investigate the neural correlates of cognitive function. We used a novel model of the human brain to investigate the neural correlates of cognitive function. We used a novel model of the human brain to investigate the neural correlates of cognitive function. We used a novel model of the human brain to investigate the neural correlates of cognitive function.

### That's interesting...

You'll notice that GPT-2 is not nearly as sophisticated as later models like GPT-4, which you may have experience using. It often repeats itself and doesn't always make much sense. But it's still pretty impressive that it can generate text that looks like English.

## Congrats for completing the exercise! 🎉

Give yourself a hand. And please take a break if you need to. We'll be here when you're refreshed and ready to learn more!

# More Generative AI Architectures

There are various Generative AI Architectures for creating new content by mimicking patterns. These architectures, like GANs, RNNs, and Transformers, excel at producing novel images, text, and sounds by understanding and repurposing what they've learned. They enable us to push the boundaries of creativity and innovation, opening up a world of new possibilities.

## Technical Terms Explained:

* **Generative Adversarial Networks (GANs)**: A system where two neural networks, one to generate data and one to judge it, work against each other. This competition helps improve the quality of the generated results.

* **Recurrent Neural Networks (RNNs)**: A network that's really good at handling sequences, like sentences or melodies, because it processes one piece at a time and remembers what it saw before.

* **Transformer-based models**: A more advanced type that looks at whole sequences at once, not one piece at a time, making it faster and smarter at tasks like writing sentences or translating languages.

* **Sequential Data**: Data that is connected in a specific order, like words in a sentence or steps in a dance routine.

# Challengs in Generative AI

Generative AI presents exciting opportunities but also involves some challenges that need thoughtful consideration. While these technologies show great promise in creativity and efficiency, it's important to address issues such as misleading information, job displacement, the originality of art, and environmental impacts.

## Technical Terms Explained:

* **Deepfakes**: Highly realistic fake videos or images created by AI, which can make it seem like people are saying or doing things they never actually did.

* **Automation**: The use of technology to perform tasks without human intervention, which can increase efficiency but also may replace jobs done by people.

* **Copyright Issues**: Legal problems that arise when someone uses work without permission, potentially impacting the original creator's rights.

* **Carbon Footprint**: The total amount of greenhouse gases produced directly or indirectly by activities or entities, like running large-scale AI models.