# Generative AI Optimization Examples
This notebook demonstrates several prompt optimization techniques using the OpenAI API. You'll need an API key to run these examples.

> ⚠️ Make sure you have installed the `openai` Python package and set your `OPENAI_API_KEY` environment variable or configure it manually in the notebook.

In [5]:
# Setup
from openai import OpenAI

client = OpenAI()
import os

# Optionally set your API key here if not using environment variables
# openai.api_key = 'your-api-key-here'

## Example 1: Prompt Compression
Reducing unnecessary instructions from the input prompt.

In [10]:
# Verbose prompt
verbose_prompt = '''
You're an intelligent and helpful assistant. Please help me answer this question in a clear, concise, and professional manner.
The question is: How can I reduce the latency of my GPT-4 application in production?
'''

response = client.chat.completions.create(model="gpt-4",
messages=[{"role": "user", "content": verbose_prompt}],
temperature=0.7)
print(response.choices[0].message.content)
print("\n--- Token Usage ---")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

To reduce the latency of your GPT-4 application in production, consider the following steps:

1. Optimize the Model: Use quantization or pruning to reduce the size of the model. This not only reduces memory usage but also speeds up model inference.

2. Use GPUs: GPUs are designed to handle massive amounts of data in parallel, which can significantly reduce the latency during model inference.

3. Optimize Batch Size: Adjusting the batch size depending on your system's specification might help in reducing latency. Be mindful that a larger batch size can maximize computational efficiency but may also increase latency.

4. Use Caching: Cache the responses for repeated queries. If certain requests are made often, caching the responses can significantly reduce latency.

5. Optimize Pre-processing and Post-processing: Ensure the data preprocessing and post-processing steps are as efficient as they can be. 

6. Use Edge Computing: If the application is latency-sensitive and the model isn’t too

In [11]:
# Compressed prompt
compressed_prompt = "How can I reduce GPT-4 latency in production?"

response = client.chat.completions.create(model="gpt-4",
messages=[{"role": "user", "content": compressed_prompt}],
temperature=0.7)
print(response.choices[0].message.content)
print("\n--- Token Usage ---")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

As of the time of writing this, GPT-4 has not been released by OpenAI. Therefore, it's hard to provide specific advice on how to reduce its latency in production. However, the following general tips might be useful for optimizing AI models like GPT-3 and potentially GPT-4:

1. Model Optimization: Use techniques like quantization, pruning, and knowledge distillation to reduce the complexity of the model. This could potentially reduce the time it takes to make predictions.

2. Hardware Acceleration: Use GPUs or other hardware accelerators, which are often designed to efficiently run AI workloads.

3. Efficient Coding: Optimize your code to reduce unnecessary computations. Make sure your implementation is as efficient as possible.

4. Use Edge Computing: If possible, deploy the model closer to the user to reduce network latency. This could mean using edge computing solutions.

5. Parallel Processing: If the model needs to make multiple predictions, try to parallelize these predictions to 

## Example 2: Output Compression
Limit the size and verbosity of the model's response.

In [12]:
# Unconstrained response
prompt = "Tell me about photosynthesis."
response = client.chat.completions.create(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}])
print(response.choices[0].message.content)
print("\n--- Token Usage ---")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy, usually from the sun, into chemical energy stored in glucose molecules. This process is essential for the survival of plants, as it is how they produce the food they need to grow and carry out their metabolic functions.

During photosynthesis, plants use carbon dioxide from the air and water from the soil to produce glucose and oxygen. This process takes place in the chloroplasts of plant cells, which contain the pigment chlorophyll that absorbs light energy. The light energy is used to split water molecules into oxygen and hydrogen ions, and the hydrogen ions are then used to convert carbon dioxide into glucose.

Overall, the equation for photosynthesis is:
6CO2 + 6H2O + light energy → C6H12O6 + 6O2

In addition to providing plants with the energy they need to grow and reproduce, photosynthesis also plays a critical role in the global carbon cycle by removing carbon dioxide from the atm

In [13]:
# Constrained response
prompt = "Tell me about photosynthesis."
response = client.chat.completions.create(model="gpt-3.5-turbo",
messages=[
    {"role": "system", "content": "Answer in 3 short bullet points."},
    {"role": "user", "content": prompt}
],
max_tokens=100)
print(response.choices[0].message.content)
print("\n--- Token Usage ---")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

- Process by which plants and other organisms convert sunlight into energy
- Involves absorption of light by chlorophyll in plant cells
- Produces oxygen as a byproduct

--- Token Usage ---
Prompt tokens: 25
Completion tokens: 36
Total tokens: 61
