# OpenAI Quickstart

# Overview  
"Large Language Model은 텍스트를 텍스트에 매핑하는 기능입니다. 입력 문자열이 주어지면 큰 언어 모델은 다음에 올 텍스트를 예측하려고합니다"(1).이 "QuickStart"노트북은 사용자에게 고급 LLM 개념, AML을 시작하기위한 핵심 패키지 요구 사항, 프롬프트 설계를위한 소프트 소개 및 다양한 사용 사례의 Severa 짧은 예를 소개합니다.

For more quickstart examples please refer to the official Azure Open AI Quickstart Documentation https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quickstart?pivots=programming-language-studio

## Table of Contents  

[Overview](#overview)  
[How to use OpenAI Service](#how-to-use-openai-service)  
[1. Creating your OpenAI Service](#1.-creating-your-openai-service)  
[2. Installation](#2.-installation)    
[3. Credentials](#3.-credentials)  

[Use Cases](#use-cases)    
[1. Summarize Text](#1.-summarize-text)  
[2. Classify Text](#2.-classify-text)  
[3. Generate New Product Names](#3.-generate-new-product-names)  
[4. Fine Tune a Classifier](#4.fine-tune-a-classifier)  
[5. Embeddings!]((#5.-embeddings!))

[References](#references)

### Getting started with Azure OpenAI Service

신규 고객은 Azure OpenAi 서비스에 [https://aka.ms/oai/access) [액세스 신청] (https://aka.ms/oai/access)해야합니다.
승인이 완료된 후 고객은 Azure Portal에 로그인하고 Azure OpenAI 서비스 리소스를 만들고 스튜디오를 통해 모델 실험을 시작할 수 있습니다.

[Great resource for getting started quickly](https://techcommunity.microsoft.com/t5/educator-developer-blog/azure-openai-is-now-generally-available/ba-p/3719177 )


### Build your first prompt  
이 짧은 연습은 간단한 작업 "요약"을 위해 OpenAI 모델에 프롬프트를 제출하기위한 기본 소개를 제공합니다.

![](images/generative-AI-models-reduced.jpg)  


**Steps**:  
1. 파이썬 환경에 OpenAI 라이브러리를 설치하십시오
2. 표준 도우미 라이브러리를로드하고 만든 OpenAI 서비스에 대한 일반적인 OpenAI 보안 자격 증명을 설정하십시오.
3. 작업에 대한 모델을 선택하십시오
4. 모델에 대한 간단한 프롬프트를 만듭니다
5. 모델 API에 요청을 제출하십시오!

### 1. Install OpenAI

In [1]:
!pip install -r ../requirements.txt
# pip install openai python-dotenv

Defaulting to user installation because normal site-packages is not writeable


### 2. Import helper libraries and instantiate credentials

In [2]:
import os
import openai
from dotenv import load_dotenv
load_dotenv()

openai.api_type = "azure"
openai.api_version = "2023-06-01-preview"

API_KEY = os.getenv("OPENAI_API_KEY","").strip()
assert API_KEY, "ERROR: Azure OpenAI Key is missing"
openai.api_key = API_KEY

RESOURCE_ENDPOINT = os.getenv("OPENAI_API_BASE","").strip()
assert RESOURCE_ENDPOINT, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in RESOURCE_ENDPOINT.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = RESOURCE_ENDPOINT

### 3. Finding the right model  
GPT-3 모델은 자연어를 이해하고 생성 할 수 있습니다.이 서비스는 각각 다른 수준의 전력과 속도를 가진 4 가지 모델 기능을 제공합니다. Davinci는 가장 유능한 모델이며 Ada는 가장 빠릅니다.다음 목록은 기능을 증가시켜 주문한 최신 버전의 GPT-3 모델을 나타냅니다 (1).
* ~~text-ada-001~~
* ~~text-babbage-001~~
* ~~text-curie-001~~
* ~~text-davinci-003~~
였으나, 지금은 다 내려가 있고, 다른 모델을 쓸 거에요.

[Azure OpenAI models](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)  
![](images/a-b-c-d-models-reduced.jpg)  



### Model Taxonomy  
Let's choose a general text GPT-3 model, using the second most powerful model (Curie)

**Model taxonomy**: {family} - {capability} - {input-type} - {identifier}  

{family}     --> text   (general text GPT-3 model)  
{capability} --> curie  (curie is second most powerful in ada-babbage-curie-davinci family)  
{input-type} --> n/a    (only specified for search models)  
{identifier} --> 001    (version 001)  

model = "text-curie-001"

In [3]:
# Select the General Purpose curie model for text
model = "gpt-35-turbo"

## 4. Prompt Design  

"LLM의 마술은 방대한 양의 텍스트 보다 예측 오류를 최소화하도록 훈련함으로써 모델이 이러한 예측에 유용한 학습 개념을 끝내는 것입니다. 예를 들어, 그들은"(1) :과 같은 개념을 학습한다는 것입니다.

* 어떻게 쓰는지
* 문법의 작동 방식
* 역설하는 방법
* 질문에 대답하는 방법
* 대화를하는 방법
* 많은 언어로 작성하는 방법
* 코딩 방법
* 등.

#### How to control a large language model  
"LLM에 대한 모든 입력 중에서 가장 영향력있는 것은 텍스트 프롬프트 (1)입니다.

Large language models can be prompted to produce output in a few ways:

Instruction: 모델에 원하는 것을 말하십시오
Completion: 모델이 원하는 것의 시작을 완료하도록 유도
Demonstration: 다음 중 하나와 함께 모델에 원하는 것을 표시하십시오.
프롬프트의 몇 가지 예
미세 조정 훈련 데이터 세트의 수백 또는 수천 가지 예제 "



#### There are three basic guidelines to creating prompts:

**Show and tell**. 지침, 예제 또는 두 가지의 조합을 통해 원하는 것을 분명히하십시오.모델이 알파벳 순서로 항목 목록을 순위에 올리거나 정서적으로 단락을 분류하려면 원하는 것임을 보여주십시오.

**Provide quality data**. 분류기를 구축하거나 모델이 패턴을 따를 경우 충분한 예가 있는지 확인하십시오.예제를 교정하십시오. 모델은 일반적으로 기본 철자 실수를 통해 보고 응답을 제공 할 수있을 정도로 똑똑하지만 의도적이며 응답에 영향을 줄 수 있다고 가정 할 수도 있습니다.

**Check your settings.** 온도 및 TOP_P 설정은 모델이 응답을 생성하는 데 결정적인 방법을 제어합니다.정답이 하나만있는 응답을 요청하는 경우 더 낮게 설정하고 싶을 것입니다.더 다양한 응답을 찾고 있다면 더 높은 응답을 원할 수도 있습니다.사람들이 이러한 설정에서 사용하는 가장 큰 실수는 그들이 "영리"또는 "창의성"컨트롤을 가정합니다.


Source: https://github.com/Azure/OpenAI/blob/main/How%20to/Completions.md

![](images/prompt_design.jpg)
image is creating your first text prompt!

### 5. Submit!

In [4]:
# Create your first prompt
system_prompt = """
너는 가게 리뷰를 보고 긍정, 부정을 판별할 수 있는 리뷰 감정사야. 각 리뷰를 보고 긍정과 부정의 정도를 나타내는 점수도 판별할 수 있어.
이제 아래 ```에 리뷰 데이터를 줄거야. 한줄당 긍정, 부정을 판별해서 각각의 점수를 알려줘. 형식은 "긍정{점수}", "부정{점수}" 형태로 알려줘.
한글로 번역해서 알려줘.
"""

text_prompt = """
```
1	맛있게 잘먹었어요 탕수육이 고기가 엄청 두툼하고 튀김옷이 얇아서 맛있어요 마라탕도 즐겨먹진 않는데 먹을수록 더 당기네요
2	탕수육전문점에게 항상 아쉬웠던게 짬뽕이 없다는 거였는데, 여긴 짬뽕 팔아서 너무 좋더라구요. 첨엔 면이 없어서 전화하려다 자세히보니 짬뽕이 아니라 짬뽕탕 였어요 ㅎㅎ 면사리 넣어 냄비에 한번 더 끊여 먹으니 맛있더라구요. 근데 오늘은 짬뽕보다 저번에 먹었던 만두가 너무 맛있길래 만두땜에 재주문 했습니다. 만두 꼭 시켜 드세요. 맛있어요. 멘보샤도 작긴 하지만 새우향 싹 올라오면서 가격에 비해 엄청 혜자메뉴 입니다. 탕수육 전문점이니 탕수육은 기본으로 맛있어요 젤 작은거 시켰는데, 꽉찬 고기 고려해서 보니 정말 저렴하게 파시는듯. 메뉴 시키는거마다 맛있고 가성비가 좋아요.
3	사진 못 찍어서 먹다가 찍었네용 ㅜ 잘 먹었습니다...!
4	김피탕이 생각나 시켜먹었는데 넘모맛있어요 고기가 토실토실해요덕분에 제 배도 토실토실해요
```
"""

In [5]:
# Simple API Call
openai.ChatCompletion.create(
    engine=model,
    max_tokens=8191,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": text_prompt}
    ]
)

InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.

### Repeat the same call, how do the results compare?

In [None]:
openai.ChatCompletion.create(
    engine=model,
    messages=[
        {"role": "user", "content": text_prompt},
    ],
    max_tokens=60
)

<OpenAIObject chat.completion id=chatcmpl-7bT3OiYIJkjK891VXQkb6j2OstkRG at 0x7f8de8435a90> JSON: {
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "length",
      "index": 0,
      "message": {
        "content": "1. The sweet and sour pork was delicious. The meat was thick and the coating was thin and crispy. I don't usually enjoy spicy hot pot, but I kept craving it more and more as I ate.\n\n2. I always felt disappointed that the pork sweet and sour restaurants didn't offer spicy",
        "role": "assistant"
      }
    }
  ],
  "created": 1689164174,
  "id": "chatcmpl-7bT3OiYIJkjK891VXQkb6j2

## Summarize Text  
#### Challenge  
텍스트 구절의 끝에 'tl; dr :'을 추가하여 텍스트를 요약하십시오.모델이 추가 지침없이 여러 작업을 수행하는 방법을 이해하는 방법에 주목하십시오.TL보다 더 많은 설명 프롬프트를 실험하여 모델의 동작을 수정하고받은 요약을 사용자 정의 할 수 있습니다 (3).

최근의 연구는 많은 NLP 작업과 벤치 마크에 대한 상당한 이익을 보여 주었고, 큰 텍스트 코퍼스에서 사전 훈련을 한 후 특정 작업에 미세 조정이 이어졌습니다. 아키텍처에서 일반적으로 작업에 대한 비도시적이지만, 이 방법은 여전히 수만 또는 수만 개의 예제의 작업 별 미세 조정 데이터 세트가 필요합니다. 대조적으로, 인간은 일반적으로 몇 가지 예 또는 간단한 지침에서 새로운 언어 작업을 수행 할 수 있습니다. 현재 NLP 시스템은 여전히 여전히 어려움을 겪고 있습니다. 여기서 우리는 언어 모델을 스케일링하면 작업에도 적합하지 않은 소수의 성능이 크게 향상되며 때로는 최첨단 미세 조정 접근법과 경쟁력에 도달합니다.

Tl;dr

# Exercises for several use cases  
1. Summarize Text  
2. Classify Text  
3. Generate New Product Names
4. Embeddings
5. Fine tune a classifier

In [None]:
prompt = "Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.\n\nTl;dr"

model = "first"

In [None]:
#Setting a few additional, typical parameters during API Call
response = openai.ChatCompletion.create(
  engine=model,
  messages=[
    {"role": "user", "content": prompt},
  ],
  temperature=0.7,
  max_tokens=60,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(response)

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "length",
      "index": 0,
      "message": {
        "content": "Pre-training language models on large text corpora followed by fine-tuning on specific tasks has shown substantial gains in NLP tasks. However, humans can perform new language tasks with few examples or simple instructions, which current NLP systems struggle to do. Scaling up language models can improve task-agnostic few",
        "role": "assistant"
      }
    }
  ],
  "created": 1689164175,
  "id": "chatcmpl-7bT3PSmq5fSao0KcVw6YGT6LHTDJ9",
  "model": "gpt-35-turbo",
  "object": 

In [None]:
response["choices"][0]["message"]["content"]

'Pre-training language models on large text corpora followed by fine-tuning on specific tasks has shown substantial gains in NLP tasks. However, humans can perform new language tasks with few examples or simple instructions, which current NLP systems struggle to do. Scaling up language models can improve task-agnostic few'

## Classify Text  
#### Challenge  
추론 시간에 제공된 범주로 항목을 분류하십시오. 다음 예에서는 프롬프트에서 분류 할 범주와 텍스트를 모두 제공합니다 (*Playground_reference).

고객 문의 : 안녕하세요, 최근 노트북 키보드의 열쇠 중 하나가 최근에 파산되었으며 교체가 필요합니다.

분류 카테고리 :

In [None]:
prompt = "Classify the following inquiry into one of the following: categories: [Pricing, Hardware Support, Software Support]\n\ninquiry: Hello, one of the keys on my laptop keyboard broke recently and I'll need a replacement:\n\nClassified category:"

model = "first"

In [None]:
response = openai.ChatCompletion.create(
  engine=model,
  messages=[
    {"role": "user", "content": prompt},
  ],
  temperature=0,
  max_tokens=60,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(response["choices"][0]["message"]["content"])

Hardware Support


## Generate New Product Names
#### Challenge
예제 단어에서 제품 이름을 만듭니다.여기에는 이름을 생성 할 제품에 대한 프롬프트 정보가 포함되어 있습니다.우리는 또한 우리가 받고자하는 패턴을 보여주는 비슷한 예를 제공합니다.또한 임의성과보다 혁신적인 반응을 높이기 위해 온도 값을 높게 설정했습니다.

제품 설명 : 홈 밀크 쉐이크 제조업체
종자 단어 : 빠르고 건강하며 소형.
제품 이름 : Homeshaker, Fit Shaker, Quickshake, Shake Maker

제품 설명 : 발 크기에 맞는 신발 한 쌍.
종자 단어 : 적응성, 적합, 옴니 피트.

In [None]:
prompt = "Product description: A home milkshake maker\nSeed words: fast, healthy, compact.\nProduct names: HomeShaker, Fit Shaker, QuickShake, Shake Maker\n\nProduct description: A pair of shoes that can fit any foot size.\nSeed words: adaptable, fit, omni-fit."
model = "gpt-35-turbo-0613"

In [None]:
response = openai.ChatCompletion.create(
  engine=model,
  messages=[
    {"role": "user", "content": prompt},
  ],
  temperature=0.8,
  max_tokens=60,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(response["choices"][0]["message"]["content"])

InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.

## Embeddings!  
이 섹션에서는 임베딩을 검색하고 단어, 문장 및 문서 사이의 유사성을 찾는 방법을 보여줍니다.

### Model Taxonomy - Choosing a similarity model
~~가장 강력한 모델 (Davinci)을 사용하여 유사성 모델을 선택합시다.~~
그러고 싶었으나, 기존 모델이 Legacy 상태로 된 결과 다른 걸 선택합니다.

**Model taxonomy**: {family} - {capability} - {input-type} - {identifier}  

{family}     --> text-similarity  (general text GPT-3 model)  
{capability} --> davincie         (curie is second most powerful in ada-babbage-curie-davinci family)  
{input-type} --> n/a              (only specified for search models)  
{identifier} --> 001              (version 001)  

## ~~model = 'text-similarity-davinci-001'~~ --> 응 안돼!

In [None]:
# Ensure core libriares are installed
!pip install plotly scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting scipy>=1.5.0 (from scikit-learn)
  Downloading scipy-1.11.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.5/36.5 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, scipy, scikit-learn
Successfully installed scikit-learn-1.3.0 scipy-1.11.1 threadpoolctl-3.1.0


In [None]:
# Dependencies for embeddings_utils
!pip install matplotlib
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Using cached pandas-2.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m502.3/502.3 kB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tzdata>=2022.1 (from pandas)
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m341.8/341.8 kB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.0.3 pytz-2023.3 tzdata-2023.3


In [None]:
from openai.embeddings_utils import get_embedding, cosine_similarity

In [None]:
text = 'the quick brown fox jumped over the lazy dog'

model = 'text-embedding-ada-002'

In [None]:
openai.Embedding.create(
    input=text, engine=model
)["data"][0]["embedding"]

[-0.004474656656384468,
 0.00978652760386467,
 -0.014904950745403767,
 -0.006424985360354185,
 -0.01135313231498003,
 0.015513833612203598,
 -0.02372107096016407,
 -0.016414472833275795,
 -0.0158182755112648,
 -0.029632311314344406,
 0.021298224106431007,
 0.021095262840390205,
 0.018570933490991592,
 0.004170214757323265,
 -0.0007155169150792062,
 -0.007579326163977385,
 0.02521790750324726,
 -0.004214612767100334,
 0.011175542138516903,
 -0.008587788790464401,
 -0.009513798169791698,
 0.021577294915914536,
 -0.005993693135678768,
 -0.008257976733148098,
 0.006041261833161116,
 0.013040246441960335,
 0.007439790293574333,
 -0.0035169341135770082,
 -0.008955655619502068,
 0.0011939817341044545,
 0.00666600139811635,
 0.0038657733239233494,
 -0.039272960275411606,
 -0.002559211803600192,
 -0.012761174701154232,
 -0.0217422004789114,
 -0.0037072100676596165,
 -0.010458835400640965,
 0.02597901225090027,
 -0.0456916019320488,
 0.009399632923305035,
 0.015653369948267937,
 -0.0226174704730

In [None]:
# compare several words
automobile_embedding    = openai.Embedding.create(input='automobile', engine=model)["data"][0]["embedding"]
vehicle_embedding       = openai.Embedding.create(input='vehicle', engine=model)["data"][0]["embedding"]
dinosaur_embedding      = openai.Embedding.create(input='dinosaur', engine=model)["data"][0]["embedding"]
stick_embedding         = openai.Embedding.create(input='stick', engine=model)["data"][0]["embedding"]

print(cosine_similarity(automobile_embedding, vehicle_embedding))
print(cosine_similarity(automobile_embedding, dinosaur_embedding))
print(cosine_similarity(automobile_embedding, stick_embedding))

0.9161762609368851
0.8334429695093547
0.7820358471285385


## Comparing article from cnn daily news dataset
source: https://huggingface.co/datasets/cnn_dailymail


In [None]:
import pandas as pd
cnn_daily_articles = ['BREMEN, Germany -- Carlos Alberto, who scored in FC Porto\'s Champions League final victory against Monaco in 2004, has joined Bundesliga club Werder Bremen for a club record fee of 7.8 million euros ($10.7 million). Carlos Alberto enjoyed success at FC Porto under Jose Mourinho. "I\'m here to win titles with Werder," the 22-year-old said after his first training session with his new club. "I like Bremen and would only have wanted to come here." Carlos Alberto started his career with Fluminense, and helped them to lift the Campeonato Carioca in 2002. In January 2004 he moved on to FC Porto, who were coached by José Mourinho, and the club won the Portuguese title as well as the Champions League. Early in 2005, he moved to Corinthians, where he impressed as they won the Brasileirão,but in 2006 Corinthians had a poor season and Carlos Alberto found himself at odds with manager, Emerson Leão. Their poor relationship came to a climax at a Copa Sul-Americana game against Club Atlético Lanús, and Carlos Alberto declared that he would not play for Corinthians again while Leão remained as manager. Since January this year he has been on loan with his first club Fluminense. Bundesliga champions VfB Stuttgart said on Sunday that they would sign a loan agreement with Real Zaragoza on Monday for Ewerthon, the third top Brazilian player to join the German league in three days. A VfB spokesman said Ewerthon, who played in the Bundesliga for Borussia Dortmund from 2001 to 2005, was expected to join the club for their pre-season training in Austria on Monday. On Friday, Ailton returned to Germany where he was the league\'s top scorer in 2004, signing a one-year deal with Duisburg on a transfer from Red Star Belgrade. E-mail to a friend .',
                        '(CNN) -- Football superstar, celebrity, fashion icon, multimillion-dollar heartthrob. Now, David Beckham is headed for the Hollywood Hills as he takes his game to U.S. Major League Soccer. CNN looks at how Bekham fulfilled his dream of playing for Manchester United, and his time playing for England. The world\'s famous footballer has begun a five-year contract with the Los Angeles Galaxy team, and on Friday Beckham will meet the press and reveal his new shirt number. This week, we take an in depth look at the life and times of Beckham, as CNN\'s very own "Becks," Becky Anderson, sets out to examine what makes the man tick -- as footballer, fashion icon and global phenomenon. It\'s a long way from the streets of east London to the Hollywood Hills and Becky charts Beckham\'s incredible rise to football stardom, a journey that has seen his skills grace the greatest stages in world soccer. She goes in pursuit of the current hottest property on the sports/celebrity circuit in the U.S. and along the way explores exactly what\'s behind the man with the golden boot. CNN will look back at the life of Beckham, the wonderfully talented youngster who fulfilled his dream of playing for Manchester United, his marriage to pop star Victoria, and the trials and tribulations of playing for England. We\'ll look at the highs (scoring against Greece), the lows (being sent off during the World Cup), the Man. U departure for the Galacticos of Madrid -- and now the Home Depot stadium in L.A. We\'ll ask how Beckham and his family will adapt to life in Los Angeles -- the people, the places to see and be seen and the celebrity endorsement. Beckham is no stranger to exposure. He has teamed with Reggie Bush in an Adidas commercial, is the face of Motorola, is the face on a PlayStation game and doesn\'t need fashion tips as he has his own international clothing line. But what does the star couple need to do to become an accepted part of Tinseltown\'s glitterati? The road to major league football in the U.S.A. is a well-worn route for some of the world\'s greatest players. We talk to some of the former greats who came before him and examine what impact these overseas stars had on U.S. soccer and look at what is different now. We also get a rare glimpse inside the David Beckham academy in L.A, find out what drives the kids and who are their heroes. The perception that in the U.S.A. soccer is a "game for girls" after the teenage years is changing. More and more young kids are choosing the European game over the traditional U.S. sports. E-mail to a friend .',
                        'LOS ANGELES, California (CNN) -- Youssif, the 5-year-old burned Iraqi boy, rounded the corner at Universal Studios when suddenly the little boy hero met his favorite superhero. Youssif has always been a huge Spider-Man fan. Meeting him was "my favorite thing," he said. Spider-Man was right smack dab in front of him, riding a four-wheeler amid a convoy of other superheroes. The legendary climber of buildings and fighter of evil dismounted, walked over to Youssif and introduced himself. Spidey then gave the boy from a far-away land a gentle hug, embracing him in his iconic blue and red tights. He showed Youssif a few tricks, like how to shoot a web from his wrist. Only this time, no web was spun. "All right Youssif!" Spider-Man said after the boy mimicked his wrist movement. Other superheroes crowded around to get a closer look. Even the Green Goblin stopped his villainous ways to tell the boy hi. Youssif remained unfazed. He didn\'t take a liking to Spider-Man\'s nemesis. Spidey was just too cool. "It was my favorite thing," the boy said later. "I want to see him again." He then felt compelled to add: "I know it\'s not the real Spider-Man." This was the day of dreams when the boy\'s nightmares were, at least temporarily, forgotten. He met SpongeBob, Lassie and a 3-year-old orangutan named Archie. The hairy, brownish-red primate took to the boy, grabbing his hand and holding it. Even when Youssif pulled away, Archie would inch his hand back toward the boy\'s and then snatch it. See Youssif enjoy being a boy again » . The boy giggled inside a play area where sponge-like balls shot out of toy guns. It was a far different artillery than what he was used to seeing in central Baghdad, as recently as a week ago. He squealed with delight and raced around the room collecting as many balls as he could. He rode a tram through the back stages at Universal Studios. At one point, the car shook. Fire and smoke filled the air, debris cascaded down and a big rig skidded toward the vehicle. The boy and his family survived the pretend earthquake unscathed. "Even I was scared," the dad said. "Well, I wasn\'t," Youssif replied. The father and mother grinned from ear to ear throughout the day. Youssif pushed his 14-month-old sister, Ayaa, in a stroller. "Did you even need to ask us if we were interested in coming here?" Youssif\'s father said in amazement. "Other than my wedding day, this is the happiest day of my life," he said. Just a day earlier, the mother and father talked about their journey out of Iraq and to the United States. They also discussed that day nine months ago when masked men grabbed their son outside the family home, doused him in gas and set him on fire. His mother heard her boy screaming from inside. The father sought help for his boy across Baghdad, but no one listened. He remembers his son\'s two months of hospitalization. The doctors didn\'t use anesthetics. He could hear his boy\'s piercing screams from the other side of the hospital. Watch Youssif meet his doctor and play with his little sister » . The father knew that speaking to CNN would put his family\'s lives in jeopardy. The possibility of being killed was better than seeing his son suffer, he said. "Anything for Youssif," he said. "We had to do it." They described a life of utter chaos in Baghdad. Neighbors had recently given birth to a baby girl. Shortly afterward, the father was kidnapped and killed. Then, there was the time when some girls wore tanktops and jeans. They were snatched off the street by gunmen. The stories can be even more gruesome. The couple said they had heard reports that a young girl was kidnapped and beheaded --and her killers sewed a dog\'s head on the corpse and delivered it to her family\'s doorstep. "These are just some of the stories," said Youssif\'s mother, Zainab. Under Saddam Hussein, there was more security and stability, they said. There was running water and electricity most of the time. But still life was tough under the dictator, like the time when Zainab\'s uncle disappeared and was never heard from again after he read a "religious book," she said. Sitting in the parking lot of a Target in suburban Los Angeles, Youssif\'s father watched as husbands and wives, boyfriends and girlfriends, parents and their children, came and went. Some held hands. Others smiled and laughed. "Iraq finished," he said in what few English words he knows. He elaborated in Arabic: His homeland won\'t be enjoying such freedoms anytime soon. It\'s just not possible. Too much violence. Too many killings. His two children have only seen war. But this week, the family has seen a much different side of America -- an outpouring of generosity and a peaceful nation at home. "It\'s been a dream," the father said. He used to do a lot of volunteer work back in Baghdad. "Maybe that\'s why I\'m being helped now," the father said. At Universal Studios, he looked out across the valley below. The sun glistened off treetops and buildings. It was a picturesque sight fit for a Hollywood movie. "Good America, good America," he said in English. E-mail to a friend . CNN\'s Arwa Damon contributed to this report.'
]

cnn_daily_article_highlights = ['Werder Bremen pay a club record $10.7 million for Carlos Alberto .\nThe Brazilian midfielder won the Champions League with FC Porto in 2004 .\nSince January he has been on loan with his first club, Fluminense .',
                                'Beckham has agreed to a five-year contract with Los Angeles Galaxy .\nNew contract took effect July 1, 2007 .\nFormer English captain to meet press, unveil new shirt number Friday .\nCNN to look at Beckham as footballer, fashion icon and global phenomenon .',
                                'Boy on meeting Spider-Man: "It was my favorite thing"\nYoussif also met SpongeBob, Lassie and an orangutan at Universal Studios .\nDad: "Other than my wedding day, this is the happiest day of my life"' 
]

cnn_df = pd.DataFrame({"articles":cnn_daily_articles, "highligths":cnn_daily_article_highlights})

cnn_df.head()                      

Unnamed: 0,articles,highligths
0,"BREMEN, Germany -- Carlos Alberto, who scored ...",Werder Bremen pay a club record $10.7 million ...
1,"(CNN) -- Football superstar, celebrity, fashio...",Beckham has agreed to a five-year contract wit...
2,"LOS ANGELES, California (CNN) -- Youssif, the ...","Boy on meeting Spider-Man: ""It was my favorite..."


In [None]:
article1_embedding    = openai.Embedding.create(input=cnn_df.articles.iloc[0], engine=model)["data"][0]["embedding"]
article2_embedding    = openai.Embedding.create(input=cnn_df.articles.iloc[1], engine=model)["data"][0]["embedding"]
article3_embedding    = openai.Embedding.create(input=cnn_df.articles.iloc[2], engine=model)["data"][0]["embedding"]

highligth1_embedding  = openai.Embedding.create(input=cnn_df.highligths.iloc[0], engine=model)["data"][0]["embedding"]
highligth2_embedding  = openai.Embedding.create(input=cnn_df.highligths.iloc[1], engine=model)["data"][0]["embedding"]
highligth3_embedding  = openai.Embedding.create(input=cnn_df.highligths.iloc[2], engine=model)["data"][0]["embedding"]

print(cosine_similarity(article1_embedding, article2_embedding))
print(cosine_similarity(article1_embedding, article3_embedding))

print(cosine_similarity(highligth1_embedding, highligth3_embedding))
print(cosine_similarity(article1_embedding, highligth3_embedding))


0.7621254342360414
0.7103234824922888
0.6754791700917779
0.6913856161284869


# References  
-Azure Reference Documentation  
-Azure OpenAI GitHub Repo
-cookbooks  
-OpenAI website  

1 - [Openai Cookbook](https://github.com/openai/openai-cookbook)  
2 - [Azure Documentation - Azure Open AI Models](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)  
3 - [OpenAI Studio Examples](https://oai.azure.com/portal)  
4 - [[PUBLIC] Best practices for fine-tuning GPT-3 to classify text](https://docs.google.com/document/d/1rqj7dkuvl7Byd5KQPUJRxc19BJt8wo0yHNwK84KfU3Q/edit#)

# For More Help  
[OpenAI Commercialization Team](AzureOpenAITeam@microsoft.com)  
AI Specialized CSAs [aka.ms/airangers](aka.ms/airangers)

# Contributors
* Brandon Cowen
* Ashish Chauhun
* Louis Li  
