<a href="https://colab.research.google.com/github/aianytyme/gpt3-for-capital-markets/blob/main/gpt3_simple_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-Tuning GPT-3 For Capital Markets**

References:

1.   [Fine-tuning GPT3 model for github repo data](https://www.youtube.com/watch?v=Cf8m3Bflfuc)
2.   [Analyzing SEC filings with Transformers for fun and profit](https://www.youtube.com/watch?v=SU1L6f0N6iw&list=PLcSRBAICQBoo0k7FR6MPAJbSWGx0WPyOI)





# Trial Playground

In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([(1,2,3), (4,5,6)])
c = np.array([[1], [2], [3]])

print ('a: dim:{}, shape:{}'.format(a.ndim, a.shape))
print ('b: dim:{}, shape:{}'.format(b.ndim, b.shape))
print ('c: dim:{}, shape:{}'.format(c.ndim, c.shape))

a: dim:1, shape:(3,)
b: dim:2, shape:(2, 3)
c: dim:2, shape:(3, 1)


# Setup

**Install Libraries**

In [1]:
!pip install PyGithub
!pip install python-dotenv
!pip install --upgrade jsonlines
!pip install --upgrade openai
!pip install wandb

Collecting PyGithub
  Downloading PyGithub-1.55-py3-none-any.whl (291 kB)
[K     |████████████████████████████████| 291 kB 3.9 MB/s 
[?25hCollecting deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting pynacl>=1.4.0
  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
[K     |████████████████████████████████| 856 kB 35.6 MB/s 
Collecting pyjwt>=2.0
  Downloading PyJWT-2.3.0-py3-none-any.whl (16 kB)
Installing collected packages: pynacl, pyjwt, deprecated, PyGithub
Successfully installed PyGithub-1.55 deprecated-1.2.13 pyjwt-2.3.0 pynacl-1.5.0
Collecting python-dotenv
  Downloading python_dotenv-0.20.0-py3-none-any.whl (17 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-0.20.0
Collecting jsonlines
  Downloading jsonlines-3.0.0-py3-none-any.whl (8.5 kB)
Installing collected packages: jsonlines
Successfully installed jsonlines-3.0.0
Collecting openai
  Down

**Mount Drive**

In [2]:
from google.colab import drive
drive.mount('/content/drive/')
PATH= r'/content/drive/MyDrive/Projects/GPT-3/'
print("Path:{}".format(PATH))

Mounted at /content/drive/
Path:/content/drive/MyDrive/Projects/GPT-3/


**Load Environment Variables**

**Plese note this jupyter notebooks expects environment variables to be in .env file in Google Drive**
* Location : /content/drive/MyDrive/Projects/GPT-3/ )
* Contents :
<p>FLASK_APP=app
<p>FLASK_ENV=development
<p>OPENAI_API_KEY=[YOUR OPEN_API_KEY]
<p>GITHUB_ACCESS_TOKEN=[AIANYTYME GITHUB ACCESS KEY]








In [3]:
from os import environ
from dotenv import load_dotenv
load_dotenv(PATH+'.env')
print("Env File to load:{}".format(PATH+'.env'))



Env File to load:/content/drive/MyDrive/Projects/GPT-3/.env


# Import Libraries

In [4]:
from github import Github
import os
import jsonlines

# Get data from Github Repositories

In [5]:
print(os.getenv("GITHUB_ACCESS_TOKEN"))
g = Github(os.getenv(os.getenv("GITHUB_ACCESS_TOKEN")))
repos = g.get_organization("aianytyme").get_repos()

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


In [7]:
jsons = []
for repo in repos:
  try:
    print("repo.full_name:{}".format(repo.full_name))
    file_content = repo.get_contents("README.md")
    jsons.append({"prompt": repo.full_name, "completion": file_content.decoded_content.decode()})
  except Exception:
    pass

print(len(jsons))
for j in jsons:
  print(j)

repo.full_name:aianytyme/openai-quickstart-python
repo.full_name:aianytyme/DS-Tutorials-Notebooks
repo.full_name:aianytyme/gpt3-for-capital-markets
repo.full_name:aianytyme/gpt-3
repo.full_name:aianytyme/sagemaker-huggingface-inference-toolkit
repo.full_name:aianytyme/NLCA_Question_Generator
6
{'prompt': 'aianytyme/openai-quickstart-python', 'completion': '# OpenAI API Quickstart - Python example app\n\nThis is an example pet name generator app used in the OpenAI API [quickstart tutorial](https://beta.openai.com/docs/quickstart). It uses the [Flask](https://flask.palletsprojects.com/en/2.0.x/) web framework. Check out the tutorial or follow the instructions below to get set up.\n\n## Setup\n\n1. If you don’t have Python installed, [install it from here](https://www.python.org/downloads/)\n\n2. Clone this repository\n\n3. Navigate into the project directory\n\n   ```bash\n   $ cd openai-quickstart-python\n   ```\n\n4. Create a new virtual environment\n\n   ```bash\n   $ python -m venv v

**Write data in JSONL file**

In [8]:
with jsonlines.open(PATH+'data/test.jsonl','w') as writer:
  writer.write_all(jsons)

# Tune the OpenAI model

**Create a new model using the JSONL file**

In [9]:
!echo $PATH
!openai api fine_tunes.create -t $PATH/data/test.jsonl -m ada --suffix "gpt3-for-capital-markets"

/content/drive/MyDrive/Projects/GPT-3/
Upload progress: 100% 20.3k/20.3k [00:00<00:00, 21.5Mit/s]
Uploaded file from /content/drive/MyDrive/Projects/GPT-3//data/test.jsonl: file-zs2dFKBcEArAiFCUm6MHrCAU
Created fine-tune: ft-eeVEzN2ToyLLh4iUYrYTlArf
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-05-02 17:33:02] Created fine-tune: ft-eeVEzN2ToyLLh4iUYrYTlArf
[2022-05-02 17:33:09] Fine-tune costs $0.01
[2022-05-02 17:33:09] Fine-tune enqueued. Queue number: 0
[2022-05-02 17:33:12] Fine-tune started
[2022-05-02 17:33:27] Completed epoch 1/4
[2022-05-02 17:33:28] Completed epoch 2/4
[2022-05-02 17:33:29] Completed epoch 3/4
[2022-05-02 17:33:30] Completed epoch 4/4
[2022-05-02 17:33:47] Uploaded model: ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-05-02-17-33-45
[2022-05-02 17:33:50] Uploaded result file: file-d3WC3LTCIfJnaD1TddbR5289
[2022-05-02 17:33:50] Fine-tune succeeded

Job complete! Status: succeede

**Test the new model**

In [10]:
!openai api completions.create -m ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-29-13-36-35 -p "The Hugging Face Inference Toolkit allows"

The Hugging Face Inference Toolkit allows you to apply machine learning to facial expressions and ask labelling questions on them.

**Delete the model(s)**

In [17]:
!openai api models.delete -i ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-29-13-36-35
#!openai api models.delete -i ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-25-02-54-08
#!openai api models.delete -i ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-25-11-47-03
#!openai api models.delete -i ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-26-02-33-18
#!openai api models.delete -i ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-05-02-17-33-45


{
  "deleted": true,
  "id": "ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-29-13-36-35",
  "object": "model"
}
