<a href="https://colab.research.google.com/github/aianytyme/gpt3-for-capital-markets/blob/main/GPT_3_For_Capital_Markets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-Tuning GPT-3 For Capital Markets**

References:

1.   [Fine-tuning GPT3 model for github repo data](https://www.youtube.com/watch?v=Cf8m3Bflfuc)
2.   [Analyzing SEC filings with Transformers for fun and profit](https://www.youtube.com/watch?v=SU1L6f0N6iw&list=PLcSRBAICQBoo0k7FR6MPAJbSWGx0WPyOI)
3.[What is the SageMaker JumpStart Industry Python SDK](https://sagemaker-jumpstart-industry-pack.readthedocs.io/en/latest/what-is.html)





# Trial Playground

In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([(1,2,3), (4,5,6)])
c = np.array([[1], [2], [3]])

print ('a: dim:{}, shape:{}'.format(a.ndim, a.shape))
print ('b: dim:{}, shape:{}'.format(b.ndim, b.shape))
print ('c: dim:{}, shape:{}'.format(c.ndim, c.shape))

a: dim:1, shape:(3,)
b: dim:2, shape:(2, 3)
c: dim:2, shape:(3, 1)


# Setup

**Install Libraries**

In [24]:
!pip install PyGithub
!pip install python-dotenv
!pip install --upgrade jsonlines
!pip install --upgrade openai
!pip install wandb

Collecting wandb
  Downloading wandb-0.12.15-py2.py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 5.3 MB/s 
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.5.10-py2.py3-none-any.whl (144 kB)
[K     |████████████████████████████████| 144 kB 63.2 MB/s 
[?25hCollecting shortuuid>=0.5.0
  Downloading shortuuid-1.0.8-py3-none-any.whl (9.5 kB)
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting setproctitle
  Downloading setproctitle-1.2.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29 kB)
Collecting GitPython>=1.0.0
  Downloading GitPython-3.1.27-py3-none-any.whl (181 kB)
[K     |████████████████████████████████| 181 kB 46.7 MB/s 
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 1.4 MB/s 
Collecting smm

**Mount Drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive/')
PATH= r'/content/drive/MyDrive/Projects/GPT-3/'
print("Path:{}".format(PATH))

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
Path:/content/drive/MyDrive/Projects/GPT-3/


**Load Environment Variables**

In [9]:
from os import environ
from dotenv import load_dotenv
load_dotenv(PATH+'.env')
print("Env File to load:{}".format(PATH+'.env'))


Env File to load:/content/drive/MyDrive/Projects/GPT-3/.env


# Import Libraries

In [10]:
from github import Github
import os
import jsonlines

# Get data from Github Repositories

In [19]:
g = Github(os.getenv("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"))
repos = g.get_organization("aianytyme").get_repos()

In [20]:
jsons = []
for repo in repos:
  try:
    print("repo.full_name:{}".format(repo.full_name))
    file_content = repo.get_contents("README.md")
    jsons.append({"prompt": repo.full_name, "completion": file_content.decoded_content.decode()})
  except Exception:
    pass

print(len(jsons))
for j in jsons:
  print(j)

repo.full_name:aianytyme/openai-quickstart-python
repo.full_name:aianytyme/DS-Tutorials-Notebooks
repo.full_name:aianytyme/gpt3-for-capital-markets
repo.full_name:aianytyme/gpt-3
4
{'prompt': 'aianytyme/openai-quickstart-python', 'completion': '# OpenAI API Quickstart - Python example app\n\nThis is an example pet name generator app used in the OpenAI API [quickstart tutorial](https://beta.openai.com/docs/quickstart). It uses the [Flask](https://flask.palletsprojects.com/en/2.0.x/) web framework. Check out the tutorial or follow the instructions below to get set up.\n\n## Setup\n\n1. If you don’t have Python installed, [install it from here](https://www.python.org/downloads/)\n\n2. Clone this repository\n\n3. Navigate into the project directory\n\n   ```bash\n   $ cd openai-quickstart-python\n   ```\n\n4. Create a new virtual environment\n\n   ```bash\n   $ python -m venv venv\n   $ . venv/bin/activate\n   ```\n\n5. Install the requirements\n\n   ```bash\n   $ pip install -r requiremen

**Write data in JSONL file**

In [21]:
with jsonlines.open(PATH+'test.jsonl','w') as writer:
  writer.write_all(jsons)

# Tune the OpenAI model

**Create a new model using the JSONL file**

In [27]:
!echo $PATH
!openai api fine_tunes.create -t $PATH/test.jsonl -m ada --suffix "gpt3-for-capital-markets"

/content/drive/MyDrive/Projects/GPT-3/
Upload progress: 100% 4.98k/4.98k [00:00<00:00, 6.51Mit/s]
Uploaded file from /content/drive/MyDrive/Projects/GPT-3//test.jsonl: file-m3zRE9CdRm3a28tafQHFwz76
Created fine-tune: ft-ZLvJ8IWObAfNEpNIWHmPAaRx
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-04-23 00:55:10] Created fine-tune: ft-ZLvJ8IWObAfNEpNIWHmPAaRx
[2022-04-23 00:55:33] Fine-tune costs $0.00
[2022-04-23 00:55:34] Fine-tune enqueued. Queue number: 0
[2022-04-23 00:55:36] Fine-tune started
[2022-04-23 00:55:51] Completed epoch 1/4
[2022-04-23 00:55:52] Completed epoch 2/4
[2022-04-23 00:55:53] Completed epoch 3/4
[2022-04-23 00:55:54] Completed epoch 4/4
[2022-04-23 00:56:11] Uploaded model: ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-23-00-56-09
[2022-04-23 00:56:14] Uploaded result file: file-KM3Ny1tArd2TWF1sRxdsGb72
[2022-04-23 00:56:14] Fine-tune succeeded

Job complete! Status: succeeded 🎉
T

**Test the new model**

In [30]:
!openai api completions.create -m ada:ft-the-orange-pencil:gpt3-for-capital-markets-2022-04-23-00-56-09 -p "How do I tune GPT-3 model for capital markets?"

How do I tune GPT-3 model for capital markets?

If you are experiencing a card not working:

Press the NO