### PIB CoPilot
PIBs are also used to create a pitchbook by assessing a company's strategy, competitive positioning, review of financial statements, industry dynamics, and trends within the industry. 

- News releases: News articles that may affect a company's stock price or growth prospect would be something that analysts look into, particularly within a 6-12 month time horizon.
- SEC filings: These regulatory documents require a company to file Form 10-K and Form 10-Q with the SEC on an ongoing basis. Form 10-K is a financial overview and commentary for the last year, usually found on the company's website. Form 10-Q is similar to form 10-K, but it is a report for the last quarter instead of the previous year.
- Equity research reports: Look into key forecasts for metrics like Revenue, EBITDA, and EPS for the company or competing firms to form a consensus estimate. 
- Investor Presentations: Companies provide historical information as an important foundation from which forecasts are made to guide key forecasting drivers. 
- Press Release: Can be found in the investor relations section of most companies' websites and contains the financial statements which are used in forms 10-K and 10-Q. 
- Conference calls: The same day a company issues its quarterly press release, it will also hold a conference call. On the call, analysts often learn details about management guidance. These conference calls are transcribed by several service providers and can be accessed by subscribers of large financial data providers.

In [1]:
import os  
import json  
import openai
from Utilities.envVars import *

# Set Search Service endpoint, index name, and API key from environment variables
indexName = SearchIndex

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"https://{OpenAiService}.openai.azure.com"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in openAiEndPoint.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = openAiEndPoint
davincimodel = OpenAiDavinci


In [2]:
import typing
from Utilities.fmp import *
apikey = FmpKey
symbol: str = "AAPL"
symbols: typing.List[str] = ["AAPL", "CSCO", "QQQQ"]
exchange: str = "NYSE"
exchanges: typing.List[str] = ["NYSE", "NASDAQ"]
query: str = "AA"
limit: int = 3
period: str = "quarter"
download: bool = True
filing_type: str = "10-K"

In [3]:
from datetime import datetime
from pytz import timezone
from dateutil.relativedelta import relativedelta
from datetime import timedelta
from Utilities.cogSearch import createEarningCallIndex, indexDocs, createPressReleaseIndex, createStockNewsIndex

central = timezone('US/Central')
today = datetime.now(central)
currentYear = today.year
historicalDate = today - relativedelta(years=3)
historicalYear = historicalDate.year
historicalDate = historicalDate.strftime("%Y-%m-%d")
totalYears = currentYear - historicalYear

#### Process the SEC Filings that are stored in JSON Format

In [31]:
from Utilities.azureBlob import upsertMetadata, getBlob, getAllBlobs

def GetAllFiles():
    # Get all files in the container from Azure Blob Storage
    # Create the BlobServiceClient object
    blobList = getAllBlobs(OpenAiDocConnStr, SecDocContainer)
    files = []
    for file in blobList:
        if (file.metadata == None):
            files.append({
            "filename" : file.name,
            "embedded": "false",
            })
        else:
            files.append({
                "filename" : file.name,
                "embedded": file.metadata["embedded"] if "embedded" in file.metadata else "false",
                })
    print(f"Found {len(files)} files in the container")
    return files

In [32]:
filesData = GetAllFiles()
filesData = list(filter(lambda x : x['embedded'] == "false", filesData))
filesData = list(map(lambda x: {'filename': x['filename']}, filesData))
print(f"Found {len(filesData)} files to embed")

Found 101477 files in the container
Found 53575 files to embed


In [33]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import *
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from itertools import islice

In [35]:
def chunkAndEmbed(indexName, secDoc, fullPath):
    fullData = []
    text = secDoc['item_1'] + secDoc['item_1A'] + secDoc['item_7'] + secDoc['item_7A']
    text = text.replace("\n", " ")

    secCommonData = {
            "id": f"{fullPath}".replace(".", "_").replace(" ", "_").replace(":", "_").replace("/", "_").replace(",", "_").replace("&", "_"),
            "cik": secDoc['cik'],
            "company": secDoc['company'],
            "filing_type": secDoc['filing_type'],
            "filing_date": secDoc['filing_date'],
            "period_of_report": secDoc['period_of_report'],
            "sic": secDoc['sic'],
            "state_of_inc": secDoc['state_of_inc'],
            "state_location": secDoc['state_location'],
            "fiscal_year_end": secDoc['fiscal_year_end'],
            "filing_html_index": secDoc['filing_html_index'],
            "htm_filing_link": secDoc['htm_filing_link'],
            "complete_text_filing_link": secDoc['complete_text_filing_link'],
            "filename": secDoc['filename'],
            "item_1": secDoc['item_1'],
            "item_1A": secDoc['item_1A'],
            "item_1B": secDoc['item_1B'],
            "item_2": secDoc['item_2'],
            "item_3": secDoc['item_3'],
            "item_4": secDoc['item_4'],
            "item_5": secDoc['item_5'],
            "item_6": secDoc['item_6'],
            "item_7": secDoc['item_7'],
            "item_7A": secDoc['item_7A'],
            "item_8": secDoc['item_8'],
            "item_9": secDoc['item_9'],
            "item_9A": secDoc['item_9A'],
            "item_9B": secDoc['item_9B'],
            "item_10": secDoc['item_10'],
            "item_11": secDoc['item_11'],
            "item_12": secDoc['item_12'],
            "item_13": secDoc['item_13'],
            "item_14": secDoc['item_14'],
            "item_15": secDoc['item_15'],
            "content": text,
            #"contentVector": [],
            "metadata" : json.dumps({"cik": secDoc['cik'], "source": secDoc['filename'], "filingType": secDoc['filing_type'], "reportDate": secDoc['period_of_report']}),
            "sourcefile": fullPath
        }
    # Comment for now on not generating embeddings
    #secCommonData['contentVector'] = generateEmbeddings(embeddingModelType, OpenAiEmbedding, text)
    fullData.append(secCommonData)

    searchClient = SearchClient(endpoint=f"https://{SearchService}.search.windows.net/",
                                index_name=indexName,
                                credential=AzureKeyCredential(SearchKey))
    results = searchClient.upload_documents(fullData)
    succeeded = sum([1 for r in results if r.succeeded])
    #print(f"\tIndexed {len(results)} sections, {succeeded} succeeded")

    return None

In [36]:
import asyncio
import time

def background(f):
    def wrapped(*args, **kwargs):
        return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)

    return wrapped

In [37]:
@background
def indexDocuments(file):
    fileName = file['filename']
    print(f"Indexing {fileName}")
    readBytes = getBlob(OpenAiDocConnStr, SecDocContainer, fileName)
    secDoc = json.loads(readBytes.decode("utf-8"))           
    #createSearchIndex(indexName)
    chunkAndEmbed(indexName, secDoc, os.path.basename(fileName))
    metadata = {'embedded': 'true', 'indexType': "cogsearchvs", "indexName": indexName}
    upsertMetadata(OpenAiDocConnStr, SecDocContainer, fileName, metadata)

In [38]:
res = filesData[:10000]

In [39]:
indexName = 'secfilings'
i = 0
for file in res:
    indexDocuments(file)

Indexing 24545/24545_10K_2000_0000024545-01-500005.jsonIndexing 24545/24545_10K_2015_0000024545-16-000054.json

Indexing 24545/24545_10K_2016_0000024545-17-000005.json
Indexing 24545/24545_10K_2017_0000024545-18-000009.json
Indexing 24545/24545_10K_2018_0000024545-19-000007.json
Indexing 24545/24545_10K_2019_0000024545-20-000005.json
Indexing 24545/24545_10K_2020_0000024545-21-000004.json
Indexing 2457/2457_10K_1994_0000950124-95-000996.json
Indexing 24654/24654_10K_1994_0000024654-94-000007.json
Indexing 24654/24654_10K_1995_0000024654-95-000005.json
Indexing 24741/24741_10K405_2000_0000912057-01-503245.json
Indexing 24741/24741_10K_1994_0000024741-94-000033.json


Indexing 24741/24741_10K_1995_0000024741-95-000026.json
Indexing 24741/24741_10K_1995_0000024741-96-000017.json
Indexing 24741/24741_10K_1996_0000950146-97-000367.json
Indexing 24741/24741_10K_1997_0000950146-98-000390.json
Indexing 24741/24741_10K_1998_0000950146-99-000306.json
Indexing 24741/24741_10K_1999_0000912057-00-010188.json
Indexing 24741/24741_10K_2015_0000024741-16-000077.json
Indexing 24741/24741_10K_2016_0000024741-17-000011.json
Indexing 24741/24741_10K_2017_0000024741-18-000010.json
Indexing 24741/24741_10K_2018_0000024741-19-000016.json
Indexing 24741/24741_10K_2019_0000024741-20-000014.jsonIndexing 24741/24741_10K_2020_0001562762-21-000023.json

Indexing 2488/2488_10K_1993_0000891618-94-000068.json
Indexing 2488/2488_10K_1994_0000950131-95-000530.json
Indexing 2488/2488_10K_1995_0000898430-96-000902.json
Indexing 2488/2488_10K_1996_0001012870-97-000533.json
Indexing 2488/2488_10K_1997_0001012870-98-000595.json
Indexing 2488/2488_10K_1998_0001012870-99-000892.json
Inde