**Development of a Structured Database and Text Extraction System for Finance Professional Development Resources**

**Overview:** You are an enterprise experimenting with the use of Models as a service APIs to build intelligent applications for knowledge retrieval and Q/A tasks.

In this assignment, we will leverage Pinecone and OpenAI api’s for:
1. Creating knowledge summaries using OpenAI’s GPT
2. Generating a knowledge base (Q/A) providing context
3. Using a vector database to find and answer questions.
4. Use the knowledge summaries from 1 to answer questions.

We will be using the scraped dataset from the CFA website using BeautifulSoup previously done by us in Assignment 1.

CFA Website: https://www.cfainstitute.org/en/membership/professional-development/refresher-readings#sort=%40refreadingcurriculumyear%20descending

As part of our Assigment, all 4 of us in the group have 4 different topics that we will be filtering out from the scraped data.

- Akshita Pathania: Market-Based Valuation: Price and Enterprise Value Multiples (2018)
- Osborne Lopes: Residual Income Valuation (2018)
- Smithi Parthiban: Industry and Company Analysis (2018)
- Manimanya Reddy: Free Cash Flow Valuation (2018)

In [1]:
pip install 'pinecone-client[grpc]'


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install openai

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install tiktoken


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install langchain

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [5]:
import openai

from typing import List, Iterator
import pandas as pd
import numpy as np
import os
from ast import literal_eval

# Pinecone's client library for Python
import pinecone

# I've set this to our new embeddings model, this can be changed to the embedding model of your choice
EMBEDDING_MODEL = "text-embedding-3-small"

# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ResourceWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

  from tqdm.autonotebook import tqdm


In [6]:
article_df = pd.read_csv('/Users/akshitapathania/Desktop/Assign5_CSV.csv')
article_df.head()


Unnamed: 0,Title,Year,Level,Introduction Summary,Learning Outcomes,Link to Summary Page,Link to PDF File
0,Industry and Company Analysis (2018),2018\n Curriculum,Level II,This reading explores industry and company ana...,"compare top-down, bottom-up, and hybrid approa...",https://www.cfainstitute.org/membership/profes...,No PDF link found
1,Free Cash Flow Valuation (2018),2018\n Curriculum,Level II,Discounted cash flow (DCF) valuation views the...,compare the free cash flow to the firm (FCFF) ...,https://www.cfainstitute.org/membership/profes...,No PDF link found
2,Residual Income Valuation (2018),2018\n Curriculum,Level II,Residual income models of equity value have be...,"calculate and interpret residual income, econo...",https://www.cfainstitute.org/membership/profes...,No PDF link found
3,Market-Based Valuation: Price and Enterprise V...,2018\n Curriculum,Level II,Among the most familiar and widely used valuat...,distinguish between the method of comparables ...,https://www.cfainstitute.org/membership/profes...,No PDF link found


In [7]:
data = article_df['Learning Outcomes']

In [8]:
data[0]

'compare top-down, bottom-up, and hybrid approaches for developing inputs to equity valuation models; compare “growth relative to GDP growth” and “market growth and market share” approaches to forecasting revenue; evaluate whether economies of scale are present in an industry by analyzing operating margins and sales levels; forecast the following costs: cost of goods sold, selling general and administrative costs, financing costs, and income taxes; describe approaches to balance sheet modeling; describe the relationship between return on invested capital and competitive advantage; explain how competitive factors affect prices and costs; judge the competitive position of a company based on a Porter’s five forces analysis; explain how to forecast industry and company sales and costs when they are subject to price inflation or deflation; evaluate the effects of technological developments on demand, selling prices, costs, and margins; explain considerations in the choice of an explicit for

In [9]:
data[1]

'compare the free cash flow to the firm (FCFF) and free cash flow\nto equity (FCFE) approaches to valuation; explain the ownership perspective implicit in the FCFE approach; explain the appropriate adjustments to net income, earnings\nbefore interest and taxes (EBIT), earnings before interest, taxes,\ndepreciation, and amortization (EBITDA), and cash flow from\noperations (CFO) to calculate FCFF and FCFE; calculate FCFF and FCFE; describe approaches for forecasting FCFF and FCFE; compare the FCFE model and dividend discount models; explain how dividends, share repurchases, share issues, and\nchanges in leverage may affect future FCFF and FCFE; evaluate the use of net income and EBITDA as proxies for cash\nflow in valuation; explain the single-stage (stable-growth), two-stage, and\nthree-stage FCFF and FCFE models and select and justify the\nappropriate model given a company’s characteristics; estimate a company’s value using the appropriate free cash flow\nmodel(s); explain the use of 

In [10]:
import tiktoken

tokenizer = tiktoken.get_encoding('p50k_base')

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20,
    length_function=tiktoken_len,
    separators=["\n\n", "\n", ";", " ",""]
)

In [12]:
from uuid import uuid4
from tqdm.auto import tqdm

# Assuming 'Learning Outcomes' is the name of the column in the DataFrame 'article_df_s'
chunks = []

for idx, text in tqdm(article_df['Learning Outcomes'].items(), total=article_df['Learning Outcomes'].shape[0]):
    # Assuming text_splitter is defined and has a method split_text that splits the text into chunks
    texts = text_splitter.split_text(text)
    chunks.extend([{
        'id': str(uuid4()),
        'text': chunk_text,
        'chunk': i,
        'title': article_df.at[idx, 'Title']  
    } for i, chunk_text in enumerate(texts)])



100%|██████████| 4/4 [00:00<00:00, 540.99it/s]


### Initialise the embedded model

In [13]:

# initialize openai API key
from openai import OpenAI
api_key = os.getenv("api_key")


embed_model = "text-embedding-3-small"

client = OpenAI(api_key=api_key)
res = client.embeddings.create(
    input=[
        "Sample document text goes here",
        "there will be several phrases in each batch"
    ], model = embed_model
)

In [14]:
print(res)

CreateEmbeddingResponse(data=[Embedding(embedding=[-0.0006736477953381836, 0.01784864068031311, 0.028474807739257812, -0.016548041254281998, -0.044690780341625214, -0.03376021981239319, 0.024296287447214127, -0.015496493317186832, 0.014804685488343239, -0.00604294054210186, 0.034147631376981735, -0.010307935066521168, 0.004617816768586636, -0.0065825507044792175, 0.06502992659807205, 0.07250145077705383, -0.0044102743268013, 0.013344971463084221, -0.026772959157824516, 0.03685951605439186, 0.0047181290574371815, 0.04671085998415947, -0.021349187940359116, 0.03442435339093208, 0.012708508409559727, -0.021874960511922836, -0.040512263774871826, -0.024891242384910583, 0.05113843083381653, -0.06253942102193832, -0.018512776121497154, -0.041204068809747696, 0.01910773105919361, -0.03038419596850872, -0.040152523666620255, 0.054680485278367996, 0.06558337807655334, 0.004022862296551466, -0.04975481331348419, -0.04817749187350273, -0.00958153698593378, -0.012507883831858635, 0.054846517741680

In [15]:
len(res.data[0].embedding), len(res.data[1].embedding)

(1536, 1536)

### Intialising the index

In [16]:
from pinecone import Pinecone
papi_key = os.getenv("papi_key")
pinecone = Pinecone(api_key=papi_key)

In [17]:
from pinecone import Pinecone

index_name = 'gpt-35-retrival'


from pinecone import PodSpec
pinecone.create_index(name=index_name, dimension=len(res.data[0].embedding), spec = PodSpec(environment="gcp-starter"))
index = pinecone.Index(name=index_name)

# Confirm our index was created
pinecone.list_indexes()


{'indexes': [{'dimension': 1536,
              'host': 'gpt-35-retrival-67tsuqg.svc.gcp-starter.pinecone.io',
              'metric': 'cosine',
              'name': 'gpt-35-retrival',
              'spec': {'pod': {'environment': 'gcp-starter',
                               'pod_type': 'starter',
                               'pods': 1,
                               'replicas': 1,
                               'shards': 1}},
              'status': {'ready': True, 'state': 'Ready'}}]}

In [18]:
from tqdm.auto import tqdm
import datetime
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

# Function to create embeddings with retry logic
def create_embeddings_with_retry(texts, model, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            return client.embeddings.create(input=texts, model=model)
        except Exception as e:
            print(f"Attempt {retries + 1}: Encountered an error - {e}")
            sleep(5)  # wait before retrying
            retries += 1
    raise Exception(f"Failed to create embeddings after {max_retries} retries.")

# Process the data in batches
for i in tqdm(range(0, len(article_df), batch_size)):
    i_end = min(len(article_df), i + batch_size)
    batch = article_df.iloc[i:i_end]

    ids_batch = [str(uuid4()) for _ in range(len(batch))]
    texts = batch['Learning Outcomes'].tolist()

    res = create_embeddings_with_retry(texts, embed_model)

    embeds = [record.embedding for record in res.data]

    # Prepare data for Pinecone upsert
    for idx, (text, embed) in enumerate(zip(texts, embeds)):
        # Use a relevant column or derived attribute to determine the namespace
        title = batch.iloc[idx]['Title']  # Adjust 'Topic' to your actual column/attribute
        
        # Clean the topic to ensure it only contains ASCII-printable characters
        title_cleaned = ''.join(char for char in title if char.isascii())
        namespace = f"{title_cleaned}_LOS"  # Create a namespace based on the cleaned topic
        namespace = ''.join(char for char in namespace if char.isalnum())  # Remove non-alphanumeric characters
        
        to_upsert = [(ids_batch[idx], embed, {'text': text, 'chunk': idx, 'url': 'default_url'})]
        index.upsert(vectors=to_upsert, namespace=namespace)


100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


In [19]:
from openai import OpenAI


# Initialize OpenAI API key
api_key = os.getenv("api_key")

query = " create a technical note that summarizes the key Learning outcome statement (LOS). Note: Be sure to include tables, figures and equations as you see fit"
embed_model = "text-embedding-3-small"  # or another model suitable for your task

# Generate the embedding for the query
client = OpenAI(api_key=api_key)
res = client.embeddings.create(
    model=embed_model,
    input=query
)

# Retrieve the embedding from the response
embedding_vector = res.data[0].embedding



New we ask the question

In [20]:
from pinecone import Pinecone
papi_key = os.getenv("papi_key")
pinecone = Pinecone(api_key=papi_key)

### Generating a technical note for the LOS of Free Cash Flow Valuation 2018

In [21]:
from IPython.display import Markdown

primer = """You are a Q&A bot, designed to provide intelligent answers based on the information provided. If the information is not available, you truthfully say, "I don't know"."""

# Initialize the list of augmented queries
augmented_queries_FreeCashFlowValuation2018LOS = []

# Perform the query using the embedding vector
query_result_FreeCashFlowValuation2018LOS = index.query(vector=embedding_vector, top_k=5, namespace="FreeCashFlowValuation2018LOS", include_metadata=True)

# Get list of retrieved text from the query result
contexts_FreeCashFlowValuation2018LOS = [item['metadata']['text'] for item in query_result_FreeCashFlowValuation2018LOS['matches']]

# Loop through each context and generate a query
for idx, context in enumerate(contexts_FreeCashFlowValuation2018LOS):
    for attempt in range(3):  # Try up to three times to get a satisfactory response
        messages = [{"role": "system", "content": primer}, {"role": "user", "content": context}]
        res = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)

        # If a satisfactory response is obtained, use it
        if "too extensive" not in res.choices[0].message.content.lower():
            augmented_queries_FreeCashFlowValuation2018LOS.append(res.choices[0].message.content)
            break


# Display the augmented queries for "FreeCashFlowValuation2018LOS"
for idx, query in enumerate(augmented_queries_FreeCashFlowValuation2018LOS):
    display(Markdown(f"Query {idx + 1} for FreeCashFlowValuation2018LOS:\n\n{query}"))
    print("=" * 50)


Query 1 for FreeCashFlowValuation2018LOS:

I can provide a general overview of the concepts you mentioned:

1. **Comparison of FCFF and FCFE approaches**: 
   - FCFF represents the cash flow available to all the firm's investors (both debt and equity holders) whereas FCFE represents the cash flow available only to equity investors after meeting all obligations.
   - Ownership perspective in FCFE approach: FCFE considers that equity holders have a claim on the company's free cash flow after all other stakeholders have been paid.

2. **Adjustments to calculate FCFF and FCFE**:
   - Add back interest expense (net of tax) to net income for FCFF.
   - Subtract net borrowing for FCFF.
   - Subtract interest expense (net of tax) from net income for FCFE.
   - Add net borrowing for FCFE.
   
3. **Forecasting FCFF and FCFE**:
   - Use historical financial data and industry analysis to forecast future cash flows and make assumptions about growth rates and economic conditions.
   
4. **Comparing FCFE model and Dividend Discount Models**:
   - FCFE model considers all cash flows available to equity holders, including dividends and share repurchases, while dividend discount models focus only on dividends.

5. **Impact of dividends, share repurchases, share issues, and changes in leverage on FCFF and FCFE**:
   - Dividends and share repurchases decrease FCFE.
   - Share issues increase FCFE and FCFF.
   - Changes in leverage affect the cost of capital, influencing FCFF and FCFE.

6. **Use of net income and EBITDA as proxies for cash flow**:
   - Net income and EBITDA are not accurate measures of cash flow due to non-cash items and accounting conventions.

7. **Single-stage, two-stage, and three-stage FCFF and FCFE models**:
   - Single-stage assumes constant growth, two-stage incorporates a high growth phase followed by a stable phase, and three-stage includes multiple growth phases.

8. **Estimating a company's value using free cash flow models**:
   - Calculate FCFF/FCFE based on the model chosen and discount it to present value using the appropriate discount rate.

9. **Sensitivity analysis in FCFF and FCFE valuations**:
   - Conduct sensitivity analysis to see how changes in key assumptions impact the valuation.

10. **Calculating terminal value in a multistage valuation model**:
    - Use perpetuity growth method or exit multiple method to calculate terminal value.

11. **Evaluating stock valuation**:
    - Compare the calculated value from the FCFF/FCFE model with the current market price to determine if a stock is overvalued, fairly valued, or undervalued.

For specific calculations and detailed numerical examples, you may need to provide more context or specific data related to a company's financials.



### Generating technical summary for Industry and Company Analysis 2018

In [22]:
from IPython.display import Markdown

primer = """You are a Q&A bot, designed to provide intelligent answers based on the information provided. If the information is not available, you truthfully say, "I don't know"."""

# Initialize the list of augmented queries
augmented_queries_IndustryandCompanyAnalysis2018LOS = []

# Perform the query using the embedding vector
query_result_IndustryandCompanyAnalysis2018LOS = index.query(vector=embedding_vector, top_k=5, namespace="IndustryandCompanyAnalysis2018LOS", include_metadata=True)

# Get list of retrieved text from the query result
contexts_IndustryandCompanyAnalysis2018LOS = [item['metadata']['text'] for item in query_result_IndustryandCompanyAnalysis2018LOS['matches']]

# Loop through each context and generate a query
for idx, context in enumerate(contexts_IndustryandCompanyAnalysis2018LOS):
    for attempt in range(3):  # Try up to three times to get a satisfactory response
        messages = [{"role": "system", "content": primer}, {"role": "user", "content": context}]
        res = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)

        # If a satisfactory response is obtained, use it
        if "too extensive" not in res.choices[0].message.content.lower():
            augmented_queries_IndustryandCompanyAnalysis2018LOS.append(res.choices[0].message.content)
            break

# Display the augmented queries for "IndustryandCompanyAnalysis2018LOS"
for idx, query in enumerate(augmented_queries_IndustryandCompanyAnalysis2018LOS):
    display(Markdown(f"Query {idx + 1} for IndustryandCompanyAnalysis2018LOS:\n\n{query}"))
    print("=" * 50)


Query 1 for IndustryandCompanyAnalysis2018LOS:

I'm sorry, but that is a very broad and extensive list of topics to cover in just one response. Each of these questions requires in-depth explanations and analyses. Let's break down the topics into manageable sections. 

1. **Approaches to developing inputs to equity valuation models**:
   - Top-Down Approach: Begins with the analysis of the overall economy and market conditions then narrows down to specific companies within that market.
   - Bottom-Up Approach: Focuses on analyzing individual companies first to determine their value and then aggregates this information to assess the overall market.
   - Hybrid Approach: Combines elements of both top-down and bottom-up approaches for a comprehensive analysis.

2. **Approaches to forecasting revenue**:
   - "Growth relative to GDP growth": Compares a company's revenue growth rate to the GDP growth rate, considering the company's exposure to the overall economy.
   - "Market growth and market share": Considers the growth potential of the specific market the company operates in and its ability to capture market share.

3. **Economies of scale in an industry**:
   - Operating margins and sales levels can indicate economies of scale. If operating margins increase as sales levels grow, it suggests economies of scale are present.

Feel free to ask about a specific topic you'd like to delve into further, and I'd be happy to provide more detailed explanations and analyses.



### Generating technical summary for Market Based Valuation Price and Enterprise Value Multiples 2018

In [23]:
from IPython.display import Markdown

primer = """You are a Q&A bot, designed to provide intelligent answers based on the information provided. If the information is not available, you truthfully say, "I don't know"."""

# Initialize the list of augmented queries
augmented_queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS = []

# Perform the query using the embedding vector
query_result_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS = index.query(vector=embedding_vector, top_k=5, namespace="MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS", include_metadata=True)

# Get list of retrieved text from the query result
contexts_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS = [item['metadata']['text'] for item in query_result_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS['matches']]

# Loop through each context and generate a query
for idx, context in enumerate(contexts_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS):
    for attempt in range(3):  # Try up to three times to get a satisfactory response
        messages = [{"role": "system", "content": primer}, {"role": "user", "content": context}]
        res = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)

        # If a satisfactory response is obtained, use it
        if "too extensive" not in res.choices[0].message.content.lower():
            augmented_queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS.append(res.choices[0].message.content)
            break

# Display the augmented queries for "MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS"
for idx, query in enumerate(augmented_queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS):
    display(Markdown(f"Query {idx + 1} for MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS:\n\n{query}"))
    print("=" * 50)


Query 1 for MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS:

I can provide individual answers to the questions you've listed about valuation methodologies, price multiples, and fundamental analysis. Let's start with the first question:

### Distinguishing Between the Method of Comparables and the Method Based on Forecasted Fundamentals

**Method of Comparables:**
- **Approach:** This method involves comparing the target company with similar publicly traded companies in the same industry or sector.
- **Economic Rationale:** It leverages the market efficiency assumption that similar companies trading at different multiples should converge toward a more appropriate valuation. It helps in understanding how the market values comparable firms and applies that knowledge to the target company.
  
**Method Based on Forecasted Fundamentals:**
- **Approach:** This method focuses on deriving price multiples based on the fundamental characteristics and projected financial performance of the target company.
- **Economic Rationale:** It incorporates future expectations into the valuation process, providing a forward-looking view that is not solely based on historical or current market prices. It allows for a more customized and potentially accurate valuation of the company based on its unique fundamentals.

### Calculating a Justified Price Multiple:

To calculate a justified price multiple, you typically forecast the fundamental drivers of the multiple (e.g., earnings, sales, cash flow), and then divide the current stock price by the forecasted fundamental to arrive at the justified multiple. 

For example, if you forecast that the company's earnings per share (EPS) will be $5 and the stock price is $50, the justified P/E multiple would be 50 / 5 = 10x. 

Interpreting a justified multiple involves comparing it to industry averages, historical levels, and understanding whether the forecasted fundamentals support the calculated multiple.

If you'd like to proceed with the next question or need clarification on anything, feel free to ask!



### Generating technical summary for Residual Income Valuation 2018 

In [26]:
from IPython.display import Markdown


primer = """You are a Q&A bot, designed to provide intelligent answers based on the information provided. If the information is not available, you truthfully say, "I don't know"."""

# Initialize the list of augmented queries
augmented_queries_ResidualIncomeValuation2018LOS = []

# Perform the query using the embedding vector
query_result_ResidualIncomeValuation2018LOS = index.query(vector=embedding_vector, top_k=5, namespace="ResidualIncomeValuation2018LOS", include_metadata=True)

# Get list of retrieved text from the query result
contexts_ResidualIncomeValuation2018LOS = [item['metadata']['text'] for item in query_result_ResidualIncomeValuation2018LOS['matches']]

# Loop through each context and generate a query
for idx, context in enumerate(contexts_ResidualIncomeValuation2018LOS):
    for attempt in range(3):  # Try up to three times to get a satisfactory response
        messages = [{"role": "system", "content": primer}, {"role": "user", "content": context}]
        res = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)

        # If a satisfactory response is obtained, use it
        if "too extensive" not in res.choices[0].message.content.lower():
            augmented_queries_ResidualIncomeValuation2018LOS.append(res.choices[0].message.content)
            break
 

# Display the augmented queries for "IndustryandCompanyAnalysis2018LOS"
for idx, query in enumerate(augmented_queries_ResidualIncomeValuation2018LOS):
    display(Markdown(f"Query {idx + 1} for ResidualIncomeValuation2018LOS:\n\n{query}"))
    print("=" * 50)
 


Query 1 for ResidualIncomeValuation2018LOS:

I can provide an overview and cover the main points of the topics you mentioned. Let's break it down into sections:

1. **Residual Income (RI)**: 
Residual income is calculated by subtracting the equity charge from net income, where the equity charge is the cost of equity capital multiplied by the average book value of equity. It represents the economic profits of a company after accounting for the cost of equity capital. It helps assess a company's performance in excess of the required return on equity.

2. **Economic Value Added (EVA)**:
Economic Value Added is similar to residual income but uses the concept of economic profit, which subtracts the cost of all capital, including debt and equity, from net operating profit after tax (NOPAT). EVA is used to evaluate how effectively a company utilizes its capital to generate returns.

3. **Market Value Added (MVA)**:
Market Value Added is the difference between the current market value of a company and the total capital invested in it. It indicates whether a company has created value for its shareholders based on market expectations.

4. **Uses of Residual Income Models**:
Residual income models are used to value a company's common stock by estimating the future cash flows generated above the cost of equity capital. These models provide insights into a company's value beyond traditional valuation methods and account for the cost of equity explicitly.

5. **Determinants of Residual Income**:
Fundamental determinants of residual income include the expected future earnings of a company, the cost of equity capital, and the book value of equity. Changes in these factors will impact the residual income generated.

6. **Relation to Justified Price-to-Book Ratio**:
Residual income valuation is related to the justified price-to-book ratio as it reflects the company's ability to generate excess returns. A higher forecasted residual income would justify a higher price-to-book ratio assuming the company's fundamentals support the growth prospects.

7. **Continuing Residual Income**:
Continuing residual income represents the residual income expected beyond the forecast horizon. It relies on assumptions about the company's future prospects and industry conditions to estimate the ongoing profitability.

8. **Comparison to Other Valuation Models**:
Residual income models differ from dividend discount and free cash flow models by explicitly considering the cost of equity capital. They focus on accounting for the true economic profit generated by a company and can provide a unique perspective on valuation.

9. **Strengths and Weaknesses**:
Strengths of residual income models include their focus on economic profit, ability to incorporate future expectations, and consideration of the cost of equity. Weaknesses may include sensitivity to assumptions and potential complexity in implementation.

10. **Accounting Issues**:
Applying residual income models may involve challenges such as determining the appropriate cost of equity, forecasting future earnings accurately, and adjusting for accounting distortions that could impact the calculations.

11. **Evaluating Stock Valuation**:
A company's stock can be evaluated as overvalued, undervalued, or fairly valued using a residual income model by comparing the calculated intrinsic value with the current market price. A stock may be undervalued if its intrinsic value exceeds the market price and vice versa.

I hope this overview helps! If you have any specific questions or need further details on any of these topics, feel free to ask.



In [27]:
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 4e-05,
 'namespaces': {'FreeCashFlowValuation2018LOS': {'vector_count': 1},
                'IndustryandCompanyAnalysis2018LOS': {'vector_count': 1},
                'MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS': {'vector_count': 1},
                'ResidualIncomeValuation2018LOS': {'vector_count': 1}},
 'total_vector_count': 4}

### SAVING THE QUERY OUTPUT OF FREE CASH FLOW VALUATION 2018 LOS TO PINECONE

In [28]:
import uuid


# Function to create embeddings with retry logic, assuming it's defined in your environment
def create_embeddings_with_retry(texts, model, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            return client.embeddings.create(input=texts, model=model)
        except Exception as e:
            print(f"Attempt {retries + 1}: Encountered an error - {e}")
            sleep(5)  # wait before retrying
            retries += 1
    raise Exception(f"Failed to create embeddings after {max_retries} retries.")

# Generate embeddings for the augmented queries
embed_model = "text-embedding-3-small"  # Use the appropriate model for your embeddings
augmented_queries_embeddings = create_embeddings_with_retry(augmented_queries_FreeCashFlowValuation2018LOS, embed_model)

# Prepare data for Pinecone upsert
upsert_data = []
for idx, query in enumerate(augmented_queries_FreeCashFlowValuation2018LOS):
    # Generate a unique ID for each query
    query_id = str(uuid.uuid4())
    
    # Extract embedding for the query
    embed = augmented_queries_embeddings.data[idx].embedding
    
    # Prepare the upsert data with metadata if needed
    upsert_data.append((query_id, embed, {'text': query}))

# Perform the upsert operation to Pinecone
namespace = "Gen_Queries_FreeCashFlowValuation2018LOS"  
index.upsert(vectors=upsert_data, namespace=namespace)

print(f"Upserted {len(upsert_data)} queries into Pinecone index under namespace '{namespace}'.")

Upserted 1 queries into Pinecone index under namespace 'Gen_Queries_FreeCashFlowValuation2018LOS'.


### SAVING THE QUERY OUTPUT OF Residual Income Valuation 2018 LOS TO PINECONE

In [29]:
import uuid


# Function to create embeddings with retry logic, assuming it's defined in your environment
def create_embeddings_with_retry(texts, model, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            return client.embeddings.create(input=texts, model=model)
        except Exception as e:
            print(f"Attempt {retries + 1}: Encountered an error - {e}")
            sleep(5)  # wait before retrying
            retries += 1
    raise Exception(f"Failed to create embeddings after {max_retries} retries.")

# Generate embeddings for the augmented queries
embed_model = "text-embedding-3-small"  
augmented_queries_embeddings = create_embeddings_with_retry(augmented_queries_ResidualIncomeValuation2018LOS, embed_model)

# Prepare data for Pinecone upsert
upsert_data = []
for idx, query in enumerate(augmented_queries_ResidualIncomeValuation2018LOS):
    # Generate a unique ID for each query
    query_id = str(uuid.uuid4())
    
    # Extract embedding for the query
    embed = augmented_queries_embeddings.data[idx].embedding
    
    # Prepare the upsert data with metadata if needed
    upsert_data.append((query_id, embed, {'text': query}))

# Perform the upsert operation to Pinecone
namespace = "Gen_Queries_ResidualIncomeValuation2018LOS"  
index.upsert(vectors=upsert_data, namespace=namespace)

print(f"Upserted {len(upsert_data)} queries into Pinecone index under namespace '{namespace}'.")

Upserted 1 queries into Pinecone index under namespace 'Gen_Queries_ResidualIncomeValuation2018LOS'.


### SAVING THE QUERY OUTPUT OF Market Based Valuation Price and Enterprise Value Multiples 2018 LOS TO PINECONE

In [30]:
import uuid


# Function to create embeddings with retry logic, assuming it's defined in your environment
def create_embeddings_with_retry(texts, model, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            return client.embeddings.create(input=texts, model=model)
        except Exception as e:
            print(f"Attempt {retries + 1}: Encountered an error - {e}")
            sleep(5)  # wait before retrying
            retries += 1
    raise Exception(f"Failed to create embeddings after {max_retries} retries.")

# Generate embeddings for the augmented queries
embed_model = "text-embedding-3-small" 
augmented_queries_embeddings = create_embeddings_with_retry(augmented_queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS, embed_model)

# Prepare data for Pinecone upsert
upsert_data = []
for idx, query in enumerate(augmented_queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS):
    # Generate a unique ID for each query
    query_id = str(uuid.uuid4())
    
    # Extract embedding for the query
    embed = augmented_queries_embeddings.data[idx].embedding
    
    # Prepare the upsert data with metadata if needed
    upsert_data.append((query_id, embed, {'text': query}))

# Perform the upsert operation to Pinecone
namespace = "Gen_Queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS"  
index.upsert(vectors=upsert_data, namespace=namespace)

print(f"Upserted {len(upsert_data)} queries into Pinecone index under namespace '{namespace}'.")

Upserted 1 queries into Pinecone index under namespace 'Gen_Queries_MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS'.


### SAVING THE QUERY OUTPUT OF Industry and Company Analysis 2018 LOS TO PINECONE

In [31]:
import uuid


# Function to create embeddings with retry logic, assuming it's defined in your environment
def create_embeddings_with_retry(texts, model, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            return client.embeddings.create(input=texts, model=model)
        except Exception as e:
            print(f"Attempt {retries + 1}: Encountered an error - {e}")
            sleep(5)  # wait before retrying
            retries += 1
    raise Exception(f"Failed to create embeddings after {max_retries} retries.")

# Generate embeddings for the augmented queries
embed_model = "text-embedding-3-small"  
augmented_queries_embeddings = create_embeddings_with_retry(augmented_queries_IndustryandCompanyAnalysis2018LOS, embed_model)

# Prepare data for Pinecone upsert
upsert_data = []
for idx, query in enumerate(augmented_queries_IndustryandCompanyAnalysis2018LOS):
    # Generate a unique ID for each query
    query_id = str(uuid.uuid4())
    
    # Extract embedding for the query
    embed = augmented_queries_embeddings.data[idx].embedding
    
    # Prepare the upsert data with metadata if needed
    upsert_data.append((query_id, embed, {'text': query}))

# Perform the upsert operation to Pinecone
namespace = "Gen_Queries_IndustryandCompanyAnalysis2018LOS"  
index.upsert(vectors=upsert_data, namespace=namespace)

print(f"Upserted {len(upsert_data)} queries into Pinecone index under namespace '{namespace}'.")

Upserted 1 queries into Pinecone index under namespace 'Gen_Queries_IndustryandCompanyAnalysis2018LOS'.


### Retrieval Augmented Generation


In [32]:
# Initialize the list of augmented queries
augmented_queries = []

# Loop through each namespace and retrieve the corresponding query result
for namespace_name in ["FreeCashFlowValuation2018LOS", "IndustryandCompanyAnalysis2018LOS", "MarketBasedValuationPriceandEnterpriseValueMultiples2018LOS", "ResidualIncomeValuation2018LOS"]:
    # Perform the query using the embedding vector
    query_result = index.query(vector=embedding_vector, top_k=5, namespace=namespace_name, include_metadata=True)
    
    # Get list of retrieved text from the query result
    contexts = [item['metadata']['text'] for item in query_result['matches']]
    
    # Append the list of contexts to the augmented_queries list
    augmented_queries.append(contexts)

# Print the contexts from each augmented query
for idx, contexts in enumerate(augmented_queries):
    print(f"Context {idx + 1} :")
    for context in contexts:
        print(context)
    print("\n" + "=" * 50 + "\n") 


Context 1 :
compare the free cash flow to the firm (FCFF) and free cash flow
to equity (FCFE) approaches to valuation; explain the ownership perspective implicit in the FCFE approach; explain the appropriate adjustments to net income, earnings
before interest and taxes (EBIT), earnings before interest, taxes,
depreciation, and amortization (EBITDA), and cash flow from
operations (CFO) to calculate FCFF and FCFE; calculate FCFF and FCFE; describe approaches for forecasting FCFF and FCFE; compare the FCFE model and dividend discount models; explain how dividends, share repurchases, share issues, and
changes in leverage may affect future FCFF and FCFE; evaluate the use of net income and EBITDA as proxies for cash
flow in valuation; explain the single-stage (stable-growth), two-stage, and
three-stage FCFF and FCFE models and select and justify the
appropriate model given a company’s characteristics; estimate a company’s value using the appropriate free cash flow
model(s); explain the use o