# **Financial Docu Summarizer**

## Tutorial

**Video Tutorials**
1. Setup: [Tutorial_API_Key.mp4](https://csciitd-my.sharepoint.com/:v:/g/personal/ph1230116_iitd_ac_in/EWYUHVBmZ9ZNvo2f9S_6BokBkjJwuZhjdxAeXEIMPPer_A?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=D96hbu)
2. File Upload: [Tutorial_File_Upload.mp4](https://csciitd-my.sharepoint.com/:v:/g/personal/ph1230116_iitd_ac_in/EX_DeHmvDl9KtpEC-DzMbZoB057CR4trmJh9LmQ0nNA2Ew?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=Zf7Z8d)
3. Usage: [Tutorial_Usage.mp4](https://csciitd-my.sharepoint.com/:v:/g/personal/ph1230116_iitd_ac_in/EbRXjNUpMRlAs5qwgt5DHeMBT-Ssu8SbeiRjlNg3rqOxzA?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=6Rx0lF)

---

**Setup**
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey/) and click the button labelled "Create API key"
2. Select any option in the "Search Google Cloud Projects" box, the API key will now be visible on the page in a table
2. Copy the API key (from the API key column), you may close the AI studio tab now
3. Click the key icon in the left sidebar (labelled secrets), and click "+ Add New Secret"
4. Name it `GOOGLE_API_KEY` (left textbox) and paste your API key in the Value/right textbox
5. Make sure to check the "Notebook Access" radio button (left of the Name field)

The aforementioned steps are one-time, once set-up, you may simply skip it 2nd time onwards.

---

**File Upload**
1. Click the file folder in the left sidebar (below the key/secrets icon)
2. Click the upload file button (left-most icon in the menu bar of the newly opened "Files" tab) (If you have a URL, save the PDF to your device first)
3. Select & upload the file from the file explorer pop-up menu
4. After the file uploads (may take several seconds), right click it in the Files tab and rename it to `report.pdf` (if you don't see the PDF, click the refresh button, right of the Upload File button)

---

**Usage**
1. In the menu bar, under the Runtime menu, click "Run All" (or just press Ctrl+F9)
2. Scroll to the bottom, the response text would be displayed under the 2nd-to-last cell (may take upto a minute to generate)
3. Ask further questions in the input textbox visible at the end of the last cell
4. End the conversation by inputting `QUIT` in the chat textbox

---


## Back-End Code
_collapse this section or scroll past it_

In [None]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
# Imports

# PDF pre-processing
from PyPDF2 import PdfReader as reader
from tqdm   import tqdm

# Gen AI API
import google.generativeai as genai

# Colab stuff
from google.colab    import userdata
from IPython.display import display, Markdown

In [None]:
# API key extraction
api_key=userdata.get("GOOGLE_API_KEY")
genai.configure(api_key=api_key)

In [None]:
# Document reader
def read_docu(pdf_path: str = 'report.pdf', give_token_estm: bool = True):
    '''Converts PDF to python string
    pdf_path : path to the PDF to be scanned
    give_token_estm : whether to print an estimate for the token count'''

    with open(pdf_path, 'rb') as pdf_file:
        reader_obj = reader(pdf_file)
        pages_obj  = reader_obj.pages

        docu_text  = ''

        for page in pages_obj:
            docu_text += page.extract_text().strip()
            docu_text += '\n'

    if give_token_estm:
        word_count_estm = docu_text.count(' ') + docu_text.count('\n') + 10
        print(f"Input token count estimate : {round(1.48*word_count_estm)}\n")

    return docu_text

In [None]:
# API Caller/Summarizer
def summarize(input_text: str, ai_model: genai.GenerativeModel, give_tokens_used: bool = True):
    '''Summarizes input_text by calling ai_model'''

    summary = ai_model.generate_content(input_text)

    if give_tokens_used:
        print(f"Tokens consumed : {summary.usage_metadata.prompt_token_count}")
        print(f"Summary length  : {len(summary.text.split())} words\n")

    return summary.text

In [None]:
# Model Config
model_sys_prompt = '''\
You shall be given the transcript of a company's financial document. The document can be a third-party report, an annual company filing, or a questionnaire.
Start with a title to your summary, followed by one line about the nature of the document.

Your primary task is to summarize it, providing details about the following topics:
1. Market Performace - Any key financial figures (eg - EBITDA, revenue, cagr, etc.)
2. Key Charts/Tables - The important tables/graphics. Do not copy-paste the data, simply mention the heading & give a sentence about it.
3. Company Info - Background info about the history or nature of the company. Do not give excessive technical details here.
4. Business Model - A quick summary of the company's business model.
5. Strategic Initiatives - Major moves that the company has made recently, or have planned.

Omit sections when details pertaining to them are unavailable in the document.
Also note that the first and last few pages might be largely irrelevant.

Your secondary task is to add an "Analysis" section to your report with the following topics:
1. Risk Analysis - Point out the risks with the company's strategic initiatives.
2. Competitive Analysis - Point out how you think the company would fare in the current market (give a high level overview of the market if info unavailable).

DO NOT make up information.'''

model_config = genai.GenerationConfig(temperature=0.2, max_output_tokens=5000, response_mime_type='text/plain')

model_name  = 'gemini-1.5-pro'

## Main

Comprised of 2 cells:
- The first generates a summary of your uploaded document
- The second allows you to further question the AI

In [None]:
# Summary generation

# Pre-processing
docu_text = read_docu('report.pdf')
token_count_estm = round(1.48*(docu_text.count(' ') + docu_text.count('\n') + 10))

if token_count_estm >= 32000:
    model_name = 'gemini-1.5-flash'


# Model Setup
model = genai.GenerativeModel(  model_name=f'models/{model_name}',
                                generation_config=model_config,
                                system_instruction=model_sys_prompt )


# Execution
summary = summarize(docu_text, model)

display(Markdown(summary))

Input token count estimate : 38703

Tokens consumed : 36773
Summary length  : 922 words



## Generative AI: Too Much Spend, Too Little Benefit?

This is a Goldman Sachs Global Investment Research report exploring the economic impact of generative AI and the potential for returns on the massive investments being made in the technology. 

**Market Performance:**

* **AI Capex:** Tech giants and other companies are expected to spend over $1 trillion on AI capex in the coming years.
* **Nvidia Stock:** The stock of Nvidia, the company currently reaping the most benefits from AI, has sharply corrected.
* **Productivity & GDP Growth:** Estimates for the impact of AI on productivity and GDP growth vary widely. Goldman Sachs estimates a 9% increase in productivity and 6.1% increase in GDP over the next decade, while MIT's Daron Acemoglu forecasts a much smaller impact of 0.5% and 0.9% respectively.

**Key Charts/Tables:**

* **AI Adoption Rates:** Charts showing the share of US firms using AI by sector, highlighting the higher adoption rates in technology industries and other digitally-enabled fields.
* **AI-Related Investment:** Charts showing the log index of AI-related investment in software and hardware, indicating that these investments are not yet visible in national accounts data.
* **US Power Demand Growth:** Charts showing the projected growth in US power demand, with data centers accounting for a significant portion of the increase.
* **EU-27 Power Demand:** Charts showing the projected growth in EU-27 power demand, with AI data centers and electrification expected to boost demand by over 40%.
* **TSMC's CoWoS Capacity:** Chart showing the projected growth in TSMC's CoWoS capacity, indicating that the packaging bottleneck is expected to ease in the coming years.
* **Utilities Sector PEG Ratio:** Chart showing the Utilities sector PEG ratio, which is well below its historical average.

**Company Info:**

* **Goldman Sachs:** A global investment banking, investment management, and brokerage firm.
* **Nvidia:** A leading manufacturer of graphics processing units (GPUs) that are essential for AI systems.
* **Microsoft:** A tech giant that is investing heavily in AI and cloud computing.
* **Amazon:** A tech giant that is investing heavily in AI and cloud computing.
* **Alphabet:** The parent company of Google, which is investing heavily in AI.
* **Salesforce:** A cloud-based software company that is investing heavily in AI.
* **ServiceNow:** A digital workflow software company that has reported significant efficiency gains from AI technology.
* **Cloverleaf Infrastructure:** A company that develops strategies to help utilities unlock new grid capacity.
* **AEP:** One of the largest US electric utility companies.
* **EirGrid:** A state-owned power operator in Ireland.
* **Talen Energy:** A company that operates unregulated nuclear plants.
* **Meta:** A tech giant that is investing heavily in AI.
* **TSMC:** A leading semiconductor manufacturer.
* **AMD:** A leading manufacturer of CPUs and GPUs.

**Business Model:**

* **Goldman Sachs:** Provides investment banking, investment management, and brokerage services.
* **Nvidia:** Manufactures GPUs that are essential for AI systems.
* **Microsoft, Amazon, Alphabet:** Provide cloud computing services and are investing heavily in AI.
* **Salesforce, ServiceNow:** Provide cloud-based software solutions and are incorporating AI into their products.
* **Cloverleaf Infrastructure:** Develops strategies to help utilities unlock new grid capacity.
* **AEP:** Provides electricity to customers in the US.
* **EirGrid:** Operates the electricity grid in Ireland.
* **Talen Energy:** Operates unregulated nuclear plants.
* **Meta:** Operates social media platforms and is investing heavily in AI.
* **TSMC:** Manufactures semiconductors.
* **AMD:** Manufactures CPUs and GPUs.

**Strategic Initiatives:**

* **AI Investment:** Tech giants and other companies are investing heavily in AI infrastructure, including data centers, chips, and other AI-related technologies.
* **Power Grid Expansion:** Utilities are investing in expanding the power grid to meet the growing demand from AI and other sources.
* **Renewables Growth:** Europe is expected to add nearly 800 gigawatts of wind and solar power over the next 10-15 years.
* **Electrification Compounders:** Utilities that focus on power grids and renewables are expected to benefit from the growing demand for power.

**Analysis:**

**Risk Analysis:**

* **AI Hype:** The current enthusiasm around AI could lead to overinvestment in the technology, with some projects failing to deliver on their promises.
* **Power Crunch:** The US power grid is not prepared for the coming surge in demand from AI and other sources, which could lead to a painful power crunch.
* **Chip Shortages:** Shortages of key components, such as HBM and CoWoS, could constrain AI growth in the coming years.
* **Valuation Risk:** The valuations of AI beneficiaries are already high, and any slowdown in economic growth or disappointment in AI's performance could lead to a decline in stock prices.

**Competitive Analysis:**

* **AI Arms Race:** Tech giants are engaged in an AI arms race, investing heavily in the technology to gain a competitive advantage.
* **Data Center Growth:** The expansion of data centers is expected to drive significant growth in power demand, particularly in regions with cheap and abundant baseload power.
* **Electrification:** The electrification of transportation, buildings, and other sectors is expected to further increase power demand.
* **Utilities:** Utilities are well-positioned to benefit from the growing demand for power, but they face challenges in securing funding for grid expansion and dealing with regulatory constraints.

**Overall, the report highlights the significant potential of generative AI, but also points out the risks associated with the massive investments being made in the technology. The report suggests that AI's ability to deliver on its promises will depend on a number of factors, including the development of killer applications, the ability to overcome chip shortages and power constraints, and the willingness of investors to continue supporting the AI arms race.** 


In [None]:
# Further Prompting

chat_sys_prompt = (
    'In a previous conversation, you were given a big financial document and were asked to produce a summary.'
    ' You came up with this:\n'
    f'{summary}'
    '\nYou will now be asked follow up questions about this summary.'
    '\nDo not use more than 1 emoji per message.'
)

chat_model = genai.GenerativeModel( model_name='models/gemini-1.5-pro',
                                    generation_config=model_config,
                                    system_instruction=chat_sys_prompt  )

chat = chat_model.start_chat()


display(Markdown("Type `QUIT` to end the conversation\n"))
display(Markdown("## Cross-Questioning Regarding The Generated Summary"))

print()
msg = input('Message: ')

while msg.upper() != 'QUIT':

    print("\nAI Response:")

    try:
        response = chat.send_message(msg)
    except:
        print('The Gemini API free quota is 2 API calls/min; kindly wait a bit before sending messages.')
        print('You may try resending your previous message.')

    display(Markdown(response.text))
    msg = input('\nMessage: ')

print("Conversation terminated. Re-run cell to initiate new conversation.")

Type `QUIT` to end the conversation


## Cross-Questioning Regarding The Generated Summary


Message: quit
Conversation terminated. Re-run cell to initiate new conversation.
