# Generate Insights Based on Document and Internet Search Using AI21 Labs models.

A key issue frequently faced by enterprise companies is the ability to generate insights from documents. These insights freqeuently come in two forms.

1. **Summarization of the underlying document**. This tends to focus on summarizing the insights of the document itself.
2. **Key Insights based on other pertinent information**. This is more challenging, and involves taking the text in the context of a corpus of other knowlege, such a propriatary datasets (e.g. in Amazon Kendra, databases etc.) or a more general knowledge of the field.


This notebook shows how you can use AI21 Labs models to both produce a summary of content, but also to combine it with other knowledge. In this notebook, we will also use AI21 models to generate queries, which are passed to an internet search. 

An outline of the approach for this is shown below:

![below](./img/1.png)

In many production use cases, multiple searches of data sources (e.g. internet search, private corpuses) would be done; but in this notebook, we will only do quieries of the public internet.

This notebook shows how to use AI21 Labs models to generate a report, which includes both a summary of the original text, as well as key insights from other data.




### Input data
First we will get an input article about issues facing the banking industry. The text of the article is from [here](https://internationalbanker.com/banking/key-risks-shaping-the-banking-industry/).

In [35]:
#
article='''
KEY RISKS SHAPING THE BANKING INDUSTRY
November 16, 2023

By Lee Doyle, Global Head Banking Industry, Ashurst LLP

 

An accelerated period of change has long been on the horizon, and institutions are now facing an inflection point in the banking industry’s evolution. The digital revolution in retail banking has largely happened, and the interest is now in how consumers engage with this technology. This is not the case in corporate banking—digitisation is continuing at pace and will continue to do so over the next few years. Several megatrends are driving these developments, which will converge across retail and corporate banking in the coming years.

Retail-banking developments are clear for everyone to see on a day-to-day basis. The scale of big banking changes, however, is likely to only become evident within the next 12 to 18 months. As advisors to the world’s largest financial institutions, we at Ashurst are fortunate to work alongside them in dealing with these issues. We’re lucky to be behind the curtain. Goldman Sachs’ ground-breaking Digital Asset Platform (GS DAP) exemplifies what’s coming. In time, these fundamental changes will mean that corporates can access capital markets without layers of processes and bureaucracy—just as consumers now connect with retail markets.

Investor considerations are being matched by other factors, including the increasing focus on ESG (environmental, social and governance) compliance, the “fight” for talent and reshaping of the workforce, and the emergence of generative AI (artificial intelligence) and the fundamental changes it brings to our preconceptions and certainties of processes, controls and work-allocation norms.

These step changes in developments and potential uses of AI alongside human thought can’t be understated. One topic we certainly didn’t think would be driving so much change 18 months ago was the sector’s AI adoption. The genie is well and truly out of the bottle, though, and regulators and legislators will be playing catch-up for some time. The European Union (EU) is trying to legislate but is struggling to define the scope and breadth of AI rules, while the United Kingdom is attempting a sector-by-sector approach. The United States is currently seeking to define its approach to regulation. AI will perhaps move us towards more rather than less legislation.

Despite the multiple challenges, those banks that can successfully implement AI technology will benefit from large competitive advantages in terms of time and cost savings. My Risk Advisory colleagues always remind me that navigating the risk profiles that come with this is a daunting challenge. The black box of AI must be opened, and the recent Senior Managers and Certification Regime (SM&CR) in the UK has further complicated this.

With liability now landing on the shoulders of those at the top, senior executives must be able to not only understand how AI is being used in their businesses but also clearly explain its implications—a major challenge in a fast-moving area in which bank leaders are often far removed from the cutting edge of technology development.

Although AI is where the most far-reaching changes will come, ESG remains the number one boardroom megatrend for banks—now with the added complexity for global financial institutions of mixed messages from some political leaders and, in the US, a full-blown backlash against many ESG policies. The differences in approach in different countries and regions are the biggest challenges today and in the future. From London to Texas to Hong Kong, banks must not only comply with local legislations but also build effective strategies to cater to the firms’ global ESG objectives.

Sometimes, in the rhetoric, the views of investors are overlooked, and they are the banks’ ultimate stakeholders when you consider that their main purpose is to generate returns. Investors are not a homogenous group. However, understanding investors at both corporate and retail levels is a puzzle to solve. Some will take a longer-term view when ESG concerns are a priority, while others will target short-term financial gains. Balancing these perspectives must be a key focus for leaders.

The risk of failing to accommodate investors’ appetites for sound ESG credentials is substantial. The financial-services sector has been hit by a stream of greenwashing allegations over the past two years, with banks accounting for 70 percent of greenwashing, according to RepRisk, a firm specialising in ESG data. Avoiding these accusations and ensuring that products advertised as environmentally friendly meet this standard imposes a major new burden on banks, as complex supply chains must be carefully scrutinised—for example, ensuring that the investments advertised are genuinely sustainable.

Closely connected to ESG is another of the megatrends to which the banking industry has had to adapt rapidly: the net-zero transition. From finding sufficient investments to fund renewable energy worldwide and preparing for the impacts of climate change, such as rising sea levels and more frequent extreme weather events, the banking industry will need to adapt radically to the “E” (environment) in ESG. Industry leaders must make complex decisions, balancing numerous commercial and regulatory concerns, if they are to play the part that governments, and increasingly shareholders, demand in financing the net-zero transition.

Among the risks are complex supply chains, making vetting a product’s environmental credentials challenging and producing potential competition issues if firms coordinate their approaches to green products. The former will burden banks’ compliance professionals, especially as national and transnational jurisdictions become increasingly strict about how environment-friendly investments are defined. For example, would using steel produced with petroleum- or coal-based needle coke to make an electric vehicle count against its green credentials? Would a bank offering this as part of a carbon-neutral investment be required to include a carbon-negative investment to balance this out? These questions still have no clear answers.

Competition, too, presents a formidable challenge for banks. Competition regulations and enforcements have arguably not kept up with the need for companies to collaborate to ease the transition to net zero. Other industries have fallen foul of competition authorities over climate-related collaboration, as seen in a 2021 European Commission (EC) decision, which fined several auto manufacturers $875 million over their alleged collusion in developing emissions-cleaning technology for diesel cars. This willingness to probe competition issues, combined with the array of subsidies that Western national governments offer environment-focused companies, makes green financing an area in which banks must rapidly adjust to changing government regulations and competition enforcement.

Never have bank senior executives needed such an array of skills and abilities to deal with these fundamental issues, and never have they had to operate in a market so influenced by governments and regulators. The Global Financial Crisis (GFC) brought in necessary regulatory oversight, and subsequent conduct and liquidity issues in some areas have led to further regulatory scrutiny. Already one of the most regulated sectors globally, banking is unlikely to see another wave of regulation. Rather, one should observe to where the regulatory focus shifts. Shadow banking and private credit have grown enormously in recent years, and there are signs in both Europe and the US that, along with new technology, this is where the eyes of regulators and legislators are beginning to turn. It is unlikely this will occur without a fight.
'''

## Install Libraries as needed.

In [None]:
!pip install  ai21
!pip install --upgrade duckduckgo_search #reinstall duck duck go if there are any issues
!pip install --upgrade --force-reinstall duckduckgo_search
!pip install python-docx
!pip install docx2pdf



### Generate Summary of Article

The following section involve processing a specific article titled "KEY RISKS SHAPING THE BANKING INDUSTRY" using AI models. The goal is to generate a concise summary that captures the essence of the article, focusing on aspects relevant to the banking sector and AI's impact on it. This uses AI21 summarization model 

Note that when invoking AI21 models, we will be putting the API key in plain text in the notebook. While acceptable for testing your notebooks, in a production settings, API keys should be stored as secure secrets, such as with [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/).

In [37]:
# Get AI21 API key from environment variable or plain text; depending on where you store it.
#ai21.api_key=''
import json
import os
os.environ["AI21_LOG_LEVEL"] = "DEBUG"

from ai21 import AI21Client
client = AI21Client()
#uncomment the following if you are not using environment variables for API key.
#client = AI21Client(api_key='')


In [38]:
response = client.summarize.create(
  source=article,
  source_type="TEXT" 
)

summary=response.summary

_Note:_ If you are using AI21 SaaS, you can pass URLs as well as text, and the API will do all the parsing as part of the summarization process.

In [39]:
'''
response = client.summarize.create(
  source="https://internationalbanker.com/banking/key-risks-shaping-the-banking-industry",
  source_type="URL" 
)
summary=response.summary
'''

'\nresponse = client.summarize.create(\n  source="https://internationalbanker.com/banking/key-risks-shaping-the-banking-industry",\n  source_type="URL" \n)\nsummary=response.summary\n'

## Create Search Queries
Building on the article summary, we now utilize AI21 to generate specific search queries. These queries are tailored to explore the article's implications for United States Federal Reserve Bank. This step is crucial for gathering focused insights and understanding the potential impact on United States Federal Reserve Bank's operations and strategies.

Note that this step allows for two essential components: 1. Finding new relevant topics and content outside the article and 2. Make insights specific to United States Federal Reserve, rather than generic about the Banking industry.

In [40]:
prompt_2=f'''
You are to take on the persona of a business analyst who works for United States Federal Reserve Bank, who can interact with a search engine. 
Given the following article summary, come up with 5 different search terms/queries about how this may impact United States Federal Reserve bank. Print each search term/query on a newline. 
Each query is to be surrounded by {{}}
Based on the summary below, generate search queries that are relevant for specifically for United States Federal Reserve Bank. 
You must ensure your queries are United States Federal Reserve specific, but also relate to the summary presented below.

Some examples of relevant United States Federal Reserve queries are:
United States Federal Reserve Bank profit forecasts.

Summary:
{summary}

Remember, you want to tailor these queries to be relevant about United States Federal Reserve.
'''

response = client.completion.create(
    model="j2-ultra",  # You can choose from various models like j2-light, j2-mid, j2-ultra
    prompt=prompt_2,
    max_tokens=300,
    temperature=0.7
)

#print(response)
generated_text = response.completions[0].data.text

In [41]:
#Show the created quries.
generated_text
generated_text_l=generated_text.split("\n")
search_queries=[i.replace('{', '').replace('}', '').replace('"',"") for i in generated_text_l]
print(search_queries)

['digital banking revolution United States Federal Reserve Bank', 'AI technology banks United States Federal Reserve Bank', 'Esg banking United States Federal Reserve Bank', 'competition banks United States Federal Reserve Bank', 'shadow banking United States Federal Reserve Bank']


### Perform Internet Search

With our AI-generated queries, we now delve into internet research using DuckDuckGo. This step aims to enrich our analysis with external data, offering a broader perspective on how the summarized content relates to United States Federal Reserve Bank's business environment and strategic planning.

Note that in this small example, we will only be using the snippets from the DuckDuckGo search; and not the full text of the articles.

**Note**: If the Duckduckgo API call fails due to an HTTP error, try reloading the notebook and reinstalling the duckduckgo library.



In [None]:
from duckduckgo_search import DDGS
import time
def search_duckduckgo(query):
    ddgs=DDGS()
    results = ddgs.text(query, max_results=5)
    return results

# Example usage
all_results_l=[]
for i in search_queries:
    query = i
    for i in range(0,3):
        try:
            search_results = search_duckduckgo(query)
            for result in search_results:
                all_results_l.append(f'''Result_Snippet\n{result['body']}\n''')
            break
        except Exception as e:
            print("Error searching. Retrying")
            time.sleep(3)
                


In [43]:
all_search_results_string="".join(all_results_l)

### Synthesize Insight Report
Now we will take both the summary of the orginal article, as well as the key insights from search, and synthesize them.

In [44]:
import ai21
prompt_2=f'''
You are to take on the persona of a business analyst who works for United States Federal Reserve Bank, who can interact with a search engine. Given the following Summary of an article, as well
as information from search results, you are to produce Key Risks section, which should be about 2 paragraphs, that to highlight any potential pitfalls or risks specifically to United States Federal Reserve Bank.

Please keep in mind that not all of the result snippets will be relevant.

Article Summary:
{summary}

Search Engine Snippets:
{all_search_results_string}
'''

response = client.completion.create(
    model="j2-ultra",  # You can choose from various models like j2-light, j2-mid, j2-ultra
    prompt=prompt_2,
    max_tokens=300,
    temperature=0.7,
)

#print(response)
generated_insight_text = response.completions[0].data.text

### Export Insights
We will export insights as a easy-to-read .docx file, as well as a Markdown file.

In [45]:
from docx import Document
from docx.enum.section import WD_SECTION
from docx.oxml import OxmlElement

from docx import Document
from docx.shared import Pt

def write_to_docx(filename, lines):
    # Create a new Document
    doc = Document()

    # Add 'Article Summary' as a styled paragraph
    p1 = doc.add_paragraph()
    run1 = p1.add_run("Banking Risks Analysis")
    run1.font.size = Pt(18)
    p1 = doc.add_paragraph()
    run1 = p1.add_run("Article Summary")
    run1.font.size = Pt(14)
    # Add the first line
    doc.add_paragraph(lines[0])

    # Add 'Key Insight' as a styled paragraph
    p2 = doc.add_paragraph()
    run2 = p2.add_run("Key Insight")
    run2.font.size = Pt(14)
    # Add the second line
    doc.add_paragraph(lines[1])

    # Save the document
    doc.save(filename)


# Example usage
file_name = 'Banking_Report.docx'
lines_to_write = [summary,generated_insight_text]
write_to_docx(file_name, lines_to_write)

def write_to_markdown(filename, lines):
    with open(filename, 'w') as md_file:
        # Add 'Banking Risks Analysis' as a heading
        md_file.write("# Banking Risks Analysis\n\n")

        # Add 'Article Summary' as a subheading
        md_file.write("## Article Summary\n")
        # Add the first line
        md_file.write(lines[0] + "\n\n")

        # Add 'Key Insight' as a subheading
        md_file.write("## Key Insight\n")
        # Add the second line
        md_file.write(lines[1] + "\n")

# Example usage
file_name = 'Banking_Report.md'
write_to_markdown(file_name, lines_to_write)

#print(f"File '{file_name}' has been created with the provided lines.")




The output of this may differ from run to run. Below is a sample output. 
Note that the `Summary` is a a summary of the article itself, with the `Key Insights` being much more specific to why the content is relevant to the Federal Reserve of the United States.
![below](./img/2.png)