In [1]:
#reloads modules before executing user code
%load_ext autoreload
%autoreload 2

In [3]:
import sys
!{sys.executable} -m pip install -r ../requirements.txt



In [4]:
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ImportWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Laying the foundations

### Storage

We're going to use Redis as our database for both document contents and the vector embeddings. You will need the full Redis Stack to enable use of Redisearch, which is the module that allows semantic search - more detail is in the [docs for Redis Stack](https://redis.io/docs/stack/get-started/install/docker/).

To set this up locally, you will need to install Docker and then run the following command: ```docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest```.

The code used here draws heavily on [this repo](https://github.com/RedisAI/vecsim-demo).

After setting up the Docker instance of Redis Stack, you can follow the below instructions to initiate a Redis connection and create a Hierarchical Navigable Small World (HNSW) index for semantic search.

In [5]:
# Setup Redis and running?
from database import get_redis_connection

redis_client = get_redis_connection()

redis_client.ping()

True

In [5]:
# Optional step to drop the indexes if they already exists
from importer import NOTION_INDEX_NAME, WEB_SCRAPE_INDEX_NAME

redis_client.ft(NOTION_INDEX_NAME).dropindex()
redis_client.ft(WEB_SCRAPE_INDEX_NAME).dropindex()

ResponseError: Unknown Index name

### Ingestion

We'll load up our Notion pages into documents

In [8]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [1]:
from importer import import_notion_data

notion_index = import_notion_data()

secret_aRPUBYbCLmHaKfoO906jwdAr1u3XcHfFBld6U4tHzHF
[Document(text='🧪\xa0\nHypothesis\n\tAt large companies, teams find it challenging to locate the resources they need to build products quickly and effectively. Navigating complex legacy infrastructure and enterprise documentation is time-consuming, confusing , and just plain hard. Knowledge often resides with a single subject matter expert.\n\tBuilding software does not have to be complicated.  An AI Enterprise Knowledge Hub will speed up development time by bridging communication gaps and breaking down knowledge silos. \n\tAI Enterprise Knowledge Hub will aggregate internal company information, paired with data available publicly, to provide faster access to information and thus faster development times. \n\t\tPossible\n\t\t \n\t\tOutcomes\n\t\tReduce the presence of the “lone SME” or “bus count of 1”\n\t\tReduce duplicate efforts \n\t\tReduce the time it takes to bring new products to market\n\t\tBuild and deliver more modern product

In [2]:
# Optional
# Proves that the redis database contains data

from importer import number_of_stored_notion_docs
print(number_of_stored_notion_docs())

134


In [3]:
# set Logging to DEBUG for more detailed outputs
query_engine = notion_index.as_query_engine()
response = query_engine.query("Why are we building the AI Enterprise Knowledge Hub?")
response.response

'\nWe are building the AI Enterprise Knowledge Hub to reduce the presence of the lone SME or bus count of 1, reduce duplicate efforts, reduce the time it takes to bring new products to market, build and deliver more modern products and experiences, and reduce onboarding time for new developers. The AI Enterprise Knowledge Hub will aggregate internal company information, paired with data available publicly, to provide faster access to information and thus faster development times.'

Adding web scraped data to index

In [8]:
from importer import import_web_scrape_data

web_scrape_index = import_web_scrape_data()

[Document(text='\n\nA digital transformation partner focused on software delivery\n\n\n\n      var show = localStorage.getItem(\'show\');\n      if(show === \'true\'){\n        document.documentElement.classList.add(\'dark\');\n      } \n    \n\nhsjQuery = window[\'jQuery\'];\n\n\n\n\n\na.cta_button{-moz-box-sizing:content-box !important;-webkit-box-sizing:content-box !important;box-sizing:content-box !important;vertical-align:middle}.hs-breadcrumb-menu{list-style-type:none;margin:0px 0px 0px 0px;padding:0px 0px 0px 0px}.hs-breadcrumb-menu-item{float:left;padding:10px 0px 10px 10px}.hs-breadcrumb-menu-divider:before{content:\'›\';padding-left:10px}.hs-featured-image-link{border:0}.hs-featured-image{float:right;margin:0 0 20px 20px;max-width:50%}@media (max-width: 568px){.hs-featured-image{float:none;margin:0;width:100%;max-width:100%}}.hs-screen-reader-text{clip:rect(1px, 1px, 1px, 1px);height:1px;overflow:hidden;position:absolute !important;width:1px}\n\n\n\n\n\n\n\n  \n  .cards_galle

In [9]:
# Optional
# Proves that the redis database contains data

from importer import number_of_stored_web_scrape_docs
print(number_of_stored_web_scrape_docs())

184


In [12]:
query_engine = web_scrape_index.as_query_engine()
response = query_engine.query("What are the values at Focused Labs?")
response.response

"\nThe values at Focused Labs are: delivering best-in-class technology, finding the best practical solutions, continuously delivering software, collecting data, earning teams' trust, proving the value of their approach, building the solutions needed now, and creating a DevOps culture."

In [1]:
from importer import compose_graph

graph = compose_graph()

In [6]:
# Optional
# Proves that the graph is built

query_engine = graph.as_query_engine()
response = query_engine.query("What are the Focused Labs values?",)

print(str(response))
print(response.get_formatted_sources())


The Focused Labs values are: agility, user-centered design, lean product, continuous delivery, data collection, trust, DevOps culture, Love your Craft, Listen First, Learn Why, and Empathy & Collaboration.
> Source (Doc id: 7a39395f-509c-4333-9a07-b5d40b393a92): 
The values at Focused Labs are: agility, user-centered design, lean product, continuous delivery...

> Source (Doc id: 8856048e-6810-478d-861e-c18f5a1a0c97): 
The Focused Labs values are: Love your Craft, Listen First, Learn Why, and Empathy & Collaborati...

> Source (Doc id: notionfocusedlabsdocs_06d64fc8-cff7-4f90-8186-5f575a85a036): page_id: 76d816d82434423d8fbec83a3979d245

 2;     nThe values at Focused Labs are: agility, user...

> Source (Doc id: notionfocusedlabsdocs_b225dcd7-462b-4160-8263-636e75fad9b0): page_id: 76d816d82434423d8fbec83a3979d245

Focused Labs;     3;     nThere are three software eng...

> Source (Doc id: webscrapefocusedlabsdocs_7df97571-7248-4b83-b8d5-371b417acd55): URL: https://focusedlabs.io/abo

# Create langchain agent

In [2]:
from agent_with_tools import create_agent

agent = create_agent()

<llama_index.indices.composability.graph.ComposableGraph object at 0x173a4d810>


In [20]:

agent.run(input="Why are we building the AI Enterprise Knowledge Hub? ")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAI: We are building the AI Enterprise Knowledge Hub to provide a comprehensive and up-to-date source of information for businesses and organizations. The Hub will provide access to a wide range of data sources, including SEC 10-K documents, news articles, and other sources of information. Additionally, the Hub will provide AI-powered tools to help businesses and organizations analyze and interpret the data, allowing them to make more informed decisions.[0m

[1m> Finished chain.[0m


'We are building the AI Enterprise Knowledge Hub to provide a comprehensive and up-to-date source of information for businesses and organizations. The Hub will provide access to a wide range of data sources, including SEC 10-K documents, news articles, and other sources of information. Additionally, the Hub will provide AI-powered tools to help businesses and organizations analyze and interpret the data, allowing them to make more informed decisions.'