# OpenAI driven search of ICWA Legistation
This uses the Western Australian [Strata Titles Act 1985](https://www.legislation.wa.gov.au/legislation/statutes.nsf/main_mrtitle_938_homepage.html)


# Initialisation

Now load the .env file to get the API keys in a secure way. The path should be the full path to the .env file. If this work it returns `True`

In [1]:
from dotenv import load_dotenv
load_dotenv() #get API keys

True

# Prepare legislation

The legislation is a word document that can be readily manipulated using the `docx` module.

In [2]:
import docx
import re
legislation_path = r'documents/Strata Titles Act 1985.docx'

Based on the following we see the document uses 4 levels of headings as follows
- Heading 2: The Parts of the legislation.
- Heading 3: Divisions
- Heading 5: These are used as to create sub-headings in the level 2 & 3 headings. 

The cover page and TOC are the first part of document and are discarded. 

In [25]:
headings = tuple(set( paragraph.style.name for paragraph in docx.Document(legislation_path).paragraphs if paragraph.style.name.startswith("Heading")))
print(sorted(headings))
toc = [": ".join([paragraph.style.name, re.sub(r"\s+", ' ', paragraph.text)]) for paragraph in docx.Document(legislation_path).paragraphs if paragraph.style.name.startswith("Heading")]
toc

['Heading 2', 'Heading 3', 'Heading 4', 'Heading 5']


['Heading 2: Part 1 — Preliminary',
 'Heading 5: 1. Short title',
 'Heading 5: 2. Commencement',
 'Heading 5: 3. Terms used',
 'Heading 5: 4. Notes and examples not part of Act',
 'Heading 5: 5. Act binds Crown',
 'Heading 2: Part 2 — Strata titles schemes',
 'Heading 5: 6. Legislative framework',
 'Heading 5: 7. Strata titles schemes',
 'Heading 5: 8. Freehold schemes and leasehold schemes',
 'Heading 5: 9. Lots — strata schemes and surveystrata schemes',
 'Heading 5: 10. Common property',
 'Heading 5: 11. Subdivision of land by strata titles scheme',
 'Heading 5: 12. Registration of strata titles scheme',
 'Heading 5: 13. Strata titles',
 'Heading 5: 14. Strata company',
 'Heading 2: Part 3 — Planning and development',
 'Heading 3: Division 1 — Planning approvals',
 'Heading 4: Subdivision 1 — Strata schemes',
 'Heading 5: 15. Subdivision approval of strata scheme',
 'Heading 5: 16. Application of Planning and Development Act',
 'Heading 4: Subdivision 2 — Surveystrata schemes',
 'He

## Chunk up the legislation

('Heading 1', 'Heading 2', 'Heading 3', 'Heading 4', 'Heading 5', 'Schedule ')

In [57]:
import re

def read_document_sections(file_path, n=5):
    '''Break document at headings up to level n (5) and return a plain text 
       document with paragraphs seperated by two newlines (\n\n)'''
    
    doc = docx.Document(file_path)
    skip_toc=True
    sections = []
    current_section = {'heading': "Document", 'level': 0, 'content': ""}

    for paragraph in doc.paragraphs:
        text = re.sub(r"\s+", ' ', paragraph.text)
        if paragraph.style.name.startswith(tuple(f"Heading {i+1}" for i in range(n))) or \
            paragraph.text.startswith(("Schedule", "Notes", "Defined terms")) or \
                re.search(r'^\d+\.', paragraph.text):
            #save old section 
            if current_section['heading'] or current_section['content']:
                sections.append(current_section)
            
            # and start a new section
            current_section = {'heading' : text,
                               #'level'   : int(re.search("Heading (\d+)", paragraph.style.name).group(1)),
                               'content' : text
                            }
        else:
            # join this paragraph text to prior ones in this section
            current_section['content'] = "\n\n".join([current_section['content'], text])

    # Add the last section
    if  current_section['heading'] or current_section['content']:
        sections.append(current_section)

    #Return list of sections
    return sections 

from langchain.schema import Document
def makeDocs(n):
    '''Break legistation by headings down to level n. This chunks up the 
       document to sizes chatGPT can digest while ensuring the clauses in
        the legislation are kept together '''

    return [Document(page_content = section['content'], metadata = {'title':section['heading']}) 
                for section in read_document_sections(legislation_path, n) ]


def counts(texts):
  '''Create some basic statistics on the corpus'''

  if len(texts) == 0:
    print("No texts")
    return

  charCounts = [len(text.page_content) for text in texts]
  wordCounts = [len(text.page_content.split()) for text in texts]
  print(f"There are {len(texts)} chunks\nAverage character count {sum(charCounts)/len(charCounts):.0f}\nAverage word count {sum(wordCounts)/len(wordCounts):.0f}")
  

In [68]:
chunk_H5 = makeDocs(5)[325:] #drop toc and title
counts(chunk_H5)

There are 380 chunks
Average character count 1343
Average word count 235


In [None]:

from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
chunk_H5_split = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=300).split_documents(chunk_H5)
counts(chunk_H5_split)

## Create the Pinecone database
Initialise the pinecode instance base on the API keys in .env. 

Depending on the user input, use the existing index or create a new one from the documents. Create a "similarity" document retriever based on the database.

In [71]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import pinecone 
import os

pinecone.init(
    api_key= os.environ.get('PINECONE_API_KEY') ,  # find at app.pinecone.io
    environment=os.environ.get('PINECONE_ENV')     # next to api key in console
)

def create_namespace(namespace, documents, embeddings):
    
    INDEX = os.environ.get('INDEX')
    
    if INDEX not in pinecone.list_indexes():
        print(f"Creating new index {INDEX}")
        pinecone.create_index(INDEX, dimension=1536)
    
    pinecone.Index(INDEX).delete(namespace=namespace, deleteAll=True)
    
    return Pinecone.from_documents(documents, 
                                   index_name=os.environ.get('INDEX'), 
                                   namespace=namespace, 
                                   embedding=embeddings)

db5 = create_namespace("SCA_H5", chunk_H5, 
                       embeddings= OpenAIEmbeddings())

# Create and test the Alice

Define a Q&A chain that 'stuffs' the retrieved chunks into the prompt to provide context. Using OpenAI deterministic (temperature=0) model `gpt-3.5-turbo`.  According to OpenAI 'gpt-3.5-turbo' is the 
> Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003.

In [72]:
import pinecone
import os

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

pinecone.init(
    api_key= os.environ.get('PINECONE_API_KEY') ,  # find at app.pinecone.io
    environment=os.environ.get('PINECONE_ENV')     # next to api key in console
)

db = Pinecone.from_existing_index(index_name=os.environ.get('INDEX'), 
                                   namespace='SCA_H5', 
                                   embedding=OpenAIEmbeddings())

retriever=db.as_retriever(search_type="similarity", 
                            search_kwargs={"k":3})

#retriever=db.as_retriever(search_type="similarity_score_threshold", 
#                          search_kwargs={"k":3, "score_threshold":0.5})

qa = RetrievalQA.from_chain_type(
                    llm=ChatOpenAI(temperature=0), # uses 'gpt-3.5-turbo' which is cheaper and better 
                    chain_type="stuff", 
                    retriever=retriever, 
                    return_source_documents=True)

In [73]:
from IPython.display import display, Markdown

import textwrap

def wrap_text_preserve_newlines(text, width=110):

    lines = text.split('\n')
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]
    wrapped_text = '\n'.join(wrapped_lines)
    return wrapped_text

def process_llm_response(llm_response, sources=True, content=False):
    display(Markdown(wrap_text_preserve_newlines(llm_response['result'])))
    if sources:
      display(Markdown('\n\nSources:'))
      for source in llm_response["source_documents"]:
        display(Markdown(f"{source.metadata.get('title')} ({source.metadata.get('score')})"))
        if content:
          display(Markdown(f'{wrap_text_preserve_newlines(source.page_content)}'))

def Simon(query, sources=True, content=False):
  
  instructions = '''You are an expert in Western Australia "Strata Titles Act" 
                    answering questions from a citizen. Only use information provided to you from the 
                    legislation below. If you do not know say "I do not know"'''
  result = qa({"query": f'{instructions} \n\n {query}'})
  process_llm_response(result, sources=sources, content=content)
  return (result)

In [74]:
result = Simon("What does the legisltation cover", sources=True, content=False)

The legislation being referred to is the Strata Titles Act 1985 in Western Australia. This legislation covers
the ownership and management of strata-titled properties, which are properties that are divided into
individual units or lots that are owned by different individuals or entities. The Act sets out the legal
framework for the creation, registration, and management of strata schemes, including the rights and
obligations of owners, strata companies, and other parties involved in the management of strata-titled
properties.



Sources:

Notes (None)

39. Strata Titles Act 1985 (None)

63. Strata Titles Act 1985 (None)

In [75]:
result = Simon("I am an owner in a 250 lot complex. I want to renovate. Do I need approval? How long do I need to wait", 
               sources=True, content=True)

Yes, you need approval to renovate your lot. You must not alter the structure of the lot without giving to the
strata company, not later than 14 days before commencement of the alteration, a written notice describing the
proposed alteration. If an application is made to a strata company for approval of the structural alteration
of a lot, voting on the application must open within 35 days after the application is received. If voting on
the application does not open as required, the applicant may convene a general meeting, in the same manner as
nearly as possible as that in which meetings are to be convened by the council, and submit the application to
that meeting.



Sources:

89. Approvals and objections to structural alterations (None)

89. Approvals and objections to structural alterations

 (1) An application for the approval of the structural alteration of a lot must set out details of the
proposal and such other information as may be prescribed.

 (2) If an application is made to a strata company under subsection (1), voting on the application must open
within 35 days after the application is received (the allowed period).

 (3) If voting on the application does not open as required by subsection (2), the applicant may convene a
general meeting, in the same manner as nearly as possible as that in which meetings are to be convened by the
council, and submit the application to that meeting.

 (4) Despite subsection (2), a council may submit an application to a general meeting convened by the council
after the allowed period if that meeting is held before a meeting is convened by the applicant under
subsection (3).

 (5) The owner of a lot or the owner of a leasehold scheme is taken to have approved the structural alteration
of a lot as set out in an application for approval served on the owner if —

 (a) the owner serves on the applicant written consent to the alteration; or

 (b) the owner has not, at the end of 42 days after being given the application, made a written objection to
the alteration; or

 (c) for a strata scheme, the owner has made such an objection but the objection does not specify the grounds
of the objection or the grounds specified are not grounds on which the owner may object under section 87.

 (6) A strata company is taken to have approved the structural alteration of a lot as set out in an
application for approval served on the strata company if —

 (a) the strata company serves on the applicant written consent to the alteration expressed by resolution
without dissent; or

 (b) despite section 87(2) —

 (i) the strata company has not, at the end of 77 days after being given the application, made a written
objection to the alteration; or

 (ii) for a strata scheme, the strata company has made such an objection but the objection does not specify
the grounds of the objection or the grounds specified are not grounds on which members of the strata company
may object under section 87.

 [(7) deleted]

 [Section 89, formerly section 7B, inserted: No. 58 of 1995 s. 13; amended, renumbered as section 89 and
relocated: No. 30 of 2018 s. 11 and 84.]

13. Notice of alteration to lot (None)

13. Notice of alteration to lot

 An owner of a lot must not alter or permit the alteration of the structure of the lot except as may be
permitted and provided for under the Act and the bylaws and in any event must not alter the structure of the
lot without giving to the strata company, not later than 14 days before commencement of the alteration, a
written notice describing the proposed alteration.

 [Bylaw 13 inserted: No. 58 of 1995 s. 88(5); amended: No. 30 of 2018 s. 111.]

85. Person to act for lot owner in certain circumstances (None)

85. Person to act for lot owner in certain circumstances

 (1) If the owner of a lot in a strata titles scheme cannot be located after reasonable enquiry or the owner
lacks the capacity to vote or consent to a matter under this Act, an application for an order under this
section may be made to the Tribunal by the strata company or a person who the Tribunal considers has a proper
interest in the matter.

 (2) The Tribunal may, on an application under this section, by order —

 (a) dispense with the requirement for the owner to vote or consent on a particular matter; or

 (b) authorise the Public Trustee under the Public Trustee Act 1941 or another specified person (with that
person’s consent) to exercise all or specified powers of the person under this Act as the owner of a lot.

 [Section 85 inserted: No. 30 of 2018 s. 83.]

Division 2 — Structural alteration of lots

 Note for this Division:

 This Division does not derogate from the requirement for subdivision approval if the definition of a lot is
modified.

 [Heading and Note inserted: No. 30 of 2018 s. 83.]

In [None]:
result = Simon("I am an owner in a 250 lot complex. I want to own a pet. Do I need approval? How long do I need to wait", 
               sources=True, content=True)

# Experimental 

This is not yet working

In [None]:

from langchain.chains.question_answering import load_qa_chain
from langchain.chains import AnalyzeDocumentChain
from langchain.chat_models import ChatOpenAI


qa_chain = load_qa_chain(llm=ChatOpenAI(temperature=0.0), #, uses 'gpt-3.5-turbo' which is cheaper and better 
                         chain_type="map_reduce")

qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)
doc =  docx.Document(legislation_path)
text_doc = "\n\n".join([para.text for para in doc.paragraphs])
qa_document_chain.run(input_document=text_doc, question="What is the purpose of this legislation")