In [1]:
import chromadb
import langchain
import ollama
import os
import streamlit
import textract # appears broken for pdf on windows due to shell call
import pypdf



In [2]:
# set variables to point at documents
document_1_location = "C:/Users/justi/Documents/GitHub/SmallProjects/documents/Scotland's Wild Deer_ A National Approach 2015-2020 Priorities.pdf"
document_2_location = "C:/Users/justi/Documents/GitHub/SmallProjects/documents/deer-management-on-scotlands-national-forest-estate.pdf"

In [3]:
def extract_text(document_location: str) -> str:
    extension = os.path.splitext(document_location)[-1].lower()
    if extension == '.txt':
        with open(document_location) as document:
            full_text = document.read()
    elif extension == '.pdf':
        # https://pypdf.readthedocs.io/en/stable/user/extract-text.html
        reader = pypdf.PdfReader(document_location)
        number_of_pages = len(reader.pages)
        full_text = ""
        # extract text page by page
        for page in reader.pages:
            text = page.extract_text()
            full_text = full_text + text
        # remove linebreaks
        full_text = full_text.replace('\n', ' ')
    else:
        # try to use textract - warning may be buggy - certainly fails pdfs
        # https://textract.readthedocs.io/en/stable/
        full_text = textract.process(document_location)
    return full_text 

In [4]:
# set up our ChromaDB NON-PERSISTENT database
# https://docs.trychroma.com/getting-started
chroma_client = chromadb.Client()
# while we do not intend to persist this chrmoadb, lets use a careful creation function
collection = chroma_client.get_or_create_collection(name="comparison_db")

In [5]:
# extract our documents to plain text
text_1 = extract_text(document_1_location)
text_2 = extract_text(document_2_location)

In [6]:
# chunk our text to keep it inside the context window
# https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter/
text_splitter = langchain.text_splitter.RecursiveCharacterTextSplitter(
    chunk_size=2000, # characters not words
    chunk_overlap=100,
    length_function=len,
#     is_separator_regex=False,
)
text_1_chunks = text_splitter.split_text(text_1)
text_2_chunks = text_splitter.split_text(text_2)


In [7]:
# store the chunks of one document in the database
for index, chunk in enumerate(text_1_chunks):
    collection.add(
        documents=[chunk],
        ids=[str(index)]
    )
    print(index, chunk)
    
# stores a tarball at C:\Users\justi\.cache\chroma\onnx_models\all-MiniLM-L6-v2\onnx.tar.gz

0 Photography: Hamza Yassin – front cover; Laurie Campbell – composite logo image, p2, p6,  p14 /15, p39; Peter Cairns – composite logo image, p11, p35; Alastair MacGugan/SNH – composite logo image, p12, p17; Neil McIntyre – p2, p4, p10; Graham Downing – composite logo image, p16; Pete Moore/SNH – p19, p32; Kieran Dodds – p28; Lorne Gill/SNH – P36; Ian MacLeod – p34; Will Boyd-Wallis – p20.Scotland’s  Wild Deer A National Approach   Including 2015 – 2020 PrioritiesForeword Scotland’s Wild Deer bring benefits to a wide range of people. They support  jobs, are part of Scotland’s biodiversity and provide us with healthy meat and recreational and sporting opportunities. Wild deer have a special hold on the public and are the animal most frequently associated with Scotland. It is therefore no wonder that how deer are managed in Scotland elicits strong feelings. What is clear, and I trust agreed among all those involved in deer management, is that a healthy, diverse and robust environment is

7 The Code of Practice on Deer Management (Deer Code), developed by SNH in consultation with relevant interests, came into effect in 2012. The Deer Code allocates deer management responsibilities and recognises the public interest in deer management. Regulatory actions have been further clarified and simplified. Changes to the way we view, use and manage deer and the land are  likely to continue over the next five years. Diminished resources, greater recognition of the contribution made by nature to Scotland’s economy and a stronger focus on climate change will undoubtedly bring new challenges. Deer management, through its place in delivering better integrated land use, will play a pivotal role in society’s response to these challenges. Integrated Land Use –   Scottish Landuse Strategy WDNA   All organisations   whose activities have a bearing on deer  and their management Deer Code   Land Managers Direction for practical delivery BPG Deer Managers:   Specific technical and   practical

14 and deer welfare.  Conserve and enhance the cultural and historic environment   and the distinct identity, diverse character and special    qualities of Scotland’s landscapes  Deer management will contribute positively to the appearance and  condition of the landscape. Deer management will also contribute to the cultural and historic environment (and people’s enjoyment of it) through managing grazing and trampling impacts. d) e) f) 17 Contributing to sustainable economic development The following objectives seek to ensure that deer management contributes  to successful rural businesses and the socio-economic development of communities. Increase the economic opportunities associated with wild   deer There is a range of opportunities to add value to deer-related products  and activities, and to broaden the economic benefits associated with the deer resource. These include further developing the markets for stalking, photography and wildlife watching and the further development and app

21 and that it is followed by all relevant parties; • Raise awareness of the need for effective deer management:  Social and media perceptions about wild deer are varied.   Increasing knowledge and understanding among all involved in   deer management and the wider public will help address this; • Establish a shared, trusted and high quality knowledge base associated with wild deer: Wild deer have been the focus  of much research. However this is not always easy to put into practice. We need to do more to develop knowledge which is collectively owned, supported and trusted and ensure this is used to inform local action.  What needs to be done? Build on work to develop conflict management tools • Develop	conflict	management	support	for:   Open range red deer;   Woodland expansion;   Complex pattern of low ground landholdings. Ensure robust deer management planning and  implementation • Improve representation and membership of Deer Management   Groups at a local level; 14 All of these wi

28 Increase awareness of the interactions of all species of wild   deer with access and recreation in urban, woodland and open land settings. Improve understanding of deer impacts on agriculture  and forestry • Examine the capacity to effectively manage deer impacts in   woodlands and forests through competent and cost-effective deer management actions; • Develop a better understanding of the interactions of deer   on other land uses, including agriculture and forestry.32 Indicators • Number of reported deer poaching incidents; • Number of FTE in employment in the deer sector;  • Value of venison to the Scottish economy;  • Value of deer stalking to the Scottish Economy;  • Number	of	deer	related	road	traffic	collisions. 33 Training and wild deer welfare21  Challenges: • Ensure a strong skill base in deer management: There is a need  to	ensure	sufficient	capacity	to	manage	wild	deer,	especially	in the lowlands. Continuing to develop a culture of on-going  professional and personal deve

36 centre of our approach to nature. It helps us to recognise that our actions today can affect future generations. 2020 Challenge for Scotland’s Biodiversity The 2020 Challenge sets out how Scotland will meet its Biodiversity obligations. The principal aim of the 2020 Challenge is for Scotland to halt biodiversity loss. The 2020 Challenge for Scotland’s Biodiversity aims to: • Protect and restore biodiversity on land and in our seas,    and to support healthier ecosystems;  • Connect people with the natural world, for their health and    well-being and to involve them more in decisions about     their environment;  • Maximise	the	benefits	for	Scotland	of	a	diverse	natural			  environment and the services it provides, contributing to    sustainable economic growth. 45 Scottish Forest Strategy (including the Woodland  Expansion Target) The Scottish Forest Strategy (SFS) is a framework for taking forestry forward  into the future. It is built around a number of key themes and objectives.

In [13]:
# now lets perform a comparison between chunks document 2 and the most similar chunks of document 1 as found in the database
answers = []
for chunk in text_2_chunks:
    print("CHUNK+++++++++++++++++++++++++++++++++++++++++++++++++")
    print (chunk)
    # retrieve the most similar chunk from the database
    results = collection.query(
        query_texts=[chunk],
        n_results=1
    )
    print(results)
    # join the top 2 results?
    result = results['documents'][0][0] #+ " " + results[1]
    print("MATCHED-------------------------------------------")
    print(result)
    query = f"""
        Please describe 3 similarities and 3 differences between the following two passages. 
        Provide these answers as two numbered lists labelled SIMILARITIES and DIFFERENCES.
        
        Passage 1:
        {chunk}
        
        passage 2:
        {result}
        """
    response = ollama.chat(model='llama3:8b', messages=[
      {
        'role': 'user',
        'content': query,
      },
    ])
    answer = response['message']['content']
    print("ANSWER-------------------------------------------")
    print(answer)
    answers.append(answer)
    
    

CHUNK+++++++++++++++++++++++++++++++++++++++++++++++++
Deer Management   on the National Forest Estate Current Practice and Future Directions 1 April 2014 to 31 March 20172 Roe buck in an agricultural / woodland environment Front cover image credit: Kenny Muir - Glen AffricAn overview Distribution of deer on the National Forest Estate   The significance of deer on the National Forest Estate Protecting and enhancing the environmentBiodiversity and the natural heritageSupporting social well-beingSupporting sustainable economic development Integrated land management  Working with others Deer management groupsWorking with neighbours and stakeholdersLeading by exampleAn integrated and collaborative approach to deliveryLandscape-scale deer management - a partnership approachEcosystem services Our approach to professional standards The FES deer management teamProfessional standardsOperational guidanceForestry Commission firearms advisory groupIndustry best practiceDeer management qualificatio

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages mention the importance of managing deer populations and their impacts on the environment.
2. They both recognize the value of deer in ecosystem management and highlight the need to balance deer management with other environmental goals.
3. Both passages emphasize the need for a national approach to wild deer management, with Passage 1 referring to the National Forest Estate (NFE) and Passage 2 using the term "Scotland's Wild Deer: A National Approach".

**DIFFERENCES**

1. Tone: Passage 1 has a more formal and official tone, reflecting its role as a strategy document from the Forestry Commission Scotland. Passage 2 has a slightly more conversational tone.
2. Focus: Passage 1 focuses on deer management within the context of forestry and the National Forest Estate, while Passage 2 takes a broader approach, highlighting the benefits of w

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages mention the four main species of deer found in Scotland: Red Deer, Roe Deer, Sika Deer, and Fallow Deer.
2. They both provide information on the population sizes of these species, with estimates ranging from 360,000 to 777,000.
3. The passages discuss the importance of deer management in Scotland, highlighting the need to balance public and private interests.

**DIFFERENCES**

1. Passage 1 provides more specific information on the distribution of the different deer species, mentioning specific locations and Forest Districts.
2. Passage 2 focuses more on the ecology of the deer species, providing links to further guides for more information.
3. Passage 1 emphasizes the need to limit the spread and population build-up of Sika Deer, while Passage 2 presents a more neutral view, acknowledging the importance of balancing private and public

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss the importance of deer management in Scotland.
2. They both mention the need for a balance between deer impact and environmental condition, with high deer impacts being detrimental to certain ecosystems.
3. Both passages emphasize the importance of biodiversity conservation and maintaining healthy ecosystems.

**DIFFERENCES**

1. Tone: Passage 1 has a more technical tone, using phrases like "Monument Management Plan" and "Scotland's 'Land Use Strategy'", while Passage 2 is more general and focuses on public interest in deer management.
2. Focus: Passage 1 primarily discusses the impact of deer on woodland regeneration and habitat diversity, while Passage 2 focuses on the broader ecosystem context and the role of deer in maintaining healthy ecosystems.
3. Methodology: Passage 1 mentions specific methods for monitoring deer impa

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss wild deer management in Scotland, highlighting their importance to the country's ecosystem and economy.
2. They both mention the need for sustainable deer management practices that balance public safety, welfare of deer, and environmental concerns.
3. The passages agree on the significance of community involvement and partnerships in planning and implementing deer management strategies.

**DIFFERENCES**

1. Tone: Passage 1 has a more practical and solution-focused tone, discussing specific initiatives and programs implemented by the NFE to manage deer populations. Passage 2 has a more analytical and policy-oriented tone, focusing on the broader context of deer management in Scotland.
2. Focus: While both passages cover deer management issues, Passage 1 places greater emphasis on the National Forest Estate's (NFE) specific init

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages mention the importance of managing deer populations and their impacts on the environment.
2. They both highlight the need to consider the welfare of wild deer species in management interventions.
3. Both passages touch on the theme of climate change and its effects on ecosystems, with Passage 1 mentioning the impact of deer on woodland regeneration and Passage 2 emphasizing the importance of maintaining vegetation cover.

**DIFFERENCES**

1. Passage 1 focuses primarily on the National Forest Estate (NFE) and its forestry operations, while Passage 2 discusses broader ecological issues such as climate change, grazing, and non-native species.
2. Passage 1 emphasizes the economic benefits of forestry management, citing timber production figures and the importance of maintaining productivity, whereas Passage 2 does not explicitly mention e

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and conservation.
2. They both mention the importance of collaboration and cooperation among various parties involved in deer management.
3. Both passages highlight the need for effective planning and implementation to achieve sustainable deer management.

**DIFFERENCES**

1. Passage 1 focuses on the work of Deer Management Groups (DMGs) and the Association of Deer Management Groups (ADMG), while Passage 2 discusses broader objectives and indicators for wildlife conservation in Scotland.
2. Passage 1 provides specific examples of FES's representation on DMG committees and participation in projects, whereas Passage 2 outlines more general goals and strategies for deer management planning and implementation.
3. The tone of the two passages differs; Passage 1 appears to be a report or update on FES's activities, w

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and conservation, with a focus on Scotland.
2. They both mention the importance of sustainable deer management practices.
3. Both passages highlight the need for research, development, and training in deer management.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal and objective tone, while Passage 2 has a slightly more casual and promotional tone.
2. Focus: Passage 1 focuses on the Forestry Commission Scotland's (FCS) role in deer management and its commitment to sustainable practices, while Passage 2 focuses on the Wildlife Deer Network Association's (WDNA) objectives and strategies for deer management.
3. Content: Passage 1 provides more specific details about FCS's activities and commitments, such as financial support for initiatives and partnerships with other organizations. Passage 2 outlines WDNA'

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages mention the Code of Practice on Deer Management and its importance in deer management.
2. They both discuss the impact of roe deer expanding their numbers and distribution, particularly in urban areas.
3. The passages share a similar structure, starting with an introduction to deer management followed by a discussion of the challenges and opportunities related to deer management.

**DIFFERENCES**

1. Passage 1 focuses on the role of Wildlife Management teams and contractors in protecting the NFE's biological and cultural assets from damaging deer impacts.
2. Passage 2 shifts the focus to the Code of Practice on Deer Management and its implications for deer management, highlighting the need for integrated land use and a response to challenges like climate change.
3. While both passages mention roe deer adapting to new habitats and envi

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management in Scotland.
2. They both mention the importance of integrated land use and collaboration between organizations.
3. Deer population numbers and density are discussed in both passages.

**DIFFERENCES**

1. Passage 1 focuses on a specific project, The Great Trossachs Forest (TGTF) project, while Passage 2 provides a broader overview of deer management in Scotland.
2. Passage 1 emphasizes the partnership approach and the importance of best practice in deer management, while Passage 2 highlights the role of the Deer Code and regulatory actions.
3. Passage 1 discusses habitat condition assessments and regular cycles for assessing deer movement and density, whereas Passage 2 does not provide specific details on these topics.
CHUNK+++++++++++++++++++++++++++++++++++++++++++++++++
is employed only when every other opti

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages are related to deer management in Scotland, specifically discussing practices and initiatives for managing wildlife.
2. They both mention the importance of professional standards and skill bases in deer management.
3. Both passages highlight the need for ongoing development and training in deer management, as well as ensuring deer welfare.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal tone, discussing organizational structure and processes, whereas Passage 2 is more focused on specific goals and challenges in deer management.
2. Scope: Passage 1 provides an overview of the Forest Enterprise Scotland (FES) deer management team and their activities, while Passage 2 focuses on broader issues related to deer management, such as education, training, and welfare.
3. Content: While both passages touch on topics like deer management 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management, highlighting the importance of training, professional development, and guidance in achieving effective deer management.
2. They both mention the need for collaboration and cooperation with various stakeholders, including community groups, land management interests, and organizations.
3. Both passages emphasize the importance of scientific knowledge and research in informing deer management decisions.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal tone, focusing on internal guidance and training programs within the FES Wildlife Ranger team. Passage 2 has a more collaborative tone, emphasizing cooperation and shared knowledge between different stakeholders.
2. Focus: Passage 1 is primarily focused on the FES Wildlife Ranger team's activities and training programs, while Passage 2 discusses broader deer m

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and its importance in Scotland.
2. They both mention the need for awareness and education about deer management, with Passage 1 highlighting the opportunity to raise awareness nationally.
3. Both passages touch on health and safety considerations in relation to deer management activities.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal tone, discussing organizational structures and requirements, while Passage 2 is more informal and conversational.
2. Focus: Passage 1 focuses on the specific organization (FCS) and its role in deer management, whereas Passage 2 takes a broader view, discussing the national approach to Scotland's wild deer.
3. Content: Passage 1 provides details about DMQ assessments, DSC certifications, and the organization's internal verifier, while Passage 2 discusses the importance of e

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management, specifically highlighting the importance of planning, education, and professional development in this field.
2. They both mention the need for a strong skill base in deer management, as well as promoting and delivering wild deer welfare.
3. Both passages emphasize the importance of considering deer welfare in all management planning and activities affecting wild deer.

**DIFFERENCES**

1. Passage 1 provides more specific details about the Forestry Enterprise Scotland's (FES) deer management initiatives, including funding, personnel, and equipment used. In contrast, passage 2 focuses on a broader overview of deer management, highlighting challenges and opportunities in the field.
2. Passage 1 primarily discusses the practical aspects of deer management, such as planning, surveys, and fencing, whereas passage 2 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and conservation.
2. They both mention the importance of monitoring and assessing deer populations and habitats to inform management decisions.
3. Both passages highlight the need for data-driven approaches to manage deer populations effectively.

**DIFFERENCES**

1. Passage 1 focuses on the Northumberland Forest Estate (NFE) and its specific challenges, while Passage 2 discusses deer management in a broader context, including lowland and urban areas.
2. Passage 1 emphasizes the importance of culling deer to achieve management objectives, whereas Passage 2 mentions deer management planning and control measures without explicitly mentioning culling.
3. Passage 1 provides more detail on specific methods for monitoring and assessing deer populations and habitats (e.g., dung counts, thermal imaging), while Passage 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and its relationship with forestry, ecology, and conservation.
2. They mention the use of fencing as a tool in deer management, including internal fences, perimeter fences, and electric fencing.
3. Both passages emphasize the importance of minimizing negative impacts on landscapes and ecosystems.

**DIFFERENCES**

1. Passage 1 is more focused on the specific approach taken by FES (Forestry England and Scotland) to deer management, including their use of fencing, culling, and planting of palatable tree species.
2. Passage 2 provides a broader overview of deer management and its relationship with conservation, ecology, and climate change resilience. It also defines key terms like biodiversity, carbon sequestration, and ecosystem approach.
3. Passage 1 includes more specific details about FES's efforts, such as th

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss wild deer in Scotland, specifically mentioning Red Deer, Roe Deer, Sika Deer, and Fallow Deer.
2. The first passage provides data on the culling of deer on the NFE (National Forest Estate) for 2010/11 to 2014/15, while the second passage discusses the management of wild deer in Scotland, including population estimates and land manager roles.
3. Both passages mention the importance of managing deer populations to mitigate environmental impacts.

**DIFFERENCES**

1. The first passage focuses on specific data and statistics related to deer culling, while the second passage takes a more general approach to discussing deer management in Scotland.
2. The tone of the two passages is different, with the first passage presenting factual information and the second passage providing context and guidance for managing wild deer.
3. The sec

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management in Scotland.
2. They both mention different species of deer (red deer, roe deer, Sika deer, fallow deer).
3. Both passages emphasize the importance of monitoring and managing deer populations to minimize damage impacts.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal tone, as it is likely an official report or document, while Passage 2 appears to be a more general educational text.
2. Focus: Passage 1 focuses on the management practices and regulations related to out-of-season and night shooting of deer in Scotland, whereas Passage 2 provides background information on deer ecology, population estimates, and the legal framework for deer management.
3. Scope: Passage 1 is specific to a particular area (the NFE) and discusses the culling of deer as part of deer management practices, while Passage 2 takes a 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management in Scotland, specifically highlighting challenges and concerns related to deer-vehicle collisions.
2. They both mention the importance of humane treatment of injured or dying deer.
3. The passages share a concern for public safety, mentioning the risk of accidents caused by deer on roads.

**DIFFERENCES**

1. Passage 1 focuses more on specific management plans and protocols for dealing with winter incursions and road traffic accidents involving deer. Passage 2 provides a broader overview of the importance of deer in Scotland's urban and natural environments.
2. While both passages touch on deer welfare, Passage 1 goes into greater detail about the methods and options for humane despatch, whereas Passage 2 mentions specific welfare problems faced by deer, such as entanglement in wire fences and choking on plasti

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management in Scotland, specifically mentioning Sika deer, Red deer, Feral boar, and Grey squirrels.
2. They both mention the importance of managing deer populations to reduce impacts on the environment, agriculture, and social interests.
3. The passages share a common goal of reducing damage caused by deer, with Passage 1 stating that FES aims to reduce expansion of Sika deer and Passage 2 mentioning the need to manage deer to meet a wide range of objectives.

**DIFFERENCES**

1. Tone: Passage 1 has a more formal and technical tone, while Passage 2 is written in a more conversational style.
2. Focus: Passage 1 focuses on specific issues related to Sika deer and Feral boar management on the National Forest Estate (NFE), whereas Passage 2 provides an overview of deer management in Scotland, including the ecology of differe

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages mention deer management and conservation efforts.
2. They both discuss the importance of recreational stalking and its contribution to the local economy.
3. Both passages highlight the need for competent and cost-effective deer management actions.

**DIFFERENCES**

1. Passage 1 is specific to a particular organization (FES) and provides details on their forest protection activities, whereas Passage 2 appears to be more general in scope and focus.
2. Passage 1 provides concrete numbers and statistics regarding recreational stalking, such as the number of trips per year, while Passage 2 does not provide similar data.
3. Passage 2 has a broader focus, discussing deer management in various settings (urban, woodland, open land) and its impacts on agriculture and forestry, whereas Passage 1 is primarily concerned with forest protection and 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss the management and supply of venison (deer meat) in Scotland.
2. They both mention the importance of sustainability, particularly in terms of ensuring a healthy venison industry and promoting environmental objectives.
3. Both passages highlight the need for collaboration and partnership working, with Passage 1 emphasizing its work with a game dealer to promote Scottish wild venison, while Passage 2 encourages greater consideration between those exercising access rights and those undertaking deer management.

**DIFFERENCES**

1. Tone: Passage 1 is more focused on the practical aspects of supplying venison, providing statistics and details about carcass supply and marketing efforts. Passage 2 has a broader focus on deer management and conservation, with a stronger emphasis on environmental and social issues.
2. Scope: Passage 1 

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management and its importance.
2. They both mention the need to balance deer management with environmental, social, and economic considerations.
3. Both passages emphasize the importance of engaging with local communities and relevant organizations in deer management planning.

**DIFFERENCES**

1. Passage 1 focuses more on the practical aspects of deer management, such as culling methods, scheduling, and monitoring, while Passage 2 takes a more holistic approach, emphasizing the need to diversify products, increase opportunities for people to engage with deer, and raise awareness about road safety issues.
2. Passage 1 places a stronger emphasis on the role of the Forestry Commission (FC) in deer management, mentioning Operational Guidance and industry best practice. Passage 2 does not mention the FC specifically, but rath

ANSWER-------------------------------------------
Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss deer management in Scotland, highlighting its importance for maintaining a healthy environment.
2. They both mention the need to balance deer populations with other public interests, such as economic development and social well-being.
3. Both passages emphasize the importance of considering the role of deer in ecosystems and their impacts on biodiversity.

**DIFFERENCES**

1. Passage 1 focuses more on the practical aspects of deer management, including the production of wild venison, timber, and ecosystem services, whereas Passage 2 takes a broader approach, discussing the public interest in deer management.
2. Passage 1 highlights the importance of an integrated land management approach, whereas Passage 2 emphasizes the need to consider biodiversity objectives and the role of deer in ecosystems.
3. The tone of Passage 1 is mo

In [12]:
answers

['Here are 3 similarities and 3 differences between the two passages:\n\n**Similarities:**\n\n1. Both passages discuss deer management, specifically in a Scottish context (although Passage 2 mentions "lowland and urban deer" without specifying Scotland).\n2. They both highlight the importance of collaboration and integrated approaches to deer management, with Passage 1 emphasizing the need for an "integrated and collaborative approach" and Passage 2 stating that deer management planning should be "co-ordinated, make available and use current data".\n3. Both passages touch on the theme of wildlife crime and disturbance, with Passage 1 mentioning "Wildlife crime and disturbance" in its glossary and Passage 2 not explicitly mentioning it but discussing public perception of urban and lowland deer.\n\n**Differences:**\n\n1. Tone and scope: Passage 1 has a more formal and detailed tone, outlining specific strategies and plans for deer management on the National Forest Estate (NFE). Passage 2

In [56]:
def recursive_summariser(
    text: str,
    model: str ='llama3:8b',
    input_chunk_tokens: int = 5000, # characters not tokens
    chunk_overlap: int = 20,
    compression_ratio: float = 5.0,
    max_summary_length: int = 5000, # characters not tokens
):
    input_length = len(text)
    print("Input_text_length: " + str(len(text)))
    # see if we can achieve our desired compressions ratio with a less agressive compression ratio
    required_compression = input_length / max_summary_length
    if required_compression < compression_ratio:
        # this could be a higher compression ration than desired, but unlikely
        # it should ensure sufficient compression on late rounds to make the last round of compression close to optimal
        compression_ratio = required_compression * 1.3 
        print(f"changed compressions ratio to : {compression_ratio}")
    # split the passage based on the token length
    text_splitter = langchain.text_splitter.RecursiveCharacterTextSplitter(
        chunk_size=input_chunk_tokens, # characters not words
        chunk_overlap=chunk_overlap,
        # add full stops and commas to the separators to try to get the most sensible splitting 
        separators=["\n\n", "\n", ".", ",", " ", ""],
        length_function=len,
    #     is_separator_regex=False,
    )
#     text_token_count = text_splitter.count_tokens(text)
#     print("Starting tokens for this recursion:" + str(text_token_count))
    text_chunks = text_splitter.split_text(text)
    # handle the edge case that we are already down to a chunk smaller than the chunk size
    if len(text_chunks) <= 1:
        return text
    # summarise each chunk in turn
    responses = []
    for chunk in text_chunks:
        word_count = len(chunk.split(" "))
        desired_word_count = int(word_count//compression_ratio)
        print(f"chunk length: {len(chunk)}, word count: {word_count}, desired word count: {desired_word_count}")
        query = f"""
You are a professional document summariser
Please summarise the following text from {word_count} words, down to {desired_word_count} words. 
Be careful to retain as much of the overall meaning of the text as possible in your summary. 
Include nothing but the summary in your reply. Do not say how many words it is summarised to or mention that it is a summary.

Text:
{chunk}
"""
        response = ollama.chat(model='llama3:8b', messages=[
          {
            'role': 'user',
            'content': query,
          },
        ])
        response_text = response['message']['content']
        responses.append(response_text)
    
    # concatenate the chunks
    summary = "\n".join(responses)
    print("Summary length in characters: " + str(len(summary)))
    
    # check summary length
    summary_splitter = langchain.text_splitter.RecursiveCharacterTextSplitter(
        chunk_size=max_summary_length, # characters not words
        chunk_overlap=0,
        length_function=len,
    #     is_separator_regex=False,
    )
#     summary_token_count = summary_splitter.count_tokens(summary)
    summary_chunks = text_splitter.split_text(summary)
    # if summary is short enough return it
    if len(summary_chunks) <= 1:
        return summary
    # otherwise call recursive_summariser on the summary
    else:
        return recursive_summariser(summary)
    
def tidy_text(text: str):
    query = f"""
You are a professional copy editor
Please edit the following text ensuring that it maintains a consistent grammatical style throughout.
Remove any references to summarisation or word counts.
Ensure you preserve all the factual meaning, but remove any incoherent text.
Break the text into meaningful paragraphs.
Your edited text should be approximately the same number of words as the original.
Your edited text must be written in British rather than American English.
Include nothing but the edited text in your reply. Do not mention that it is edited.

Text:
{text}
            """
    edited_summary = ollama.chat(model='llama3:8b', messages=[
      {
        'role': 'user',
        'content': query,
      },
    ])
    return edited_summary['message']['content']

In [57]:
raw_summary_1 = recursive_summariser(text_1)

Input_text_length: 70352
chunk length: 5000, word count: 816, desired word count: 163
chunk length: 5000, word count: 809, desired word count: 161
chunk length: 5000, word count: 826, desired word count: 165
chunk length: 5000, word count: 807, desired word count: 161
chunk length: 4999, word count: 862, desired word count: 172
chunk length: 5000, word count: 782, desired word count: 156
chunk length: 5000, word count: 798, desired word count: 159
chunk length: 4999, word count: 807, desired word count: 161
chunk length: 5000, word count: 784, desired word count: 156
chunk length: 4998, word count: 759, desired word count: 151
chunk length: 5000, word count: 749, desired word count: 149
chunk length: 4999, word count: 791, desired word count: 158
chunk length: 5000, word count: 765, desired word count: 153
chunk length: 5000, word count: 748, desired word count: 149
chunk length: 632, word count: 95, desired word count: 19
Summary length in characters: 12013
Input_text_length: 12013
ch

In [58]:
print(raw_summary_1)

Scotland's Wild Deer: A National Approach aims to balance public benefits with private objectives in deer management, prioritizing healthy ecosystems, biodiversity, and recreational opportunities. The approach sets out challenges and priorities for the next five years, including adopting higher standards of deer management planning and delivery. Deer play an important role in Scotland's economy, providing food, recreational opportunities, and ecosystem services, but can have negative impacts on the environment, forestry, agriculture, and public safety if not managed sustainably. The approach focuses on balancing economic, environmental, and social benefits while maintaining healthy ecosystems, with land managers considering ecosystem services and deer impacts to achieve sustainable economic growth.
Deer management in Scotland aims to balance ecological, cultural, and economic needs while ensuring public welfare. The plan focuses on minimizing non-native deer species spread, safeguardin

In [59]:
print(len(raw_summary_1))

3160


In [60]:
tidy_summary_1 = tidy_text(raw_summary_1)

In [61]:
print(tidy_summary_1)

Scotland's Wild Deer: A National Approach aims to balance public benefits with private objectives in deer management, prioritising healthy ecosystems, biodiversity, and recreational opportunities. The approach sets out challenges and priorities for the next five years, including adopting higher standards of deer management planning and delivery. Deer play an important role in Scotland's economy, providing food, recreational opportunities, and ecosystem services, but can have negative impacts on the environment, forestry, agriculture, and public safety if not managed sustainably.

The approach focuses on balancing economic, environmental, and social benefits while maintaining healthy ecosystems, with land managers considering ecosystem services and deer impacts to achieve sustainable economic growth. Deer management in Scotland aims to balance ecological, cultural, and economic needs while ensuring public welfare. The plan prioritises minimising non-native deer species spread, safeguard

In [62]:
print(len(tidy_summary_1))

3015


In [63]:
print(len(text_1))

70352


In [64]:
raw_summary_2 = recursive_summariser(text_2)

Input_text_length: 88149
chunk length: 5000, word count: 723, desired word count: 144
chunk length: 5000, word count: 887, desired word count: 177
chunk length: 5000, word count: 752, desired word count: 150
chunk length: 5000, word count: 795, desired word count: 159
chunk length: 4999, word count: 749, desired word count: 149
chunk length: 5000, word count: 791, desired word count: 158
chunk length: 5000, word count: 777, desired word count: 155
chunk length: 4999, word count: 771, desired word count: 154
chunk length: 5000, word count: 808, desired word count: 161
chunk length: 4999, word count: 780, desired word count: 156
chunk length: 5000, word count: 795, desired word count: 159
chunk length: 5000, word count: 814, desired word count: 162
chunk length: 4999, word count: 862, desired word count: 172
chunk length: 5000, word count: 807, desired word count: 161
chunk length: 5000, word count: 776, desired word count: 155
chunk length: 5000, word count: 804, desired word count: 160

In [65]:
print(raw_summary_2)

The National Forest Estate manages deer populations to protect the environment, biodiversity, and natural heritage. The Forestry Commission Scotland's Forest Enterprise Scotland agency works with stakeholders to manage deer and their impacts through an integrated approach. The goal is to maintain healthy wild deer populations, manage deer impacts across the NFE, and contribute to Scotland's Wild Deer: a National Approach. The Scottish deer population consists of two native species (Red and Roe) and two introduced species (Sika and Fallow), with an estimated total population of around 777,000. The organization aims to protect and enhance the environment by managing deer populations, ensuring long-term sustainability of Scotland's natural resources. Deer management practices balance environmental, social, and economic benefits, prioritizing conservation of Scotland's natural heritage while balancing competing demands for land use.
The Forestry Commission's deer management strategy involv

In [66]:
print(len(raw_summary_2))

3645


In [67]:
tidy_summary_2 = tidy_text(raw_summary_2)

In [68]:
def compare_summaries(
    summary_1: str,
    summary_2: str,
    model:str = 'llama3:8b',
    max_length:int = 10000,
):
    query = f"""
Please describe the similarities and differences between the following two passages. 
Focus on similarites and differences of content and emphasis, rather than style or phrasing.
Provide your answers as two numbered lists labelled SIMILARITIES and DIFFERENCES.

Passage 1:
{summary_1}

passage 2:
{summary_2}
"""
    if len(query) > max_length:
        print("Warning query may exceed context window")
    response = ollama.chat(model=model, messages=[
      {
        'role': 'user',
        'content': query,
      },
    ])
    answer = response['message']['content']
    return(answer)

In [70]:
comparison = compare_summaries(tidy_summary_1, tidy_summary_2)
print(comparison)

Here are the similarities and differences between the two passages:

**SIMILARITIES**

1. Both passages discuss the management of wild deer in Scotland, with a focus on balancing environmental, social, and economic benefits.
2. They both emphasize the importance of sustainable deer management practices to protect Scotland's natural heritage and ecosystems.
3. Both passages mention the need for effective communication, training, and financial support to deliver successful deer management strategies.

**DIFFERENCES**

1. Passage 1 focuses on a national approach to deer management in Scotland, while Passage 2 describes the specific actions of Forestry Commission Scotland (FCS) and Forest Enterprise Scotland (FES) in managing deer populations.
2. Passage 1 covers a broader range of topics, including the importance of ecosystem services, cultural heritage, and human disease risks, whereas Passage 2 is more focused on FCS/FES's role in deer management and its specific strategies for achievin

In [71]:
def compare_strategies(
    summary_1: str,
    summary_2: str,
    model:str = 'llama3:8b',
    max_length:int = 10000,
):
    query = f"""
You are an enviromental strategy expert.
Please describe the similarities and differences in the strategies outlined by the following passages. 
Focus on similarites and differences of strategy, rather than style, tone or phrasing.
Provide your answers as two numbered lists labelled SIMILARITIES and DIFFERENCES.

Passage 1:
{summary_1}

passage 2:
{summary_2}
"""
    if len(query) > max_length:
        print("Warning query may exceed context window")
    response = ollama.chat(model=model, messages=[
      {
        'role': 'user',
        'content': query,
      },
    ])
    answer = response['message']['content']
    return(answer)

In [72]:
strategy_comparison = compare_strategies(tidy_summary, tidy_summary_2)
print(strategy_comparison)

Here are the similarities and differences in strategy between the two passages:

**SIMILARITIES**

1. Both strategies prioritize balancing environmental, social, and economic benefits.
2. They both aim to manage deer populations sustainably, considering ecosystem services, biodiversity, and human well-being.
3. Both strategies emphasize the importance of conservation of Scotland's natural heritage and ecosystems.
4. They both recognize the need for collaboration with stakeholders and partnerships to achieve sustainable deer management.
5. Both strategies prioritize deer welfare, including minimizing non-native species spread and ensuring strong skill bases in management.

**DIFFERENCES**

1. Passage 1 focuses on a national approach to deer management, while Passage 2 is specific to Forestry Commission Scotland's Forest Enterprise Scotland (FES) agency and its deer management strategy.
2. Passage 1 places more emphasis on promoting sustainable economic development, recreational opportun