# Large Langurage Model Task

This report is compiled after meeting with 12 members of council staff belonging to housing teams within the London Borough of Tower Hamlets. LLMs are employed to summarise content and keywords. This notebook primarily explores the use of artificial intelligence and automated processes as an intervention to our user research.

ADD TABLE HERE (llm excel)

## Software

In order to run various models, **Python** is the computer language used in this report. Python's built-in features and packages developed by contributors permit us to run the LLM models with simple lines of code. Before using the code and modules, we prepare initial steps such as download all the required packages and import them onto the Python platform.

## Preparation

### Download Libraries

In [1]:
!pip install bert-extractive-summarizer
!pip install python-docx
!pip install datasets
!pip install sentencepiece
!pip install Rouge

!pip install sentence-transformers
!pip install typing-extensions

!pip install yake
!pip install rake-nltk
!pip install keybert
!pip install summa

Collecting bert-extractive-summarizer
  Downloading bert_extractive_summarizer-0.10.1-py3-none-any.whl (25 kB)
Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m0m
Collecting tokenizers<0.21,>=0.20
  Downloading tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting safetensors>=0.4.1
  Downloading safetensors-0.4.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (435 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m435.0/435.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting filelock
  Downloading filelock-3.16.1-py3-none-any.whl (16 kB)
Collecting huggingface-hub<1.0,>=0.23.2
  Dow

### Import Libraries

In [20]:
#Basic
from docx import Document
import pandas as pd
import matplotlib.pyplot as plt
import datasets
import numpy as np

#Extractive Summarisation
import re
import nltk
import sentencepiece as spm
from summarizer import Summarizer
from nltk.tokenize import sent_tokenize
from transformers import pipeline, set_seed
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
from rouge import Rouge
from summa import keywords

# Abstractive Summarisation
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Keywords
import yake
from keybert import KeyBERT
from summa import keywords
from sklearn.feature_extraction.text import TfidfVectorizer

## Key Definition

**Large Language Model (LLM)** is a type of language processing technology used for text recognition and performing text-related tasks such as summarising, text extraction, text classification, and text dialogue. LLMs can fulfil our purpose of extracting information.

There are two type of summarisations in LLM: 

1. **Extractive Summarisation**: this approach extracts the most important phrases and lines from the documents.

2. **Abstractive Summarisation**: this approach creates new phrases and terms that are different from the original document, retaining all original content.

This case tests both these listsed types. Considering the variety of popular LLM models available, this report first compares various models to determine which is the most suitable for the tasks. Then the most suitable model is used to conduct relevant summaries to derive insights. The testing process for both types of summaries is as follows:

## Summarisation

### Extractive Summarisation

#### **Model Comparison**

There are four main models for LLM: Pegasus, BART, T5 and GPT-4.

| **Model**      | **Underlying Principle**                                                                                                                                                                                                                                                                           | **Advantages for Summarisation**                                                                                                                                                                                                                                                                              | **Disadvantages for Summarisation**                                                                                                                                                                                                                       |
|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Pegasus**    | - Pre-training with **Gap-Sentence Generation (GSG)**: Sentences are masked, and the model learns to generate missing sentences, which simulates summarisation tasks.                                                                       | - **Specifically designed** for summarisation, with pre-training objective closely aligned to the task.<br>- **Handles long documents** well.                                             | - **Specialised** for summarisation, so less flexible for other tasks.                                 |
| **BART**       | - **Denoising autoencoder** trained to recover original text from corrupted input.                                                                                              | - Pre-trained on **large datasets**, providing a strong base for extractive summarisation tasks.              | - Summaries might occasionally lack **coherence** or **factual accuracy** due to the free-form generation style.                                                          |
| **T5**         | - **Text-to-Text Transfer Transformer**: Frames all tasks (classification, summarisation, etc.) as a **text-to-text** problem.<br>- Pre-trained on the **Colossal Clean Crawled Corpus (C4)** dataset using a denoising objective.                                                                  | - Extremely **flexible** for many LLM tasks, including summarisation. | - May not perform as well on **highly specialised summarisation tasks** compared to models specifically tuned for summarisation.<br>- Requires careful prompt design for optimal summarisation results. |
| **GPT-4**      | - **Large language model** based on a **transformer architecture**.<br>- Pre-trained with **unsupervised learning** on vast amounts of diverse text data.<br>                         | - Handles **open-domain** and **complex texts** well.  | - Requires **API** access and is computationally demanding, but delivers excellent thematic analysis. GPT-4 cannot run in local.                |



As GPT-4 is an online model with data privacy risks, this study selects models **Pegasus, BART and T5** for LLM.

To test the performance of Pegasus, BART, and T5 models on Extractive Summarisation, this study uses content from a randomly selected speaker, Matthew Pullen, as a test sample.  

**Everything after this point would be moved to results- separate the above and add in methodology**

## LLM Processing and Results

#### **Code Tab**

##### **Doc Processing**

In [3]:
def extract_text_from_docx(docx_file):
    doc = Document(docx_file)
    full_text = []
    for para in doc.paragraphs:
        if para.text.strip():
            full_text.append(para.text)
    return '\n'.join(full_text)

#########################################

def extract_speaker_content(text, speakers):
    speaker_contents = {speaker: "" for speaker in speakers}
    paragraphs = text.split('\n')
    
    current_speaker = None

    for paragraph in paragraphs:
        for speaker in speakers:
            if paragraph.startswith(speaker + ":"):
                current_speaker = speaker
                content = paragraph[len(speaker) + 1:].strip()
                speaker_contents[speaker] += content + "\n"
                break
        else:
            if current_speaker:
                speaker_contents[current_speaker] += paragraph.strip() + "\n"

    return speaker_contents

#########################################

# Extract the speaker's speech content 
# according to their name and store it in a string field
def extract_speaker_content(text, speakers):
    speaker_contents = {speaker: '' for speaker in speakers}  # Initialize with empty strings instead of lists
    paragraphs = text.split('\n')
    
    for paragraph in paragraphs:
        # Matches paragraphs that begin with the speaker's name
        for speaker in speakers:
            if paragraph.startswith(speaker + ":"):
                # Extract the speaker's speech and remove the name part
                content = paragraph[len(speaker) + 1:].strip()  # Remove the name and colon
                speaker_contents[speaker] += content + ' '  # Append the content to the corresponding speaker's string
    
    return speaker_contents

#########################################

if __name__ == "__main__":
    meeting_text = extract_text_from_docx('Group_1.docx')  # Path to the file provided
    
    # Identify speakers who need attention
    target_speakers = ['Matthew Pullen', 'Matt Newby', 'Camelia Smith', 'Megan Rourke']
    
    # Extract the speech content of each speaker and store it in a string field
    speaker_contents = extract_speaker_content(meeting_text, target_speakers)
    
    # Output the speech content of each speaker
    for speaker, content in speaker_contents.items():
        print(f"{speaker}'s content:\n{content.strip()}\n")
        print("-" * 50)
    
    # Store the speeches as strings
    matthew_pullen_content = speaker_contents['Matthew Pullen'].strip()
    matt_newby_content = speaker_contents['Matt Newby'].strip()
    camelia_smith_content = speaker_contents['Camelia Smith'].strip()
    megan_rourke_content = speaker_contents['Megan Rourke'].strip()

Matthew Pullen's content:
Shall I start? I guess there's different ways of looking at it. We use housing data at different trigger points, maybe it's the place to start so all the way from maybe we want to know when something is in pre-app, in an application, it's got a decision, so both if it's got a resolution to grant through planning committee and if it's got a final decision with the section 106. And then moving further through the process, bigger points at when they're when they're starting to look at conditions when they're having conditions signed off, including specific conditions within permission then commencement on site, commencement can mean a different thing depending on. We we've got plenty. But yeah, we'd be happy to share with you and take you through. I mean, look, we do stuff using Excel spreadsheets because we haven't had anything more sophisticated than. I don't think it hopefully this can be of help to you and it can fit into the project rather than being awkward

##### **BART**

In [4]:
model = Summarizer()
result = model(matthew_pullen_content, min_length=60, num_sentences=5)
BART = ''.join(result)
print(BART)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

We use housing data at different trigger points, maybe it's the place to start so all the way from maybe we want to know when something is in pre-app, in an application, it's got a decision, so both if it's got a resolution to grant through planning committee and if it's got a final decision with the section 106. With different size, different types of units as well and we've done that off of reality rather than using planning policies because planning policy doesn't always get adhered to. And so, we've kind of had to prioritise where we put our time and energy. This tool really if you're if you're trying to forecast housing completions over the next three years, well, you only need stuff that's in the planning system. I was going to say certainly on some things we've done, we've had to kind of do something that works for most and then like I said, we've taken off the little ones and just put them in one windfall like, similarly, we've had to be a bit bespoke for some of the really big

Ref:https://github.com/dmmiller612/bert-extractive-summarizer

##### **T5**

In [5]:
pipe = pipeline('summarization', model = 't5-large', min_length=100)
T5 = pipe(matthew_pullen_content)
t5 = ', '.join(str(item) for item in T5)
T5

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

[{'summary_text': "if you're trying to forecast housing completions over the next three years, it's going to be a bit more complicated . we've got a lot of assumptions that we use, but they're not necessarily based on huge amounts of data . it would be useful to have that information in a system that's searchable and can be fed into other models . if we can do that, then we'll be able to do a better job of forecasting. I think we're"}]

##### **Pegasus**

In [6]:
# Pegasus model and SentencePiece tokenizer
tokenizer = PegasusTokenizer.from_pretrained("google/pegasus-cnn_dailymail")
model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-cnn_dailymail")

# Tokenize the text
tokens = tokenizer(matthew_pullen_content, truncation=True, padding="longest", return_tensors="pt")

# Generate summary
summary_ids = model.generate(tokens["input_ids"], max_length=1024, min_length=100, num_beams=4, early_stopping=True)

# Decode and output the summary
PEGASUS = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(PEGASUS)

tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-cnn_dailymail and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

We use housing data at different trigger points, maybe it's the place to start .<n>We layer in assumptions about when things might get permission .<n>We tend to layer in assumptions on how long it would take to build something that's under 200 units .<n>There's definitely a hole in our data for our CIL forecasting when it comes to commercial .<n>But we haven't got anything anywhere near as sophisticated as we've got for this holes in our housing approach .


Ref: https://github.com/rohan-paul/MachineLearning-DeepLearning-Code-for-my-YouTube-Channel/blob/master/NLP/Text_Summarization_%20BART%20_T5_Pegasus.ipynb

#### **Results**

**BART identified the following as the five most important sentences from Matthew's speech:**

*"We use housing data at different trigger points, maybe it's the place to start so all the way from maybe we want to know when something is in pre-app, in an application, it's got a decision, so both if it's got a resolution to grant through planning committee and if it's got a final decision with the section 106. With different size, different types of units as well and we've done that off of reality rather than using planning policies because planning policy doesn't always get adhered to. And so, we've kind of had to prioritise where we put our time and energy. This tool really if you're if you're trying to forecast housing completions over the next three years, well, you only need stuff that's in the planning system. I was going to say certainly on some things we've done, we've had to kind of do something that works for most and then like I said, we've taken off the little ones and just put them in one windfall like, similarly, we've had to be a bit bespoke for some of the really big ones as well, and so something as complex as Lemon road or Wood Wharf."*

**T5 identified the following as the five most important sentences from Matthew's speech:**

*"if you're trying to forecast housing completions over the next three years, it's going to be a bit more complicated . we've got a lot of assumptions that we use, but they're not necessarily based on huge amounts of data . it would be useful to have that information in a system that's searchable and can be fed into other models . if we can do that, then we'll be able to do a better job of forecasting. I think we're."*

**Pegasus identified the following as the five most important sentences from Matthew's speech:**

*"We use housing data at different trigger points, maybe it's the place to start .<n>We layer in assumptions about when things might get permission .<n>We tend to layer in assumptions on how long it would take to build something that's under 200 units .<n>There's definitely a hole in our data for our CIL forecasting when it comes to commercial .<n>But we haven't got anything anywhere near as sophisticated as we've got for this holes in our housing approach."*

#### **Evaluation with Rouge**

To verify the quality of the results, this report employs the **Rouge** metric for evaluation. Rouge is used to assess text summarisation tasks. The scores obtained through Rouge range from 0 to 1, with a score of 1 indicating high quality, representing a high degree of similarity between the model-generated summary and the reference summary (manually generated).

For the Rouge evaluation, a reference summary is necessary. This report has set the following five sentences as the reference summary, each representing one of the five critical aspects of the overall requirements of the Housing Data Flow project.

1. **Trigger points of housing data needed**: *"We use housing data at different <u>trigger points</u> maybe it's the place to start so all the way from maybe we want to know when something is in <u>pre-app</u> in an application it's got a <u>decision</u> so both if it's got a resolution to grant through <u>planning committee</u> and if it's got a <u>final decision</u> with the <u>section 106</u>."*

2. **Monitoring and continue of housing data**: *"And through the build process as they're making progress with building and <u>occupation</u>, we then use the points through <u>Section 106 monitoring</u> which go on forever but possibly there's not a need to pick them up in here because we have a section 106 monitoring system that might do that."*

3. **The role of data - prediction**: *"In the <u>future</u> we'll also use it to <u>forecast workload</u> as well and as we're expecting developments to move through the process we know when workload is going to be coming in too."*

4. **Platform and dushbroud**: *"We've got plenty of <u>spreadsheets</u> for forecasting development moving through the system at the moment so we can do what I was mentioning before."*

5. **Key data**: *"Really the ones that I've just mentioned depending on what it is sometimes we need to look at this by <u>unit</u>, sometimes we need to look at it by <u>square metre</u>, and there is variation depending on the type of home it is as well that's both by <u>affordable and market</u> but also by the <u>number of bedrooms</u> as well."*

##### **Code Tab**

In [7]:
refer = "We use housing data at different trigger points maybe it's the place to start so all the way from maybe we want to know when something is in pre-app in an application it's got a decision so both if it's got a resolution to grant through planning committee and if it's got a final decision with the section 106. And through the build process as they're making progress with building and occupation, we then use the points through Section 106 monitoring which go on forever but possibly there's not a need to pick them up in here because we have a section 106 monitoring system that might do that. In the future we'll also use it to forecast workload as well and as we're expecting developments to move through the process we know when workload is going to be coming in too. We've got plenty of spreadsheets for forecasting development moving through the system at the moment so we can do what I was mentioning before. Really the ones that I've just mentioned depending on what it is sometimes we need to look at this by unit, sometimes we need to look at it by square metre, and there is variation depending on the type of home it is as well that's both by affordable and market but also by the number of bedrooms as well."

In [10]:
rouge = Rouge()

# Compute ROUGE scores
t5_scores = rouge.get_scores(t5, refer, avg=True)
bart_scores = rouge.get_scores(BART, refer, avg=True)
pegasus_scores = rouge.get_scores(PEGASUS, refer, avg=True)

# Print scores
print("T5 ROUGE Scores:", t5_scores)
print("BART ROUGE Scores:", bart_scores)
print("PEGASUS ROUGE Scores:", pegasus_scores)

T5 ROUGE Scores: {'rouge-1': {'r': 0.23622047244094488, 'p': 0.46875, 'f': 0.3141361211984321}, 'rouge-2': {'r': 0.02926829268292683, 'p': 0.07692307692307693, 'f': 0.042402822862066335}, 'rouge-l': {'r': 0.1889763779527559, 'p': 0.375, 'f': 0.2513088960675421}}
BART ROUGE Scores: {'rouge-1': {'r': 0.48031496062992124, 'p': 0.4728682170542636, 'f': 0.4765624950003052}, 'rouge-2': {'r': 0.24878048780487805, 'p': 0.2712765957446808, 'f': 0.25954197974218035}, 'rouge-l': {'r': 0.41732283464566927, 'p': 0.4108527131782946, 'f': 0.4140624950003052}}
PEGASUS ROUGE Scores: {'rouge-1': {'r': 0.2204724409448819, 'p': 0.4666666666666667, 'f': 0.29946523628356553}, 'rouge-2': {'r': 0.05365853658536585, 'p': 0.14864864864864866, 'f': 0.07885304269729339}, 'rouge-l': {'r': 0.1968503937007874, 'p': 0.4166666666666667, 'f': 0.26737967478623925}}


The Rouge scores for each models are as follows:

| Model   | Metric   | Recall    | Precision | F-Score   |
|---------|----------|-----------|-----------|-----------|
| **T5**  | ROUGE-1  | 0.2362    | 0.46875   | 0.31414   |
|         | ROUGE-2  | 0.02927   | 0.07692   | 0.04240   |
|         | ROUGE-L  | 0.18898   | 0.37500   | 0.25131   |
| **BART**| ROUGE-1  | 0.48031   | 0.47287   | 0.47656   |
|         | ROUGE-2  | 0.24878   | 0.27128   | 0.25954   |
|         | ROUGE-L  | 0.41732   | 0.41085   | 0.41406   |
| **PEGASUS** | ROUGE-1 | 0.22047 | 0.46667   | 0.29947   |
|         | ROUGE-2  | 0.05366   | 0.14865   | 0.07885   |
|         | ROUGE-L  | 0.19685   | 0.41667   | 0.26738   |

The various subtypes of Rouge are explained as follows: 
* ROUGE-N: Similarity evaluation based on n consecutive words
* ROUGE-L：Order-based evaluation (text coherence and fluency)
* R: Proportion of N-grams in the reference summary also appear in the generated summary
* P： Proportion of N-grams in the generated summary also appear in the reference summary
* F1: Harmonic mean of recall and precision
_______________

* R Score: Number of overlapping N-grams / Total number of N-grams in the reference summary
* P Score: Number of overlapping N-grams / Total number of N-grams in the generated summary
* F1 Score = 2 × (Precision × Recall)/(Precision + Recall)


The results indicate that **BART** performed exceptionally well across all metrics. Consequently, this paper will now use the BART model to perform Extractive Summarisation on the dialogue content of the 12 participants, highlighting the key five points for each members.

#### **Extractive Summarisation Result - BART**

| Speaker            | Contribution |
|---------------------|--------------|
| **Matthew Pullen**  | - We use housing data at different trigger points, maybe it's the place to start so all the way from maybe we want to know when something is in pre-app, in an application, it's got a decision, so both if it's got a resolution to grant through planning committee and if it's got a final decision with the section 106.<br>- With different size, different types of units as well and we've done that off of reality rather than using planning policies because planning policy doesn't always get adhered to.<br>- And so, we've kind of had to prioritise where we put our time and energy.<br>- This tool really if you're if you're trying to forecast housing completions over the next three years, well, you only need stuff that's in the planning system.<br>- I was going to say certainly on some things we've done, we've had to kind of do something that works for most and then like I said, we've taken off the little ones and just put them in one windfall like, similarly, we've had to be a bit bespoke for some of the really big ones as well, and so something as complex as Lemon road or Wood Wharf. |
| **Matt Newby**      | - So, I sort of sit in the planning service as corporate lead, but guess where we use the data or I use the data quite a lot is aligned sort of more with the Plan Making team.<br>- They need to assess how things, how the policies are performing.<br>- So as like an indicator framework at the back which kind of sets out kind of the kind of key bits that we will assess the plan and provide commentary on and some of that is obviously things that housing delivery.<br>- So, it's a useful kind of exercise sort of breakdown the kind of spread of housing delivery across the borough.<br>- I think there is a kind of component of kind of like promotion of this across sort of parts beyond well maybe just with planning as well but beyond that as well I think it would be really beneficial to sort of for people to at least know it exists. |
| **Camelia Smith**   | - So for example, the GLA has a dashboard where they've got the different types of housing, so they've got flat student accommodation, Houses, bungalows, C2, student, shelters, housing blocks.<br>- So, the ones don't make it into the site allocations, the ones that don't go forward, the ones that you've projecting, Considered for a later date Two years later the site is unlocked for various reasons and it's able to be advanced.<br>- You might still find that your three plots come ahead and bring in 800 homes or something.<br>- What's been inherited, and that data stored on that system and that format?<br>- So doctors, GPS and I think having a positive aspects apart from residential would help to balance out the story in terms of development because often it's objections about lack of infrastructure and if we could show that there we are actually increasing the capacity of social infrastructure. |
| **Megan Rourke**    | - So, we're coming from, I guess like a housing and regional client-side perspective.<br>- So, it'd be really great to have like a sort of overall map which shows those things and has adds add out sort of photographs.<br>- It wasn't so much a question, it was just a response to your question.<br>- So it's almost, I don't know, I'd love it if you could click on a site and then it would say this, this, this is this policy will be applicable.<br>- Yeah, I was just gonna say the frustrations that I get, I'll just the sheer amount of human error because we have so many different Excel spreadsheets that unless you are all synonymously updating your spreadsheets, it means that one Excel spread is saying one thing and I want to say one thing and or it's something that's borrowed in the council, which you've got to ask X person about and they appear to be the repository for all of that information. |
| **Steven Heywood**  | - So, I mean for us in terms of using housing data, it's largely about just understanding what kind of delivery we've had in the past and what delivery we're likely to have in the future.<br>- So, the tenure which I think was mentioned on the slide before this one.<br>- And then for things that are further in the future, so sort of beyond five years, again we're just kind of having to make assumptions really and talk to developers as we put them into the site allocations and ask them kind of what do they think the timelines are that they're working on.<br>- No, I don't remember, from my perspective, this has got most of the stuff that we would usually ask for anyway when doing, you know, like I say, housing delivery and housing projections.<br>- I mean, I think basically kind of what I said earlier really in that I think it would just be very useful for us to use the kind of monitoring purposes for demonstrating what we've delivered and how the council is delivering housing. |
| **Hannah Horton**   | - I'm in the development coordination team, so we work closely with developers, but also colleagues and particularly highways and environmental health sort of supporting that construction phase.<br>- My team is a very basic one, just sort of an Excel spreadsheet that pulls data from a few places and but it is things like how many site, there's how many walkabouts with developers we've done how many construction forums.<br>- So it'd be worth you talking to either Sarah Wilkes or Jonathan Morrison.<br>- So usually through section 106, they might be delivering something for the community, such as a new school.<br>- So it'd be interesting to look at something like this and see when the key construction timelines are for some of those major developments and it might be that we sort of try and time these construction forums around them. |
| **Sam Happe**       | - As you can see from behind, I'm Sam, happy from the Council tax valuation part.<br>- And we do work heavily with numbering, so even though we work directly with the developers, we know what the development is called, the development name, as soon as we've got an order, we will then list it as what they will be known as, rather than a development, and we then have to send all that information over to the Inland Revenue Valuation Office because they are the people that band each property and then once they've done that, they will send the information back to us to upload.<br>- So it is really important that we get everything into the list when it should be.<br>- So we, we've got inspectors who are out on, you know, over Tower Hamlets.<br>- And so we get it at an early stage so we can also attach to the other information we give you on a weekly basis and to help with that, but it is well we're searching for it. |
| **Crissi Russo**    | - So I was gonna say Elika, just from a housing perspective, I think much of the information is, is probably quite similar that we're all looking for, but just in different guises.<br>- So for us, it's important to understand where we've received planning approvals when we expect starts on site, when we expect the schemes to be delivered and then for us, we have direct engagement with the RPs so should there be any deeds of variation then that's captured because in the planning sometimes that might not be captured whilst those negotiations are being had so for us it's I think it's quite similar information that that we're looking for because we're often asked to report in terms of what what's coming up in the pipeline and particularly looking at tenures as well.<br>- So I think it's probably quite similar information that we're all sort of looking for.<br>- So it'd be good to kind of understand this is this is only going to work for our own regeneration site.<br>- So it would be good to just have one set of data that is shared by all Council members and so and then we can philtre it in depending on our audience, but it means that the core data is the same for everybody. |
| **Melissa Spearman**| - I'm assuming what you mean is what information about housing completion like commencements or completions do we use?<br>- We do that kind of incidentally, to an extent with section 106, because we're always wanting to try and figure out, like, particularly if maybe they've done all their commencement obligations, but we're trying to get a sense of when the future ones might get triggered, will often try and ask for like a do you have a timeline?<br>- And so because, I mean, I don't really know loads about building control, but because CIL’s often one of the first things to get triggered, it's a really good indicator that something is starting on site, although they could also do enough to start CIL and give you know effect to their permission, but then they might stall on site and they might not continue for a long time so.<br>- But I think it's, yeah, it's all it would just be better to have like a source of truth that everyone is relying for the data.<br>- I guess to like by way of example like even the way that the CIL team monitor kind of follow that process through is different. |
| **Chris Hancox**    | - Can I just add on to that a little bit just from the data sources that we use is Mel says there's quite a few different areas that we get information from.<br>- So that might be an additional data source depending on how it goes to help us identify commencements.<br>- The only thing that I've noticed is missing from the information is CIL information and section 106 that doesn't have a column to say whether CIL has been paid or whether there's been a commencement notice issued to the CIL team.<br>- And that can just come out in the spreadsheet that gets sent to you periodically.<br>- I know this is a bit pie in the sky kind of thing, but it'll be handy when all this information is available to have, like an automated bots in the background that search the information and then inform the relevant teams according to criteria that they've set out for themselves. |
| **Natalya Palit**   | - I'm Natalya Palette, I joined last month leading the plan making team.<br>- What we did there was basically to set up an MS Teams form that was like a template.<br>- And I guess potentially I don't, I haven't looked at, I don't know what the boundaries are for East and West, but if you needed a kind of proxy like the way that it's divided up in the local plan, I don't know if you could kind of add together for example like city fringe is more on the West team and I don't know which you could add to that maybe not, but whether that would be possible.<br>- So I don't know if it's an issue here, but I remember coming up against in Southwark, they had a certain amount per I don't know however much square metres of children's play space that you didn't provide and you had to provide a kind of offsite contribution if for whatever shortfall.<br>- I don't know if it's the same supplier, I don't know how like if the back ends different. |
| **Paul Buckenham**  | - My name's Paul Buckenham and I'm head of development management.<br>- This is quite interesting to me because I think traditionally my service wouldn't necessarily get involved so much with the housing data side of things.<br>- So, we know what's completed now, how do we work out on the sites where there's still under construction?<br>- So I think I think actually there's probably a whole host of different development department management kind of activities that it would that it could potentially feed into probably like one of those ones that we don't necessarily quite understand all the implications until you actually see it work and start to use it and then you pick up on ideas for how you could you know.<br>- And the enforcement side of things as well, compliance to ensure that what's actually been built is what should have been built so. |


### Abstractive Summarisation

Models for abstractive summarisation are diverse. This report experiments with and compares Llama 3.2 and Copilot which are the two most popular LLM model. Brief introductions to the two models are as follows:

| Aspect                        | Llama 3 (Meta)                                      | Microsoft Copilot (Microsoft Teams)                 |
|-------------------------------|-----------------------------------------------------|-----------------------------------------------------|
| **Underlying Principle**      | LLM: Uses transformer models trained on diverse texts for deep natural language processing, capable of running locally to ensure data security. | Integration of LLM and NLP: Utilizes AI tailored for Teams, focusing on extracting actionable insights from communications, operates online using Microsoft's cloud security protocols. |
| **Advantages**                | - Generates detailed, context-aware summaries adaptable to various styles.<br>- "A multimodal large language model (LLM) capable of visual reasoning, image captioning, and answering questions about images." | - Seamless integration with Teams enhances productivity.<br>- Summaries include explicit citations to original texts, improving traceability. |
| **Disadvantages**             | - High computational costs.<br>- Potential bias in summaries due to training data.<br>- Summaries lack specific citations to sources, which might affect transparency. | - Limited to summarizing content within Microsoft Teams.<br>- May overlook complex nuances in discussions.<br>- Slower response times due to reliance on internet connectivity and cloud processing. |
| **Security**                  | - Can be operated locally, providing control over data security and faster response times. | - Relies on Microsoft's robust online security measures:"Copilot searches, processes and uses organisations’ data to provide the responses, which are specific to your organisation and user based on what they can access." |

https://www.fordway.com/chat-gpt-bard-llama-and-microsoft-copilot

https://medium.com/@rahultiwari065/unlocking-the-power-of-llama-3-2-vision-a-comprehensive-guide-to-metas-multimodal-ai-b200da72b685

The two models differ in their advantages and principles. To determine which model is more suitable for the User Research Meeting text, this report will test both models.

#### **Model Comparison**

This paper continues to use Matthew Pullen's dialogue to test the two models. The prompt used for testing is *"Summarise the following content into a five-sentence paragraph: ..."*. The output results are as follows:

##### **Llama 3.2 output**

*"Matthew Pullen expressed concerns about the usefulness of population projections for urban planning and development. They questioned the accuracy and reliability of existing models, citing a lack of consideration for overcrowding and assumptions based solely on policy. The speaker suggested that incorporating solid housing forecast data into these models could improve their accuracy. However, they noted that developing new systems to support population projections can be complex and time-consuming, requiring significant resources and effort. Despite the challenges, the speaker believes that having digital versions of planning documents and data, such as those provided by Exocom's system for section 106 agreements, can help streamline urban planning processes and improve decision-making."*

##### **Copilot output**

*"Matthew Pullen discussed the various ways housing data is used at different stages in the development process, from the pre-application stage to the final decision, including planning committee resolutions and section 106 agreements. He highlighted the importance of different commencement points, which vary depending on the context, such as building control, planning, CIL, and section 106. Matthew also explained that housing data is crucial for forecasting housing completions, CIL income, and section 106 income, which supports various teams' jobs. Additionally, he emphasized the need for a comprehensive system that includes both residential and non-residential data, and considers different types of housing, such as student housing and co-living. Finally, he discussed the challenges of identifying site boundaries and capturing data for sites not yet in the planning system."*

#### **Evaluation with SemScore**

Considering that abstractive summarisation reconstructs sentences and generates entirely new text for summaries, the previously used ROUGE metric, which only detects identical words, is not suitable for evaluating abstractive summaries. In contrast, **SemScore** is more appropriate for scenarios involving abstractive summaries.

SemScore is a method used to measure the similarity between a model-generated summary and a reference summary. It focuses on the semantic content of a model’s output using embeddings. This approach <u>converts text into mathematical vectors to represent its semantics</u>. Then, by comparing the similarity between these embedding vectors—typically using cosine similarity—SemScore can assess how closely the semantic content of the model's output aligns with the reference content. The results range from 0 to 1, with scores closer to 1 indicating greater similarity to the reference summary. This paper employs SemScore to test and compare the summaries produced by Copilot and Llama 3.2.

https://medium.com/@geronimo7/semscore-evaluating-llms-with-semantic-similarity-2abf5c2fadb9

A reference summary of Matthew Pullen's remarks during the meeting have been provided:

*"Matt Pullen understands the teams are using housing data at different trigger points depending on their work throughout the development stages. His team is using mostly Exacom and Excel spreadsheets for their CIL/S106 projection of income and workload (development progress) based on their assumption rules. Variables for their assumption is adjusted by development such as sqm, unit types and tenures. As planning could be different from reality, he’d like to have forecasting not solely based on policies. He understands the importance of having access to reliable data, considering the council teams forecast and align their services through the development’s sequence. He’d like to have a consistent forecasting data furthering it to comprehensive (commercial inclusive) and be able to convert the house data into people (demographic) data."*

Comparing the reference summary with those generated by Copilot and Llama 3.2, both scores are very close.

##### **Code Tab**

In [11]:
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-mpnet-base-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-mpnet-base-v2')

# Function to calculate embeddings
def get_embeddings(text):
    encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    with torch.no_grad():
        model_output = model(**encoded_input)
    return model_output.pooler_output

# Function to calculate cosine similarity
def cosine_similarity(embedding1, embedding2):
    # Ensure the embeddings are 2D (1, N)
    embedding1 = embedding1.unsqueeze(0) if embedding1.dim() == 1 else embedding1
    embedding2 = embedding2.unsqueeze(0) if embedding2.dim() == 1 else embedding2
    # Compute cosine similarity and reduce to a single number
    similarity = F.cosine_similarity(embedding1, embedding2, dim=1)
    return similarity.mean()


# Texts to compare
reference_summary = """Matt Pullen understands the teams are using housing data at different trigger points depending on their work throughout the development stages. His team is using mostly Exacom and Excel spreadsheets for their CIL/S106 projection of income and workload (development progress) based on their assumption rules. Variables for their assumption is adjusted by development such as sqm, unit types and tenures. As planning could be different from reality, he’d like to have forecasting not solely based on policies. He understands the importance of having access to reliable data, considering the council teams forecast and align their services through the development’s sequence. He’d like to have a consistent forecasting data furthering it to comprehensive (commercial inclusive) and be able to convert the house data into people (demographic) data. 
"""
summaryC = """Matthew Pullen discussed the various ways housing data is used at different stages in the development process, from the pre-application stage to the final decision, including planning committee resolutions and section 106 agreements. He highlighted the importance of different commencement points, which vary depending on the context, such as building control, planning, CIL, and section 106. Matthew also explained that housing data is crucial for forecasting housing completions, CIL income, and section 106 income, which supports various teams' jobs. Additionally, he emphasized the need for a comprehensive system that includes both residential and non-residential data, and considers different types of housing, such as student housing and co-living. Finally, he discussed the challenges of identifying site boundaries and capturing data for sites not yet in the planning system.
"""
summaryL = """Matthew Pullen expressed concerns about the usefulness of population projections for urban planning and development. They questioned the accuracy and reliability of existing models, citing a lack of consideration for overcrowding and assumptions based solely on policy. The speaker suggested that incorporating solid housing forecast data into these models could improve their accuracy. However, they noted that developing new systems to support population projections can be complex and time-consuming, requiring significant resources and effort. Despite the challenges, the speaker believes that having digital versions of planning documents and data, such as those provided by Exocom's system for section 106 agreements, can help streamline urban planning processes and improve decision-making.
"""

# Get embeddings
embeddings_ref = get_embeddings(reference_summary)
embeddings_1 = get_embeddings(summaryC)
embeddings_2 = get_embeddings(summaryL)

# Calculate similarities
similarity1 = cosine_similarity(embeddings_ref, embeddings_1)
similarity2 = cosine_similarity(embeddings_ref, embeddings_2)

print(f"Similarity between Reference and Summary C: {similarity1.item()}")
print(f"Similarity between Reference and Summary L: {similarity2.item()}")

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Similarity between Reference and Summary C: 0.7885517477989197
Similarity between Reference and Summary L: 0.7846035361289978


|    Model Name    |    SemScore    |
|------------------|----------------|
|    Copilot    |    0.7885517477989197    |
|    Llama 3.2    |    0.7846035361289978    |

**Copilot** scored **0.7886**, while **Llama 3.2** scored **0.7846**. This indicates that both Copilot and Llama 3.2 perform exceptionally well in summarising meeting records, with Copilot having a slight advantage. It is important to note that although Copilot and Llama 3.2 exhibit similar performance under the same prompt, reducing the restrictions of the prompt, such as changing it to ‘Summarise Following Content,’ would result in different focus points in their summaries. See the following results:

When the prompt used for testing is 'Summarise Following Content: ...', the results are as follows:

##### **Llama 3.2 General Output**

This appears to be a transcript of a meeting between Ambrey (likely an employee or representative from the Greater London Authority) and other individuals discussing infrastructure planning in London, specifically regarding population projections, development, and the use of technology to improve processes.

Some key points mentioned include:
1. Population projection: The group is concerned that current population projections may not be accurate, as they don't take into account factors such as birth rates, death rates, and migration.
2. Assumptions vs. reality: The group worries that policies are based on assumptions rather than solid data, which can lead to inaccuracies in planning.
3. Need for better infrastructure planning: The group recognizes the importance of having accurate and up-to-date information to inform infrastructure planning decisions.

Some specific examples and ideas mentioned include:
1. Using Excel spreadsheets to create a more comprehensive and accessible plan.
2. Utilizing the Exocom system for section 106 agreements, which allows for easier searching and analysis of contracts.
3. Leveraging digital technologies to improve processes and make information more available.
4. Comparing London's infrastructure planning process to that of Lewisham, where they had a more comprehensive and effective system in place.

Overall, the conversation highlights the challenges and complexities involved in infrastructure planning, as well as the need for innovative solutions and improved data management to support informed decision-making.

##### **Copilot General Output**

Matthew Pullen discussed the various ways housing data is used at different trigger points in the development process. He mentioned that housing data is used from the pre-application stage through to the final decision, including the resolution to grant through planning committee and the final decision with section 106. He highlighted the importance of different commencement points, which vary depending on the context, such as building control, planning, CIL, and section 106.

Matthew also explained that housing data is used for forecasting housing completions and CIL income, as well as section 106 income, which supports various teams' jobs. He mentioned that this data is crucial for forecasting workload and planning services.

Additionally, Matthew emphasized the importance of having a comprehensive system that includes both residential and non-residential data, as well as the need to consider different types of housing, such as student housing and co-living. He also discussed the challenges of identifying site boundaries and capturing data for sites that are not yet in the planning system.

Besides producing more text due to the absence of a sentence count limitation, it can also be observed that the two models focus on different summary points once the 'Paragraph' prompt restriction is removed. Copilot emphasizes a macro-level summary, attempting to cover broad topics extensively, while Llama 3.2 focuses on the micro-elements of the text, striving to delve into the categorization and significance of key content. This variation may relate to the differences in their training corpora and the fundamental principles of the models. In future work, either Copilot or Llama 3.2 can be chosen for Abstractive Summarisation based on specific job requirements.

#### **Abstractive Summarisation Result**

Below are the output results from both models when the prompt used is 'Summarise the following content into a five-sentence paragraph: ...' for all speakers:

| Name             | Llama 3.2 Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Copilot Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Matthew Pullen   | Matthew Pullen expressed concerns about the usefulness of population projections for urban planning and development. They questioned the accuracy and reliability of existing models, citing a lack of consideration for overcrowding and assumptions based solely on policy. The speaker suggested that incorporating solid housing forecast data into these models could improve their accuracy. However, they noted that developing new systems to support population projections can be complex and time-consuming, requiring significant resources and effort. Despite the challenges, the speaker believes that having digital versions of planning documents and data, such as those provided by Exocom's system for section 106 agreements, can help streamline urban planning processes and improve decision-making. | Matthew Pullen discussed the various ways housing data is used at different stages in the development process, from the pre-application stage to the final decision, including planning committee resolutions and section 106 agreements. He highlighted the importance of different commencement points, which vary depending on the context, such as building control, planning, CIL, and section 106. Matthew also explained that housing data is crucial for forecasting housing completions, CIL income, and section 106 income, which supports various teams' jobs. Additionally, he emphasized the need for a comprehensive system that includes both residential and non-residential data, and considers different types of housing, such as student housing and co-living. Finally, he discussed the challenges of identifying site boundaries and capturing data for sites not yet in the planning system. |
| Matt Newby       | Matt Newby, corporate lead in the planning service, discussed the use of data in urban planning and development. He emphasized the importance of analyzing tenure accessibility, wheelchair accessible homes, starts, completions, and approvals by sub area to assess policy performance and deliverables. Matt highlighted that the London plan's requirements for wheelchair accessible homes are difficult to report on at a unit level, and he expressed interest in having better data to support policy assessments. He also mentioned that commercial data is currently challenging to obtain and analyze, particularly with regards to permitted use classes and flexibilities within those classes. Newby suggested promoting a dashboard that integrates planning policy layers, including site allocations, employment land, and town centres, to improve analysis and decision-making in urban planning. | Matt Newby, a corporate lead in the planning service, discussed the use of housing data in various contexts, including Plan Making, authority monitoring reports, and corporate performance to the mayor's annual delivery plan. He emphasized the importance of having up-to-date data for quality checks and mentioned that the data is used to assess policy performance and provide commentary on housing delivery. Matt highlighted the need for a comprehensive system that includes both residential and commercial data, and considers different types of housing, such as student housing and co-living. Additionally, he discussed the challenges of identifying site boundaries and capturing data for sites not yet in the planning system. Finally, he mentioned the importance of having a system that can provide a complete picture of housing and commercial developments. |
| Camelia Smith    | Camelia Smith discussed the importance of disaggregating housing types to improve analysis and decision-making in urban planning. She mentioned that the GLA has a dashboard with various types of housing, including flats, houses, bungalows, and more, which could be useful for planners. The conversation also touched on the challenges of data collection and accuracy, particularly when it comes to counting gross versus net dwellings and ensuring consistency across departments. Smith suggested using a flagging system to track changes in unit numbers and monitor updates, as well as incorporating unit mix data to inform development decisions. Additionally, she highlighted the importance of contextual planning information, such as conservation areas and listed properties, which could be overlaid on policy maps to provide a more comprehensive understanding of site contexts. | Camelia Smith emphasized the importance of understanding site feasibility and context when developing sites and submitting planning applications. She mentioned that knowing the density, number of units, and any precedents set in the locality helps in making sound justifications for development proposals. Camelia also highlighted the need for a comprehensive view of planning consents and constraints to avoid overlooking critical information. She suggested that integrating policy layers into the dataset would help in designing projects by providing relevant policy information for each site. Lastly, she noted that the GLA has a dashboard that includes different types of housing, which could be useful to include in their own dataset. |
| Megan Rourke     | Megan Rourke discussed the importance of having a centralized dashboard for housing delivery projects, highlighting the need for an overview of site constraints, photos, and other relevant information. She emphasized the value of creating a map that displays these factors, allowing for quick assessments and prioritization of sites. Megan also touched on the challenges of managing data across multiple Excel spreadsheets and departments, advocating for a single database where all stakeholders can access and update information simultaneously. The conversation also centered around the need for clearer policy guidance and visualization tools, such as highlighting applicable policies on site maps or planning briefs. Additionally, Megan mentioned the frustration of human error in tracking changes to data, particularly with regards to resident wait lists and housing register updates, which can be updated frequently due to changing circumstances. | Megan Rourke discussed the use of a dashboard for housing delivery, which includes site address, ward, and whether the land is HRA or general funds. She mentioned the importance of having an overall map to show site constraints and photographs, which would help in presenting information to the mayor regarding project targets and unit numbers. Megan also highlighted the need for quick site assessments for site prioritization, referencing previous feasibility studies, flood zones, and anticipated unit mix. She emphasized the importance of having a comprehensive view of planning consents and constraints to avoid overlooking critical information. Lastly, Megan suggested that integrating policy layers into the dataset would help in designing projects by providing relevant policy information for each site. |
| Steven Heywood   | The speaker from the planning policy team discussed their challenges in accessing and utilizing existing housing data. They highlighted the need for a centralized platform that can provide aggregate numbers on housing completions, tenure, and affordability, as well as individual scheme-level information such as construction timelines. The team currently relies on spreadsheets and manual data collection to gather this information, which can be time-consuming and inefficient. The speaker expressed interest in a mapping-based database that could display housing data spatially by area or sub-area of the borough, with features such as filtering by bedroom count and open space square meterage. They believe that such a system would greatly improve their ability to track progress and inform future local plan development, making it more efficient and effective for monitoring and projecting housing delivery. | Steven Heywood, from the planning policy team, discussed the importance of housing data for understanding past and future delivery, focusing on aggregate numbers and completion data. He mentioned the challenges of disentangling data from various sources and the usefulness of a centralized system. Steven also highlighted the need for annual monitoring reports and the significance of having accessible data for housing delivery projections. He expressed the desire for a map-based visualization of data to see housing completions by sub-areas of the borough. Lastly, he emphasized the potential benefits of visualizing data spatially for easier understanding and reporting. |
| Hannah Horton    | Hannah from the development coordination team expressed interest in having a centralized platform to access housing data, particularly for tracking commencement dates and infrastructure delivery. She believes that such a system would enable her team to better support developers, collect funding, and forecast future resource needs. Currently, her team relies on an Excel spreadsheet to gather data, which she thinks could be improved with a more robust and user-friendly platform like Exocom. Hannah is familiar with the CIL team's calculations and notes that extracting data from Exocom into a dashboard would be beneficial for tracking development timelines and planning conditions. She also hopes that such a system could help her team time construction forums around key developments, improving their ability to coordinate with infrastructure planning teams. | Hannah Horton, from the development coordination team, works closely with developers, highways, and environmental health to support the construction phase of developments. She is particularly interested in commencement dates to ensure proper support and funding collection for the Council. Hannah's team uses a basic Excel spreadsheet to track internal data such as site visits and construction forums. She emphasized the importance of having a centralized and up-to-date data dashboard to improve efficiency and reduce the need for manual data collection. Additionally, she mentioned the potential benefits of including information on infrastructure being delivered as part of development schemes, such as schools or healthcare centers. |
| Sam Happe        | Sam from the Council tax valuation department discussed the importance of a centralized data platform for managing new developments and their associated information. He mentioned that currently, they rely on various sources, including developers, planning teams, and the Inland Revenue Valuation Office, to gather and update this information. Sam highlighted the need for consistency and accuracy in reporting, particularly when it comes to terms like "re-provided" and "additionality", which are used in regeneration schemes. He also noted that the current system can be prone to errors and inaccuracies, especially when trying to match orders with planning permissions or identifying missing information. Sam believes that a standardized platform would greatly benefit their department, enabling them to report on the same data in different ways and reducing errors in their processes. | Sam Happe, from the Council tax valuation part, discussed the importance of forward planning and providing projection figures for the next few years. Sam emphasized the need for accurate and timely data to ensure that new properties are listed correctly and that the government funding, known as the new homes bonus, is received. They also mentioned the importance of working closely with developers and the Inland Revenue Valuation Office to ensure that properties are banded correctly and that all information is up-to-date. Sam highlighted the usefulness of a centralized data platform for managing this information and ensuring that nothing is missed. Additionally, Sam pointed out the challenges of keeping track of new developments and the need for a tool that can help with this process. |
| Crissi Russo     | The speaker from the housing association emphasized the importance of having a centralized system to access data, as similar information is needed by various teams across the organization. They highlighted discrepancies in figures and reporting requirements between building control and housing departments, which would be addressed with a standardized platform. The speaker expressed concern about who inputs the data and how it is collated, suggesting that individuals from different departments should populate the system to ensure accuracy. They also noted the need for clear definitions of terms such as "re-provided" and "additionality" in regeneration schemes, which would help ensure consistency in reporting. Ultimately, the goal is to have a shared set of core data that can be filtered by department, enabling teams to report on the same information in different ways. | Crissi Russo mentioned that from a housing perspective, it is important to understand where planning approvals have been received, when starts on site are expected, and when schemes are expected to be delivered. They emphasized the need for direct engagement with RPs (Registered Providers) to capture any deeds of variation that might not be reflected in planning. Crissi also highlighted the discrepancies in figures between different teams, such as building control and housing associations, due to issues like unresolved leaks. They pointed out the importance of understanding whether regeneration schemes are re-providing existing units or delivering additional units. Lastly, Crissi suggested that it would be beneficial to have a centralized system that pulls data from different sources to ensure accuracy and consistency. |
| Melissa Spearman | The speaker is discussing the challenges of accessing information on completed construction projects in London. They mention that building control has been able to gather some data through their records, but more accurate and up-to-date information would be beneficial for monitoring progress and ensuring compliance with regulations. The speaker hopes to create a "source of truth" system that can pull together data from various sources, including permits, section 106 agreements, and construction plans. This would make it easier to track the completion of projects and ensure that developers are meeting their obligations, particularly in terms of infrastructure delivery. By having access to this information, the speaker's team could better inform decision-making and planning for future developments, and ultimately help deliver more effective development control policies. | Melissa Spearman, who works in section 106, explained that their team focuses on housing commencements and affordable housing delivery, using multiple data sources like building control commencements, CIL commencement information, and site visits. They collaborate with the housing team to get updates on affordable housing progress. Melissa mentioned that they do not rely on a single data source but rather gather information from various teams within the Council. Additionally, she highlighted the manual nature of their work, which involves checking with individual developers to get timelines and updates. Lastly, Melissa noted that their monitoring system, Exocom, lacks dashboard capabilities but provides detailed information on affordable housing delivery. |
| Chris Hancox     | The discussion centered around improving the functionality of the planning application tracking system on Accolade. One suggestion was to include a field for CIL commencement, which would provide a clear indicator of when development had begun. Another proposal was to incorporate completion certificates from independent inspectors and building control into the system, making it easier to track progress and identify potential issues. The group also brainstormed ideas for automated systems that could alert teams automatically when certain criteria were met, such as occupation or discharge of conditions. Additionally, the conversation touched on the use of levelling up legislation to encourage developers to move stalled projects forward by providing a completion notice and requiring them to disclose their time scales and reasons for delays. | Chris Hancox discussed various data sources and their importance in monitoring housing commencements and affordable housing delivery. He mentioned that one of the new data sources will be the commencement notices required under the Levelling-up and Regeneration Act (LURA) 2023 regulations. Chris highlighted the use of Exocom, their main system, which includes various spreadsheets to monitor housing data. He emphasized the importance of accurate data and mentioned that they are working on filling in detailed information about affordable housing, including building control reference numbers and the status of units. Additionally, he pointed out the need for a column to indicate whether CIL (Community Infrastructure Levy) has been paid or if a commencement notice has been issued. |
| Natalya Palit    | Natalya Palette introduced herself as the new plan making team leader and discussed the housing data used for the annual monitoring report, led by Matt Newby. She emphasized the need to prepare a detailed commentary on housing supply that supports local plans, focusing on completions, consented developments, and demographic projections. Natalya shared her experience with similar issues in consultancy work, including the importance of children's play space requirements and off-site contributions, as seen in Southwark's adopted plan. She mentioned that colleagues at Enfield council have been grappling with similar issues, using Exacom and exploring ways to link it with Council tax systems. Natalya suggested that a dashboard would be beneficial for the team during examination and hearing sessions, providing quick access to up-to-date information and facilitating faster response times. | Natalya Palit, who joined last month as the leader of the plan making team, discussed the housing data they use, which feeds into the annual monitoring report led by Matt Newby. She emphasized the need for detailed commentary on housing supply to support local plans, focusing on overall completions, consents, dwelling size mix, and tenure mix. Natalya shared an approach from her previous experience in Enfield, where they used an MS Teams form to streamline the process of updating annual monitoring reports by collecting data from developers. She suggested that having sub-area data within the borough, divided into City fringe, Central, Isle of Dogs, and the east side, would be helpful for their monitoring indicators. Additionally, she mentioned the importance of understanding the quantum of floor space and the net versus gross gain for mixed-use developments. |
| Paul Buckenham   | The discussion revolves around the complexities of managing multiple planning permissions on a single site, and how it can be challenging to track their progress and compliance. A potential solution is to break down the process into individual properties within a development unit, allowing for better monitoring and tracking of their status. The developer's actual building plans and timelines may not align with their original permission, highlighting the need for ongoing monitoring and assessment. The discussion also touches on the importance of planning discharge and conditions, as well as enforcement of compliance to ensure that approved developments are built as intended. However, challenges arise from potential confidentiality around certain aspects of site management, such as affordable housing and commercial-sensitive information, which may limit the ability to gather accurate data. | Paul Buckenham, head of development management, discussed the importance of accurate housing data for the mayor's pledge to deliver 4000 affordable homes by spring 2026. He emphasized the need to join the dots between planning permissions, commencements, and completions to build a trajectory of potential completions. Paul mentioned that his service traditionally wouldn't get involved with housing data but now needs to help the mayor understand how close they are to the pledge. He highlighted the challenge of forecasting completions and the necessity of contacting developers to get accurate timelines. Paul also noted the importance of having a consistent methodology for measuring housing data and the potential benefits of a single point of truth for the Council.|





It is important to note that due to inevitable errors in speech-to-text transcription, the model outputs can be affected. For instance, in Llama 3.2’s output for Matthew Pullen’s text, it incorrectly mentions an “Ambrey” employee, which was not a focal point of the meeting, nor was any speaker named “Ambrey”. In this regard, Copilot demonstrates better robustness.

### Extractive Summarisation VS Abstractive Summarisation

Comparing the two summaries, they have the following characteristics, points to note, applicable scenarios, advantages and disadvantages:


| Category        | Extractive Summary                                                                                                                                                                                | Abstractive Summary                                                                                                                                         |
|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Characteristics** | 1. Selects key sentences or phrases directly from the original text.<br>2. Focuses on preserving the most important or information-rich parts without altering the wording.                        | Generates new, concise summaries by understanding and rephrasing the original content.                                                                       |
| **Advantages**     | 1. Lower time and computational costs for model calculation.<br>2. Training of the model is transparent and can be used locally to ensure data security.                                           | 1. Uses natural language processing to interpret and convey the core meaning in a new form, providing smoother, more coherent, and more humane summaries.<br>2. High quality and readability of outputs.             |
| **Disadvantages**  | The sentences are not coherent as they are extracted directly from the text, which might lead to a lack of understanding of the context.                                                           | Higher computational costs compared to extractive summarization.<br>Prompt is not the only factor affecting the model's output; many models still operate as "black boxes."                                           |
| **Cautions**       | 1. Sensitivity to the value of tokens is unclear, and other metrics such as tokens used might vary with different texts, implying longer debugging times and uncertain outputs.                     | 1. Current LLM technology can still fabricate content from nothing.<br>2. Summaries might be affected by spelling errors in the original text.<br>3. Some models still require online access, e.g., Copilot. Check user agreements for data security before use. |
| **Applicability**  | Used when the key points of the original text need to be directly viewed and cited.                                                                                                               | Suitable for rephrasing long, complex information into concise, easily understandable forms.                                                                  |



### Insight

Whether in the results of extractive summarisation or abstractive summarisation, several key points were highlighted:
1. Data Sets: Nearly all participants emphasised the importance of data sets for their work, especially in <u>annual monitoring reports, demographic forecasting, land analysis, workflow processes, and the local plans</u>.
2. Data Platforms: Some participants suggested considering <u>new platforms</u> to replace older ones like Excel to enhance office efficiency.
3. Data Requirements: Some participants mentioned what the data should include, such as <u>types of housing, stages of construction, and affordable housing</u>.
4. Data Updates: There are currently inconsistencies in data due to a <u>lack of synchronised updates</u> between different departments or teams. Member suggested the necessity of establishing a centralised and reliable data source.

## Keywords

In addition to summarisation, this paper also compares different models for extracting keywords for meetings. Keywords help people know the essential themes. Particularly for User Research Meetings, extracting keywords enables quick understand people's areas of interest and expected improvements in their work. This information can guide the future digital produces. Meanwhile, using familiar keywords as titles or focal points on websites can enhance content accessibility, allowing individuals with reading disabilities to more easily find and use information.

This paper considers experimenting with six language processing models for keyword extraction: **YAKE, KeyBERT, Summa, TF-IDF, Copilot, and Llama 3.2**. YAKE, KeyBERT, Summa, and TF-IDF are tools specifically designed for keyword extraction, employing methods tailored for this purpose, usually <u>based on rule-based, statistical, or embedding-driven techniques</u>. In contrast, Copilot and Llama 3.2 perform keyword extraction within <u>a broader range of language processing tasks using their general language understanding capabilities</u>, benefiting from their ability to integrate multiple data sources and user-specific contexts.

Excluding Copilot and Llama 3.2, whose operating principles are not disclosed due to intellectual property and trade secrets, the principles, features, and unique aspects of the other four LLM models are as follows:

| **Method**   | **Theory**                                                                 | **Features the Model Considers**                                           | **Unique Aspects**                                     |
|--------------|----------------------------------------------------------------------------|---------------------------------------------------------------------------|--------------------------------------------------------|
| **YAKE**     | Statistical, unsupervised. Relies on **local document properties**.            | - Word frequency within the document<br>- Word position<br>- Word co-occurrence<br>- Dispersion across the text | Does not require external corpus or training. Fast and lightweight. |
| **KeyBERT**  | Semantic, based on BERT embeddings. Measures **contextual similarity** between document and words. | - Deep contextual embeddings (BERT)<br>- Cosine similarity between document and keyword embeddings | Contextual understanding of text. Captures the meaning behind words and phrases. |
| **Summa**    | Graph-based ranking algorithm (TextRank), inspired by Google's PageRank.   | - Word co-occurrence<br>- Graph-based ranking of words by importance in the document | Captures relationships between words (**connection between words**), ideal for key phrase extraction. |
| **TF-IDF**   | Statistical, unsupervised. Combines **term frequency** and **inverse document frequency**. | - Word frequency in the document<br>- Rarity of words across the corpus | Simple and efficient. Weighs document-specific importance vs. rarity across a corpus. |

### Model Comparison

The outputs of the ten keywords from Matthew Pullen's text by the six models are as follows:

| Yake              | KeyBERT         | Summa          | TF-IDF          | Copilot                | Llama 3.2        |
|-------------------|-----------------|----------------|-----------------|------------------------|------------------|
| assumptions       | commencement    | planned        | assumptions     | Housing data           | Infrastructure   |
| planning          | planning        | plan           | section         | Trigger points         | Planning         |
| section           | plans           | housing data   | permission      | Pre-application        | Population       |
| data              | commence        | space          | planning        | Section 106            | Projections      |
| permission        | planned         | excel          | data            | Commencement           | Technology       |
| system            | triggers        | place          | 106             | Forecasting            | Data             |
| housing           | application     | permission     | system          | CIL income             | Analysis         |
| planning system   | assumptions     | application    | commercial      | Workload               | Development      |
| commercial        | agreements      | section        | commencement    | Non-residential data   | Growth           |
| applications      | development     | sectional      | metre           | Student housing        | London           |


#### **Code Tab**

##### **Set Stop Words**

In [12]:
custom_stop_words = [
    "a", "about", "actually", "almost", "also", "although", "always", "am", "an", "and",
    "any", "are", "as", "at", "be", "became", "become", "but", "by", "can", "could",
    "did", "do", "does", "each", "either", "else", "for", "from", "had", "has", "have",
    "hence", "how", "i", "if", "in", "is", "it", "its", "just", "may", "maybe", "me",
    "might", "mine", "must", "my", "neither", "nor", "not", "of", "oh", "ok", "on",
    "once", "one", "only", "or", "other", "our", "ours", "out", "own", "people", "put",
    "re", "really", "s", "same", "she", "should", "so", "some", "such", "t", "than",
    "that", "the", "their", "theirs", "them", "then", "there", "these", "they", "this",
    "those", "through", "to", "too", "under", "until", "up", "very", "was", "we", "were",
    "what", "when", "where", "which", "while", "who", "whom", "why", "will", "with",
    "you", "your", "yours", "yourself", "yourselves", "ve", "ll", "yeah", "want", "know",
    "think", "well", "going", "get", "got", "mean", "need", "point", "something", "stuff",
    "make", "makes", "making", "things","because", "would","into","all", "more","different",
    "years","things","many","like","layer","guess","based","thing","take","whether","site",
    "over","come","look","lot","kind","use","used","useful","say","don","time","sort","square",
    "off","down","quite","probably","bit","start","two","big","go","done","even","good","way",
    "long"
]

NLTK's list of english stopwords: https://gist.github.com/sebleier/554280

75 Stop Words That Are Common: https://blog.hubspot.com/marketing/stop-words-seo

##### **Yake**

Yake not rely on external data, corpus or pre-trained models, but based on the document its self (**Local Statistical Features**). Following are the features when Yake looking for a document or content:
1. **Term Frequency**: how many time the words appear in the doc (except those stop words).
2. **Position in the Doc**: Yake believes that the title, introduction or first paragraphys may be more relevant to the overall topic. Yake will give those word more weight when it do the coculation.
3. **Word Dispersion**: How evenly a words is distributed through the doc. Yake believe those words that appears across diffrerent paragraphs are more importent to the words appear in  only few sections
4. **Co-occurrence**: considering the same topic for different words (for example: "plan" and "planning" at the same topic, but different words)
5. **Uniqueness**: In order to avoid over-representing words, words that are distinct or rare within the document (but not necessarily rare in the entire language) can be considered more important.
6. **Contextual Relationship**: This can be measured by how close they appear to each other or how they influence one another in their surrounding context.

In [13]:
# set the content for analysis
text = matthew_pullen_content

kw_extractor = yake.KeywordExtractor(
    #language=english
    lan= "en", 
    # max 3 words
    n=3, 
    # aviod duplicate word appear in the result (90% similarity)
    dedupLim=0.9, 
    #top 10 keywords
    top=10, 
    stopwords=custom_stop_words
)

# entracting the keywords from "text"
keywords = kw_extractor.extract_keywords(text)

# print the keywords
print("Top keywords and their scores:")
for kw, score in keywords:
    print(f"{kw}: {score}")

yake_list = [kw for kw, score in keywords]

Top keywords and their scores:
assumptions: 0.01910470469030072
planning: 0.02714639930086541
section: 0.030576054957495272
data: 0.033984039463766776
permission: 0.049955091044590755
system: 0.06216639122715094
housing: 0.06271764373002733
planning system: 0.08258617466765696
commercial: 0.08264724501909056
applications: 0.09507326540820295


##### **KeyBERT**

This KeyBERT library built based on BERT model for extract the keywoords. The features are following:
1. **Semantic Similarity**: the similarity between document embedding anf keyword embedding. The cosine similarity between two vectors \( A \) and \( B \) is calculated as:

$$
\text{Cosine Similarity} = \frac{A \cdot B}{||A|| \cdot ||B||}
$$

- \( A \cdot B \) represents the dot product of vectors \( A \) and \( B \),
- \( ||A|| \) and \( ||B|| \) represent the magnitudes (norms) of vectors \( A \) and \( B \).

2. **Contextual Understanding**:BERT is trained to understand words in their context by looking at both the preceding and following words. This means that KeyBERT does not simply count word frequencies but instead understands how each word or phrase fits into the meaning of the document.
3. **BERT Embeddings**: KeyBERT compares these embeddings to rank candidate keywords. The embeddings are based on pre-trained knowledge from large corpora (e.g., Wikipedia), so they include a deep understanding of language structure and meaning.
4. **No Dependence on Frequency**: KeyBERT focuses solely on semantic relevance. Even if a word appears only a few times in the document, it can still be identified as a key concept if its meaning aligns closely with the document.

In [14]:
doc = matthew_pullen_content

kw_model = KeyBERT()
keywords = kw_model.extract_keywords(
    doc, 
    # number of words for one keywords
    # I set 1,1 after testing 1,1; 1,2 and 1,3. 1,1 has the best performance.
    keyphrase_ngram_range=(1, 1), 
    # number of keywords we looking for
    stop_words=custom_stop_words, top_n=10)

# print the keywords and its score
for keyword, score in keywords:
    print(f"{keyword}: {score}")
keybert_list = [keyword for keyword, score in keywords]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

commencement: 0.4556
planning: 0.4397
plans: 0.3788
commence: 0.3247
planned: 0.3244
triggers: 0.3092
application: 0.3047
assumptions: 0.3038
agreements: 0.2909
development: 0.2906


Ref: https://github.com/MaartenGr/KeyBERT

##### **Summa**

Summa relies on graph-based ranking and co-occurrence relationships within the document (based on the TextRank algorithm). The features are following:
1. **Graph-based Ranking**: Each word in this graph is considered as a node. The <u>connection</u> between nodes is considered as one of the criteria of their importance.
2. **<u>Connection</u>**(Word Co-occurrence):Two words are considered related if they appear very close to each other in the text (within a predefined <u>window size</u>). The more times two words co-occur, the stronger their connection in the graph.
3. **<u>Window size</u>**: The window size is a parameter that defines how close two words need to be to create an edge between them (summa.keywords does not allow window resizing. The default is 2.).
4. **PageRank-style Voting**:Summa uses a variation of Google's PageRank algorithm, which is commonly used to rank web pages. In this case, instead of ranking web pages, words in the document vote for each other based on their relationships (co-occurrence).
5. **No Dependence on Frequency**

In [15]:
# considering the "-s" words in content
def add_plural_words(word_list):
    pluralized_words = []
    for word in word_list:
        if word.endswith('y'):
            # For the "-ies"
            pluralized_words.append(word[:-1] + 'ies')
        elif word.endswith('s') or word.endswith('sh') or word.endswith('ch') or word.endswith('x'):
            # For the "-es"
            pluralized_words.append(word + 'es')
        else:
            # For the "-s"
            pluralized_words.append(word + 's')
    return word_list + pluralized_words

# Based on the original stop word list, comes out a list including their "-s" formet
custom_stop_words_s = add_plural_words(custom_stop_words)

# Print top 100 words for view
print(custom_stop_words[:100])

['a', 'about', 'actually', 'almost', 'also', 'although', 'always', 'am', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'became', 'become', 'but', 'by', 'can', 'could', 'did', 'do', 'does', 'each', 'either', 'else', 'for', 'from', 'had', 'has', 'have', 'hence', 'how', 'i', 'if', 'in', 'is', 'it', 'its', 'just', 'may', 'maybe', 'me', 'might', 'mine', 'must', 'my', 'neither', 'nor', 'not', 'of', 'oh', 'ok', 'on', 'once', 'one', 'only', 'or', 'other', 'our', 'ours', 'out', 'own', 'people', 'put', 're', 'really', 's', 'same', 'she', 'should', 'so', 'some', 'such', 't', 'than', 'that', 'the', 'their', 'theirs', 'them', 'then', 'there', 'these', 'they', 'this', 'those', 'through', 'to', 'too', 'under', 'until', 'up', 'very', 'was', 'we', 'were', 'what', 'when']


In [21]:
text = matthew_pullen_content

# Use TextRank to extract the keywords
extracted_keywords = keywords.keywords(text, 
                                       #Return keyword results as a list instead of a single string
                                       split=True, 
                                       #When returning keywords, their scores are returned. The higher the score, the higher the relevance between the keyword and the document.
                                       scores=True, 
                                       #stop words
                                       additional_stopwords=custom_stop_words_s)

# no "s"  or "ing" end words
filtered_keywords = [(kw, score) for kw, score in extracted_keywords if not kw.endswith('s') and not kw.endswith('ing')]

# number of keywords (15)
filtered_keywords = filtered_keywords[:10]

# print
print("Filtered Extracted Keywords (Top 10):")
for kw, score in filtered_keywords:
    print(f"{kw}: {score}")

# creat 'summa_list' to including the result for later use
summa_list = [kw for kw, score in filtered_keywords]

Filtered Extracted Keywords (Top 10):
planned: 0.2899959537626859
plan: 0.2899959537626859
housing data: 0.22222526545957466
space: 0.15200036162569006
excel: 0.1340973732843077
place: 0.13409737328430651
permission: 0.13409737328430602
application: 0.13376710876071438
section: 0.1094423366358233
sectional: 0.1094423366358233


##### **TF-IDF**

TF-IDF（Term Frequency - Inverse Document Frequency) 

The features are following:
1. **Term frequency** measures how often a word appears in a document.
Term Frequency (TF) is calculated as:

$$
\text{TF}(t, d) = \frac{\text{count of term } t \text{ in document } d}{\text{total number of terms in document } d}
$$

2. **Inverse document frequency** measures the rarity of a word in the entire collection of documents (corpus). If a word appears frequently in many documents, it is a common word and may not be important; if a word appears in only a few documents, it is more discriminative and may be more important.
Inverse Document Frequency (IDF) is calculated as:

$$
\text{IDF}(t) = \log \left( \frac{N}{\text{count of documents containing term } t} \right)
$$

Where:
- **N**: is the total number of documents.
- **Count of documents containing term**: is the number of documents in which the term \( t \) appears.

3. The combination of TF and IDF makes those words that appear frequently in documents but are relatively rare in the entire document collection considered as important keywords.

In [22]:
# Add a collection of documents (corpus)
documents = [
    matthew_pullen_content,
    matt_newby_content,
    camelia_smith_content,
    megan_rourke_content
]

# Create a TF-IDF Vectorizer and use the custom stopwords list
tfidf_vectorizer = TfidfVectorizer(stop_words=custom_stop_words, ngram_range=(1, 2))

# Fit and transform the documents into a TF-IDF matrix
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)

# Get the feature names (words)
words = tfidf_vectorizer.get_feature_names_out()

# Get the TF-IDF values for each word in the specific document (first document in the list)
tfidf_scores = tfidf_matrix[0].toarray().flatten()  # This retrieves the TF-IDF values for matthew_pullen_content

# Combine the words with their corresponding TF-IDF scores
word_scores = dict(zip(words, tfidf_scores))

# Sort the keywords by their TF-IDF scores in descending order
sorted_word_scores = sorted(word_scores.items(), key=lambda x: x[1], reverse=True)

# Print out the top 15 keywords
print("Top 10 keywords:")
for word, score in sorted_word_scores[:10]:
    print(f"{word}: {score}")

# Save the top 15 keywords in a list for later use
tfidf_list = [word for word, score in sorted_word_scores[:10]]

Top 10 keywords:
assumptions: 0.3324501160825692
section: 0.16381718130209472
permission: 0.145446925786124
planning: 0.1301146965408906
data: 0.11927180516248306
106: 0.1146720269114663
section 106: 0.1146720269114663
system: 0.1146720269114663
commercial: 0.09829030878125683
commencement: 0.0831125290206423


### Evaluation with Rouge

In order to evaluate the quality of each keyword model, we need to set a gold standard mamually. They are:

| **Keyword**           | **Reason**                                                                                                 |
|-----------------------|------------------------------------------------------------------------------------------------------------|
| **housing data**       | Central of the discussion. Housing data is used for planning, forecasting, and decision-making processes.   |
| **planning**           | Refers to the housing planning process, including applications, approvals, and development stages.          |
| **section 106**        | Refers to legal agreements related to housing processes, important for financial and legal management.   |
| **commercial**         | Highlights the discussion on commercial developments and how they fit within mixed-use projects.            |
| **tenure mix**         | Discusses the variety and distribution of housing types, crucial for planning and resource allocation.      |
| **permission**         | Determining when housing projects can approval and commence.                  |
| **commencement**       | A key stage for the application process.                     |
| **CIL** | Important for calculating the financial contributions towards infrastructure in developments.|
| **forecasting**        | Discusses the process of predicting housing completions, which is key for long-term planning and prediction. |
| **infrastructure**     | Concerns the necessary services and structures (e.g., schools, parks) to support new housing developments.  |

Comparing the reference summary with the summaries produced by the six models.

#### **Code Tab**

In [25]:
# Define the gold standard and models' outputs
gold_standard_keywords = {"housing data", "planning", "section 106", "commercial", 
                          "tenure mix", "permission", "commencement", "CIL", "forecasting", 
                          "infrastructure"}

model_outputs = {
    "Yake": {"assumptions", "planning", "section", "data", "permission", "system", 
             "housing", "planning system", "commercial", "applications"},
    "KeyBERT": {"commencement", "planning", "plans", "commence", "planned", "triggers", 
                "application", "assumptions", "agreements", "development"},
    "Summa": {"planned", "plan", "housing data", "space", "excel", "place", "permission", 
              "application", "section", "sectional"},
    "TF-IDF": {"assumptions", "section", "permission", "planning", "data", "106", 
               "system", "commercial", "commencement", "metre"},
    "Copilot": {"housing data", "trigger points", "pre-application", "section 106", "commencement", 
                "forecasting", "CIL income", "workload", "non-residential data", "student housing"},
    "Llama 3.2": {"infrastructure", "planning", "population", "projections", "technology", 
                  "data", "analysis", "development", "growth", "london"}
}

# Normalize and calculate metrics for each model
def normalize_and_metrics(models, gold_standard):
    results = []
    for model_name, keywords in models.items():
        # Normalize keywords
        normalized_keywords = set(k.split()[0].lower() for k in keywords)
        # Calculate intersection with gold standard
        intersection = gold_standard.intersection(normalized_keywords)
        # Calculate precision, recall, and F1 score
        precision = len(intersection) / len(normalized_keywords) if normalized_keywords else 0
        recall = len(intersection) / len(gold_standard) if gold_standard else 0
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0
        results.append({
            "Model": model_name,
            "Precision": precision,
            "Recall": recall,
            "F1 Score": f1
        })
    return results

# Calculate and prepare results in a DataFrame
metrics_results = normalize_and_metrics(model_outputs, gold_standard_keywords)
results_df = pd.DataFrame(metrics_results)

# Display results
print("The results as following:")
results_df

The results as following:


Unnamed: 0,Model,Precision,Recall,F1 Score
0,Yake,0.333333,0.3,0.315789
1,KeyBERT,0.2,0.2,0.2
2,Summa,0.1,0.1,0.1
3,TF-IDF,0.4,0.4,0.4
4,Copilot,0.2,0.2,0.2
5,Llama 3.2,0.2,0.2,0.2


According to the results, the TF-IDF model achieved the highest score. The keyword extraction results by TF-IDF for the text content of the 12 participants are as follows:

| Name             | Top 10 TF-IDF Keywords                                         |
|------------------|----------------------------------------------------------------|
| Matthew Pullen   | assumptions, section, permission, planning, data, 106, system, commercial, commencement, metre |
| Matt Newby       | data, industrial, commercial, planning, component, economic, sub, live, slightly, plan |
| Camelia Smith    | terms, net, housing, planning, dwellings, see, been, section, types, data |
| Megan Rourke     | planning, gonna, great, lewisham, pre, 2024, click, constraints, presenting, website |
| Steven Heywood   | housing, delivery, delivered, data, past, similarly, uses, areas, borough, future |
| Hannah Horton    | construction, 106, section, infrastructure, team, data, commencement, details, interesting, major |
| Sam Happe        | information, order, developers, planning, directly, list, orders, numbering, development, street |
| Crissi Russo     | information, received, regeneration, similar, understand, perspective, additionality, expectation, handed, rps |
| Melissa Spearman | section, housing, information, 106, affordable, original, work, permission, given, getting |
| Chris Hancox     | information, add, commenced, system, cil, commencement, completion, control, data, 106 |
| Natalya Palit    | plan, helpful, enfield, local, around, terms, certainly, possible, questions, east |
| Paul Buckenham   | Planning, permission, Planning permission, housing, development, affordable housing, granted Planning permission, asking, issues, affordable |


According to the TF-IDF results, while it performed best in terms of Rouge scores, its ability to discern **bi-gram keywords was relatively weak**, even with a prompt specifically aimed at extracting bi-grams. To address this, the results from Copilot, which is more sensitive to bi-gram keywords, are included as a supplement. The results are as follows:

| Name             | Keywords                                                                                                                                                                  |
|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Matthew Pullen   | Housing data, Trigger points, Pre-application, Section 106, Commencement, Forecasting, CIL income, Workload, Non-residential data, Student housing                        |
| Matt Newby       | Plan Making, Authority monitoring reports, Corporate performance, Tenure, Accessibility, Starts, Completions, Approvals, Sub-area, Local plan                             |
| Megan Rourke     | Housing, Regional, Client-side, Dashboard, Site constraints, Planning, Targets, Unit numbers, Risks, Photographs                            |
| Camelia Smith    | Housing types, GLA dashboard, Student accommodation, Shelters, Housing blocks, Planning applications, Density, Unit mix, Feasibility studies, Justification                    |
| Steven Heywood   | Housing data, Planning policy, Housing delivery, Aggregate numbers, Completion data, Tenure, Annual monitoring report, Centralized system, Housing projections, Spatial visualization |
| Hannah Horton    | Development coordination, Construction phase, Commencement dates, Construction forums, Funding collection, Resource forecasting, Highways, Environmental health, Excel spreadsheet, Internal data |
| Sam Happe        | Forward planning, Projection figures, Property listing, Inland Revenue Valuation Office, Property banding, New homes bonus, Inspectors, Planning permissions, Commencement dates, Completion dates |
| Crissi Russo     | Planning approvals, Start dates, Delivery timelines, Registered Providers (RPs), Deeds of variation, Data accuracy, Discrepancies, Housing schemes, Building control, Housing association |
| Melissa Spearman | Section 106, Housing data, Commencements, Affordable housing, Building control, CIL commencement, Site visits, Data sources, Council collaboration, Housing team           |
| Chris Hancox     | Commencement notices, Levelling-up and Regeneration Act (LURA) 2023, Exocom, Affordable housing, Building control, Data accuracy, Data consistency, Standardized methodology, Single source of truth, Housing development |
| Natalya Palit    | Plan making team, Annual monitoring report, Housing supply, Local plans, Completions, Consents, Dwelling size mix, Tenure mix, MS Teams form, Sub-area data                |
| Paul Buckenham   | Affordable homes, Planning permission, Housing developments, Forecasting, Developers, Timelines, Data accuracy, Commencement notices, Building control, Standardized methodology |


It is important to note that although Copilot can generate bi-gram keywords, it also has clear disadvantages. Firstly, its Rouge scores are not particularly high, which means that its keyword accuracy is questionable. Secondly, it may generate non-existent keywords or miss important ones. Additionally, even when using the same prompt, its reproducibility is not high.

The keywords generated by the TF-IDF model and Copilot highlight several focal points of the meeting:

1. **Emphasis on Regulations and Stages**: Keywords such as "planning," "permit," and "Section 106" indicate that a significant part of the discussion revolved around regulatory compliance and planning permissions.
2. **Focus on Planning Processes**: Keywords like "application" and "commencement" suggest that the discussions likely involved the formal planning process.
3. **Data-Centric Approach**: Keywords such as "data," "Excel," and "assumptions" indicate that data processing, accuracy, and analysis were central themes. This likely reflects how data is currently used in planning and how it might be better utilised or managed in future workflows.
4. **Discussion on Current and Future Workflows**: Both sets of keywords reflect a focus on the current practices of managing authority, planning, and project initiation. Additionally, there is a discussion on how to optimize these processes or adapt them to meet future needs.

These results are helpful in determining the direction of future work to address the areas of concern identified by participants.

## Conclusion

**This report explores the potential of integrating language processing technologies into work practices, highlighting several significant point:**

### Achievement
1. **Performance of BART**: The BART model excels in extractive summarisation. For future work, the BART model can be directly employed to extract key sentences from texts, permitting easy viewing and referencing of the main original content.
2. **Capabilities of Copilot and Llama 3.2**: Both models are efficient at abstractive summarisation, with Copilot scoring highest for macro-level summaries. It also requires shorter prompts to complete summaries, facilitating quick comprehension of the entire text's gist. Llama 3.2 shows good summarisation abilities for specific examples. The choice of model can be tailored to specific work requirements.
3. **Keyword Extraction**: TF-IDF performs well with uni-gram words, while Copilot performs well in multi-gram words.

### Comprehensive insights
1. **Data Quality**: Attendees consistently valued the <u>coherence, real-time nature, reliability, and centralisation</u> of housing data. These factors are crucial for enhancing work efficiency and the accuracy of urban and regional planning decisions.
2. **Automation in Housing Digital Products**: There is a strong interest in future housing digital products having automated data collection and management features.
3. **Data Visualisation Platforms**: Data visualisation platforms represent a part of viable trend for the future digitalisation of housing.

### Future directions for language processing technology
1. **Innovation in Copilot and Llama 3.2**: These models, which continually evolve and integrate comprehensive language processing technologies (such as image-language processing), particularly Copilot, are expanding with diversity. As more applications incorporate Copilot, it may facilitate a <u>"Internet of Application"</u>, enhancing user experience with seamless and convenient integration. This is worth exploring for potential uses in housing and planning digital products.
2. **Validation with Rouge and SemScore**: The current validation results are based on reference summaries or keywords drafted by one individual. For greater accuracy, conducting future Rouge tests based on reference summaries and keywords provided by multiple individuals is necessary.
3. **Text Processing Considerations**: The texts used in this study were processed by collating statements from the same individual before evaluating them into the model, which might cause discrepancies between the model’s understanding of context and the actual context. Although no significant impact on abstractive summarisation has been observed to date, this requires attention in subsequent tests.