(1)=
# 1 Understanding Context

(1.1)=
## 1.1 Background

The purpose of the wider project is to develop the tools needed to create systems for situational awareness in the urban context. This project is carried out as part of a research project for the Geospatial Systems CDT at Newcastle University and is funded by DSTL. 

| Latency Level  | Description                                                                                                                        |
|----------------|------------------------------------------------------------------------------------------------------------------------------------|
| Machine-speed real-time | Minimal human intervention. Decision-making at machine speed for rapid and automated response such as immediate traffic routing, congestion control, utility management, overcrowding, etc. |
| Near real-time | Human to coordinate from a predefined set of possible responses. Decision-making for rapid response planning, such as short-term public transportation adjustments, law enforcement dispatching, and emergency response coordination. |
| Short latency | Human-machine collaboration. Decisions made within a few hours to a day, such as daily route planning for waste management, minor infrastructure repair prioritization, and park maintenance scheduling. |
| Medium latency | Automated suggestions with human review. Decision-making over a few days to a week, such as neighborhood resource allocation, parking regulations adjustment, and weekly public service planning. |
| Long latency | Human-led with machine assistance. Decisions over weeks to months, such as public space redesign, road construction planning, and seasonal urban farming planning. |
| Extended latency | Mainly human decision-making. Decision-making over urban planning period - years, such as urban development planning, zoning law changes, and public transportation network redesign. |
| Archive latency | No automation, human creates a bespoke model for a specific planning problem. Long term historical data used for retrospective studies and long-term planning, like demographic changes, urban growth patterns, and infrastructure ageing patterns. |

The project utilises the data made available from the Urban Observatory (UO) in Newcastle upon Tyne, specifically pedestrian count data, generated from object recognition algorithms on CCTV cameras {cite}`chen2021estimating`. The challenge in this project is gaining insight into the *temporal dynamics* and *spatial interaction* on the basis of the UO data. 

(1.2)=
## 1.2 Project Objectives

1. **Can we detect anomalous results in the training
data?** 
    * This requires creating a model that identifies some of the existing crowding events in the data as
anomalies.
2. **Do the patterns of anomaly emergence vary at
different time periods?** 
    * This requires creating a
model that can generalise across different time periods
in which anomalies occur.
3. **What frequency of data is required to make accurate decisions?** This relates to response requirements
and data latency. It depends on the answers to the
following:
    * How far in advance can a model reliably
predict an anomaly using the existing IoT data
(prediction horizon)?
    * Does the prediction horizon and the existing data
frequency provide enough time to coordinate a
response?
4. **How does randomly introducing data gaps affect
the results?** 
    * This involves measuring the change in
accuracy of a model when data is randomly removed
from the training/testing set and then validating on the
randomly removed set.
5. **How do models perform when they are trained and
tested on certain time periods but not on others?**
    * This involves measuring the change in accuracy of
a model when the data is systematically removed
from the model i.e., removing all July data from the
training/testing set, and then validating on July data.
6. **How much weight is assigned to additional features
when they are included in the model?** 
    * This involves
introducing additional features to the training/testing
of a model. These features might be something like
university term dates or match days as Boolean values.
7. **What is the spatial autocorrelation (SAC) between
sensors at different distances?** 
    * This involves calculating spatial autocorrelation statistics for the dataset
of each sensor and computing the pedestrian networkconstrained distance between the sensors. Regression
analysis can then be carried out to quantify any relationship between SAC and distance.

(1.3)=
## 1.3 Project Success Criteria

1. Gain insight into how the sensor data is clustered.
2. Identify the signals within the sensor data (which signals have the highest power).
3. Evaluate how SAC between data from different sensors depreciates with distance. 
    * For this experiment, we will compare data from the same time period for each sensor and measure the SAC between several sensors in a small area.
    * We will measure using LISA (Local Indicators of Spatial Autocorrelation).
4. Assess how SAC changes with temporal aggregation. 
    * For this experiment, we will repeat the steps in the previous criterion, but this time use a variety of temporal aggregations.
    * We will measure this by showing which temporal aggregations show the greatest spatial autocorrelation again using LISA.
5. Predict the time of the year and the sensor location from which a specific time-series window originated.

(1.4)=
## 1.4 Data Mining Objectives

In the first iteration of this report, we will focus on the first objective, aiming to at least partly deliver the first two project success criteria.

(1.5)=
## 1.5 Data Mining Success Criteria

- [x] Identify patterns or trends in the pedestrian count data that can be utilized for further analysis.
- [x] Establish an effective data preprocessing and cleaning method to improve the quality of data for ML models.
- [ ] Develop and train a predictive model that can classify unseen data points in terms of datetime and sensor accurately.
- [ ] Evaluate the model's performance using appropriate metrics, such as accuracy, precision, recall, or F1-score.
- [ ] Document the insights gained from the data mining process that can contribute to the overall project objectives.

## References

```{bibliography}
:filter: docname in docnames
```