# TM351 Data Management & Analysis


## TMA02 Preparation Tutorial

In [53]:
# This cell imports the standard pandas library needed for the tutorial

import pandas as pd
import folium

# for JSON data
import requests
import json

# for MongodB
import pymongo

## TMA02 Review

Review of what you need to do for TMA02

## Logistics reminder

### Preparation

Before you start the TMA:  
* download and unzip `2025J_TMA02.zip`
* rename the folder 2025J_TMA02 by prefixing it with your OU student PI (personal identifier). That is: *yourPI_2025J_TMA02*
* create two subdirectories: `images` and `data`. If you have used OpenRefine these should hold any cleaned data that your notebooks rely on and screenshots that show your working.

For example, before:

!["File structure - before changes"](images/TMA02-25b-before.jpg)


After, assuming my PI is *mg123*:

!["File structure - after"](images/TMA02-25b-after.jpg)
 

Always remember to keep backups of your work! For instance if you are using the local VCE, upload your work to the remote VCE as a backup.

**Deadline**
If you are unable to submit the TMA by 12th March get in touch with your tutor know beforehand to discuss your options.

Some of the questions should be answered in a notebook, some in a solution document called: `yourPI_TMA02_solution.doc (or .docx)`. Each question will guide you as to where you should include your answer.


**Using Generative AI**

*The OU guidance document "Generative AI for students" (https://about.open.ac.uk/policies-and-reports/policies-and-statements/generative-ai-learning-teaching-and-assessment-ou-0) defines the acceptable use policy for using Generative AI to support your studies. Using the framework defined in that document, TMA 02 is classed as a Category 2 activity, which means you may use Generative AI to assist you in completing an assessment piece as long as you acknowledge and report on its use.*

*You **must** complete and submit the Generative AI template supplied:* (https://learn2.open.ac.uk/mod/oucontent/olink.php?id=2548802&targetdoc=TM351+TMA02+Generative+AI+template) 

## Question One - 45 Marks

Police Crimes and Outcomes (8 + 5 + 10 + 10 + 6 + 6 = 45 marks total)

This should be answered in the notebook `yourPI_q1_2025.ipynb`

There are several parts to this question (a-f):

**a) Getting a feel for the Data (8 marks)**

*The police forces publish a wide range of data relating to their operations [ https://data.police.uk/about/ ]. The police publish information on accessing their data via a landing page [ https://data.police.uk/ ].*

*For the purposes of this assessment, you will work with data published by the police on crimes and outcomes in London.*


There are several bits to this part. Do make sure you answer them all:

- *under what terms is the data licensed?*
- *by what methods can copies of the data be obtained from the Police website, and in what file or data format(s) is the data made available?*
- *what data is published for a crime?*
- *identify one good aspect and one poor or weak aspect of how the police make their data available.*

__*8 marks*__

**b) Inspecting the data (5 marks)**

*Outcomes of crimes for an area of central London can be found in JSON format at: [https://data.police.uk/api/outcomes-at-location?date=2024-01&poly=51.575,0.248:51.575,0.001:51.475,0.001:51.475,0.248]*

*Outcomes of crimes for the London borough of Tower Hamlets can be found in JSON format at: [https://data.police.uk/api/outcomes-at-location?date=2024-01&poly=51.536508,-0.021129:51.507804,0.005502:51.508336,0.004562:51.505108,-0.005711:51.487091,-0.008352:51.487045,-0.010779:51.489810,-0.024926:51.502368,-0.029234:51.509395,-0.045493:51.502902,-0.061207:51.506404,-0.073998:51.505460,-0.074663:51.521512,-0.078529:51.535532,-0.062434:51.543330,-0.016550]*

*The following request for a larger area of central London returns a 503 error: [https://data.police.uk/api/outcomes-at-location?date=2024-01&poly=51.575,0.248:51.575,0.001:51.449,0.001:51.449,0.248]*

### Previewing the data in a *pandas* DataFrame

*Download and extract the establishment data from Tower Hamlets into a *pandas* DataFrame, with one establishment data record per row.*

```{admonition} Problems accessing the online data file
If you have problems accessing the online JSON file, a copy retrieved in April 2025 is available in the sample data folder (`2025J_TMA02_data/tma02_police/tower-hamlets-outcomes.json`).
```

**JSON Data**

A non-proprietary format is needed when exchanging data between different systems. Comma Separated Values (CSV) files are commonly used, since they are widely understood and most systems can generate and import such data. CSV files have limitations in that there are no semantics in the data, the best you can hope for is that there is a header row containing meaningful columns

JSON is an example of semi-structured data, which contains {key:value} pairs, adding some semantics to the data.

When examining JSON data, the OS head function is less useful since due to the hierarchical structure and how it has been generated, all the data may be stored in the first line. Even limiting the rows returned will show too much data, making it hard to evaluate if there are any issues.

In [7]:
# uncomment the following row if you want to see the raw JSON data
#!head -1 data/2025J_TMA02_data/tma02_police/tower-hamlets-outcomes-260112.json

Do view the data in a browser, or text editor to check if it appears to be complete.

For example, this is the Central London data downloaded on the 12th January 2026:

!["Central London JSON data"](images/central-london-a.jpg)

This does not look very helpful if you just want to view the data, so click on the `Raw Data` tab to see the JSON data:

!["Central London JSON data"](images/central-london-b.jpg)

Click on: `Pretty Print` if your browser supports it:

!["Central London JSON data"](images/central-london-c.jpg)

In this case it is more appropriate to import the data and examine it in a dataframe.

The module notebook: `02.2.2 Data file formats - JSON.ipynb` includes examples of how to read the data.

Some of the examples used in this notebook use data directly from the BBC's iPlayer library. For example, have a look at BBC's Animal Park:

In [54]:
bbc_url = "http://www.bbc.co.uk/programmes/m0021r4h.json"
bbc_resp = requests.get(bbc_url)

aProgramme = bbc_resp.json()
aProgramme

{'programme': {'type': 'episode',
  'pid': 'm0021r4h',
  'expected_child_count': None,
  'position': 9,
  'image': {'pid': 'p0jj0dwm'},
  'media_type': 'audio_video',
  'title': 'Episode 9',
  'short_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
  'medium_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
  'long_synopsis': 'Along with feeding all the animals at Longleat Safari Park, the keepers need to keep an eye on their mood, behaviour, love life and, of course, their health. And that’s before doing any training or coming up with enrichment ideas. It’s a never-ending job, and they must be ready for anything. \n\nKate Humble’s first job today is creating a new toy for the magnificent colobus monkeys. There are seven boys that live in the troop on ‘gorilla island’, and keeper Carys has attached three bottles to a broom handle and stacked them full of monkey treats. Each bottle has different-sized h

Check what sort of data type is returned, since this can affect what you do with the results.

In [55]:
type(aProgramme)

dict

In [56]:
# Convert to a dataframe
bbc_df = pd.DataFrame(aProgramme)
bbc_df.head(10)

Unnamed: 0,programme
type,episode
pid,m0021r4h
expected_child_count,
position,9
image,{'pid': 'p0jj0dwm'}
media_type,audio_video
title,Episode 9
short_synopsis,"The keepers rally around Ghost, an ageing lion..."
medium_synopsis,"The keepers rally around Ghost, an ageing lion..."
long_synopsis,Along with feeding all the animals at Longleat...


In [57]:
# might be more readable by flatten the data
df_programme = pd.json_normalize(aProgramme["programme"])
df_programme

Unnamed: 0,type,pid,expected_child_count,position,media_type,title,short_synopsis,medium_synopsis,long_synopsis,first_broadcast_date,...,peers.previous.title,peers.previous.first_broadcast_date,peers.previous.position,peers.previous.media_type,peers.next.type,peers.next.pid,peers.next.title,peers.next.first_broadcast_date,peers.next.position,peers.next.media_type
0,episode,m0021r4h,,9,audio_video,Episode 9,"The keepers rally around Ghost, an ageing lion...","The keepers rally around Ghost, an ageing lion...",Along with feeding all the animals at Longleat...,2024-08-22T09:30:00+01:00,...,Episode 8,2024-08-21T09:30:00+01:00,8,audio_video,episode,m0021r4j,Episode 10,2024-08-23T09:30:00+01:00,10,audio_video


Do remember that the JSON data is schema-less, one approach is to look at the keys to find out what the structure is.

In [60]:
# find out what keys the data has
aProgramme["programme"].keys(), df_programme.columns

(dict_keys(['type', 'pid', 'expected_child_count', 'position', 'image', 'media_type', 'title', 'short_synopsis', 'medium_synopsis', 'long_synopsis', 'first_broadcast_date', 'display_title', 'ownership', 'parent', 'peers', 'versions', 'links', 'supporting_content_items', 'categories']),
 Index(['type', 'pid', 'expected_child_count', 'position', 'media_type',
        'title', 'short_synopsis', 'medium_synopsis', 'long_synopsis',
        'first_broadcast_date', 'versions', 'links', 'supporting_content_items',
        'categories', 'image.pid', 'display_title.title',
        'display_title.subtitle', 'ownership.service.type',
        'ownership.service.id', 'ownership.service.key',
        'ownership.service.title', 'parent.programme.type',
        'parent.programme.pid', 'parent.programme.title',
        'parent.programme.short_synopsis', 'parent.programme.media_type',
        'parent.programme.position', 'parent.programme.image.pid',
        'parent.programme.expected_child_count',
     

As you can see, the data is not as structured as you will have seen with the CSV and Excel data seen in TMA01.

An alternative is to store the data in some sort of permanent storage, the next two parts look at relational and document databases.

**c) Representing the Data in a Relational Database (10 marks)**

*Describe how you would represent the establishment data using a relational database. For each table in your database design, you should **identify the following** and **explain your reasoning** in each case:*

*1. what entity is represented by the table;*

*2. what columns would be used in the table and, if it is not clear, what data those columns would contain;*

*3. the table's primary and, if required, foreign keys;*

*4. any constraints that should be applied (including any constraints on keys).*

*Note: this question is primarily about the __design__ of the relational database, rather than its implementation. __You do not need to build the relational database when answering this question.__*

*You do not need to formally normalise the database, but thinking about how the database could be normalised may give you some insights into how to split the dataset into smaller tables.*

Do note the part about not needing to implement the schema.

`Part 8 Introduction to relational databases`, `Part 9 Relaional data modelling` and `Part 10 Normalisation` will help with this part.

The goal for relational databases is not to repeat information, such as store your name and address for every module you take at the OU. This means the data is normally split over several tables, but these tables should be meaningful and contain the same sort of data.

This can be seen in the Hospital example in Part 9:

![](images/tm351_pt09_f07.eps.jpg)

There is nothing to spot you putting all the information in one big table, but image the amount of data that would be duplicated.


For the TMA you need to examine the police data and think about how the data could be stored in more than one table. Look for fields that contain data that could be repeated in different records, such as the details of a location.

JSON data is hierarchical, so another thing to look for is any sub-documents, which could indicate a potential table. 

Remember to include the primary and foreign keys too. 

- Each table has one Primary key, which must be unique and not null.
- Tables are linked implicitly using foreign keys. This will be matched to a primary key in the table it is linked to. For instance, in the hospital example above, the Prescription table contains three separate foreign keys to link it to the Patient, Doctor and Drug tables.
- Foreign key field(s) may include duplicate values and can be null if the relational is optional 

Let's look at some further BBC data and think about what tables could be generated from it.

In [61]:
# Traitors Series 4, Episode 6 - 09/01/26 (BBC Sounds/iPlayer can help find the unique identifier for a programme)
bbc_url = "http://www.bbc.co.uk/programmes/m002pl3w.json"
bbc_resp = requests.get(bbc_url)

aProgramme2 = bbc_resp.json()
aProgramme2

{'programme': {'type': 'episode',
  'pid': 'm002pl3w',
  'expected_child_count': None,
  'position': 6,
  'image': {'pid': 'p0mrr2ml'},
  'media_type': 'audio_video',
  'title': 'Episode 6',
  'short_synopsis': 'Are the players inching closer to uncovering the Traitors?',
  'medium_synopsis': 'As the game reaches the half-way point, are the players inching closer to uncovering the Traitors? The mission encourages everyone to reflect on their time in the castle.',
  'long_synopsis': 'As the game reaches the half-way point, are the players inching closer to uncovering the Traitors? The mission encourages everyone to reflect on their time in the castle, and money for the prize pot is not the only tempting thing on offer. As numbers dwindle, tension rises at the Round Table. The evening brings the Faithful a unique opportunity that could change the course of the game - if played correctly.',
  'first_broadcast_date': '2026-01-09T20:00:00Z',
  'display_title': {'title': 'The Traitors',
   '

In this case there seems to be some sub-documents that are worth investigating.

There is some general programme information at the start, then information about the parent programme and peers. These could all be potential tables. Look for potential unique keys in the data that could become primary keys too.

**d) Using a MongoDB database (10 marks)**

This part involves several steps:

### Setting up the database

*An alternative solution to using a relational database is to use a document database. In this question, you will load the data into a MongoDB document database and run some queries on it.*

*Your data should be stored in a MongoDB database named police and a collection called outcomes.*

*Each document in the collection should correspond to the establishment data for a single establishment.*

*Check that all the data have been loaded by checking that the size of the collection matches the number of records listed in the original data. Also, display an example record retrieved from the collection.*

**4 marks**

The above data is semi-structured and probably better suited to a document database, such as MongoDB.

Let's setup and store our two BBC programmes in a MongoDB collection.

In [66]:
# Set up a MongoDB connection
MONGO_CONNECTION_STRING = f"mongodb://localhost:27017/"
print(f"MONGO_CONNECTION_STRING = {MONGO_CONNECTION_STRING}")

MONGO_CONNECTION_STRING = mongodb://localhost:27017/


In [67]:
# set up a client
from pymongo import MongoClient
mongo_client = MongoClient(MONGO_CONNECTION_STRING)

In [68]:
# I'm likely to run this several times, so will drop my previous version
mongo_client.drop_database("bbc_db")

In [69]:
# Create database
db = mongo_client["bbc_db"]

# Create collection
bbc_collection = db["bbc_collection"]

In [70]:
# insert our two records
# need to use insert_one() since there is only one document in each one
# use insert_many() if you have more than one document
bbc_collection.insert_one(aProgramme["programme"])

InsertOneResult(ObjectId('696538e188c9481b924c1d73'), acknowledged=True)

In [71]:
bbc_collection.insert_one(aProgramme2["programme"])

InsertOneResult(ObjectId('696538e288c9481b924c1d74'), acknowledged=True)

In [72]:
bbc_collection.find_one()

{'_id': ObjectId('696538e188c9481b924c1d73'),
 'type': 'episode',
 'pid': 'm0021r4h',
 'expected_child_count': None,
 'position': 9,
 'image': {'pid': 'p0jj0dwm'},
 'media_type': 'audio_video',
 'title': 'Episode 9',
 'short_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
 'medium_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
 'long_synopsis': 'Along with feeding all the animals at Longleat Safari Park, the keepers need to keep an eye on their mood, behaviour, love life and, of course, their health. And that’s before doing any training or coming up with enrichment ideas. It’s a never-ending job, and they must be ready for anything. \n\nKate Humble’s first job today is creating a new toy for the magnificent colobus monkeys. There are seven boys that live in the troop on ‘gorilla island’, and keeper Carys has attached three bottles to a broom handle and stacked them full of monkey treats. Each bottl

In [73]:
# how many documents
bbc_collection.count_documents({})

2

In [74]:
# parameters are usually provided as key: value pairs
# for example, to search for a particular title 
bbc_collection.find_one({"title": "Episode 6"})

{'_id': ObjectId('696538e288c9481b924c1d74'),
 'type': 'episode',
 'pid': 'm002pl3w',
 'expected_child_count': None,
 'position': 6,
 'image': {'pid': 'p0mrr2ml'},
 'media_type': 'audio_video',
 'title': 'Episode 6',
 'short_synopsis': 'Are the players inching closer to uncovering the Traitors?',
 'medium_synopsis': 'As the game reaches the half-way point, are the players inching closer to uncovering the Traitors? The mission encourages everyone to reflect on their time in the castle.',
 'long_synopsis': 'As the game reaches the half-way point, are the players inching closer to uncovering the Traitors? The mission encourages everyone to reflect on their time in the castle, and money for the prize pot is not the only tempting thing on offer. As numbers dwindle, tension rises at the Round Table. The evening brings the Faithful a unique opportunity that could change the course of the game - if played correctly.',
 'first_broadcast_date': '2026-01-09T20:00:00Z',
 'display_title': {'title':

In [75]:
# you can use the dot notation to search sub-documents
bbc_collection.find_one({"display_title.title": "Animal Park"})

{'_id': ObjectId('696538e188c9481b924c1d73'),
 'type': 'episode',
 'pid': 'm0021r4h',
 'expected_child_count': None,
 'position': 9,
 'image': {'pid': 'p0jj0dwm'},
 'media_type': 'audio_video',
 'title': 'Episode 9',
 'short_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
 'medium_synopsis': 'The keepers rally around Ghost, an ageing lion who is suddenly losing weight.',
 'long_synopsis': 'Along with feeding all the animals at Longleat Safari Park, the keepers need to keep an eye on their mood, behaviour, love life and, of course, their health. And that’s before doing any training or coming up with enrichment ideas. It’s a never-ending job, and they must be ready for anything. \n\nKate Humble’s first job today is creating a new toy for the magnificent colobus monkeys. There are seven boys that live in the troop on ‘gorilla island’, and keeper Carys has attached three bottles to a broom handle and stacked them full of monkey treats. Each bottl

If you are interested in document databases and want to see further examples of how to use a MongoDB database, see this notebook: `MongoDB.ipynb`. 

Do note, the examples in the MongoDB notebook go beyond what is needed for TMA02.

### Data Validation

Once you have imported the police data you are also asked to validate it. You are given a `partial_validation_schema` to check your data against.

When creating your collection do check what other options are available, which will help with what to do with this schema:

https://www.mongodb.com/docs/manual/reference/method/db.createCollection

To do:
* creating a collection `outcomes_cleaner` with the partial_validation_schema. Say what errors were raised, if any (2 marks)
* cleaning the data so it passes the validation tests and discussing advantages of using a MongoDB pipeline (2 marks)
* how to extend the partial validation schema to ensure that the ID contains an integer or a null value (2 marks)

**6 marks**

**e) Using the Geographic Data ( 3 + 3 = 6 marks)**

Two images are required:

*i) Using the folium package, generate a map that uses markers with pop-up labels to identify petrol and fuel stations in the area based on the data in the outcomes_cleaner collection. To what extent do you think your data query reliably identifies all of the petrol and fuel stations in the area?*

Note:
*You may use the pandas DataFrame or the MongoDB database as the source of your data, although you may find the cleaned and validated data easier to work with.*

__3 marks__

*ii) As well as using markers to locate individual establishments, we can also use choropleth maps to visualise aggregated data values within area boundaries.*

*An area often used for government analysis in England is the Lower Layer Super Output Area (LSOA).* 

*Using the data from the `outcomes_cleaner` collection, plot a choropleth map that depicts the number of crime outcomes in that collection in each LSOA.*

__3 marks__

Creating choropleth maps for Part ii can be seen in `Notebook 05.2 Getting started with maps - folium`. `Notebook 05.X Optional notes on Geo data formats` has some examples of adding a marker to a map.

**f) Database models (6 marks)**

*Having explored the data, you need to write a short memo explaining whether a relational database (such as PostgreSQL) or a document database (such as MongoDB) would be more suitable for storing the Crime Outcome data.*

*Give two advantages and one disadvantage of each solution for this particular dataset. State which database you think would be more appropriate for an actual implementation. Your justification should be made with specific reference to the scenario.*

*Write no more than 250 words.*

***Marks will be capped at 2 out of 6 for a generic answer that does not reference the specific dataset used in this scenario.***

__*6 marks*__


This gets you to think about which database system is best for the crime data. Do tailor your answer to what you have discovered in parts c and d, otherwise your marks will be capped.

## Question Two - 55 Marks

This question is to give you practice on what is expected in the EMA. 

**Question 2 Overview**

*In Question 2, you will conduct a concise data investigation using the Police Crimes & Outcomes dataset for London. Your output is a report for a non-technical audience that you identify. Your audience should be aware of the crime/policing domain but not data analysis methods. Ensure that you clearly state the audience and why this may be of interest to them.*

*There is a recommended 1000-word limit is a soft cap for the main body of the report (executive summary, tables, and figures count; appendices/references do not, but markers will not award credit for material only in appendices). There is a 1250-word limit, which is a hard cap, after which your tutor will not mark any included text. Captions and titles are included in the word count, which may be checked by tutors by selecting the text in Microsoft Word. Include an appendix listing the notebooks and (if needed) the order to run them.*


*Formulate one practical question about the Police Crimes & Outcomes data from question 1, explain why it matters to your stated audience, and then answer it using appropriate analysis in a notebook.*

*Your report will be based on your analysis in a notebook, and this notebook is required as part of your submission to show you have completed the analysis. It should reproduce the results in the report and run without intervention.*

*For the avoidance of doubt by ‘without intervention’ we mean that when all cells are re-run in your notebook(s), they are all executed. If a cell produces an error, it is acceptable if it was a part of your analysis and does not cause the remaining cells not to be executed. In this case, your notebook should contain a cell explaining why the cell raised an error, and how it is handled in the subsequent analysis. All the results shown in your report should be successfully recreated.*

*In your report, be explicit about assumptions and limits, what your findings do and do not justify, and show how your methods address the question.*

Overall, what you need to do is:

* decide on a question and target audience
* carry out the investigative work in the notebook following the data pipeline. Look at the seven steps required for Q2 in TMA01 as guidance on what to do, however, you do not need to repeat a description of the data cleansing carried out for Q1, instead refer to the relevant notebook
* make sure you include comments to say what you are doing and why throughout your notebook
* make sure you produce at least one Folium map and one other appropriate visualisation that is not a map, such as a bar or scatter graph. These should be different to those produced for Q1
* produce a report on the investigation

**Preparing for the EMA**

*To prepare for the EMA, use your Question 2 exploration to choose one additional dataset for later use and to pose two future questions (but do not answer these yet!). These should be included in the conclusions section of your report.*

Note, you are provided with two additional datasets and you only need to pick one of them for the questions:
1. Police data relating to Stop and Search (ema_police)
2. Data from the 2021 and 2011 censuses concerning the London labour market (ema_census)


So open the datasets and explore them.

Think about how you will create a Folium map.

Any metadata can help understand them.

Do remember you only need to pick **one** of the two new datasets provided (Stop and Search or Census data).

Do note, if you opt to investigate a larger data sets you will need to consider how to download and store this data such that it can be submitted successfully for the TMA and EMA where there are limited filesizes for submission. You will likely need to clean the data and discard data what you are not interested in before storing it in a database for submission.

Remember: *The directory 2025J_TMA02_data can be deleted provided it does not contain any cleaned datafiles (but these should be in the data/ directory)*. It is probably a good idea to put any downloaded files into the data directory too.

**This is not a question to do last minute!!**

**Report structure and style**

This question is practice for writing your EMA report. For this part you should write your answer in a Word document: `yourPI_TMA02_solution.doc (or .docx)` .

*Write a tightly focused narrative that a non technical, decision oriented reader can follow:*

1. *Executive summary (must not exceed 200 words). State objectives, outline the analysis approach, summarise key findings with caveats, and recommend next steps.*
2. *Aims and Objectives. Use literature/sources to show why the questions matter (to the audience), and how they fit into the wider context.*
3. *Background to the investigation. Use literature sources to explain why the questions are significant and worth asking.*
4. *Scope and sources of data. Provenance, licences (with brief quotation), compliance, and any privacy/ethical notes.*
5. *The analysis pipeline. What you did and why those techniques are appropriate, any data cleaning or transformations?*
6. *Findings. Visualisations (folium map + at least one other visualisation), numerical statements, where helpful, and a balanced interpretation including assumptions, limitations and confounding factors.*
7. *Conclusions and preparing for the EMA. Provide recommendations for next steps, including the selection of a dataset and questions for the EMA (see next section).*
8. *References. Use Cite Them Right (Harvard) for intext citations and the list.*
9. *Appendix: Notebooks and data. A list of the notebooks and data provided in your solution.*
10. *A final word count of report sections 1-7.*


Writing a report can be difficult if you are out of practice doing this.

`Part 5 Presentation: telling the story` can be useful for understanding how to report your findings.

In the discussion of `Exercise 5.6 Exploratory` you can find a file: `TM351_Report_Outline.docx` which as the name suggests is a general-purpose reporting structure and can be useful for getting started.

Do also look at the links given under the `Sources of help` section of the TMA.

When discussing your results, things to think about commenting on:
- the range: how much the measurement varies
- the outliers: the maximum and minimum values (Are these unexpected? Do they represent errors in the data?)
- any trends: does the values increase or decrease or oscillate over time 
- any patterns: does anything stand out as being a regular pattern 


**Marking Guide**

Do make sure you look at the `Criteria` section. This will be how Question 2 is marked and is similar to how the EMA is marked too.

Each criterion has a set of indicative content for each score, but to gain the highest mark do make note of the **All the above, plus:** too!

For example, for the first one: **1. Provide an executive summary outlining the context, analysis, findings and recommendations** to gain a score of 5-6, you don't just do d. and e. without also carrying out a-c.

A summary can be found in the `TMA02 Q2 Marking Criteria 25J.docx` document provided in the tutorial. Do always check the website for any changes before the hand-in date.

## Reminders

**News**
- see the news from the 22nd December regarding a TMA02 typo 

**Deadlines**
- If you are unable to submit the TMA by the 12th March get in touch with your Tutor
- Bear in mind there are no extensions allowed for the EMA

**iCMAs**
- iCMA44 is due by the 29th January
- you need to get at least 30% in five of them

## Wrap Up

- Any questions?


### Data Sources

The police data used here is the from the TM351 TMA02-25J assessment and found in the `2025J_TMA02_data` folder.

The data is licenced and details can be found in the data.police.uk website: https://data.police.uk/about/
