# Data Organisation and Good Practice

```{card}

<div class="alert alert-block" style="background-color: #ffffff; border: 1px solid #ccc; padding: 1em; border-radius: 8px;">

**Author:** Dorothea Seiler Vellame (GitHub: [dorotheavellame](https://github.com/dorotheavellame))

**License:** Creative Commons Attribution-ShareAlike 4.0 International license ([CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)).

</div>

```

## Course Objectives

- **Understand why good data organisation is important**
    Data organisation can help us manage projects, especially where the number of files are high and the project may be collaborative.

- **Learn what best practice is**
    Clear file names and structure are the main contributers to good data organisation.

- **Differentiate between better and worse practice**
    Test your knowledge with a quiz.


## Why is good data organisation important? 

Data is the cornerstone of research.
At the start of a project, it can seem like data organisation will be simple enough, you know what your data is called and where it is, however, over time things can snowball. New data is added, new scripts are stored with Final and FinalFINAL in the name. 

Here are a few reasons why good data organisation can help your research:
- It makes data sharing, now often a requirement from funders, easier.
- It can protect from data loss, destruction, or corruption
- It enables compliance with ethical codes, data protection laws, journal requirements, and funder/institutional policy
- Can minimise future work and confusion of having to work out what's what (_"Your primary collaborator is you in 6 months, and your past self doesn't answer emails"_)


Now that we know that research productivity can be improved by good practice, how do we implement it?

## What is best practice in data organisation?

There are two main components to best practice when it comes to data organisation: **file names** and **folder structure**.


### File naming conventions
Here are the general standards to follow for naming files
1. Make file names brief and descriptive
2. If date or version is relevant, begin file name with the date, formatted YYYYMMDD or YYYY_MM_DD. This helps with version control and searching.
3. File names should not contain spaces, use underscores (_) instead.
4. File names should not contain special characters as they may break some systems.
5. Order elements from brief to specific.
6. Use meaningful abbreviations only.
7. Use the correct file extensions.



### Folder structure
Organised folder structure can make finding files easier. Folder names should follow the same conventions as file names. Structure will be slightly dependent on the research in question but should aim to build on the following:

```text
Research_Project_Name/
│
├── README.md (a README describing the research project and what is contained)
│
├── data/
│   ├── README_data.md (a README describing the data in each folder)
│   │
│   ├── raw/
│   │   └── <raw data files>
│   │
│   ├── processed/
│   │   └── <processed data files>
│   │
│   └── metadata/
│       └── <metadata fiels>
│
├──scripts/
│   └── <scripts to carry out any data analysis>
│   
└──results/
    └── <results figures or tables> (could be split into file type folders)
```

Subfolders within a research project are broadly split by type (i.e. a data file or a script), and by use (i.e. raw vs processed data).
Avoid having too many layers in your folder heirarchy, as it will be more difficult to navigate. 

### README files
In addition to good folder and file name organisation, another way to help can be including README files. They are often text files and include a description of the folder. The project based README could include a descruption of the project, where as the data README could describe files included in more detail than possible in a file name, such as where raw data was generated.

README files should be formatted in a consistent way:
- Use plain text to write them
- Format the files in an understandable way
- Include important information about each file
    <Filename>.<format> 
    <Dataset unique ID>
    <Date created in YYYYMMDD>
    <file owner> <contact information>
    <Data description> 
    <methodology or scripts used>

**Optional**: you could include a README changelog to track how your files have changed.


## Quiz
1. Which of these filenames does **not** follow best practice?
 ```text
    a) 20251225nice_list.csv
    b) NorthSeaFishLengths.csv
    c) WYCTMCTSimportant.txt
    d) 2026_02_17PancakeDayPresenation.pptx
```
=> c. It has an unknown acronym and is the only file where its contents is not immediately clear.

2. Which of these could be included in a data README.txt
 ```text
    a) Filename
    b) Unique dataset ID
    c) Data description
    d) All of the above
```


3.

 ```text
    a) 
    b) 
    c) 
    d) 
```
