# `````YAML`````

>`YAML` is a data serialisation language. It is used for configuration files and data exchange between languages with different data structures because of its simplicity and ease of integration. At its core, `YAML` is designed to be intuitive and flexible, with a minimal syntax that can nonetheless handle complex nested data structures. It uses indentation instead of tags to represent nested structures, making it particularly easy for humans to read. The name `YAML` is a recursive acronym, which stands for "`YAML` ain't Markup Language".


## Why Use `YAML`?


`YAML` is popular for a number of reasons:

- **Human Readable**: It uses indentation instead of tags to represent nested structures, giving it a minimal format that is easy for humans to read and interpret, compared to alternatives such as `JSON` and `XML`

- **Language Agnostic**: It can be used in conjunction with various programming languages like Python, Java, and Ruby

- **Used by a Wide Range of Tools**: Various popular tools in data science, data engineering, DevOps and cloud engineering use `YAML` for configurations. Examples include Docker, Kubernetes, `Pytorch` and Conda.

- **Minimalism and Simplicity**: `YAML` deliberately opts for a minimalistic approach, avoiding the necessity for complex syntax and structures. This quality is not just about readability, but also about reducing the likelihood of errors and making it straightforward to write accurately, thus reducing the need for troubleshooting and debugging. 


Common `YAML` use cases include:


- **Configuration Files**: Due to its readability and ease of use, `YAML` is widely utilised to write configuration files for applications, systems, and development tools. Developers can easily discern settings and modify configurations as per requirements without steep learning curves.

- **Data Interchange**: `YAML` is commonly used for data interchange between languages with different data structures or syntax, providing a cross-language environment for sharing data structures while maintaining human readability

- **Metadata Storage**: In projects, `YAML` is often employed to store metadata that requires regular updates or reviews by humans, ensuring that the contextual information is easily accessible and modifiable





## Structure and Syntax

### `YAML` File Extension

`YAML` typically uses one of two file extensions: `.yaml` and `.yml`. There is no distinction between these file extensions and they can be used interchangeably. `YAML` itself is does not depend on the file extension used, and you can write `YAML` inside any filename, but it is wise to check the documentation of any tool you are using in conjunction with `YAML` to see it imposes any constraints on the file extension used.





### Key-Value Pairs 

In `YAML`, data is usually represented with key-value pairs, making it straightforward to model dictionaries, hashes, or maps from various programming languages. The key is followed by a colon and a space, and then the value is specified:

```yaml
first_name: John
last_name: Doe
email: john@example.com
```
Here, `first_name`, `last_name`, and `email` are keys, each associated with their respective values.


### Indentation and Separation

`YAML` relies on indentation to represent nested structures. The `YAML` documentation recommends two spaces per indent level and strongly deprecates the use of tabs. The important point to note however is that consistent spacing is essential to maintaining the integrity of the data structure.

```yaml
parent:
  child1: 
    grandchild1: value1
    grandchild2: value2
  child2: 
    grandchild3: value3
```
In the above example, each indentation level (using two spaces) delineates a nested layer of the data hierarchy. The values `grandchild1` and `grandchild2` are nested within `child1`, which is in turn nested within `parent`.

### Comments 

Comments in `YAML` are designated by the `#` symbol. Anything following this symbol on a line is treated as a comment, providing a useful way to annotate your `YAML` files with descriptive information and usage details without affecting the data structure.

```yaml
# This is a single-line comment
key: value  # This is an inline comment
```
In this snippet, the first line is purely a comment, while the second line demonstrates how comments can be placed inline with data.





### Sequences (Lists)

Sequences in `YAML` are denoted by using hyphens `-` followed by a space. These represent list-like structures and can include different scalar types or even nested sequences:

```yaml
fruits:
  - Apple
  - Banana
  - Cherry
  
```




## `YAML` Data Types

> In `YAML`, values are known as `scalars`. There is no need to specify data types explicitly, the parser discerns them based on syntax.

### Integers and Floats

```yaml
integer: 42
float: 42.0
```

In this example, `integer` is mapped to an integer value, while `float` is mapped to a floating-point number. As previously stated, there is no need to specify types; `YAML` discerns them based on syntax.


### Strings

Strings in `YAML` can typically be written without quotation marks, but using them can be beneficial in certain scenarios, especially to handle special characters or to enforce string interpretation.

```yaml
simple_string: Hello, YAML
string_with_special_characters: "Hello, YAML: # See the comment"
string_version_of_an_integer: "5"
```

In the second example, without the quotation marks, `YAML` might misinterpret the colon or the hash, mistaking them for syntax rather than string content, while in the third example, the numeral `5` has been interpreted as a string rather than an integer.



### Booleans

Booleans in `YAML` can be represented in several ways, allowing for a degree of flexibility. 

```yaml
standard_boolean: true
alternative_boolean: yes
```

`YAML` recognises the following values as meaning `True`:

`true`, `True`, `TRUE`, `yes`, `Yes`, `YES`, `y`, `Y`, `on`, `On`, and `ON`. 

Conversely, it accepts the following as `False`: 

`false`, `False`, `FALSE`, `no`, `No`, `NO`, `n`, `N`, `off`, `Off`, and `OFF`.

This liberal acceptance of boolean representations allows you to choose the style that best fits your use case or team conventions.


## Advanced Concepts in `YAML`



### Nested Data Structures

`YAML`’s indentation-based syntax allows for easy creation of nested data structures like lists within dictionaries, dictionaries within lists, and other combinations, enabling a hierarchical organisation of complex data:

```yaml
employee:
  name: John Doe
  skills:
    - Python
    - Data Analysis
  projects:
    - name: Project A
      status: Completed
    - name: Project B
      status: Ongoing
```

Here, `employee` is a dictionary containing strings, lists (`skills`), and a list of dictionaries (`projects`), showcasing a versatile nested structure to encapsulate varied and detailed data seamlessly.



### Multiline Strings

`YAML` supports multiline strings, known as block scalars, maintaining line breaks and formatting with certain syntaxes.

```yaml
literal_block_scalar: |
  This is a multiline string.
  Line breaks will be preserved.
  
folded_block_scalar: >
  This is a multiline string.
  Line breaks will be ignored.
```

The `|` character maintains line breaks, while `>` concatenates lines into a single string. 

#### Block Scalar Header Options

To control indentation and line breaks in block scalars, you can use various header options.

```yaml
chomping: |
  This string will have its final newline preserved.

strip_chomping: |- 
  This string will not have its final newline preserved.

keep_all_trailing_newlines: |+
  This string will keep all final newlines.
```

In these examples, the standard `|` keeps the final new line, `|-` removes it, and `|+` keeps all new lines, offering versatile control over your string formatting.


### Directives and Tags

In `YAML`, *directives* and *tags* act as guiding signals for processors, ensuring proper parsing and interpretation by explicitly communicating metadata or format specifications.

#### Directives

**Directives**, identified by a percentage sign `%`, set overarching rules for the YAML processor, influencing the parsing of the entire document. 

```yaml
%YAML 1.2
---
example: 12345
```

Here, `%YAML 1.2` instructs the processor to interpret the document according to the `YAML 1.2` specification. **Directives** must be followed by three hyphens `---` that signal the start of a `YAML` document. This explicit declaration ensures a defined and expected parsing behavior across various uses or platforms.

#### Tags

**Tags**, on the other hand, inform the `YAML` processor about the desired data type for a particular scalar, assuring it is interpreted and converted as intended. 

```yaml
example: !!str 12345
```

In the above code, `!!str` acts as a **tag** that informs the processor to interpret `12345` as a string, despite its numerical appearance. Without the `!!str` **tag**, `YAML` might infer its type as an integer due to the numeric characters. **Tags** hence grant control over scalar interpretation, ensuring data is treated as desired even when its form might suggest otherwise.

### Nodes and Anchors

`YAML` allows the use of *nodes* and *anchors* for reusing existing data, which prevents redundancy and maintains the DRY (Don’t Repeat Yourself) principle.

> **Nodes** are individual pieces of data. A node might be a scalar (such as a string or an integer), a sequence (an ordered list of items), or a mapping (a set of key-value pairs). 

>**Anchors** (`&`) and **aliases** (`*`) in `YAML` allow for the creation and utilisation of named references, known as **anchors** to manage and reuse nodes across your document, fostering data consistency and reducing redundancy.

#### Defining an **Anchor**

**Anchors** are defined using the ampersand `&` followed by a name. This name then allows the associated **node** to be referenced elsewhere in the document.

```yaml
default_address: &default_address
  city: NullTown
  zipcode: "00000"
```

Here, `default_address` is a named **anchor** associated with a mapping **node**. The `&default_address` syntax defines the **anchor**, enabling this **node** to be referenced elsewhere.

#### Referencing an **Anchor**

To reference an anchored **node**, use the asterisk `*` followed by the **anchor** name. 

```yaml
john:
  <<: *default_address
  name: John Doe
  
jane:
  <<: *default_address
  name: Jane Doe
```

In these examples, `*default_address` is an alias pointing to the anchored `default_address` **node**. 

#### Merging

The `<<` **merge** key is used to merge **nodes**, often in conjunction with **anchors** and **aliases** to inject existing data structures into new ones, providing a powerful way to reuse and combine data:

```yaml
default: &DEFAULT
  color: blue
  size: M
  
shirt_A:
  <<: *DEFAULT
  price: $20
  
shirt_B:
  <<:
    <<: *DEFAULT
    price: $30
  size: L
```

In the above example, `shirt_A` inherits properties from `default` via the anchor-alias mechanism. `shirt_B` demonstrates merging with overrides, inheriting properties but updating `size`, illustrating `YAML`’s sophisticated capabilities to manage, reuse, and extend data structures efficiently.


## Common Pitfalls and How to Avoid Them

Navigating through YAML can be smooth when you're accustomed to its nuances. However, it's not uncommon to encounter some hurdles along the way. Let's delve into some typical issues and how you might sidestep them.

### Syntax Issues

YAML is notably strict with its syntax to preserve its simplicity and readability. Common syntax pitfalls include incorrect indentation, mishandling scalars, or misplacing dashes.

- **Indentation**: Use spaces, not tabs, for indentation and ensure consistency to avoid misinterpretation.
  
  ```yaml
  # Correct
  person:
    name: Jane

  # Incorrect
  person:
  name: Jane
  ```
- **Scalars**: Be mindful when using characters like `:` or `-` within scalars, as they might be misconstrued as key-value or list indicators respectively.
  
  ```yaml
  # Correct
  sentence: "Hello: World"

  # Incorrect
  sentence: Hello: World
  ```
  
### Data Type Mismatches

Incorrect or unexpected data types can result in erroneous processing of your YAML file. The lack of explicit type definitions in YAML may sometimes lead to unintentional type inference.

- **Quoting**: Ensure to quote scalars that might otherwise be inferred as different data types.
  
  ```yaml
  # Correct
  version: "1.2"

  # Incorrect
  version: 1.2
  ```
- **Boolean Representations**: Beware of values like `yes`, `no`, `true`, and `false`, which YAML recognises as booleans unless quoted.

  ```yaml
  # Correct
  answer: "yes"

  # Incorrect
  answer: yes
  ```
  
### Debugging YAML Files

Given that issues within YAML files can sometimes be tricky to spot, employing a systematic approach towards debugging can be useful.

- **Manual Review**: Regularly reviewing the file, especially focusing on indentations and scalar values, can prevent many common issues

- **Comments**: Make use of comments (`#`) to annotate your code, providing insights for yourself and collaborators on the structure and purpose of your data



### `YAML` Linters

A *`YAML` Linter* is a tool that analyzes `YAML `files to detect errors, bugs, stylistic errors, and suspicious constructs, helping you ensure that your file is syntactically correct and adheres to best practices. A good **linter** doesn't just flag errors but also suggests corrections and occasionally automatically corrects identified issues.

When working with technologies such as Kubernetes or Github Actions, having clean, error-free `YAML` files is of high importance. **Linters** can help identify issues before they create problems in a live environment.

#### Why Use a `YAML` Linter?

- **Syntax Verification**: Ensure that your file adheres to `YAML`’s strict syntax rules
- **Troubleshooting**: Identify and rectify issues before they affect your workflows
- **Conformity**: Assure your `YAML` documents conform to best practices and are consistently structured
- **Collaboration**: Help teams adhere to a common standard, improving readability and reducing the chance of errors

#### Examples of `YAML` Linters

Here are a few examples of `YAML` **linters** that you might find useful:

- **Online Linters**: 
  - [YAML Lint](http://www.yamllint.com/): A straightforward online tool that validates your `YAML` syntax and indicates where errors lie
  - [OnlineYAMLParser](https://codebeautify.org/yaml-parser): Another online tool that parses `YAML` and even converts it to JSON, providing a visual tree structure for better understanding

- **Command-Line Linters**:
  - [yamllint](https://github.com/adrienverge/yamllint): A **linter** that not only checks for syntax validity but also looks for cosmetic improvements, ensuring that files are consistent and easy to read. Installable and usable from your command line
  
    ```bash
    pip install yamllint
    yamllint your_file.yaml
    ```

Using a **linter**, whether online or via your command line, ensures that you preemptively catch and correct any issues in your `YAML` files, maintaining the integrity of your work and preventing possible future errors during execution or runtime.

## Key Takeaways

- `YAML` is a data serialisation language with a minimal, human-readable syntax
- It is popular in a variety of use-cases including configuration files and for storing data or parameters
- Data types, known as **scalars** are interpreted by the `YAML` parser, and do not need to be actively specified
- `YAML` supports key-value pairs, lists, and nested data structures, which are represented via indentation
- Strings in `YAML` do not require quotes, but quotes can be used to prevent misinterpretation of reserved characters like `:`
- The `|`, `>`, `-` and `+` operators can be used to alter the interpretation of multi-line strings
- **Directives**, indicated by the `%` operator, relay instructions to the `YAML` parser
- **Anchors** and **merging** can be used to adhere to the DRY principle
- `YAML` **linters** can be used to catch and handle syntax issues
