# Parsing Metadata from Filenames: Working with Strings, Lists, and Dicts

In this section, we delve into the practical application of Python data structures such as dictionaries, lists, and strings, focusing particularly on extracting and managing metadata. As neuroscience grad students, you are likely encountering experimental data that needs organized and structured management. Metadata, or data about data, often comes in various formats, and filenames can be a rich source. This section guides you in extracting and manipulating this information to make your data analysis more structured and less error-prone.



## Key-Value Mappings: Dictionaries

As neuroscience researchers, we often encounter scenarios where you need to associate specific values with unique identifiers – be it experimental conditions, subject details, or measurement parameters. This section introduces dictionaries in Python, a versatile data structure ideal for storing and retrieving data through key-value pairs:

```python
{"name": "Emma",   "Date": "2022-07-23"}
#--Key----Value-   -Key-------Value----
```

You start with basic dictionary operations, such as creating, adding, and accessing elements. This hands-on experience familiarizes you with dictionary syntax and operations. The exercises are designed to reflect realistic use cases, such as storing and accessing metadata from experimental recordings.



| Code | Description |
| :-- | :-- |
| data = {} | Makes an empty Dict | 
| data = {'a': 3, 'b': 5} | Makes a Dict with two items: "a" and "b" |
| data['a'] | Accesses the value associated with key 'a' |
| data['c'] = 7 | Adds a new key-value pair 'c': 7 to the Dict |
| list(data.keys()) | Retrieves a list of all keys in the Dict |

**Exercises**

The `image` dict describes how researcher Tom's recording is formatted:

In [1]:
image = {'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}
image

{'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}

**Example**: Write the code to print out the width of the image, by accessing the `"width"` key:

In [3]:
image['width']

1080

What is the height of the image?

In [2]:
image['height']

1920

How are the pixel data in the image formatted?

In [4]:
image['format']

'RGB'

What does the error message say, if you use the same syntax to find out which key has the value `1080` ?  What does this tell you about how key-value maps like Dictionaries are designed for?

In [5]:
image[1080] # tells me that the values are accessed by keys.

KeyError: 1080

Make a dictionary: Reorganize the code below: tell Python that the three variables below all belong together by putting them into a dictionary called `session`.

In [7]:
subject = "Josie"
date = "2023-07-23"
group = "control"

In [8]:
session = {'subject': subject, 'date': date, 'group': group}
session

{'subject': 'Josie', 'date': '2023-07-23', 'group': 'control'}

Check that the dictionary is constructed properly by getting the subject from it. It should show "Josie"

In [9]:
session['subject']

'Josie'

## Analysing Data stored in Dicts

The challenge with analyzing `dict` data is that dicts are not "sequences", and neither are `dict_keys()` or `dict_values()`, so before putting them into a statistics function we should first turn `dict_values()` into a `list` using the `list()` function. For example:

```python
>>> data = {'x': 1, 'y': 2}

>>> data.values()
dict_values([1, 2])

>>> list(data.values())
[1, 2]

>>> np.mean(list(data.values()))
1.5
```

Useful Functions for the below Exercises:

| Function | Example | Description |
| :----  | :----   | :---- |
| `len()` | `len(the_dict)` | The total number of items |
| `np.mean()` | `np.mean(list(the_dict.values())` | The mean of the dict's values |
| `np.min()` | `np.min(list(the_dict.values()))` | The minimum of the dict's values |

**Exercises**: Let's get some practice querying dicts and calculating some statistics on dicts using Numpy.

In [11]:
# %pip install numpy

Collecting numpy
  Using cached numpy-2.0.1-cp312-cp312-win_amd64.whl.metadata (60 kB)
Using cached numpy-2.0.1-cp312-cp312-win_amd64.whl (16.3 MB)
Installing collected packages: numpy
Successfully installed numpy-2.0.1
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xarray 2024.7.0 requires pandas>=2.0, which is not installed.


In [12]:
import numpy as np

Using the following dict, calculate what was the average hours of sleep that our friends got last night:

In [13]:
hours_of_sleep = {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8}

In [14]:
np.mean(list(hours_of_sleep.values()))

np.float64(7.0)

How many total people in the following dataset were in our sleep study?

In [15]:
hours_of_sleep = {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8}

In [16]:
len(list(hours_of_sleep))

5

What was the average amount of sleep on day 2 in the following dataset?

In [17]:
hours_of_sleep = {
    'Day1': [5, 7, 3, 3, 4, 6, 8, 9],
    'Day2': [5, 7, 8, 5, 6, 7, 8, 4],
}

In [19]:
np.mean(hours_of_sleep['Day2'])

np.float64(6.25)

Use the following dataset to answer the questions below

*Tip*: you can index multiple times (e.g. `data['Monday']['Morning']` or `data['Monday'].keys()`)

In [20]:
hours_of_sleep = {
    'Day1': {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8},
    'Day2': {'Billy': 10, 'Kimberly': 7, 'Trini': 7, 'Jason': 4},
    'Day3': {'Trini': 8, 'Zack': 6, 'Jason': 9, 'Billy': 9},
}

*Example*: How many hours of sleep did Trini get on Day 2?

In [21]:
hours_of_sleep['Day2']['Trini']

7

How many hours of sleep did Billy get on Day 1?

In [22]:
hours_of_sleep['Day1']['Billy']

7

How much sleep did Zack get on Day 3?

In [23]:
hours_of_sleep['Day3']['Zack']

6

How many people were in the study on Day 1?

In [25]:
len(hours_of_sleep['Day1'])

5

How many people were still in the study on Day 3?

In [26]:
len(hours_of_sleep['Day3'])

4

Was the average amount of sleep higher on day one or day three?

In [32]:
np.mean(list(hours_of_sleep['Day1'].values())), np.mean(list(hours_of_sleep['Day3'].values())), 

(np.float64(7.0), np.float64(8.0))

In [None]:
# It was higher on day 3