# Session 05

[![Open and Execute in Google Colaboratory](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/astrojuanlu/ie-mbd-python-data-analysis-i/blob/main/sessions/Session%2005.ipynb)

- Dictionaries, properties and methods
- Basic import statements
- Loading JSON data

## Dictionaries

After covering tuples and lists, the last essential data structure are dictionaries. Dictionaries are mappings between keys and values:

In [None]:
my_dictionary = {
    "key_1": "A",
    "key_2": "B",
    "key_3": "C",
}
my_dictionary

The keys and the values of the dictionary can be extracted separately:

In [None]:
my_dictionary.keys()

In [None]:
my_dictionary.values()

And `.items()` can be used to extract pairs of `(key, value)`:

In [None]:
my_dictionary.items()

While sequences are indexed by position, dictionaries are indexed by key:

In [None]:
my_dictionary["key_1"]

Dictionaries are mutable, which means that you can add new keys or mutate existing values:

In [None]:
my_dictionary["key_4"] = "D"
my_dictionary["key_1"] = "a"
my_dictionary

Only immutable (actually: [hashable](https://docs.python.org/3/glossary.html#term-hashable)) objects can be keys:

In [None]:
my_dictionary[[0]] = "Zero"

Dictionaries, lists, and tuples can be arbitrarily nested. For example, this is an excerpt of a tweet represented as JSON data (see `data/twitter_data.json`):

In [None]:
data = [
    {
        "created_at": "Wed Apr 22 06:04:57 +0000 2020",
        "entities": {
            "hashtags": [
                {
                    "text": "balboaisland",
                    "indices": [
                        21,
                        34
                    ]
                },
                {
                    "text": "newportbeach",
                    "indices": [
                        35,
                        48
                    ]
                },
            ]
        }
    }
]

In [None]:
data[0]["entities"]["hashtags"][0]["indices"]

Those are: lists, inside dictionaries, inside lists, inside dictionaries, inside dictionaries, inside a list.

## Basic import statements

In [None]:
from statistics import mean  # belongs to the standard library (stdlib)

In [None]:
mean([1, 3, 4])

In [None]:
import requests  # Not part of the stdlib, requires installing a package

And potentially, you could `import` your own Python modules (more on that in a few weeks).

## Exercises

### Twitter data

The file `twitter_data.json` contains a subset of real tweets obtained from http://covid19research.site/geo-tagged_twitter_datasets/, with full metadata as retrieved by the Twitter API. These are the first 10 lines:

```
[
  {
    "created_at": "Wed Apr 22 06:04:57 +0000 2020",
    "id": 1252840795737997317,
    "id_str": "1252840795737997317",
    "text": "Tennis a la Balboa.\n\n#balboaisland #newportbeach #tennis #covid_19 #coronavirus #orangecounty #california\u2026 https://t.co/px1GCH1bgZ",
    "truncated": true,
    "entities": {
      "hashtags": [
        {
```

You can load it as a list of dictionaries using the code below.

In [None]:
import requests

DATA_URL = (
    "https://github.com/astrojuanlu/ie-mbd-python-data-analysis-i/"
    "raw/main/data/twitter_data.json"
)

data = requests.get(DATA_URL).json()
print(type(data), len(data))