# Tutorial 02-01 - Data Structures

Sometimes we're given data that we're unsure about the structure or quality of.  It may be given to us by a colleague, or we might get the data from an API.  In many cases (especially when we're working with web services and APIs), the data we're given is JSON.

In this tutorial, we'll go over some basics of working with lists and dictionaries in Python and see how that enables us to explore an unknown JSON file Pythonically.

## List Basics

Let's walk through some basics of lists and how we can work with them. It's worth having these concepts at hand when you're trying to figure out the structure of some new unknown data.

#### 1.  Create a list.

First, you'll create a list.  The easiest way to create your own list in Python is to write it yourself.  This also gives you a great idea of how lists are structured.  Start by assigning a variable name to your new list.  Then all you need to do is put some values in between square brackets with commas to separate them.

In [28]:
# create a list
list_1 = [1, 2, 3, 5, 7, 8]

Now that you've made a list, you can print that list or use functions on that list.

In [29]:
print(list_1)

[1, 2, 3, 5, 7, 8]


In [30]:
len(list_1)

6

NOTE - The function `len()` is a built-in Python function that will tell you the length of something.  It works with lists, tuples, strings, and many other data types.

#### 2.  Use an index to access values in the list.

Now that you've created a list, you can access the items in that list using a special property of lists.  The *index* values of the list items represent each item in order.  The index starts at 0 which represents the first item in the list.  The list index increases by 1 for each subsequent value, so the second item in the list has an index of 1.

You can access an item in a certain position in the list by using square brackets with your list variable like in this example below.

In [31]:
list_1[0]

1

In [32]:
list_1[1]

2

Try this with other indexes up until the end of the list.

#### 3.  Use an index to access the last value in the list.

The index starts at the beginning of a list, so counting up from 0 is always an option.  Sometimes you'd be more interested in the last item in the list.  In that case, you can use -1 to start at the end.

In [33]:
list_1[-1]

8

You can also count backwards by increasing the negative index.  Using -2 gives you the second to last item in the list.

In [34]:
list_1[-2]

7

#### 4.  Use indexes to slice a list.

In the same way you access a single value in a list, you can also pull out multiple values.  This is often called slicing the list.  Using square brackets, you can provide two index numbers with a colon.  This will retrieve all the list items including the first index and up to (but not including) the second index.

In [35]:
list_1[1:-2]

[2, 3, 5]

#### 5. Save the slice as a new list.

You can always save the values returned by indexing or slicing into the list as new variables.

In [36]:
list_2 = list_1[1:-2]

#### 6.  Add a new item to a list.

One of the great things about lists is that they are mutable, meaning that they can be changed.  You can use the `.append()` method to add items to the end of a list.

In [37]:
list_2.append("a new value")

Note that we previously had a list of all numbers, but now we've added a string.  Python is very permissive in allowing us to put whatever we want in a list.  This can be a really handy or really problematic concept.  Imagine trying to do the sum or max of a list of numbers but finding out that there's a string included.

In [38]:
list_2

[2, 3, 5, 'a new value']

#### 7. Sum the values in a list.

Because we have a list of numbers, we can use the `sum()` function to add them all up.  It's worth noting that this function only works on lists of numbers.  If you try to use it on a list with strings or other data types, you'll get an error.

In [39]:
sum(list_1)

26

In [40]:
sum(list_2)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

#### 8.  Remove an item from a list.

Similarly to how we can add items to a list, we can also take them out.  Let's use a function called `pop()` to remove an item.  You'll provide the index of the item you'd like to remove, and pop will return that item.

In [None]:
list_2.pop(-1)

'a new value'

In [None]:
list_2

[2, 3, 5]

Note that now `list_2` does not contain a string.

In [None]:
sum(list_2)

10

## Dictionary Basics

Lists are great, but sometimes you need a little more structure in your data.  Dictionaries can be very helpful in these situations.

#### 1.  Create a dictionary.

You can create your own dictionary by using curly brackets.  If you just provide an open and closed curly bracket (`{}`), you'll end up with an empty dictionary.  You can also create a dictionary populated with values by providing comma separated key-value pairs.

In [None]:
dict_1 = {
    'key1': 'value1',
    'key2': 'value2',
    'key3': 'value3'
}

Each key and its corresponding value are separated by a colon.  Each pair is separated by a comma.

#### 2.  Print the keys and values.

One of the ways you can get a good idea of what's in a dictionary is by looking at the keys.  You can do this by calling the `.keys()` method on the dictionary.

In [None]:
dict_1.keys()

dict_keys(['key1', 'key2', 'key3'])

You can also do this with the values using the `.values()` method.

In [None]:
dict_1.values()

dict_values(['value1', 'value2', 'value3'])

#### 3.  Access a specific key/value pair.

Once you understand what the keys in a dictionary are, you can use a key to retreive its corresponding value from the dictionary.  You can do this by providing the key to the dictionary with square brackets.

In [None]:
dict_1['key1']

'value1'

#### 4.  Update the value for a key.

In addition to using a key to access a value, you can also use that key to replace the corresponding value.

In [None]:
dict_1['key1'] = 'updates!'
dict_1['key1']

'updates!'

#### 5.  Add a new key/value pair.

You can use the same syntax to add new key/value pairs to a dictionary.

In [None]:
dict_1['a new key'] = 'a new value'

At this point, we've worked with all string values for keys and values, but you can use whatever data types you'd like.  In the following example, we'll add a list as a value to the dictionary.

In [None]:
dict_1['a list'] = list_2

#### 6.  Combine two dictionaries.

Occasionally, you'll need to combine two dictionaries.  You could iterate through each key from one dictionary and add it to another, but there's a handy built-in method on the dictionary class called `.update()` that does all that for you.

In [None]:
# create a second dictionary
dict_2 = {
    'key3': 'new value3',
    'key4': 'value4',
    'key5': 'value5'
}


In [None]:
# combine two dictionaries
dict_1.update(dict_2)

In [None]:
dict_1

{'key1': 'updates!',
 'key2': 'value2',
 'key3': 'new value3',
 'a new key': 'a new value',
 'key4': 'value4',
 'key5': 'value5'}

Note that the keys from dict_2 that didn't exist in dict_1 were added.  The one key that existed in both dictionaries was updated with the value from dict_2.

## Parse an Unknown GeoJSON Structure

It's great practice to understand the fundamentals of lists and dictionaries.  They're super handy for gathering and organizing your own data as you write a script.  

These same tools are super useful for exploring JSON datasets.  The arrays and objects that make up JSON are represented in Python as lists and dictionaries.  It's super common in the working world to be given JSON data without context.  Being able to do some quick exploration of the data can give you a really good idea of what the data looks like without having to open the whole file (which could be quite large).

#### 1.  Load JSON data.

First, we'll load some data.  We'll need to import the **json** package to handle our conversion between JSON and Python data types.  

In [None]:
import json

We'll use a context manager to open the file.  There's a longer discussion about context managers in the chapter on ArcPy and cursors.  In short, though, the context manager is represented by the `with` statement in the code below.  It saves us from having to open and close the file reader.

Once we have a file reader created using the context manager, we can use the `json.load()` function to convert the file into Python data types.

In [None]:
with open('data.geojson') as f:
    data = json.load(f)

#### 2.  Check the type of the loaded data.

The JSON data you get could end up being an array of objects, which would be represented as a list in Python.  It could also be one large object, which would be represented as a dictionary.  Before you go exploring the data, you'll have to know how to access it.  Checking the type of the data is a great place to start.

In [None]:
type(data)

dict

#### 3.  Check the keys of the data.

Now that you know the data is one large dictionary, you can check the keys to get an idea of what the data looks like.

In [None]:
data.keys()

dict_keys(['type', 'features', 'crs'])

There are three keys shown in this data.  Given that we know it's probably GeoJSON, we can guess that the *features* key will probably contain the features and data we're interested in.  We can check the other two keys at this level though and get some descriptive information about our data.

In [None]:
data['type']

'FeatureCollection'

In [None]:
data['crs']

{'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}

Now we know this data represents a FeatureCollection.  We also know that the spatial reference of the data is WGS1984.

#### 4.  Check the type of the features key.

Based on context so far, we can make a guess at what's in the *features* key, but we should confirm.

In [None]:
type(data['features'])

list

We know now that the *features* key is a list.  Now we can treat it as such and try to find some more information about the dataset.  We can start by finding out how many features there are.

In [None]:
len(data['features'])

10000

#### 5.  Explore the first feature.

Now that we know this is a list of features, we can explore the first feature. 

In [None]:
type(data['features'][0])

dict

It looks like the first feature is a dictionary.  We can look at the keys if we'd like.  At this point, it might be good to just represent that feature as its own variable for readability.

In [None]:
first_feature = data['features'][0]

In [None]:
first_feature.keys()

dict_keys(['type', 'geometry', 'properties'])

Similarly to how we explored the entire dataset, we can start looking at these keys.

In [None]:
first_feature['type']

'Feature'

In [None]:
first_feature['geometry']

{'type': 'Point', 'coordinates': [-122.42710769, 37.75976521]}

These two keys show that this dictionary represents a feature and the feature has a "Point" geometry type. We can also see that the feature has a properties key that contains more information.

In [None]:
first_feature['properties']

{'status_description': 'Closed',
 'bos_2012': '8.00000',
 'source': 'Mobile/Open311',
 'updated_datetime': '2023-01-01T01:58:00.000',
 'police_district': 'MISSION',
 'agency_responsible': 'RPD NSA Queue',
 'media_url': None,
 'neighborhoods_sffind_boundaries': 'Mission Dolores',
 'requested_datetime': '2023-01-01T00:11:54.000',
 'service_request_id': '16240855',
 'service_name': 'Noise Report',
 'data_loaded_at': '2024-02-20T15:21:23.000',
 'long': '-122.427107691765',
 'status_notes': 'Comment Noted',
 'point_geom': {'type': 'Point',
  'coordinates': [-122.427107692, 37.759765211]},
 'address': 'Mission Dolores Park, , SAN FRANCISCO, CA, 94114',
 'street': 'Mission Dolores Park',
 'supervisor_district': '8.00000',
 'service_details': 'Other',
 'service_subtype': 'Noise Issue',
 'data_as_of': '2024-02-20T10:35:55.000',
 'closed_date': '2023-01-01T01:58:00.000',
 'lat': '37.759765210592'}