# List Comprehensions & Lambda Functions

In this mission, we'll learn some tips and syntax shortcuts we can use on top of everything we've learned, including:
<ul>
    <li>Creating list comprehensions to replace loops with a single line of code.</li>
    <li>Creating single use functions called lambda functions.</li>
    </ul>
    
The data set we'll use in this mission is in a format called JavaScript Object Notation (JSON). As the name indicates, JSON originated from the JavaScript language, but has now become a language-independent format.

From a Python perspective, JSON can be thought as a collection of Python objects nested inside each other.

![ ](images\lcl.png)

The JSON above is a list, where each element in the list is a dictionary. Each of the dictionaries have the same keys, and one of the values of each dictionary is itself a list.

The Python <b>json</b> module contains a number of functions to make working with JSON objects easier. We can use the json.loads() method to convert JSON data contained in a string to the equivalent set of Python objects:

In [1]:
json_string = """
[
  {
    "name": "Sabine",
    "age": 36,
    "favorite_foods": ["Pumpkin", "Oatmeal"]
  },
  {
    "name": "Zoe",
    "age": 40,
    "favorite_foods": ["Chicken", "Pizza", "Chocolate"]
  },
  {
    "name": "Heidi",
    "age": 40,
    "favorite_foods": ["Caesar Salad"]
  }
]
"""

import json
json_obj = json.loads(json_string)
print(type(json_obj))

<class 'list'>


We can see that <b>json_string</b> has turned into a list. Let's take a look at the values in the list:

In [2]:
print(json_obj)

[{'name': 'Sabine', 'age': 36, 'favorite_foods': ['Pumpkin', 'Oatmeal']}, {'name': 'Zoe', 'age': 40, 'favorite_foods': ['Chicken', 'Pizza', 'Chocolate']}, {'name': 'Heidi', 'age': 40, 'favorite_foods': ['Caesar Salad']}]


We can observe a few things:
<ul>
<li>The formatting from our original string is gone. This is because printing Python lists and dictionaries has a simple formatting structure.</li>
<li>The order of the keys in the dictionary have changed. This is because (prior to version 3.6) Python dictionaries don't have fixed order.</li>
    </ul>

In [3]:
world_cup_str = """
[
    {
        "team_1": "France",
        "team_2": "Croatia",
        "game_type": "Final",
        "score" : [4, 2]
    },
    {
        "team_1": "Belgium",
        "team_2": "England",
        "game_type": "3rd/4th Playoff",
        "score" : [2, 0]
    }
]
"""
import json
world_cup_obj = json.loads(world_cup_str)
print(type(world_cup_obj))

<class 'list'>


In [4]:
world_cup_obj

[{'team_1': 'France',
  'team_2': 'Croatia',
  'game_type': 'Final',
  'score': [4, 2]},
 {'team_1': 'Belgium',
  'team_2': 'England',
  'game_type': '3rd/4th Playoff',
  'score': [2, 0]}]

One of the places where the JSON format is commonly used is in the results returned by an <b>Application programming interface (API)</b>. APIs are interfaces that can be used to send and transmit data between different computer systems. We'll learn about how to work with APIs in a later course.

The data set from this mission — <b>hn_2014.json</b> — was downloaded from the Hacker News API. It's a different set of data from the CSV we've been using in the previous two missions, and it contains data about stories from Hacker News in 2014.

To read a file from JSON format, we use the <b>json.load()</b> function. Note that the function is json.load() without an "s" at the end. 

The <b>json.loads()</b> function is used for loading JSON data from a string ("loads" is short for "load string"), whereas the json.load() function is used to load from a <b>file object</b>. Let's look at how we would read that in our data:

In [6]:
import json
file = open("hn_2014.json")
hn = json.load(file)

print(type(hn))

<class 'list'>


Our <b>hn</b> variable is a list. Let's find out how many objects are in the list, and the type of the first object (which will almost always be the type of every object in the list in JSON data):

In [11]:
print(len(hn))

35806


In [10]:
print(type(hn[0]))

<class 'dict'>


In [13]:
#keys of the first dictionary
hn[0].keys()

dict_keys(['author', 'numComments', 'points', 'url', 'storyText', 'createdAt', 'tags', 'createdAtI', 'title', 'objectId'])

If we recall the data set we used in the previous two missions, we can see some similarities. There are keys representing the title, URL, points, number of comments, and date, as well as some others that are less familiar to us. Here is a summary of the keys and the data that they contain:
<ul>
    <li><b>author</b>: The username of the person who submitted the story.</li>
    <li><b>createdAt</b>: The date and time at which the story was created.</li>
    <li><b>createdAtI</b>: An integer value representing the date and time at which the story was created.</li>
    <li><b>numComments</b>: The number of comments that were made on the story.</li>
    <li><b>objectId</b>: The unique identifier from Hacker News for the story.</li>
<li><b>points</b>: The number of points the story acquired, calculated as the total number of upvotes minus the total number of downvotes.</li>
    <li><b>storyText</b>: The text of the story (if the story contains text).</li>
    <li><b>tags<b/>: A list of tags associated with the story.</li>
    <li><b>title</b>: The title of the story.</li>
    <li><b>url</b>: The URL that the story links to (if the story links to a URL).</li>
    </ul>

Let's look at the first dictionary in full. To make it easier to understand, we're going to create a function which will print a JSON object with formatting to make it easier to read.

<br>The function will use the <b>json.dumps()</b> function ("dump string") which does the opposite of the <b>json.loads()</b> function — it takes a JSON object and returns a string version of it.</br>

<br>The <b>json.dumps()</b> function accepts arguments that can specify formatting for the string, which we'll use to make things easier to read:</br>

In [21]:
def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)
    
first_story = hn[0]
jprint(first_story)

{
    "author": "dragongraphics",
    "createdAt": "2014-05-29T08:07:50Z",
    "createdAtI": 1401350870,
    "numComments": 0,
    "objectId": "7815238",
    "points": 2,
    "storyText": "",
    "tags": [
        "story",
        "author_dragongraphics",
        "story_7815238"
    ],
    "title": "Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability",
    "url": "http://ashleynolan.co.uk/blog/are-we-getting-too-sassy"
}


You may notice that the <b>createdAt</b> and <b>createdAtI</b> keys both have the date and time data in two different formats. Because the format of <b>createdAt</b> is much easier to understand, let's do some data cleaning by deleting the <b>createdAtI</b> key from every dictionary.

To delete a key from a dictionary, we can use the <b>del</b> statement. Let's learn the syntax by looking at a simple example:

In [22]:
d = {'a': 1, 'b': 2, 'c': 3}
del d['a']
print(d)

{'b': 2, 'c': 3}


We can create a function using <b>del</b> that will return a copy of our dictionary with the key removed:

In [23]:
def del_key(dict_, key):
    # create a copy so we don't
    # modify the original dict
    modified_dict = dict_.copy()
    del modified_dict[key]
    return modified_dict

Let's use this function to delete the createdAtI key from first_story:

In [24]:
first_story = del_key(first_story, 'createdAtI')
jprint(first_story)

{
    "author": "dragongraphics",
    "createdAt": "2014-05-29T08:07:50Z",
    "numComments": 0,
    "objectId": "7815238",
    "points": 2,
    "storyText": "",
    "tags": [
        "story",
        "author_dragongraphics",
        "story_7815238"
    ],
    "title": "Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability",
    "url": "http://ashleynolan.co.uk/blog/are-we-getting-too-sassy"
}


The dictionary returned by the function no longer includes the <b>createdAtI</b> key.

Let's use a loop and the <b>del_key()</b> function to remove the <b>createdAtI</b> key from every story in our Hacker News data set:

In [28]:
def del_key(dict_, key):
    # create a copy so we don't
    # modify the original dict
    modified_dict = dict_.copy()
    del modified_dict[key]
    return modified_dict

hn_clean = []

for j in hn:
    new_d = del_key(j, "createdAtI")
    hn_clean.append(new_d)

In [37]:
hn_clean[2]

{'author': 'callum85',
 'numComments': 0,
 'points': 1,
 'url': 'http://online.wsj.com/articles/apple-to-buy-beats-1401308971',
 'storyText': '',
 'createdAt': '2014-05-29T08:05:06Z',
 'tags': ['story', 'author_callum85', 'story_7815230'],
 'title': 'Apple Agrees to Buy Beats for $3 Billion',
 'objectId': '7815230'}