<a href="https://colab.research.google.com/github/Rossel/DataQuest_Courses/blob/master/036__List_Comprehensions_and_Lambda_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COURSE 5/6: DATA CLEANING IN PYTHON: ADVANCED

# MISSION 3: List Comprehensions and Lambda Functions

Learn techniques to turbo-charge working with data in Python

## 1. The JSON Format

So far, we've learned how to use regular expressions to make cleaning and analyzing text data easier.

In this mission, we'll learn some tips and syntax shortcuts we can use on top of everything we've learned, including:

- Creating list comprehensions to replace loops with a single line of code.
- Creating single use functions called lambda functions.

The data set we'll use in this mission is in a format called [JavaScript Object Notation](https://www.json.org/) (JSON). As the name indicates, JSON originated from the JavaScript language, but has now become a language-independent format.

From a Python perspective, JSON can be thought as a collection of Python objects nested inside each other.

![img](https://s3.amazonaws.com/dq-content/355/json.svg)

The JSON above is a list, where each element in the list is a dictionary. Each of the dictionaries have the same keys, and one of the values of each dictionary is itself a list.

The Python `json` [module](https://docs.python.org/3.7/library/json.html#module-json) contains a number of functions to make working with JSON objects easier. We can use the `json.loads()` [method](https://docs.python.org/3.7/library/json.html#json.loads) to convert JSON data contained in a string to the equivalent set of Python objects:

In [6]:
json_string = """
[
  {
    "name": "Sabine",
    "age": 36,
    "favorite_foods": ["Pumpkin", "Oatmeal"]
  },
  {
    "name": "Zoe",
    "age": 40,
    "favorite_foods": ["Chicken", "Pizza", "Chocolate"]
  },
  {
    "name": "Heidi",
    "age": 40,
    "favorite_foods": ["Caesar Salad"]
  }
]
"""

import json
json_obj = json.loads(json_string)
print(type(json_obj))

<class 'list'>


We can see that `json_string` has turned into a list. Let's take a look at the values in the list:



In [7]:
print(json_obj)

[{'name': 'Sabine', 'age': 36, 'favorite_foods': ['Pumpkin', 'Oatmeal']}, {'name': 'Zoe', 'age': 40, 'favorite_foods': ['Chicken', 'Pizza', 'Chocolate']}, {'name': 'Heidi', 'age': 40, 'favorite_foods': ['Caesar Salad']}]


We can observe a few things:

- The formatting from our original string is gone. This is because printing Python lists and dictionaries has a simple formatting structure.
- The order of the keys in the dictionary have changed. This is because (prior to version 3.6) Python dictionaries don't have fixed order.

Let's practice using `json.loads()` to convert JSON data from a string to Python objects!



**Instructions:**

We have created a JSON string, `world_cup_str`, which contains data about games from the 2018 Football World Cup.

1. Import the `json` module.
2. Use `json.loads()` to convert `world_cup_str` to a Python object. Assign the result to world_cup_obj.

In [8]:
# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [9]:
# Once you have completed verification, go to the CSV file in Google Drive, right-click on it and select “Get shareable link”, and cut out the unique id in the link.
# https://drive.google.com/file/d/1J3gYozddULmvWfprsTXK9Ga3V74UpaJe/view?usp=sharing
id = "1J3gYozddULmvWfprsTXK9Ga3V74UpaJe"

In [10]:
# Download the dataset
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('hn_2014.json')

## 2. Reading a JSON file

One of the places where the JSON format is commonly used is in the results returned by an [Application programming interface](https://en.wikipedia.org/wiki/Application_programming_interface) (**API**). APIs are interfaces that can be used to send and transmit data between different computer systems. We'll learn about how to work with APIs in a later course.

The data set from this mission — `hn_2014.json` — was downloaded from the Hacker News API. It's a different set of data from the CSV we've been using in the previous two missions, and it contains data about stories from Hacker News in 2014.

To read a file from JSON format, we use the `json.load()` [function](https://docs.python.org/3.7/library/json.html#json.load). Note that the function is `json.load()` without an "s" at the end. The `json.loads()` function is used for loading JSON data from a string ("loads" is short for "load string"), whereas the `json.load()` function is used to load from a file object. Let's look at how we would read that in our data:



In [11]:
import json
file = open("hn_2014.json")
hn = json.load(file)

print(type(hn))

<class 'list'>


## 3. Deleting Dictionary Keys

## 4. Writing List Comprehensions

## 5. Using List Comprehensions to Transform and Create Lists

## 6. Using List Comprehensions to Reduce a List

## 7. Passing Functions as Arguments

## 8. Lambda Functions

## 9. Using Lambda Functions to Analyze JSON data

## 10. Reading JSON files into pandas

## 11. Exploring Tags Using the Apply Function

## 12. Extracting Tags Using Apply with a Lambda Function