## Process JSON String

Let us understand how to process JSON strings using Python as programming language. Later we will see different ways of storing JSON data in files.

We will see following examples of processing JSON strings.
* Single JSON document.
* Multiple JSON documents, with one JSON per line.
* Multiple JSON documents as an Array under one attribute. Most of the REST APIs which return multiple elements follow this approach.
* We can process JSON Strings either by using `json` module or `pandas`.
* As part of developing backend for web or mobile applications we use `json` or some high level wrappers. For bulk data processing typically we fall back on modules such as `pandas`.
* You should be familiar with both. For now, we will focus on `json`.
* We should first import `json` module to process the JSON strings using it.
* We have a function called as `loads` which takes a JSON in string and returns `dict`.

### Single JSON document

Let us go through the details of processing Single JSON document. 
* Import `json` module.
* Create JSON String.
* Pass the string to `json.loads`. It will return `dict`.
* Assign it to a variable and use it further.

In [1]:
import json

In [2]:
person = '{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"}'

In [3]:
type(person)

str

In [4]:
json.loads?

[0;31mSignature:[0m
[0mjson[0m[0;34m.[0m[0mloads[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0ms[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mencoding[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcls[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mobject_hook[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparse_float[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparse_int[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparse_constant[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mobject_pairs_hook[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkw[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document)

In [5]:
person_dict = json.loads(person)

In [6]:
type(person_dict)

dict

In [7]:
print(person_dict)

{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}


In [8]:
person_dict['id']

1

In [9]:
person_dict['first_name']

'Frasco'

In [10]:
person_dict.keys()

dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])

In [11]:
person_dict.items()

dict_items([('id', 1), ('first_name', 'Frasco'), ('last_name', 'Necolds'), ('email', 'fnecolds0@vk.com'), ('gender', 'Male'), ('ip_address', '243.67.63.34')])

* Here is an example of a single JSON as string that is part of multiple lines.

In [12]:
import json

In [13]:
person = '''{
    "id":1,
    "first_name":"Frasco",
    "last_name":"Necolds",
    "email":"fnecolds0@vk.com",
    "gender":"Male",
    "ip_address":"243.67.63.34"
}'''

In [14]:
type(person)

str

In [15]:
person_dict = json.loads(person)

In [16]:
type(person_dict)

dict

In [17]:
print(person_dict)

{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}


### Multiple JSON Documents - One per line

Let us go through the steps involved in processing a string which contain one JSON per line.
* We should convert the string into list of JSON strings and then use `json.loads` to process each JSON.
* Import `json` module.
* Split the string into multiple strings using new line character (`\n`) as delimiter. String have a function called as `splitlines` and we should be able to leverage it.
* Use `for` loop or `map` function to convert list of JSON Strings into list of dicts. We should use `json.loads` to convert each JSON String as dict.

In [18]:
persons = '''{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"}
{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"}
{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"}
{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"}
{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"}
{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"}
{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"}
{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@posterous.com","gender":"Polygender","ip_address":"247.229.14.25"}
{"id":9,"first_name":"Helenka","last_name":"Mullender","email":"hmullender8@cloudflare.com","gender":"Non-binary","ip_address":"133.216.118.88"}
{"id":10,"first_name":"Christine","last_name":"Swane","email":"cswane9@shop-pro.jp","gender":"Polygender","ip_address":"86.16.210.164"}'''

In [19]:
type(persons)

str

In [20]:
persons.splitlines?

[0;31mDocstring:[0m
S.splitlines([keepends]) -> list of strings

Return a list of the lines in S, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends
is given and true.
[0;31mType:[0m      builtin_function_or_method


In [21]:
# Using for loop
import json

In [22]:
persons_list = persons.splitlines()

In [23]:
type(persons_list)

list

In [24]:
type(persons_list[0])

str

In [25]:
persons_list[1]

'{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"}'

In [53]:
json.loads(persons_list[0])

list

* Converting list of strings to list of dicts using conventional loops.

In [29]:
persons_dict_list = []

for person in persons_list:
    persons_dict_list.append(json.loads(person))

In [30]:
type(persons_dict_list)

list

In [31]:
type(persons_dict_list[0])

dict

In [32]:
persons_dict_list[0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

In [33]:
persons_dict_list[0]['first_name']

'Frasco'

* Converting list of strings to list of dicts using list comprehensions.

In [34]:
persons_dict_list = [json.loads(person) for person in persons_list]

In [35]:
type(persons_dict_list)

list

In [36]:
type(persons_dict_list[0])

dict

In [37]:
persons_dict_list[0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

* Converting list of strings to list of dicts using `map` function.

In [38]:
persons_dict_list = list(map(json.loads, persons_list))

In [39]:
type(persons_dict_list)

list

In [40]:
type(persons_dict_list[0])

dict

In [41]:
persons_dict_list[0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

In [42]:
list(map(lambda person: person['first_name'], persons_dict_list))

['Frasco',
 'Dulce',
 'Prissie',
 'Schuyler',
 'Leopold',
 'Joanna',
 'Lion',
 'Marabel',
 'Helenka',
 'Christine']

In [43]:
list(filter(lambda person: person['gender'] == 'Female', persons_dict_list))

[{'id': 2,
  'first_name': 'Dulce',
  'last_name': 'Santos',
  'email': 'dsantos1@mashable.com',
  'gender': 'Female',
  'ip_address': '60.30.246.227'}]

### Multiple JSON Documents - Array

Let us go through the details of processing multiple JSON Documents as an array.
* We should be able to use `json.loads`. For the below string it will return Python list.
* Steps are same as processing single JSON document.
  * Import `json` module.
  * Use `json.loads` to convert to Python list.
  * Process using Python capabilities.

In [48]:
persons = '''[{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"},
{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"},
{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"},
{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"},
{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"},
{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"},
{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"},
{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@posterous.com","gender":"Polygender","ip_address":"247.229.14.25"},
{"id":9,"first_name":"Helenka","last_name":"Mullender","email":"hmullender8@cloudflare.com","gender":"Non-binary","ip_address":"133.216.118.88"},
{"id":10,"first_name":"Christine","last_name":"Swane","email":"cswane9@shop-pro.jp","gender":"Polygender","ip_address":"86.16.210.164"}]'''

In [49]:
persons_dict_list = json.loads(persons)

In [50]:
type(persons_dict_list)

list

In [47]:
type(persons_dict_list[0])

dict

In [54]:
persons_dict_list[0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

* When we use `json.loads` on below string, it will create a dict where value will be a list.

In [181]:
persons = '''{
    "results": [
        {
            "id": 1,
            "first_name": "Frasco",
            "last_name": "Necolds",
            "email": "fnecolds0@vk.com",
            "gender": "Male",
            "ip_address": "243.67.63.34"
        },
        {
            "id": 2,
            "first_name": "Dulce",
            "last_name": "Santos",
            "email": "dsantos1@mashable.com",
            "gender": "Female",
            "ip_address": "60.30.246.227"
        },
        {
            "id": 3,
            "first_name": "Prissie",
            "last_name": "Tebbett",
            "email": "ptebbett2@infoseek.co.jp",
            "gender": "Genderfluid",
            "ip_address": "22.21.162.56"
        },
        {
            "id": 4,
            "first_name": "Schuyler",
            "last_name": "Coppledike",
            "email": "scoppledike3@gnu.org",
            "gender": "Agender",
            "ip_address": "120.35.186.161"
        },
        {
            "id": 5,
            "first_name": "Leopold",
            "last_name": "Jarred",
            "email": "ljarred4@wp.com",
            "gender": "Agender",
            "ip_address": "30.119.34.4"
        },
        {
            "id": 6,
            "first_name": "Joanna",
            "last_name": "Teager",
            "email": "jteager5@apache.org",
            "gender": "Bigender",
            "ip_address": "245.221.176.34"
        },
        {
            "id": 7,
            "first_name": "Lion",
            "last_name": "Beere",
            "email": "lbeere6@bloomberg.com",
            "gender": "Polygender",
            "ip_address": "105.54.139.46"
        },
        {
            "id": 8,
            "first_name": "Marabel",
            "last_name": "Wornum",
            "email": "mwornum7@posterous.com",
            "gender": "Polygender",
            "ip_address": "247.229.14.25"
        },
        {
            "id": 9,
            "first_name": "Helenka",
            "last_name": "Mullender",
            "email": "hmullender8@cloudflare.com",
            "gender": "Non-binary",
            "ip_address": "133.216.118.88"
        },
        {
            "id": 10,
            "first_name": "Christine",
            "last_name": "Swane",
            "email": "cswane9@shop-pro.jp",
            "gender": "Polygender",
            "ip_address": "86.16.210.164"
        }
    ]
}'''

In [182]:
import json

In [183]:
person_results = json.loads(persons)

In [67]:
type(person_results)

dict

In [59]:
person_results.keys()

dict_keys(['results'])

In [60]:
type(person_results['results'])

list

In [61]:
type(person_results['results'][0])

dict

In [62]:
person_results['results'][0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

In [63]:
results = person_results['results']

In [64]:
type(results)

list

In [65]:
results[0]

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

### Exercise on processing collections

Take person_results and get list of dicts where each dict contain id, first_name and email. We would like to send an offer for all the persons in the form of email.
* You should use `map` function for the same.
* Do not use loops.

In [184]:
person_results

{'results': [{'id': 1,
   'first_name': 'Frasco',
   'last_name': 'Necolds',
   'email': 'fnecolds0@vk.com',
   'gender': 'Male',
   'ip_address': '243.67.63.34'},
  {'id': 2,
   'first_name': 'Dulce',
   'last_name': 'Santos',
   'email': 'dsantos1@mashable.com',
   'gender': 'Female',
   'ip_address': '60.30.246.227'},
  {'id': 3,
   'first_name': 'Prissie',
   'last_name': 'Tebbett',
   'email': 'ptebbett2@infoseek.co.jp',
   'gender': 'Genderfluid',
   'ip_address': '22.21.162.56'},
  {'id': 4,
   'first_name': 'Schuyler',
   'last_name': 'Coppledike',
   'email': 'scoppledike3@gnu.org',
   'gender': 'Agender',
   'ip_address': '120.35.186.161'},
  {'id': 5,
   'first_name': 'Leopold',
   'last_name': 'Jarred',
   'email': 'ljarred4@wp.com',
   'gender': 'Agender',
   'ip_address': '30.119.34.4'},
  {'id': 6,
   'first_name': 'Joanna',
   'last_name': 'Teager',
   'email': 'jteager5@apache.org',
   'gender': 'Bigender',
   'ip_address': '245.221.176.34'},
  {'id': 7,
   'first_name

In [192]:
offers_list=list(
    map(
        lambda element: {'id':element['id'],'first_name':element['first_name'],'email':element['email']},
        person_results['results']
    )
)
offers_list

[{'id': 1, 'first_name': 'Frasco', 'email': 'fnecolds0@vk.com'},
 {'id': 2, 'first_name': 'Dulce', 'email': 'dsantos1@mashable.com'},
 {'id': 3, 'first_name': 'Prissie', 'email': 'ptebbett2@infoseek.co.jp'},
 {'id': 4, 'first_name': 'Schuyler', 'email': 'scoppledike3@gnu.org'},
 {'id': 5, 'first_name': 'Leopold', 'email': 'ljarred4@wp.com'},
 {'id': 6, 'first_name': 'Joanna', 'email': 'jteager5@apache.org'},
 {'id': 7, 'first_name': 'Lion', 'email': 'lbeere6@bloomberg.com'},
 {'id': 8, 'first_name': 'Marabel', 'email': 'mwornum7@posterous.com'},
 {'id': 9, 'first_name': 'Helenka', 'email': 'hmullender8@cloudflare.com'},
 {'id': 10, 'first_name': 'Christine', 'email': 'cswane9@shop-pro.jp'}]