# `Parse List of Lists of Dictionaries:` Solving 1 problem multiple ways! 🤠

# <font color=red size =6>Mr Fugu Data Science</font>

# (◕‿◕✿)

# Purpose & Outcome;

+ Start small: parse 1 example (multiple ways)
+ Then, work out parsing and iterating lists of dictionaries
+ Finally, move onto parsing json object with values that are lists of dictionaries

In [1]:
import json 
import pandas as pd
from collections import defaultdict
from collections import ChainMap

In [2]:
# NY Times API Raw Data:
df_to_parse=pd.read_json('nytimes_api.json')
df_to_parse['media']

0     [{'type': 'image', 'subtype': 'photo', 'captio...
1     [{'type': 'image', 'subtype': 'photo', 'captio...
2     [{'type': 'image', 'subtype': 'photo', 'captio...
3     [{'type': 'image', 'subtype': 'photo', 'captio...
4     [{'type': 'image', 'subtype': 'photo', 'captio...
5     [{'type': 'image', 'subtype': 'photo', 'captio...
6     [{'type': 'image', 'subtype': 'photo', 'captio...
7                                                    []
8     [{'type': 'image', 'subtype': 'photo', 'captio...
9     [{'type': 'image', 'subtype': 'photo', 'captio...
10    [{'type': 'image', 'subtype': 'photo', 'captio...
11    [{'type': 'image', 'subtype': 'photo', 'captio...
12    [{'type': 'image', 'subtype': 'photo', 'captio...
13                                                   []
14    [{'type': 'image', 'subtype': 'photo', 'captio...
15                                                   []
16    [{'type': 'image', 'subtype': 'photo', 'captio...
17    [{'type': 'image', 'subtype': 'photo', 'ca

# list of dictionaries: Single Entry

In [3]:
df_to_parse['media'][0] # single entry to parse first as practice


[{'type': 'image',
  'subtype': 'photo',
  'caption': 'Michael Reinoehl was killed by a federally led fugitive task force in Lacey, Wash., on Thursday. He was being investigated in a fatal shooting at a Portland protest.',
  'copyright': 'Joshua Bessex for The New York Times',
  'approved_for_syndication': 1,
  'media-metadata': [{'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg',
    'format': 'Standard Thumbnail',
    'height': 75,
    'width': 75},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg',
    'format': 'mediumThreeByTwo210',
    'height': 140,
    'width': 210},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg',
    'format': 'mediumThreeByTwo440',
    'height': 293,
    'width': 440}]}]

# Inside `df_to_parse['media'] `there is another list of dictionaries, which is one of the values (*further nesting*) 


`'media-metadata': [{'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg',
    'format': 'Standard Thumbnail',
    'height': 75,
    'width': 75},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg',
    'format': 'mediumThreeByTwo210',
    'height': 140,
    'width': 210},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg',
    'format': 'mediumThreeByTwo440',
    'height': 293,
    'width': 440}]}]`
 
`----------------------------------`

**We will work backwards and get an idea how to deal with this**

+ To Clarify: this is 1 entry of `media-metadata`, so pay attention. Each *url* should be a list of *url's* for each entry.
    + Thus, further formatting is needed. 

In [4]:
media_meta={'media-metadata': [{'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg',
    'format': 'Standard Thumbnail',
    'height': 75,
    'width': 75},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg',
    'format': 'mediumThreeByTwo210',
    'height': 140,
    'width': 210},
   {'url': 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg',
    'format': 'mediumThreeByTwo440',
    'height': 293,
    'width': 440}]}
    
    
pd.DataFrame(media_meta['media-metadata'])

Unnamed: 0,url,format,height,width
0,https://static01.nyt.com/images/2020/10/03/us/...,Standard Thumbnail,75,75
1,https://static01.nyt.com/images/2020/10/03/us/...,mediumThreeByTwo210,140,210
2,https://static01.nyt.com/images/2020/10/03/us/...,mediumThreeByTwo440,293,440


# This is in theory how each of the *url's* should appear *`when done parsing and formatting`*

+ `Notice:` that the above cell had a list of *url's* but each url was a separate dictionary.
+ In this cell they have been converted to proper form for what I am trying to do here

In [5]:


two_entries=[{'url': ['https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg',
              'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg',
              'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg']},
{'url': ['https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg',
              'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg',
              'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg']}]
pd.DataFrame(two_entries)

Unnamed: 0,url
0,[https://static01.nyt.com/images/2020/10/03/us...
1,[https://static01.nyt.com/images/2020/10/03/us...


# `Example 01`:  Use `Collections ChainMap`

+ This allows us to: essentially group multiple dictionaries to a single dictionary, with many options such as updates for example. Please, look at documentation for further explanation. 

https://docs.python.org/3/library/collections.html

In [6]:

multiDct_to_single=[]
for i in range(len(df_to_parse['media'])):
    qq=dict(ChainMap(*df_to_parse['media'][i][::-1])) # notice the star!
    multiDct_to_single.append(qq)


pd.DataFrame(multiDct_to_single).head()

Unnamed: 0,type,subtype,caption,copyright,approved_for_syndication,media-metadata
0,image,photo,Michael Reinoehl was killed by a federally led...,Joshua Bessex for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
1,image,photo,The actor Chadwick Boseman in 2018. He was 35 ...,Axel Koester for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
2,image,photo,"I can’t get in to edit the digital column, Sho...",Mason Trinca for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
3,image,photo,Tests authorized by the F.D.A. provide only a ...,Johnny Milano for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
4,image,photo,"Claudia Perez, left, and Carmen Quiñones appea...",Republican National Convention,1.0,[{'url': 'https://static01.nyt.com/images/2020...


# `Example 02`: List comprehension, followed by iteratation

In [7]:
def merge_(*dicts):
    for i in dicts:
        return dict(*[d.items() for d in i])

In [8]:
func_w_iter=[]
for i in range(len(df_to_parse['media'])):
    a=merge_(df_to_parse['media'][i])
    func_w_iter.append(a)

pd.DataFrame(func_w_iter).head()

Unnamed: 0,type,subtype,caption,copyright,approved_for_syndication,media-metadata
0,image,photo,Michael Reinoehl was killed by a federally led...,Joshua Bessex for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
1,image,photo,The actor Chadwick Boseman in 2018. He was 35 ...,Axel Koester for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
2,image,photo,"I can’t get in to edit the digital column, Sho...",Mason Trinca for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
3,image,photo,Tests authorized by the F.D.A. provide only a ...,Johnny Milano for The New York Times,1.0,[{'url': 'https://static01.nyt.com/images/2020...
4,image,photo,"Claudia Perez, left, and Carmen Quiñones appea...",Republican National Convention,1.0,[{'url': 'https://static01.nyt.com/images/2020...


# Example 03: `Parse List of Lists of Dictionaries`

<font size=6>`Souppa Crazy Version: Wowzas`</font>

#  <font size=9>🤯</font>

+ Don't get scared, it won't bite. Let's walk through it ok.

In [9]:

h=defaultdict(list)
d=[]
for i in range(len(df_to_parse['media'])):
    
    if len(df_to_parse['media'][i]) >0: # each list can be empty=0 or filled with len=1
        
        for j in df_to_parse['media'][i][0].items(): # convert dict->tup and iterate
            d.append([i,j]) # append (i=location,j=tuples)

    else: # w= my keys from above and create tuples with Nope as values
        
        w=tuple(zip(df_to_parse['media'][0][0].keys(),\
                  ['Sorry Nope']*len(df_to_parse['media'][0][0].keys())))
        for ii in w:
#             
            d.append([i,ii])

for jk in d:
    h[jk[1][0]].append(jk[1][1]) # list of lists again with tuples as second param

    
new_df=pd.DataFrame(h)
new_df

Unnamed: 0,type,subtype,caption,copyright,approved_for_syndication,media-metadata
0,image,photo,Michael Reinoehl was killed by a federally led...,Joshua Bessex for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
1,image,photo,The actor Chadwick Boseman in 2018. He was 35 ...,Axel Koester for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
2,image,photo,"I can’t get in to edit the digital column, Sho...",Mason Trinca for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
3,image,photo,Tests authorized by the F.D.A. provide only a ...,Johnny Milano for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
4,image,photo,"Claudia Perez, left, and Carmen Quiñones appea...",Republican National Convention,1,[{'url': 'https://static01.nyt.com/images/2020...
5,image,photo,Protesters gathered at the site where Daniel P...,Joshua Rashaad McFadden for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
6,image,photo,,"Brendan Gutenschwager, via Storyful",0,[{'url': 'https://static01.nyt.com/images/2020...
7,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope
8,image,photo,"A queue for an open house in Belleville, a New...",Karsten Moran for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...
9,image,photo,"Curtis Flowers, center, as he exited the Winst...",Rogelio V. Solis/Associated Press,1,[{'url': 'https://static01.nyt.com/images/2020...


# I created this to illustrate how I made the tuples based on the keys for our dictionary.

+ **`Side Note`**: if you have lists of dictionaries of varying lengths then you have to do 1 more step.
    + You would have to basically find all unique keys such as using a `set` operator. Then you would need to do `if/else` statements to make sure that everything lined up.
    
`This example didn't have this problem`

In [10]:
tuple(zip(df_to_parse['media'][0][0].keys(),\
                  ['Nope']*len(df_to_parse['media'][0][0].keys())))

(('type', 'Nope'),
 ('subtype', 'Nope'),
 ('caption', 'Nope'),
 ('copyright', 'Nope'),
 ('approved_for_syndication', 'Nope'),
 ('media-metadata', 'Nope'))

# `Example 04`: Inner most list of lists of dictionaries `['media-metadata']`

In [11]:
'''
func(): Iterate through dataframe column, rows that are string that I 
made correspond to values that were not present. I will ignore them and only work with
lists and create the default dict list.

the second part will iterate through the DF, and take what is not a string and append to
list that will have the default dict, then I do else statement to take care of the strings
and append the tups; I created which store the same keys but a string as value. This is
done for formatting properly for dataframe as dicts for reading in each row.

'''
def func(dict_):
    h=defaultdict(list)
    for j in dict_:
        if type(j)!=str:
            for i in j.items():
                h[i[0]].append(i[1])
    return h


t_=[]
g=defaultdict(list)
for i in range(len(new_df['media-metadata'])):
    
    if type(new_df['media-metadata'][i])!=str:
        t_.append(func(new_df['media-metadata'][i]))
    
    else:
        tups=tuple(zip(new_df['media-metadata'][0][0].keys() ,
                               ['Nope']*len(new_df['media-metadata'][0][0].keys())))
        for n in tups:
            g[n[0]].append(n[1])
        t_.append(g)
        

# verify that each row is correct and not using the same thing repeated
print(pd.DataFrame(t_).url[0])
print('--------------------')
print(pd.DataFrame(t_).url[1])

#
url_expand_=pd.DataFrame(t_)
url_expand_

['https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-thumbStandard.jpg', 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo210.jpg', 'https://static01.nyt.com/images/2020/10/03/us/03portland-suspect-alt/03portland-suspect-alt-mediumThreeByTwo440.jpg']
--------------------
['https://static01.nyt.com/images/2020/08/29/multimedia/29xp-boseman1/29xp-boseman1-thumbStandard.jpg', 'https://static01.nyt.com/images/2020/08/29/multimedia/29xp-boseman1/29xp-boseman1-mediumThreeByTwo210.jpg', 'https://static01.nyt.com/images/2020/08/29/multimedia/29xp-boseman1/29xp-boseman1-mediumThreeByTwo440.jpg']


Unnamed: 0,url,format,height,width
0,[https://static01.nyt.com/images/2020/10/03/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
1,[https://static01.nyt.com/images/2020/08/29/mu...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
2,[https://static01.nyt.com/images/2020/08/30/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
3,[https://static01.nyt.com/images/2020/08/30/sc...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
4,[https://static01.nyt.com/images/2020/08/28/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
5,[https://static01.nyt.com/images/2020/09/05/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
6,[https://static01.nyt.com/images/2020/08/28/vi...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
7,"[Nope, Nope, Nope]","[Nope, Nope, Nope]","[Nope, Nope, Nope]","[Nope, Nope, Nope]"
8,[https://static01.nyt.com/images/2020/08/28/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
9,[https://static01.nyt.com/images/2020/09/04/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"


# Combine All Data into 1 DF:

In [12]:

pd.concat([df_to_parse,new_df,url_expand_],axis=1)

Unnamed: 0,uri,url,id,asset_id,source,published_date,updated,section,subsection,nytdsection,...,type,subtype,caption,copyright,approved_for_syndication,media-metadata,url.1,format,height,width
0,nyt://article/f0510da8-1ef8-5442-a909-8af53b7d...,https://www.nytimes.com/2020/09/03/us/michael-...,100000007321101,100000007321101,New York Times,2020-09-03,2020-09-05 10:04:00,U.S.,,u.s.,...,image,photo,Michael Reinoehl was killed by a federally led...,Joshua Bessex for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/10/03/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
1,nyt://article/607123ea-14ba-5f9c-ab43-7d8b6c7a...,https://www.nytimes.com/2020/08/28/movies/chad...,100000007314593,100000007314593,New York Times,2020-08-28,2020-08-31 10:07:14,Movies,,movies,...,image,photo,The actor Chadwick Boseman in 2018. He was 35 ...,Axel Koester for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/29/mu...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
2,nyt://article/6bff4972-07cc-5b20-bd16-39f9cf19...,https://www.nytimes.com/2020/08/30/us/portland...,100000007315198,100000007315198,New York Times,2020-08-30,2020-09-05 10:05:01,U.S.,,u.s.,...,image,photo,"I can’t get in to edit the digital column, Sho...",Mason Trinca for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/30/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
3,nyt://article/0487a919-ec10-5bf5-8f65-449c7a78...,https://www.nytimes.com/2020/08/29/health/coro...,100000007294406,100000007294406,New York Times,2020-08-29,2020-09-01 21:09:22,Health,,health,...,image,photo,Tests authorized by the F.D.A. provide only a ...,Johnny Milano for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/30/sc...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
4,nyt://article/7e66f291-6167-5d78-b942-4937278f...,https://www.nytimes.com/2020/08/28/nyregion/ny...,100000007313944,100000007313944,New York Times,2020-08-28,2020-09-03 11:03:02,New York,,new york,...,image,photo,"Claudia Perez, left, and Carmen Quiñones appea...",Republican National Convention,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/28/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
5,nyt://article/93323728-2566-5176-9909-d107ce96...,https://www.nytimes.com/2020/09/03/nyregion/da...,100000007323110,100000007323110,New York Times,2020-09-03,2020-09-04 19:21:30,New York,,new york,...,image,photo,Protesters gathered at the site where Daniel P...,Joshua Rashaad McFadden for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/09/05/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
6,nyt://article/fa102828-c20a-5f7c-8685-de98792a...,https://www.nytimes.com/2020/08/27/us/kyle-rit...,100000007309185,100000007309185,New York Times,2020-08-27,2020-09-03 14:36:58,U.S.,,u.s.,...,image,photo,,"Brendan Gutenschwager, via Storyful",0,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/28/vi...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
7,nyt://article/b1dd8dce-004c-556c-8043-ae7f9216...,https://www.nytimes.com/2020/08/28/world/europ...,100000007312969,100000007312969,New York Times,2020-08-28,2020-08-29 12:40:00,World,Europe,world,...,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope,Sorry Nope,"[Nope, Nope, Nope]","[Nope, Nope, Nope]","[Nope, Nope, Nope]","[Nope, Nope, Nope]"
8,nyt://article/024a4b57-909b-538d-91f4-fd8522c9...,https://www.nytimes.com/2020/08/30/nyregion/ny...,100000007292267,100000007292267,New York Times,2020-08-30,2020-08-31 03:14:52,New York,,new york,...,image,photo,"A queue for an open house in Belleville, a New...",Karsten Moran for The New York Times,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/08/28/ny...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"
9,nyt://article/587ec74f-656f-5d4b-b850-87b83536...,https://www.nytimes.com/2020/09/04/us/after-6-...,100000007325874,100000007325874,New York Times,2020-09-04,2020-09-05 00:33:44,U.S.,,u.s.,...,image,photo,"Curtis Flowers, center, as he exited the Winst...",Rogelio V. Solis/Associated Press,1,[{'url': 'https://static01.nyt.com/images/2020...,[https://static01.nyt.com/images/2020/09/04/us...,"[Standard Thumbnail, mediumThreeByTwo210, medi...","[75, 140, 293]","[75, 210, 440]"


# Last Thoughts:

There are always various ways to solve a problem based on skill level, experience and Time/Memory tradeoff. 

+ If you want a speed up and have the ability try using list comprehensions vs loops.
+ There are times when index valus MATTER, like I did in this video. Therefore, preserving order matters, and you have to adjust your code.
+ Finally, you can write elegant code over time: but, doesn't always mean that it is the best way to solve or fastest way either.

# <font color=red>LIKE</font>, Share &

# <font color=red>SUB</font>scribe

`--------------------------------`

# Citations & Help:

# <font size=8 > ◔̯◔</font>

https://stackoverflow.com/questions/10756427/loop-through-all-nested-dictionary-values

https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/

https://florimond.dev/blog/articles/2018/07/a-practical-usage-of-chainmap-in-python/