# Soviet Posters dataset analysis 

This dataset represents the complete descriptive metadata for the Soviet posters (http://digital.nls.uk/soviet-posters) from the Woodburn Collection, a digitised collection of posters issued in the Soviet Union between 1919 and 1930. They relate to the Russian Civil War, and economic and social issues of the 1920s. The posters were brought back from the Soviet Union by Scottish Labour MP Arthur Woodburn after his visit there in 1932. 

All items are in JPEG format, and are electronic reproductions of original works. 

Owner: National Library of Scotland

Creator: National Library of Scotland

Date created: 08/01/2016

Date updated: 23/08/2019

License: Creative Commons CC-0

In [1]:
import pandas as pd
import xml.etree.ElementTree as etree

In [2]:
def parse_item(i):
    """Parses a dublic core element and returns a dictionary in the format:
    {"element": ["items"]}
    """
    
    # set up the dictionary 
    item_dict = {}
    
    # set for all the children of the input node
    for child in i:
        
        # if the child's text doesn't display"metadata context", then add it into the item_dict
        if child.text != "Metadata context: metadata created as part of normal Library activities.":
            # Use the setdefault method of item_dict to allow us to insert and return a list if one doesn't
            # already exist at that key in item_dict, then append the child's text to the list
            item_dict.setdefault(child.tag[len(prefix):], []).append(child.text)
            
    return item_dict

In [3]:
# add the tree to parse
tree = etree.parse("Soviet-Posters-Datasets-DC.xml")

# remove the prefix {http://purl.org/dc/elements/1.1/} from the start of all the tags, 
# remove when indexing child.tag
prefix = "{http://purl.org/dc/elements/1.1/}"

# get the root of the tree
root = tree.getroot()
# go through each element in root, and parse and create a list of these
parsed_items = [parse_item(desc) for desc in root]

# set df to be a dataframe of the parsed items
df = pd.DataFrame(parsed_items)

In [4]:
df

Unnamed: 0,identifier,title,creator,format,extent,publisher,date,language,description,isVersionOf,subject,rights
0,"[74506106, http://digital.nls.uk/74506106]",[Primernye chasy kormleniia grud'iu ot 6 do 12...,[Gosudarstvennoe izdatel'stvo medit︠s︡inskoi l...,"[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Primernye chasy k...,"[Mothers, Infants, Clocks, Color prints (print...",[Available under Creative Commons license; Att...
1,"[74506108, http://digital.nls.uk/74506108]",[Frontovye chastushki [Translation: Frontline ...,"[Woodburn, Arthur]","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1921...,[Electronic reproduction of: Frontovye chastus...,"[Lenin, Vladimir Il´ich, 1870-1924, Trotsky, L...",[Available under Creative Commons license; Att...
2,"[74506110, http://digital.nls.uk/74506110]",[Chto dolzhna znat' kazhdaia zhenshchina [Tran...,"[Ioffe, A. A. (Adolʹf Abramovich), 1883-1927, ...","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Chto dolzhna znat...,"[Color prints (prints), Collections (object gr...",[Available under Creative Commons license; Att...
3,"[74506112, http://digital.nls.uk/74506112]",[Sopostavlenie fondov po otrasliam v nachale i...,"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Wo...","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Sopostavlenie fon...,"[Color prints (prints), Collections (object gr...",[Available under Creative Commons license; Att...
4,"[74506114, http://digital.nls.uk/74506114]",[Zhivotnovodstvo [Translation: Animal husbandr...,"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Wo...","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Zhivotnovodstvo [...,"[Color prints (prints), Collections (object gr...",[Available under Creative Commons license; Att...
...,...,...,...,...,...,...,...,...,...,...,...,...
67,"[74506240, http://digital.nls.uk/74506240]",[Kto-kogo ? Dognat' i peregnat' [Translation: ...,[Assot︠s︡iat︠s︡ii︠a︡ khudozhnikov revoli︠u︡t︠s...,"[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Kto-kogo ? Dognat...,"[Graphs, Locomotives, Stars, Red (color), Coll...",[Available under Creative Commons license; Att...
68,"[74506242, http://digital.nls.uk/74506242]",[Kul'turnoe stroitel'stvo : neot'emlemaia chas...,[Assot︠s︡iat︠s︡ii︠a︡ khudozhnikov revoli︠u︡t︠s...,"[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Kul'turnoe stroit...,"[Collections (object groupings), Propaganda, C...",[Available under Creative Commons license; Att...
69,"[74506244, http://digital.nls.uk/74506244]","[V 1905, 1906, i 1907 gg. krest'ianstvo gromil...","[Krasnaia nov', Woodburn, Arthur]","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,"[Electronic reproduction of: V 1905, 1906, i 1...","[Color prints (prints), Comic strips, Peasants...",[Available under Creative Commons license; Att...
70,"[74506246, http://digital.nls.uk/74506246]",[Transport [Translation: Transport].],"[Woodburn, Arthur, Gosudarstvennoe izdatelʹstv...","[still image, JPEG]",[1 online resource],"[National Library of Scotland, ]",[2005.],[None],[Date of event depicted: between 1917 and 1936...,[Electronic reproduction of: Transport [Transl...,"[Collections (object groupings), Propaganda, T...",[Available under Creative Commons license; Att...


In [5]:
# Delete unwanted columns
df = df.drop(["format",'extent', 'publisher', 'date', 'language', 'isVersionOf', 'rights'], axis=1)

In [6]:
#Allowing the column width to be maximum length 
pd.set_option('display.max_colwidth', None)

In [15]:
# Repace the names of columns with more appropriate names 
df.rename(
    columns={
        "identifier": "Photo Number + URL",
        "title": "Caption + Translation",
        "description": "Description",
        "subject": "Subject"
    },
    inplace=True
)
# Final dataframe in table format
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [16]:
df

Unnamed: 0,Photo Number + URL,Caption + Translation,creator,Description,Subject
0,"[74506106, http://digital.nls.uk/74506106]",[Primernye chasy kormleniia grud'iu ot 6 do 12 mesiatsev [Translation: Approximate times for breast-feeding from 6 to 12 months].],"[Gosudarstvennoe izdatel'stvo medit︠s︡inskoi literatury, Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured poster showing breast-feeding times for infants. Line at bottom of poster says : ""Precise feeding times for infants of different ages is determined by consultation with the doctor""..]","[Mothers, Infants, Clocks, Color prints (prints), Collections (object groupings), Propaganda, Asia, Russia, Rossiya republic, Moskva autonomous city, Moscow (inhabited place), Asia, Soviet Union (former nation/state/empire)]"
1,"[74506108, http://digital.nls.uk/74506108]",[Frontovye chastushki [Translation: Frontline verses].],"[Woodburn, Arthur]","[Date of event depicted: between 1917 and 1921, Date of event depicted: between 1917 and 1936, Coloured comic strip style poster showing scenes from the Russian Civil War. Notable for iconic depiction of Trotsky with Lenin..]","[Lenin, Vladimir Il´ich, 1870-1924, Trotsky, Leon, 1879-1940, Russian S.F.S.R. Revoli︠u︡t︠s︡ionnyĭ voennyĭ sovet Respubliki, Comic strips, Color prints (prints), Collections (object groupings), Soldiers, Politicians, Leaders (people), Bourgeoisie, Gold (metal), Money, Generals, Admirals, Civil wars, Propaganda, Asia, Russia, Rossiya republic, Moskva autonomous city, Moscow (inhabited place), Asia, Soviet Union (former nation/state/empire)]"
2,"[74506110, http://digital.nls.uk/74506110]",[Chto dolzhna znat' kazhdaia zhenshchina [Translation: What every woman should know].],"[Ioffe, A. A. (Adolʹf Abramovich), 1883-1927, Grun, O., Woodburn, Arthur, Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.)]","[Date of event depicted: between 1917 and 1936, Coloured poster consisting of four frames showing dangers of pregnancy and advising women to use the services of doctors and midwives rather than relying on ignorant old women..]","[Color prints (prints), Collections (object groupings), Pipes (smoking equipment), Physicians, Propaganda, Asia, Russia, Rossiya republic, Sankt-Peterburg autonomous city, Saint Petersburg (inhabited place), Asia, Soviet Union (former nation/state/empire)]"
3,"[74506112, http://digital.nls.uk/74506112]",[Sopostavlenie fondov po otrasliam v nachale i v kontse piatiletki [Translation: A comparison of resources devoted to the main branches of the economy at the beginning and end of the Five Year Plan].],"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured photomontage statistical poster..]","[Color prints (prints), Collections (object groupings), Statistics, Economic development, Propaganda, Asia, Russia, Rossiya republic, Sankt-Peterburg autonomous city, Saint Petersburg (inhabited place), Asia, Soviet Union (former nation/state/empire), Asia, Soviet Union (former nation/state/empire)]"
4,"[74506114, http://digital.nls.uk/74506114]",[Zhivotnovodstvo [Translation: Animal husbandry].],"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured photomontage poster showing the projected inrease in the production of livestock products during the Five Year Plan..]","[Color prints (prints), Collections (object groupings), Propaganda, Animal husbandry, Livestock, Development (function), Plans (reports), Asia, Russia, Rossiya republic, Sankt-Peterburg autonomous city, Saint Petersburg (inhabited place), Asia, Soviet Union (former nation/state/empire), Asia, Soviet Union (former nation/state/empire)]"
5,"[74506116, http://digital.nls.uk/74506116]",[Sovetskaia repa [Translation: The Soviet turnip].],"[Russian S.F.S.R. Revoli︠u︡t︠s︡ionnyĭ voennyĭ sovet Respubliki. Politicheskoe upravlenie. Literaturno-izdatelʹskiĭ otdel, Moor, Dmitriĭ Stakhievich, 1883-1946, Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured strip cartoon style poster which shows the efforts of capitalism and reaction failing to uproot the red Soviet turnip, here depicted as the headgear of a Red Army soldier..]","[Color prints (prints), Collections (object groupings), Propaganda, Capitalism, Asia, Russia, Rossiya republic, Moskva autonomous city, Moscow (inhabited place), Asia, Soviet Union (former nation/state/empire)]"
6,"[74506118, http://digital.nls.uk/74506118]",[Raspredelenie vlozhenii po sektoram narodnogo khoziaistva [Translation: The allocation of investments among sectors of the national economy].],"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured photomontage poster showing projected changes in investment allocations at different stages of the Five Year Plan..]","[Color prints (prints), Collections (object groupings), Propaganda, Plans (reports), Investment (function), Asia, Russia, Rossiya republic, Sankt-Peterburg autonomous city, Saint Petersburg (inhabited place), Asia, Soviet Union (former nation/state/empire), Asia, Soviet Union (former nation/state/empire)]"
7,"[74506120, http://digital.nls.uk/74506120]",[Transport [Translation: Transport].],"[Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured photomontage poster showing projected increases in construction in the various sectors of the transport industry during the Five Year Plan..]","[Color prints (prints), Collections (object groupings), Propaganda, Construction (assembling), Plans (reports), Transportation, Asia, Russia, Rossiya republic, Sankt-Peterburg autonomous city, Saint Petersburg (inhabited place), Asia, Soviet Union (former nation/state/empire), Asia, Soviet Union (former nation/state/empire)]"
8,"[74506122, http://digital.nls.uk/74506122]",[Trud [Translation: Labour].],"[Moor, Dmitriĭ Stakhievich, 1883-1946, Gosudarstvennoe izdatelʹstvo (R.S.F.S.R.), Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, Coloured strip cartoon style poster showing how labour is exploited under capitalism and calling on workers to join the Red Army and rid themselves of their oppressors..]","[Soviet Union. Raboche-Krestʹi︠a︡nskai︠a︡ Krasnai︠a︡ Armii︠a︡, Color prints (prints), Collections (object groupings), Propaganda, Constructivist, Soldiers, Capitalism, Workers, Asia, Russia, Rossiya republic, Moskva autonomous city, Moscow (inhabited place), Asia, Soviet Union (former nation/state/empire)]"
9,"[74506124, http://digital.nls.uk/74506124]",[V glazakh dvoitsia a v karmane nichego [Translation: You're seeing two of everything but in your pocket there's nothing].],"[Gosudarstvennoe izdatel'stvo medit︠s︡inskoi literatury, Woodburn, Arthur]","[Date of event depicted: between 1917 and 1936, An anti-alcohol campaign poster..]","[Collections (object groupings), Propaganda, Alcoholism, Sight (sense), Poverty, Asia, Russia, Rossiya republic, Moskva autonomous city, Moscow (inhabited place), Asia, Soviet Union (former nation/state/empire)]"


In [17]:
# Shows the full data in a more text-friendly way
print(df.to_string())

                            Photo Number + URL                                                                                                                                                                                                                                                Caption + Translation                                                                                                                                                                                        creator                                                                                                                                                                                                                                                                                                                                                                                                                   Description                                                                                        