# Manipulating the items on the archive.org website
The `archive.org` is arguably the largest collection of community contributed collection of `items` such as books, movies, audio, images and even code. The snippets in this notebook can be used for automating your interactions with the `archive.org` servers. 

In order for making the interactions smoother, they've released a python library named `internetarchive` with serveral methods that will allow you in seemlessly interact with their servers.

Keep in mind that you need to configure your work system before you can start interacting with the servers, especially if you are planning to make changes on the items you've submitted on `archive.org` website.

## Configuring your system
The first step is to configure the system for using library. They have a tiny script, `ia`, that will help you configure and secure your account.

TODO: Write the steps to use `ia`.

In [69]:
from internetarchive import upload, get_item, modify_metadata, search_items

## Reading the Metadata of an exisiting item
An item on archive.org represents a single entry on the server's catalogue. Each item has two parts: its actual file(s), as well as its description. The description of an item is known as the metadata. You can access all of an item's metadata via the `item` object. 

In [105]:
# To get the details of an item, you'd need the identifier
item = get_item('vilpattukaltest1980kssp')
item.item_metadata['metadata']

{'identifier': 'vilpattukaltest1980kssp',
 'mediatype': 'texts',
 'collection': ['kssp-archives', 'kerala-archives', 'additional_collections'],
 'creator': 'Kerala Sasthra Sahithya Parishad',
 'date': '1980',
 'description': 'വിൽപ്പാട്ടുകൾ',
 'language': 'mal',
 'licenseurl': 'https://creativecommons.org/licenses/by-sa/4.0/',
 'scanner': 'Internet Archive HTML5 Uploader 1.6.4',
 'title': 'വിൽപ്പാട്ടുകൾ - ശാസ്ത്രകലാജാഥ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്',
 'uploader': 'shijualexonline@gmail.com',
 'publicdate': '2020-03-16 16:06:06',
 'addeddate': '2020-03-16 16:06:06',
 'identifier-access': 'http://archive.org/details/vilpattukaltest1980kssp',
 'identifier-ark': 'ark:/13960/t0wq8k30h',
 'imagecount': '20',
 'ppi': '600',
 'ocr': 'language not currently OCRable',
 'year': '1980',
 'subject': 'Kilipattu;  1982'}

## Upload an item
It's fairly easy to establish a connection for creating an item on archive.org and uploading file to it. Remember that we need to upload both the actual files and metadata when we create a new item.

> If the item already has a file with the same filename, the existing file within the item will be overwritten.


In [114]:
# metadata is submitted to the archive.org as a dictionary. Remember that the subject element must be a simi-colon separated string 
md = dict(title='Title of the item', mediatype='movies', subject='test; magazines')


In [115]:
r = upload('ia-test-upload', files=['test.txt'], metadata=md)

In [116]:
r[0].status_code

200

**Success**

This has created the above item on the archive.org site. 
[Title of the item](https://archive.org/details/ia-test-upload)

## Modify the metadata

In [74]:
item_id = "kalivela2002kssp"
item = get_item(item_id)
cur_sub = item.item_metadata['metadata']['subject']
print(cur_sub)
cur_sub = "; ".join([lambda sub: sub.strip()[for su])
rem
if item_id and new_sub:
    r = modify_metadata(item_id, metadata=dict(subject=new_sub))
    r.status_code

In [78]:
item_id = "kalivela2002kssp"
item = get_item(item_id)
cur_sub = item.item_metadata['metadata']['subject']

In [79]:
rem_sub = 'Kerala Archives'
print(cur_sub)
cur_sub.pop(cur_sub.index('Kerala Archives'))
print(cur_sub)

['KSSP leaflets', 'Kerala Archives']
['KSSP leaflets']


In [84]:
kssp = list(search_items('kssp-archives'))


In [184]:
for book in kssp:
    print(book['identifier'])

1966octsasthraga0000kssp
1969decsasthrake0000kssp
1969novsasthrake0000kssp
1969sathrakerala0000kssp
1970augeureka0000kssp
1970deceureka0000kssp
1970febsasthrake0000kssp
1970jansasthrake0000kssp
1970marsasthrake0000kssp
1970maysasthrake0000kssp
1970noveureka0000kssp
1970sepeureka0000kssp
1970sepeureka0000kssp_v4f2
1971apreureka0000kssp
1971febeureka0000kssp
1971janeureka0000kssp
1971mareureka0000kssp
1971mayeureka0000kssp
1983arogyarekha0000kssp
1983mannan0000rajm
1983nammudearogy0000ekba
1983pradhamasusr0000jaya
1984marunnuvyvas0000ekba
1984urjamchandra0000kssp
1985arationalstu0000jami
1985raionalityst0000shis
1986vaidyuthipra0000kssp
1986vaithyuthipr0000kssp
1986vanamvellam0000kssp
1986vyavasayaval0000kssp
1986vyavasayaval0000unse
1987janakeeyaaro0000kssp
1988vaidyuthikan0000mppa
1988vaidyuthinir0000unse
1989aksharathiln0000kssp
1989arogyasurvey0000kssp
1989deseeyavanit0000kssp
1989parishathums0000kssp
1989parishathums0000kssp_z0l1
1989randayiraman0000kssp
1989rogaprathiro0000kssp
198

In [93]:
for item_id in kssp:
    item = get_item(item_id['identifier'])
    cur_sub = item.item_metadata.get(['metadata']['subject'], "")
    print(cur_sub, type(cur_sub))
#     try:
#         cur_sub.pop(cur_sub.index('Kerala Archives'))
#     except AttributeError:
#         cur_sub = cur_sub.replace('Kerala Archives;', "")
#     print(cur_sub)
#     r = modify_metadata(item, metadata=dict(subject=cur_sub))
#     if r.status_code == 200:
#         print("Successfully updated the subject of {} as {}".format(item, cur_sub))
#     else:
#         print("Something went wrong while updating the subject of {} as {}".format(item, cur_sub))

Sasthragathi <class 'str'>
Sasthra Keralam Magazine <class 'str'>
Sasthra Keralam Magazine, KSSP Science Magazine <class 'str'>
Sasthra Keralam Magazine, Malayalam Science Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Sasthra Keralam Magazine, KSSP Science Magazine <class 'str'>
Sasthra Keralam Magazine, KSSP Science Magazine <class 'str'>
Sasthra Keralam Magazine, KSSP Science Magazine <class 'str'>
Sasthra Keralam Magazin; Malayalam Science Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine, Malayalam Children's Magazine <class 'str'>
Eureka Magazine <class 'str'>
KSSP 

KeyError: 'subject'

In [98]:
for item_id in kssp[107:]:
    item = get_item(item_id['identifier'])
#     print(item)
    try:
        cur_sub = item.item_metadata['metadata']['subject']
    except Exception as e:
        cur_sub = str(e)
        pass
    item_title = item.item_metadata
    print(cur_sub, type(cur_sub))

Malayalam Science Education Books <class 'str'>
Malayalam Science Education Books <class 'str'>
['KSSP leaflets', 'Kerala Archives'] <class 'list'>
['Parishath Geology Books, Rock', 'KSSP Books'] <class 'list'>
Astronomy, Cosmology <class 'str'>
['Bharat Gyanvigyan Samithi Kerala, Street Theatre in Kerala, Literacy, Gender', 'Kerala Sasthra Sahithya Parishad', 'Kerala Archives'] <class 'list'>
Kerala Coconut Industry; KSSP leaflets <class 'str'>
['Dunkal Draft, New Economic Policy, Agriculture', 'Kerala Swashraya Samithy'] <class 'list'>
Kala Jatha, Street Theatre <class 'str'>
Kala Jatha, Ecology <class 'str'>
Kerala Panchayathraj <class 'str'>
Education <class 'str'>
KSSP General Documents <class 'str'>
KSSP leaflets;KSSP Health Books;Kerala Health Books <class 'str'>
['Development, Decentralization', 'Kerala Archives'] <class 'list'>
['Bamboo', 'Kerala Environment'] <class 'list'>
Kala Jatha, Street Theatre <class 'str'>
Kala Jatha, Street Theatre <class 'str'>
news kssp-inaguration

In [182]:
for item_id in kssp:
    item = get_item(item_id['identifier'])
    cur_sub = False
#     print(item)
    try:
        cur_sub = item.item_metadata['metadata']['subject']
    except Exception as e:
        print(str(e))
        
        pass
    item_title = item.item_metadata['metadata']['title']
    print("{}\t{}\t{}\t{}".format(item_id['identifier'], item_title, cur_sub, normalize_subject(cur_sub), type(cur_sub)))

1966octsasthraga0000kssp	ശാസ്ത്രഗതി ലക്കം 1 - 1966 ഒക്ടോബർ - കേരള ശാസ്ത്രസാഹിത്യപരിഷത്ത്	Sasthragathi	Sasthragathi
1969decsasthrake0000kssp	ശാസ്ത്രകേരളം - സയൻസു മാസിക - 1969 ഡിസംബർ	Sasthra Keralam Magazine	Sasthra Keralam Magazine
1969novsasthrake0000kssp	ശാസ്ത്രകേരളം - സയൻസു മാസിക - 1969 നവം‌ബർ	Sasthra Keralam Magazine, KSSP Science Magazine	Sasthra Keralam Magazine; KSSP Science Magazine
1969sathrakerala0000kssp	ശാസ്ത്രകേരളം - സയൻസു മാസിക - 1969 ആഗസ്റ്റ്	Sasthra Keralam Magazine, Malayalam Science Magazine	Sasthra Keralam Magazine; Malayalam Science Magazine
1970augeureka0000kssp	യുറീക്ക - കുട്ടികളുടെ ശാസ്ത്രമാസിക - 1970 ആഗസ്റ്റ് - വാല്യം 1 ലക്കം 3	Eureka Magazine, Malayalam Children's Magazine	Eureka Magazine; Malayalam Children's Magazine
1970deceureka0000kssp	യുറീക്ക - കുട്ടികളുടെ ശാസ്ത്രമാസിക - 1970 ഡിസമ്പർ - വാല്യം 1 ലക്കം 7	Eureka Magazine, Malayalam Children's Magazine	Eureka Magazine; Malayalam Children's Magazine
1970febsasthrake0000kssp	ശാസ്ത്രകേരളം - സയൻസു മാസിക - 1970 ഫെബ

1993sthreekalude0000para	1993 - സ്ത്രീകളുടെ ആരോഗ്യപ്രശ്നങ്ങൾ - ഡോ: സി.എൻ. പരമേശ്വരൻ	KSSP leaflets; Women's Health; KSSP Books about Health; KSSP books about women	KSSP leaflets; Women's Health; KSSP Books about Health; KSSP books about women
1993vanithasasth0000kssp	1993 - വനിതാ ശാസ്ത്രസംഗമം - കൈപുസ്തകം	KSSP leaflets;KSSP Gender Books;Gender Bias in Kerala	KSSP leaflets; KSSP Gender Books; Gender Bias in Kerala
1994elippaniyump0000kssp	1994 - എലിപ്പനിയും പ്ലേഗും	KSSP leaflets;KSSP Health Books;Leptospirosis;Plague	KSSP leaflets; KSSP Health Books; Leptospirosis; Plague
1994urjarekha0000unse	1994 - ഊർജ്ജരേഖ	KSSP leaflets;Power Problem of Kerala, KSSP Books about Power	KSSP leaflets; Power Problem of Kerala; KSSP Books about Power
1995arogyarangat0000kssp	1995 - ആരോഗ്യരംഗത്തെ പ്രശ്നങ്ങൾ	KSSP leaflets;KSSP Health Books	KSSP leaflets; KSSP Health Books
1995bhashasamska0000kssp	1995 - ജനകീയ വിദ്യാഭ്യാസ നിഷേധം കേരളത്തിൽ	KSSP leaflets;Kerala Public Education;KSSP Education Books	KSSP leaflets

indiayerakshikkan1992kssp	ഇന്ത്യയെ രക്ഷിക്കാൻ – കേരള സ്വാശ്രയ സമിതി	['Globalization, Dunkal Draft, Indian Economy', 'Kerala Swashraya Samithy']	Globalization; Dunkal Draft; Indian Economy; Kerala Swashraya Samithy
irupathonnamnutandu1986kssp	ഇരുപത്തൊന്നാം നൂറ്റാണ്ടിലേക്ക് - ഗാനങ്ങളും സംഗീതശില്പങ്ങളും നാടകങ്ങളും - ശാസ്ത്രസാംസ്കാരികജാഥ 1986 - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Sasthra Kala Jatha, Street Theatre	Sasthra Kala Jatha; Street Theatre
isrocharacase1995kssp	ഇന്ത്യൻ ബഹിരാകാശ ഗവേഷണം ചാരവൃത്തിക്കു മുമ്പും പിമ്പും - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	['ISRO; KSSP leaflets', 'Kerala Archives']	ISRO; KSSP leaflets; Kerala Archives
janakeeyasasthraprasthanam1992kssp	ജനകീയ ശാസ്ത്രപ്രസ്ഥാനം വിശാലമാക്കുന്ന പ്രവർത്തന മേഖലകൾ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	People's Science Movement in Kerala	People's Science Movement in Kerala
janakeeyasuthranavivadam2003kssp	ജനകീയാസൂത്രണ വിവാദം - ശാസ്ത്രസാഹിത്യ പരിഷത്തിന് പറയാനുള്ളത്	People's Planning	People's Planning
janakiyasuthranam2002kssp	ജനകീയാസൂത്രണം - അനുഭവങ്

njansthree1990kssp	ഞാൻ സ്ത്രീ - വനിതാകലാജാഥാ സ്ക്രിപ്റ്റുകൾ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Kala Jatha, Street Theatre	Kala Jatha; Street Theatre
onnaampadam1995kssp	ഒന്നാം പാഠം - വിദ്യാഭ്യാസ ജാഥാ സ്ക്രിപ്റ്റുകൾ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Kala Jatha, Street Theatre, Kerala Education	Kala Jatha; Street Theatre; Kerala Education
orudheeraswapnam1994kssp	ഒരു ധീര സ്വപ്നം -  ശാസ്ത്രകലാജാഥാ സ്ക്രിപ്റ്റുകൾ - ശാസ്ത്ര സാംസ്കാരിക ജാഥ 1994 - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Sasthra Kala Jatha, Street Theatre	Sasthra Kala Jatha; Street Theatre
orukunjujanikkunnu0000kssp	ഒരു കുഞ്ഞ് ജനിക്കുന്നു - ഗ്രാമശാസ്ത്രജാഥ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Reproduction, Sex Education	Reproduction; Sex Education
orumaramoruvaram1987kssp	ഒരു മരം ഒരു വരം - പാവനാടകങ്ങൾ - ബാലോത്സവജാഥ 1987 - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Balalsava Jatha, Street Theatre	Balalsava Jatha; Street Theatre
othukali1984kssp	ഒത്തുകളി - ദേശീയ കലാരൂപങ്ങൾ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Kala Jatha, Street Theatre	Kala Jatha; Street Theatre
ottanthul

souraaduppu0000pgpa	1981 - സൗര അടുപ്പ് - പി.ജി. പത്മനാഭൻ	KSSP leaflets, KSSP energy Books	KSSP leaflets; KSSP energy Books
spartacs1981kssp	സ്‌പാർട്ടക്കസ് (കഥാപ്രസംഗം) - ശാസ്ത്രകലാജാഥ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	['Sasthra Kala Jatha,', 'Kerala Archives']	Sasthra Kala Jatha; ; Kerala Archives
stateconventiono0000kssp	State Convention of Modern Medical Doctors - Draft Manifesto for Discussion	KSSP leaflets;KSSP Health Books;Modern Medical Doctors	KSSP leaflets; KSSP Health Books; Modern Medical Doctors
sthreekalumpinth0000sama	സ്ത്രീകളും പിന്തുടർച്ചാവകാശവും - സമതാവേദി	KSSP leaflets; Samatha leaflets, Gender Bias in Kerala	KSSP leaflets; Samatha leaflets; Gender Bias in Kerala
sthreekalumsaksh0000kssp	സ്ത്രീകളും സാക്ഷരതയും (ചർച്ചയ്ക്കുള്ള കുറിപ്പ്)	KSSP leaflets;KSSP Gender Books;Gender Bias in Kerala	KSSP leaflets; KSSP Gender Books; Gender Bias in Kerala
suryanteathmakadha1979kssp	സൂര്യന്റെ ആത്മകഥ - വി.കെ. ദാമോദരൻ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്	Sun, Energy	Sun; Energy
susthiravikasan

In [183]:
len(kssp)

236

## Normalizer
This module normalizes current subject into the correct format expected by the archive.org website. Currently there are several non-standard forms there.

Here are a few examples:
```
'Sasthra Kala Jatha, Street Theatre' # 'str', but comma seperated
['Kerala Swasraya Samithi, Indian Agriculture, Globalization', 'Kerala Swashraya Samithy'] # 'list', comma separated
'KSSP leaflets;KSSP Health Books;Modern Medical Doctors' # 'str', but no space between entries
```

So before updating them with the new entries, we need to make sure that the subjects lines are formulated according to the standard format expeted by `archive.org` website.

In [141]:
subject_text = 'KSSP leaflets;KSSP Health Books;Modern Medical Doctors'
normalize_subject(subject_text)

KSSP leaflets; KSSP Health Books; Modern Medical Doctors


In [179]:
def normalize_subject(subject_text):
    if subject_text:
        if type(subject_text) == list:
            subject_text = ["; ".join(y.strip() for y in re.split(r'[;,]', x)) for x in subject_text]
            return("; ".join(subject_text))
        else:
            subject_text = re.split(r'[;,]', subject_text)
            subject_text = [x.strip() for x in subject_text]
            return("; ".join(subject_text))
    else:
        return ""

In [47]:
shiju_list = []
for i in search_items('uploader:shijualexonline@gmail.com'):
    shiju_list.append(i["identifier"])


In [48]:
shiju_list

['1606_Synodo_Diocesano_Da_Igreia_E_Bispado',
 '1732_Grammatica_Grandonica_Ernst_Hanxleden',
 '1745_Historia_Ecclesiae_Malabarica_Cum_Diamper',
 '1772AlphabetumGrandonicoMalabaricum',
 '1787SaggioPraticoDelleLingue',
 '1790_Siddharupam_Paulinus',
 '1791_AlphabetaIndica_Paulinus',
 '1791_Centum_Adagia_Malabarica_Paulinus',
 '1797_Systema_Brahmanicum_Liturgicum_Mythologi_Paulinus',
 '1799_Grammar_Of_The_MalabarLanguage',
 '1811RambanBibleMalayalam',
 '1813CMSMissionaryRegister',
 '1814CMSMissionaryRegister',
 '1815CMSMissionaryRegister',
 '1815DissertationTheSecondOnTheMalayalmaLanguage',
 '1816CMSMissionaryRegister',
 '1817CMSMissionaryRegister',
 '1818CMSMissionaryRegister',
 '1819CMSMissionaryRegister',
 '1820CMSMissionaryRegister',
 '1821CMSMissionaryRegister',
 '1822CMSMissionaryRegister',
 '1823CMSMissionaryRegister',
 '1824CMSMissionaryRegister',
 '1824_Cheru_Paithangal',
 '1825-Mathayiyude-evangeliyon',
 '1825CMSMissionaryRegister',
 '1826CMSMissionaryRegister',
 '1827CMSMissiona

In [175]:
len(shiju_list)

1094

In [None]:
for item in shiju_list[:3]:
    

In [67]:
"https://archive.org/details/kssp-archives?and[]=subject%3A%22Kerala+Archives%22"

'https://archive.org/details/kssp-archives?and[]=subject%3A%22Kerala+Archives%22'

In [155]:
subject_list = '''Sasthragathi\nSasthra Keralam Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, Malayalam Science Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazin; Malayalam Science Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books, Measles\nKSSP leaflets; Kerala Health\nKSSP leaflets; First aid, KSSP Health Books\nKSSP leaflets; KSSP Health Books\nKSSP leaflets;KSSP Books about Power\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;KSSP Energy Books;Power Problex in Kerala\nKSSP leaflets;Kerala Energy;KSSP Ecology Books\nKSSP leaflets; KSSP Development Books; Kerala Development\nKSSP leaflets; KSSP Development Books; Kerala Development\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;Kerala Power Problem\nKSSP leaflets;KSSP Health Books;Kerala Health Problem\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets about Gender\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets; KSSP Books about Gender Bias\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Gender Books;Civil Code and Gender Bias\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books;Medicine Price Hike\nKSSP leaflets;KSSP Gender Books;Gender Bias\nKSSP leaflets; Women's Health; KSSP Books about Health; KSSP books about women\nKSSP leaflets;KSSP Gender Books;Gender Bias in Kerala\nKSSP leaflets;KSSP Health Books;Leptospirosis;Plague\nKSSP leaflets;Power Problem of Kerala, KSSP Books about Power\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Public Education;KSSP Education Books\nKSSP leaflets;Kerala Public Education;KSSP Education Books; Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education;KSSP Education Books;Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education, Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education;KSSP Education Books;KSSP and Education Debates\nKSSP leaflets; Kerala Education; KSSP Books about Education; Vidyabhyasa Jadha-95\n['Kerala Electricity', 'KSSP leaflets', 'Kerala Energy']\n['Planning, evelopment', 'Kerala Archives']\n['Decentralization, Planning, Democracy', 'Kerala Archives']\nKala Jatha, Street Theatre\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nErnakulam District Total Literarcy Programme\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nKilikkkoottam Jadha, Street Theatre\nStreet Theatre\nHealth Education, Asthma\nKSSP leaflets;KSSP Health Books\nKala Jatha, Street Theatre\n['Economics, Badjet', 'Kerala Swashraya Samithy']\nBalavedi, Science History\nSasthra Kala Jatha, Street Theatre\nBalalsava Jatha, Street Theatre\nBalolsava Songs\nSasthra Kala Jatha, Street Theatre\nkssp-science books, P R Madhavappanikkar, Malayalam Physics Books\nPainting, Art, History\nScience Education, Biography, C V Raman\n['Dunkal Draft', 'Kerala Archives']\n['New Economic Policy of India, Kerala Swasraya Samithi', 'Kerala Sasthra Sahithya Parishad']\n['kssp-science books', 'P R Madhavappanikkar', 'Malayalam Physics Books']\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nScience Education\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nKerala Development, Express Highway, Jalanidhi\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\n['Ernakulam District Total Literarcy Programme; Mathematics Hand Book', 'Kerala Literacy']\n['Globalization, Gatt Agreement', 'Kerala Swashraya Samithy']\n['Sasthra Kala Jatha', 'Kerala Archives']\nKerala Economy, Rural Development\nAstronomy, Halley's comet\n['Globalization, Dunkal Draft, Indian Economy', 'Kerala Swashraya Samithy']\nSasthra Kala Jatha, Street Theatre\n['ISRO; KSSP leaflets', 'Kerala Archives']\nPeople's Science Movement in Kerala\nPeople's Planning\nPeople's Planning; Participative Democracy, Decentralization\n['Education, Literacy', 'Kerala Literacy']\nBhopal Gas Tragedy\nScience Education, Adulteration\nMalayalam Science Education Books'''.split("\n")

In [172]:
subject_list = [x.replace('"', '') for x in subject_list]

In [173]:
sl = []
for subject_text in subject_list:
    sl.append(normalize_subject(subject_text))

In [167]:
sl = [normalize_subject(x) for x in sl]

In [174]:
sl

['Sasthragathi',
 'Sasthra Keralam Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; Malayalam Science Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazin; Malayalam Science Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Eureka Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Eureka Magazine',
 'KSSP leaflets; KSSP Health Books',
 'KSSP leaflets; KSSP Health Books; Measles',
 'KSSP leaflets; Kerala Health',
 'KSSP leaflets; First aid; KSSP Health Books',
 'KSSP leaflets; KSSP