# Manipulating the items on the archive.org website
The `archive.org` is arguably the largest collection of community contributed collection of `items` such as books, movies, audio, images and even code. The snippets in this notebook can be used for automating your interactions with the `archive.org` servers. 

In order for making the interactions smoother, they've released a python library named `internetarchive` with serveral methods that will allow you in seemlessly interact with their servers.

Keep in mind that you need to configure your work system before you can start interacting with the servers, especially if you are planning to make changes on the items you've submitted on `archive.org` website.

## Configuring your system
The first step is to configure the system for using library. They have a tiny script, `ia`, that will help you configure and secure your account.

TODO: Write the steps to use `ia`.

In [69]:
from internetarchive import upload, get_item, modify_metadata, search_items

## Reading the Metadata of an exisiting item
An item on archive.org represents a single entry on the server's catalogue. Each item has two parts: its actual file(s), as well as its description. The description of an item is known as the metadata. You can access all of an item's metadata via the `item` object. 

In [105]:
# To get the details of an item, you'd need the identifier
item = get_item('vilpattukaltest1980kssp')
item.item_metadata['metadata']

{'identifier': 'vilpattukaltest1980kssp',
 'mediatype': 'texts',
 'collection': ['kssp-archives', 'kerala-archives', 'additional_collections'],
 'creator': 'Kerala Sasthra Sahithya Parishad',
 'date': '1980',
 'description': 'വിൽപ്പാട്ടുകൾ',
 'language': 'mal',
 'licenseurl': 'https://creativecommons.org/licenses/by-sa/4.0/',
 'scanner': 'Internet Archive HTML5 Uploader 1.6.4',
 'title': 'വിൽപ്പാട്ടുകൾ - ശാസ്ത്രകലാജാഥ - കേരള ശാസ്ത്രസാഹിത്യ പരിഷത്ത്',
 'uploader': 'shijualexonline@gmail.com',
 'publicdate': '2020-03-16 16:06:06',
 'addeddate': '2020-03-16 16:06:06',
 'identifier-access': 'http://archive.org/details/vilpattukaltest1980kssp',
 'identifier-ark': 'ark:/13960/t0wq8k30h',
 'imagecount': '20',
 'ppi': '600',
 'ocr': 'language not currently OCRable',
 'year': '1980',
 'subject': 'Kilipattu;  1982'}

## Upload an item
It's fairly easy to establish a connection for creating an item on archive.org and uploading file to it. Remember that we need to upload both the actual files and metadata when we create a new item.

> If the item already has a file with the same filename, the existing file within the item will be overwritten.


In [114]:
# metadata is submitted to the archive.org as a dictionary. Remember that the subject element must be a simi-colon separated string 
md = dict(title='Title of the item', mediatype='movies', subject='test; magazines')


In [115]:
r = upload('ia-test-upload', files=['test.txt'], metadata=md)

In [116]:
r[0].status_code

200

**Success**

This has created the above item on the archive.org site. 
[Title of the item](https://archive.org/details/ia-test-upload)

## Modify the metadata

In [184]:
for book in kssp:
    print(book['identifier'])

1966octsasthraga0000kssp
1969decsasthrake0000kssp
1969novsasthrake0000kssp
1969sathrakerala0000kssp
1970augeureka0000kssp
1970deceureka0000kssp
1970febsasthrake0000kssp
1970jansasthrake0000kssp
1970marsasthrake0000kssp
1970maysasthrake0000kssp
1970noveureka0000kssp
1970sepeureka0000kssp
1970sepeureka0000kssp_v4f2
1971apreureka0000kssp
1971febeureka0000kssp
1971janeureka0000kssp
1971mareureka0000kssp
1971mayeureka0000kssp
1983arogyarekha0000kssp
1983mannan0000rajm
1983nammudearogy0000ekba
1983pradhamasusr0000jaya
1984marunnuvyvas0000ekba
1984urjamchandra0000kssp
1985arationalstu0000jami
1985raionalityst0000shis
1986vaidyuthipra0000kssp
1986vaithyuthipr0000kssp
1986vanamvellam0000kssp
1986vyavasayaval0000kssp
1986vyavasayaval0000unse
1987janakeeyaaro0000kssp
1988vaidyuthikan0000mppa
1988vaidyuthinir0000unse
1989aksharathiln0000kssp
1989arogyasurvey0000kssp
1989deseeyavanit0000kssp
1989parishathums0000kssp
1989parishathums0000kssp_z0l1
1989randayiraman0000kssp
1989rogaprathiro0000kssp
198

## Normalizer
This module normalizes current subject into the correct format expected by the archive.org website. Currently there are several non-standard forms there.

Here are a few examples:
```
'Sasthra Kala Jatha, Street Theatre' # 'str', but comma seperated
['Kerala Swasraya Samithi, Indian Agriculture, Globalization', 'Kerala Swashraya Samithy'] # 'list', comma separated
'KSSP leaflets;KSSP Health Books;Modern Medical Doctors' # 'str', but no space between entries
```

So before updating them with the new entries, we need to make sure that the subjects lines are formulated according to the standard format expeted by `archive.org` website.

In [141]:
subject_text = 'KSSP leaflets;KSSP Health Books;Modern Medical Doctors'
normalize_subject(subject_text)

KSSP leaflets; KSSP Health Books; Modern Medical Doctors


In [179]:
def normalize_subject(subject_text):
    if subject_text:
        if type(subject_text) == list:
            subject_text = ["; ".join(y.strip() for y in re.split(r'[;,]', x)) for x in subject_text]
            return("; ".join(subject_text))
        else:
            subject_text = re.split(r'[;,]', subject_text)
            subject_text = [x.strip() for x in subject_text]
            return("; ".join(subject_text))
    else:
        return ""

### Fetch metadata info of an entire set of books 

In [194]:
topic_name = 'kssp-archives'
kssp = list(search_items(topic_name)) # fetch all the items within of a particular topic

### Kerala Missionary Documents

In [240]:
strbuilder = ''
for item_id in kerala_missionary_documents[2:]:
#     item = get_item(item_id['identifier'])
    item = get_item(item_id)
    cur_sub = False
#     print(item)
    try:
        cur_sub = item.item_metadata['metadata']['subject']
    except Exception as e:
#         print("Error fetching data!: {}\t{}\t{}".format(item, cur_sub, str(e))
        pass
    item_title = item.item_metadata['metadata']['title']
    cur_sub = normalize_subject(cur_sub)
    if cur_sub:
        new_sub = cur_sub + "; Kerala Missionary Documents"
    print("{}\t{}".format(item_id, new_sub))
    r = modify_metadata(item_id, metadata=dict(subject=new_sub))
    r.status_code    
#     print("{}\t{}\t{}\t{}\t{}".format(item_id, item_title, cur_sub, normalize_subject(cur_sub), new_sub))


1815CMSMissionaryRegister	Church Missionary Society; Kerala Missionary Documents
1816CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1817CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1818CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1819CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1820CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1821CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1822CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1823CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1824CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1825CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1826CMSMissionaryRegister	CMS Missionary Register; Kerala Missionary Documents
1827CMSMissionaryRegister	CMS Missionary Register;

1877TheChurchMissionaryIntelligencer	The Church Missionary Intelligencer; Kerala Missionary Documents
1878TheChurchMissionaryGleaner	The Church Missionary Gleaner; Kerala Missionary Documents
1878TheChurchMissionaryIntelligencer	The Church Missionary Intelligencer; Kerala Missionary Documents
1879TheChurchMissionaryGleaner	The Church Missionary Gleaner; Kerala Missionary Documents
1879TheChurchMissionaryIntelligencer	The Church Missionary Intelligencer; Kerala Missionary Documents
1880TheChurchMissionaryGleaner	The Church Missionary Gleaner; Kerala Missionary Documents
1880TheChurchMissionaryIntelligencer	The Church Missionary Intelligencer; Kerala Missionary Documents
1881TheChurchMissionaryGleaner	The Church Missionary Gleaner; Kerala Missionary Documents
1881TheChurchMissionaryIntelligencer	The Church Missionary Intelligencer; Kerala Missionary Documents
1882TheChurchMissionaryGleaner	The Church Missionary Gleaner; Kerala Missionary Documents
1882TheChurchMissionaryIntelligencer	The

### Malankara Edavaka Pathrika

In [249]:
my_collection = []
for i in search_items('collection:(MalankaraEdavakaPathrika)'):
    my_collection.append(i["identifier"])

print("{} items found in this collection".format(len(my_collection))

SyntaxError: unexpected EOF while parsing (<ipython-input-249-6bd793304da8>, line 6)

#### Add Malankara Edvaka Pathrika as a topic

In [251]:
total_items = len(my_collection)
i = 1
for item_id in my_collection:
#     item = get_item(item_id['identifier'])
    item = get_item(item_id)
    cur_sub = False
#     print(item)
    try:
        cur_sub = item.item_metadata['metadata']['subject']
    except Exception as e:
#         print("Error fetching data!: {}\t{}\t{}".format(item, cur_sub, str(e))
        pass
    item_title = item.item_metadata['metadata']['title']
    cur_sub = normalize_subject(cur_sub)
    if cur_sub:
        new_sub = "Malankara Edavaka Pathrika"
    print("{}/{}\t{}\t{}".format(i, total_items, item_id, new_sub))
    i += 1
    r = modify_metadata(item_id, metadata=dict(subject=new_sub))
#     r.status_code    

1/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_10	Malankara Edavaka Pathrika
2/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_11	Malankara Edavaka Pathrika
3/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_12	Malankara Edavaka Pathrika
4/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_2	Malankara Edavaka Pathrika
5/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_3	Malankara Edavaka Pathrika
6/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_4	Malankara Edavaka Pathrika
7/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_5	Malankara Edavaka Pathrika
8/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_6	Malankara Edavaka Pathrika
9/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_7	Malankara Edavaka Pathrika
10/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_8	Malankara Edavaka Pathrika
11/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_9	Malankara Edavaka Pathrika
12/162	1893MalankaraEdavakaPathrikaVolume02Issue01	Malankara Edavaka Pathrika
13/162	1893_Mal

98/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_05	Malankara Edavaka Pathrika
99/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_06	Malankara Edavaka Pathrika
100/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_07	Malankara Edavaka Pathrika
101/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_08	Malankara Edavaka Pathrika
102/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_09	Malankara Edavaka Pathrika
103/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_10	Malankara Edavaka Pathrika
104/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_11	Malankara Edavaka Pathrika
105/162	1904_Malankara_Edavaka_Pathrika_Volume_13_Issue_12	Malankara Edavaka Pathrika
106/162	1905_Malankara_Edavaka_Pathrika_Volume_14_Issue_01	Malankara Edavaka Pathrika
107/162	1905_Malankara_Edavaka_Pathrika_Volume_14_Issue_02	Malankara Edavaka Pathrika
108/162	1905_Malankara_Edavaka_Pathrika_Volume_14_Issue_03	Malankara Edavaka Pathrika
109/162	1905_Malankara_Edavaka_Pathrika_Volume_14_Issue_

#### Update collection info
Remove existing collections and add `kerala-archives` as a new collection

In [255]:
total_items = len(my_collection)
i = 2
for item_id in my_collection[1:]:
#     item = get_item(item_id['identifier'])
    item = get_item(item_id)
    cur_sub = False
#     print(item)
    try:
        cur_sub = item.item_metadata['metadata']['collection']
    except Exception as e:
#         print("Error fetching data!: {}\t{}\t{}".format(item, cur_sub, str(e))
        pass
    item_title = item.item_metadata['metadata']['title']
    cur_sub = normalize_subject(cur_sub)
    if cur_sub:
        new_sub = "kerala-archives"
    print("{}/{}\t{}\t{}".format(i, total_items, item_id, cur_sub))
    i += 1
    r = modify_metadata(item_id, metadata=dict(collection=[new_sub]))

2/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_11	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
3/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_12	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
4/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_2	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
5/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_3	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
6/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_4	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
7/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_5	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
8/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_6	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
9/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_7	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
10/162	1892_Malankara_Edavaka_Pathrika_Volume_1_Issue_8	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
11/162	1892_Malankara_Eda

77/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue05	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
78/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_02	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
79/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_03	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
80/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_04	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
81/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_06	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
82/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_08	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
83/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_09	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
84/162	1902_Malankara_Edavaka_Pathrika_Volume_11_Issue_10	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
85/162	1903_Malankara_Edavaka_Pathrika_Volume_12_Issue_04	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
86

151/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_05	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
152/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_06	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
153/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_07	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
154/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_08	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
155/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_09	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
156/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_10	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
157/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_11	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
158/162	1909_Malankara_Edavaka_Pathrika_Volume_18_Issue_12	MalankaraEdavakaPathrika; MalayalamHeritage; JaiGyan
159/162	1910_Malankara_Edavaka_Pathrika_Volume_19_Issue_01	MalankaraEdavakaPathrika; MalayalamHeritage; 

In [258]:
!git commit -m "add collection manipulator"

[master 5d215c3] add collection manipulator
 1 file changed, 449 insertions(+), 2 deletions(-)


## Upload the updated topic list

In [203]:
f = open("kssp.tab", mode='r', encoding='utf-8')
fc = f.read()

In [211]:
for line in fc.split("\n"):
    line = line.split("\t")
    new_item_id = line[0]
    new_sub = line[-1].split("; ")
    ka = False
    try:
        ka = new_sub.pop(new_sub.index('Kerala Archives'))
        r = modify_metadata(new_item_id, metadata=dict(subject=new_sub))
        r.status_code
        print("; ".join(new_sub))
    except:
        print(new_sub)
    

['Sasthragathi']
['Sasthra Keralam Magazine']
['Sasthra Keralam Magazine', 'KSSP Science Magazine']
['Sasthra Keralam Magazine', 'Malayalam Science Magazine']
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine', "Malayalam Children's Magazine"]
['Sasthra Keralam Magazine', 'KSSP Science Magazine']
['Sasthra Keralam Magazine', 'KSSP Science Magazine']
['Sasthra Keralam Magazine', 'KSSP Science Magazine']
['Sasthra Keralam Magazin', 'Malayalam Science Magazine']
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine']
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine', "Malayalam Children's Magazine"]
['Eureka Magazine']
['KSSP leaflets', 'KSSP Health Books']
['KSSP leaflets', 'KSSP Health Books', 'Measles']
['KSSP leaflets', 'Kerala Health']
['KSSP leaflets', 'First aid', 'K

In [222]:
shiju_list = []
for i in search_items('collection:(digitallibraryindia) AND uploader:(shijualexonline@gmail.com)'):
    shiju_list.append(i["identifier"])


In [223]:
len(shiju_list)

136

In [175]:
len(shiju_list)

1094

In [224]:
kerala_missionary_documents = []
for i in search_items('collection:(digitallibraryindia) AND uploader:(shijualexonline@gmail.com)'):
    kerala_missionary_documents.append(i["identifier"])


In [239]:
for book_item in kerala_missionary_documents:
    print(book_item)
    

1813CMSMissionaryRegister
1814CMSMissionaryRegister
1815CMSMissionaryRegister
1816CMSMissionaryRegister
1817CMSMissionaryRegister
1818CMSMissionaryRegister
1819CMSMissionaryRegister
1820CMSMissionaryRegister
1821CMSMissionaryRegister
1822CMSMissionaryRegister
1823CMSMissionaryRegister
1824CMSMissionaryRegister
1825CMSMissionaryRegister
1826CMSMissionaryRegister
1827CMSMissionaryRegister
1828CMSMissionaryRegister
1829CMSMissionaryRegister
1830CMSMissionaryRegister
1831CMSMissionaryRegister
1832CMSMissionaryRegister
1833CMSMissionaryRegister
1835CMSMissionaryRegister
1836CMSMissionaryRegister
1837CMSMissionaryRegister
1838CMSMissionaryRegister
1839CMSMissionaryRegister
1840CMSMissionaryRegister
1841CMSMissionaryRegister
1842CMSMissionaryRegister
1842ChurchMissionaryGleanerVol2
1843CMSMissionaryRegister
1843ChurchMissionaryGleanerVol3
1844CMSMissionaryRegister
1845CMSMissionaryRegister
1846CMSMissionaryRegister
1846ChurchMissionaryGleanerVol6
1847CMSMissionaryRegister
1848CMSMissionaryReg

In [67]:
"https://archive.org/details/kssp-archives?and[]=subject%3A%22Kerala+Archives%22"

'https://archive.org/details/kssp-archives?and[]=subject%3A%22Kerala+Archives%22'

In [155]:
subject_list = '''Sasthragathi\nSasthra Keralam Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, Malayalam Science Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazine, KSSP Science Magazine\nSasthra Keralam Magazin; Malayalam Science Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books, Measles\nKSSP leaflets; Kerala Health\nKSSP leaflets; First aid, KSSP Health Books\nKSSP leaflets; KSSP Health Books\nKSSP leaflets;KSSP Books about Power\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;KSSP Energy Books;Power Problex in Kerala\nKSSP leaflets;Kerala Energy;KSSP Ecology Books\nKSSP leaflets; KSSP Development Books; Kerala Development\nKSSP leaflets; KSSP Development Books; Kerala Development\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;Kerala Power Problem\nKSSP leaflets;KSSP Health Books;Kerala Health Problem\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets about Gender\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books;Health Survey\nKSSP leaflets; KSSP Books about Gender Bias\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Energy Problem;KSSP Energy Books\nKSSP leaflets;KSSP Gender Books\nKSSP leaflets;KSSP Gender Books;Civil Code and Gender Bias\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;KSSP Health Books;Medicine Price Hike\nKSSP leaflets;KSSP Gender Books;Gender Bias\nKSSP leaflets; Women's Health; KSSP Books about Health; KSSP books about women\nKSSP leaflets;KSSP Gender Books;Gender Bias in Kerala\nKSSP leaflets;KSSP Health Books;Leptospirosis;Plague\nKSSP leaflets;Power Problem of Kerala, KSSP Books about Power\nKSSP leaflets;KSSP Health Books\nKSSP leaflets;Kerala Public Education;KSSP Education Books\nKSSP leaflets;Kerala Public Education;KSSP Education Books; Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education;KSSP Education Books;Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education, Vidyabhyasa Jadha-95\nKSSP leaflets;Kerala Public Education;KSSP Education Books;KSSP and Education Debates\nKSSP leaflets; Kerala Education; KSSP Books about Education; Vidyabhyasa Jadha-95\n['Kerala Electricity', 'KSSP leaflets', 'Kerala Energy']\n['Planning, evelopment', 'Kerala Archives']\n['Decentralization, Planning, Democracy', 'Kerala Archives']\nKala Jatha, Street Theatre\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nErnakulam District Total Literarcy Programme\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nKilikkkoottam Jadha, Street Theatre\nStreet Theatre\nHealth Education, Asthma\nKSSP leaflets;KSSP Health Books\nKala Jatha, Street Theatre\n['Economics, Badjet', 'Kerala Swashraya Samithy']\nBalavedi, Science History\nSasthra Kala Jatha, Street Theatre\nBalalsava Jatha, Street Theatre\nBalolsava Songs\nSasthra Kala Jatha, Street Theatre\nkssp-science books, P R Madhavappanikkar, Malayalam Physics Books\nPainting, Art, History\nScience Education, Biography, C V Raman\n['Dunkal Draft', 'Kerala Archives']\n['New Economic Policy of India, Kerala Swasraya Samithi', 'Kerala Sasthra Sahithya Parishad']\n['kssp-science books', 'P R Madhavappanikkar', 'Malayalam Physics Books']\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\nScience Education\nEureka Magazine, Malayalam Children's Magazine\nEureka Magazine\nKerala Development, Express Highway, Jalanidhi\n['Ernakulam District Total Literarcy Programme', 'Kerala Literacy']\n['Ernakulam District Total Literarcy Programme; Mathematics Hand Book', 'Kerala Literacy']\n['Globalization, Gatt Agreement', 'Kerala Swashraya Samithy']\n['Sasthra Kala Jatha', 'Kerala Archives']\nKerala Economy, Rural Development\nAstronomy, Halley's comet\n['Globalization, Dunkal Draft, Indian Economy', 'Kerala Swashraya Samithy']\nSasthra Kala Jatha, Street Theatre\n['ISRO; KSSP leaflets', 'Kerala Archives']\nPeople's Science Movement in Kerala\nPeople's Planning\nPeople's Planning; Participative Democracy, Decentralization\n['Education, Literacy', 'Kerala Literacy']\nBhopal Gas Tragedy\nScience Education, Adulteration\nMalayalam Science Education Books'''.split("\n")

In [172]:
subject_list = [x.replace('"', '') for x in subject_list]

In [173]:
sl = []
for subject_text in subject_list:
    sl.append(normalize_subject(subject_text))

In [167]:
sl = [normalize_subject(x) for x in sl]

In [174]:
sl

['Sasthragathi',
 'Sasthra Keralam Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; Malayalam Science Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazine; KSSP Science Magazine',
 'Sasthra Keralam Magazin; Malayalam Science Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Eureka Magazine',
 "Eureka Magazine; Malayalam Children's Magazine",
 "Eureka Magazine; Malayalam Children's Magazine",
 'Eureka Magazine',
 'KSSP leaflets; KSSP Health Books',
 'KSSP leaflets; KSSP Health Books; Measles',
 'KSSP leaflets; Kerala Health',
 'KSSP leaflets; First aid; KSSP Health Books',
 'KSSP leaflets; KSSP