## Extract DOI info from PDF

In this notebook, we have used a keyword-search algorithm to perform EDA on our dataset. 
This is where we also extract the DOIs data into a json. The json file will later be merged into the Data
Extraction process as DOI is one of the unique keys for identification in our project.
    
We also used the keyword-search algorithm to split the training and test data sets. 
Dataset : All the research papers were downloaded in an automated way from Elsevier and ACS APIs.    

In [1]:
# Import the required libraries.
import PyPDF2, re
import pandas as pd
import numpy as np

from pathlib import Path

In [2]:
pdf_dir = "../AllData"

In [3]:
dois = []
filenames = []
doi_regex = re.compile("10.(\d)+/([^(\s\>\"\<)])+")

for index, path in enumerate(Path(pdf_dir).iterdir()):
    if path.name.endswith('.pdf'):
        try:
            obj = open(pdf_dir + '/' + path.name, 'rb')
            pdfReader = PyPDF2.PdfFileReader(obj)
            text = pdfReader.getPage(0).extractText()
            t_doi = doi_regex.search(text)
# if the doi is in the document info given by PYPDF2
            doi   = re.findall("doi.*", str(pdfReader.getDocumentInfo().subject))
# if not, we look that through the text in each pdf
            if not doi:
                if t_doi:
                    doi = [t_doi.group(0)]
                # if we still don't find it, add empty, please bear with! 
                else:
                    doi = []
                
            
            filenames.append(path.name)
            dois.append(doi)
            
            print("[{0}] path: {1}, doi: {2}".format(index, path.name, doi))
            
        except Exception as e:
            
            
            print(e)
            pass



[0] path: 1-s2.0-S0196890413007474-main.pdf, doi: ['doi:10.1016/j.enconman.2013.11.026']
[1] path: 1-s2.0-S2211926416307834-main.pdf, doi: ['10.1016/j.algal.2017.03.017']
[2] path: 1-s2.0-S0961953412000451-main.pdf, doi: ['doi:10.1016/j.biombioe.2012.01.035']
[3] path: bbb.1974.pdf, doi: ['10.1002/bbb.1974;']
[4] path: gcbb.12359.pdf, doi: ['10.1111/gcbb.12359']
[5] path: 1-s2.0-S0960148119304562-main.pdf, doi: ['doi:10.1016/j.renene.2019.03.135']
[6] path: 1-s2.0-S0960852413011917-main.pdf, doi: ['doi:10.1016/j.biortech.2013.07.123']
[7] path: 1-s2.0-S0016236116311632-main.pdf, doi: ['doi:10.1016/j.fuel.2016.11.059']
[8] path: 1-s2.0-S0196890417311330-main.pdf, doi: ['doi:10.1016/j.enconman.2017.11.081']
[9] path: ef3020298.pdf, doi: ['10.1021/ef3020298|EnergyFuels']
[10] path: s13068-016-0628-5.pdf, doi: ['doi:10.1186/s13068-016-0628-5']
[11] path: 1-s2.0-S0960852414014230-main.pdf, doi: ['doi:10.1016/j.biortech.2014.10.011']
[12] path: ef402371h.pdf, doi: ['10.1021/ef402371h|EnergyF

[100] path: ijfs.14543.pdf, doi: ['10.1111/ijfs.14543']
[101] path: processes-07-00286.pdf, doi: ['10.3390/pr7050286']
[102] path: 1-s2.0-S2211926414000502-main.pdf, doi: ['10.1016/j.algal.2014.05.007']
[103] path: 1-s2.0-S0165237014000825-main.pdf, doi: ['doi:10.1016/j.jaap.2014.04.006']
[104] path: energies-03-00155.pdf, doi: ['10.3390/en3020155']
[105] path: 1-s2.0-S0378382013000131-main.pdf, doi: ['doi:10.1016/j.fuproc.2013.01.010']
[106] path: 1-s2.0-S0961953416302458-main.pdf, doi: ['doi:10.1016/j.biombioe.2016.07.010']
[107] path: acssuschemeng.9b05308.pdf, doi: ['10.1021/acssuschemeng.9b05308ACSSustainableChem.Eng.']
[108] path: 1-s2.0-S0961953417300120-main.pdf, doi: ['doi:10.1016/j.biombioe.2017.01.012']
[109] path: 1-s2.0-S0960852415011499-main.pdf, doi: ['doi:10.1016/j.biortech.2015.08.043']
[110] path: 1-s2.0-S096085241400025X-main.pdf, doi: ['doi:10.1016/j.biortech.2014.01.010']
[111] path: 1-s2.0-S0360319917300198-main.pdf, doi: ['doi:10.1016/j.ijhydene.2016.12.153']
[11

[196] path: 1-s2.0-S221192641730454X-main (1).pdf, doi: ['doi:10.1016/j.algal.2017.08.007']
[197] path: 1-s2.0-S0960852416305533-main.pdf, doi: ['doi:10.1016/j.biortech.2016.04.066']
[198] path: acs.est.5b01931.pdf, doi: ['10.1021/acs.est.5b01931']
[199] path: 1-s2.0-S0016236104001887-main.pdf, doi: ['10.1016/j.fuel.2004.06.023']
[200] path: Low temperature AOP studies for spent reverse osmosis module components.pdf, doi: []
[201] path: 1-s2.0-S0961953413004133-main.pdf, doi: ['doi:10.1016/j.biombioe.2013.09.005']
[202] path: 1-s2.0-S0896844614002836-main.pdf, doi: ['doi:10.1016/j.supflu.2014.09.008']
[203] path: acs.energyfuels.9b00954.pdf, doi: ['10.1021/acs.energyfuels.9b00954EnergyFuels']
[204] path: 1-s2.0-S0960852411004263-main.pdf, doi: ['doi:10.1016/j.biortech.2011.03.069']
[205] path: apj.2317.pdf, doi: ['10.1002/apj.2317']
[206] path: 1-s2.0-S0016236116312819-main.pdf, doi: ['doi:10.1016/j.fuel.2016.12.060']
[207] path: 1-s2.0-S0961953412003042-main.pdf, doi: ['doi:10.1016/j.

[291] path: sc4004983.pdf, doi: ['10.1021/sc4004983|ACSSustainableChem.Eng.']
[292] path: acs.iecr.6b03846.pdf, doi: ['10.1021/acs.iecr.6b03846Ind.Eng.Chem.Res.']
[293] path: 1-s2.0-S0960852419319480-main.pdf, doi: ['doi:10.1016/j.biortech.2019.122719']
[294] path: Hydrothermal liquefaction of a wastewater native Chlorella sp bacteria consortium biocrude production and characterization.pdf, doi: ['doi:10.1080/17597269.2016.1168027']
[295] path: 1-s2.0-S0960852416304138-main.pdf, doi: ['doi:10.1016/j.biortech.2016.03.110']
[296] path: 1-s2.0-S2211926412000057-main.pdf, doi: ['10.1016/j.algal.2012.02.002Contentslistsavailableat']
[297] path: 1-s2.0-S0378382018308063-main.pdf, doi: ['doi:10.1016/j.fuproc.2018.07.013']
[299] path: 1-s2.0-S016777991300022X-main.pdf, doi: ['10.1016/j.tibtech.2013.01.010']
[300] path: 1-s2.0-S2211926418304806-main.pdf, doi: ['doi:10.1016/j.algal.2018.101399']
[301] path: 1-s2.0-S0961953412003789-main.pdf, doi: ['doi:10.1016/j.biombioe.2012.09.038']
[302] path

[386] path: 1-s2.0-S0016236119319908-main.pdf, doi: ['doi:10.1016/j.fuel.2019.116636']
[387] path: 1-s2.0-S2452223616300153-main.pdf, doi: ['doi:10.1016/j.cogsc.2016.08.005']
[388] path: 1-s2.0-S0306261914012677-main.pdf, doi: ['doi:10.1016/j.apenergy.2014.12.018']
[389] path: processes-08-00015-v3 (1).pdf, doi: []
[390] path: 1-s2.0-S0165237019305753-main.pdf, doi: ['doi:10.1016/j.jaap.2019.104758']
[391] path: 1-s2.0-S0960852413015794-main.pdf, doi: ['doi:10.1016/j.biortech.2013.09.137']
[392] path: 1-s2.0-S0960852414015211-main.pdf, doi: ['doi:10.1016/j.biortech.2014.10.089']
[393] path: 1-s2.0-S0009250914000463-main.pdf, doi: ['doi:10.1016/j.ces.2014.01.036']
[394] path: 1-s2.0-S0165237017300244-main.pdf, doi: ['doi:10.1016/j.jaap.2017.06.004']
[395] path: C7RA08311D.pdf, doi: ['doi:10.1039/C7RA08311D']
[396] path: cssc.201702362.pdf, doi: ['10.1002/cssc.201702362']
[397] path: acs.iecr.6b04086.pdf, doi: ['10.1021/acs.iecr.6b04086Ind.Eng.Chem.Res.']
[398] path: acscatal.8b04143.pdf

[482] path: sc500686j.pdf, doi: ['10.1021/sc500686j|ACSSustainableChem.Eng.']
[483] path: Katsimpouras2016_Article_AceticAcid-catalyzedHydrotherm.pdf, doi: ['doi:10.1007/s00449-016-1618-5']
[484] path: ep.12713.pdf, doi: ['10.1002/ep.12713']
[485] path: 1-s2.0-S1364032116311637-main.pdf, doi: ['doi:10.1016/j.rser.2016.12.110']
[486] path: ef101232t.pdf, doi: []
[487] path: Almeida2017_Article_CharacterizationAndPyrolysisOf.pdf, doi: ['doi:10.1007/s11356-017-9009-2']
[488] path: 1-s2.0-S2213343717304517-main.pdf, doi: ['doi:10.1016/j.jece.2017.09.013']
[489] path: 1-s2.0-S1364032115008473-main.pdf, doi: ['doi:10.1016/j.rser.2015.08.005']
[490] path: 1-s2.0-S2211926416301898-main.pdf, doi: ['10.1016/j.algal.2016.05.033']
[491] path: 1-s2.0-S014139101630026X-main.pdf, doi: ['doi:10.1016/j.polymdegradstab.2016.02.003']
[492] path: Brady2014_Article_CorrosionConsiderationsForTher.pdf, doi: ['10.1007/s11837-014-1201-y']
[493] path: s13068-017-0830-0.pdf, doi: ['doi:10.1186/s13068-017-0830-0'

[584] path: C6RA21663C.pdf, doi: ['doi:10.1039/C6RA21663C']
[585] path: 1-s2.0-S096085241731578X-main.pdf, doi: ['doi:10.1016/j.biortech.2017.09.030']
[586] path: 1-s2.0-S004896971830593X-main.pdf, doi: ['10.1016/j.scitotenv.2018.02.194']
[587] path: 1-s2.0-S0360544218319558-main.pdf, doi: ['doi:10.1016/j.energy.2018.09.182']
[588] path: 1-s2.0-S1004954117304305-main.pdf, doi: ['10.1016/j.cjche.2017.08.010']
[589] path: jctb.5003.pdf, doi: ['10.1002/jctb.5003']
[590] path: 1-s2.0-S0165237013000788-main.pdf, doi: ['doi:10.1016/j.jaap.2013.04.002']
[591] path: 1-s2.0-S0960852419311149-main.pdf, doi: ['doi:10.1016/j.biortech.2019.121884']
[592] path: 1-s2.0-S1385894705000227-main.pdf, doi: ['10.1016/j.cej.2005.01.007']
[593] path: 1-s2.0-S0960852415004526-main.pdf, doi: ['doi:10.1016/j.biortech.2015.03.120']
[594] path: 1-s2.0-S0301479717303808-main.pdf, doi: ['doi:10.1016/j.jenvman.2017.04.032']
[595] path: 1-s2.0-S0960852419304353-main.pdf, doi: ['doi:10.1016/j.biortech.2019.03.076']
[5

[682] path: 1-s2.0-S0196890417303448-main.pdf, doi: ['doi:10.1016/j.enconman.2017.04.034']
[683] path: 1-s2.0-S0960852417316413-main.pdf, doi: ['doi:10.1016/j.biortech.2017.09.076']
[684] path: 1-s2.0-S2211926419300712-main.pdf, doi: ['doi:10.1016/j.algal.2019.101658']
[685] path: c9ra07150d.pdf, doi: ['doi:10.1039/C9RA07150D']
[686] path: 1-s2.0-S0960852419305103-main.pdf, doi: ['doi:10.1016/j.biortech.2019.03.136']
[687] path: 1-s2.0-S2095809917306860-main.pdf, doi: ['doi:10.1016/j.eng.2018.05.012']
[688] path: 1-s2.0-S0378382015001563-main.pdf, doi: ['10.1016/j.fuproc.2015.04.009']
[690] path: 1-s2.0-S2213343718306250-main.pdf, doi: ['doi:10.1016/j.jece.2018.10.017']
[691] path: Zhang2019_Article_EmergingTechniquesForCellDisru.pdf, doi: ['doi:10.1007/s00449-018-2038-5']
[692] path: Angell2017_Article_AComparisonOfProtocolsForIsola.pdf, doi: ['doi:10.1007/s10811-016-0972-7']
[693] path: 1-s2.0-S0378382016301096-main.pdf, doi: ['10.1016/j.fuproc.2016.03.006']
[694] path: 1-s2.0-S01968

[780] path: acssuschemeng.7b02226.pdf, doi: ['10.1021/acssuschemeng.7b02226ACSSustainableChem.Eng.']
[781] path: processes-05-00048.pdf, doi: ['10.3390/pr5030048']
[782] path: ef301925d.pdf, doi: ['10.1021/ef301925d|EnergyFuels']
[783] path: acs.energyfuels.7b02322.pdf, doi: ['10.1021/acs.energyfuels.7b02322EnergyFuels']
[784] path: ie9008293.pdf, doi: ['10.1021/ie90082932010AmericanChemicalSociety']
[785] path: 1-s2.0-S0960852419300963-main.pdf, doi: ['doi:10.1016/j.biortech.2019.01.076']
[786] path: 1-s2.0-S0960852416317692-main.pdf, doi: ['doi:10.1016/j.biortech.2016.12.091']
[787] path: 1-s2.0-S0926337320301053-main.pdf, doi: ['doi:10.1016/j.apcatb.2020.118690']
[788] path: 1-s2.0-S0378382016300595-main.pdf, doi: ['10.1016/j.fuproc.2016.02.011']
[789] path: acs.iecr.6b03414.pdf, doi: ['10.1021/acs.iecr.6b03414Ind.Eng.Chem.Res.']
[790] path: 1-s2.0-S0960852412015222-main.pdf, doi: ['doi:10.1016/j.biortech.2012.10.020']
[791] path: 1-s2.0-S0926337311003766-main.pdf, doi: ['10.1016/j.

[880] path: jctb.3933.pdf, doi: ['10.1002/jctb.3933']
[881] path: 1-s2.0-S0921344914000974-main.pdf, doi: ['doi:10.1016/j.resconrec.2014.04.011']
[882] path: 3-s2.0-B9780444641922000159-main.pdf, doi: ['doi:10.1016/B978-0-444-64192-2.00015-9']
[883] path: 1-s2.0-S0360544218308648-main.pdf, doi: ['doi:10.1016/j.energy.2018.05.044']
[884] path: 2018TC005267.pdf, doi: ['10.1029/2018TC005267KeyPoints:']
[885] path: 1-s2.0-S0959652619316634-main.pdf, doi: ['doi:10.1016/j.jclepro.2019.05.137']
[886] path: c3ra41582a.pdf, doi: ['10.1039/c3ra41582a']
[887] path: acs.energyfuels.7b02080.pdf, doi: ['10.1021/acs.energyfuels.7b02080EnergyFuels']
[888] path: 1-s2.0-S0196890418309257-main.pdf, doi: ['doi:10.1016/j.enconman.2018.08.058']
[889] path: 1-s2.0-S0960852413013631-main.pdf, doi: ['doi:10.1016/j.biortech.2013.08.112']
[890] path: 1-s2.0-S0016236118321835-main.pdf, doi: ['doi:10.1016/j.fuel.2018.12.115']
[891] path: 1-s2.0-S096085241401390X-main.pdf, doi: ['doi:10.1016/j.biortech.2014.09.131'

[983] path: 1-s2.0-S0196890419301797-main.pdf, doi: ['doi:10.1016/j.enconman.2019.01.111']
[984] path: 1-s2.0-S0378382018323282-main.pdf, doi: ['doi:10.1016/j.fuproc.2019.03.031']
[985] path: 1-s2.0-S0956053X18303301-main.pdf, doi: ['doi:10.1016/j.wasman.2018.05.033']
[986] path: 1-s2.0-S0378382015302022-main.pdf, doi: ['10.1016/j.fuproc.2015.10.015']
[987] path: 1-s2.0-S0360128515000246-main.pdf, doi: ['doi:10.1016/j.pecs.2015.01.003']
[988] path: Akia2017_Chapter_AnOverviewOfTheRecentAdvancesI.pdf, doi: ['10.1007/978-3-319-45459-7_12']
[989] path: materials-12-01030.pdf, doi: ['10.3390/ma12071030']
[990] path: 1-s2.0-S0960852410010989-main.pdf, doi: ['doi:10.1016/j.biortech.2010.06.097']
[991] path: Sandquist2019_Article_HydrothermalLiquefactionOfOrga.pdf, doi: ['doi:10.1007/s00253-018-9507-2']
[992] path: C8RA04668A.pdf, doi: ['doi:10.1039/C8RA04668A']
[993] path: 1-s2.0-S0960852419311691-main.pdf, doi: ['doi:10.1016/j.biortech.2019.121939']
[994] path: 1-s2.0-S0960852418317528-main

[1078] path: acs.energyfuels.7b03144.pdf, doi: ['10.1021/acs.energyfuels.7b03144EnergyFuels']
[1079] path: 1-s2.0-S0960852413016982-main.pdf, doi: ['doi:10.1016/j.biortech.2013.10.111']
[1080] path: acssuschemeng.9b06480.pdf, doi: ['10.1021/acssuschemeng.9b06480ACSSustainableChem.Eng.']
[1081] path: Onwudili2014_Chapter_HydrothermalGasificationOfBiom.pdf, doi: ['10.1007/978-3-642-54458-3_10,']
[1082] path: 1-s2.0-S0360544218306649-main.pdf, doi: ['doi:10.1016/j.energy.2018.04.057']
[1083] path: 1-s2.0-S1743967117301071-main.pdf, doi: ['doi:10.1016/j.joei.2017.05.009']
[1084] path: ef2004046.pdf, doi: []
[1085] path: Investigation of cornstalk cellulose liquefaction in supercritical acetone by FT TR and GC MS methods.pdf, doi: []
[1086] path: 1-s2.0-S1364032117314144-main.pdf, doi: ['doi:10.1016/j.rser.2017.10.033']
[1087] path: 1-s2.0-S0960852417313536-main.pdf, doi: ['doi:10.1016/j.biortech.2017.08.048']
[1088] path: 1-s2.0-S0960852415015448-main.pdf, doi: ['doi:10.1016/j.biortech.201

[1170] path: acs.est.7b01049.pdf, doi: ['10.1021/acs.est.7b01049Environ.Sci.Technol.']
[1171] path: er.3473.pdf, doi: ['10.1002/er.3473']
[1172] path: 1-s2.0-S0360544219322388-main.pdf, doi: ['doi:10.1016/j.energy.2019.116543']
[1173] path: acssuschemeng.9b02191.pdf, doi: ['10.1021/acssuschemeng.9b02191ACSSustainableChem.Eng.']
[1174] path: 1-s2.0-S2211926416301989-main.pdf, doi: ['10.1016/j.algal.2016.06.009']
[1175] path: C6GC03294J.pdf, doi: ['10.1039/c6gc03294j']
[1176] path: 3-s2.0-B9780128123607000197-main.pdf, doi: ['10.1016/B978-0-12-812360-7.00019-7']
[1177] path: 1-s2.0-S0956053X17304403-main.pdf, doi: ['doi:10.1016/j.wasman.2017.06.002']
[1178] path: 1-s2.0-S0960852419304997-main.pdf, doi: ['doi:10.1016/j.biortech.2019.03.125']
[1179] path: 1-s2.0-S0960852409017234-main.pdf, doi: ['10.1016/j.biortech.2009.12.058*Correspondingauthor.Address:MEB,2500UniversityDrive,NW,Calgary,']
[1180] path: Tian2017_Chapter_HydrothermalLiquefactionHTLAPr.pdf, doi: ['10.1007/978-3-319-51010-1_

[1270] path: 1-s2.0-S0165237017311233-main.pdf, doi: ['doi:10.1016/j.jaap.2018.03.013']
[1271] path: 1-s2.0-S096085241300326X-main.pdf, doi: ['doi:10.1016/j.biortech.2013.02.091']
[1272] path: 1-s2.0-S1364032115003196-main.pdf, doi: ['doi:10.1016/j.rser.2015.04.049']
[1273] path: 1-s2.0-S221501611930322X-main.pdf, doi: ['doi:10.1016/j.mex.2019.11.019']
[1274] path: 1-s2.0-S0960852419300410-main.pdf, doi: ['doi:10.1016/j.biortech.2019.01.030']
[1275] path: s12866-017-1144-x.pdf, doi: ['doi:10.1186/s12866-017-1144-x']
[1276] path: 1-s2.0-S0960852417308544-main.pdf, doi: ['doi:10.1016/j.biortech.2017.05.186']
[1277] path: 1.5043227.pdf, doi: []
[1278] path: Grigorenko2019_Article_HydrothermalLiquefactionOfArth.pdf, doi: []
[1279] path: 1-s2.0-S0045653518320605-main.pdf, doi: ['doi:10.1016/j.chemosphere.2018.10.189']
[1280] path: 1-s2.0-S0378382017319185-main.pdf, doi: ['doi:10.1016/j.fuproc.2018.02.028']
[1281] path: 1-s2.0-S096195341400141X-main.pdf, doi: ['doi:10.1016/j.biombioe.2014.03

[1370] path: 1-s2.0-S0016236119323282-main.pdf, doi: ['doi:10.1016/j.fuel.2019.116935']
[1371] path: acs.energyfuels.9b02524.pdf, doi: ['10.1021/acs.energyfuels.9b02524EnergyFuels']
[1372] path: 1-s2.0-S2468823119305024-main.pdf, doi: ['doi:10.1016/j.mcat.2019.110648']
[1373] path: 1-s2.0-S0378382011003262-main.pdf, doi: ['10.1016/j.fuproc.2011.09.010Contentslistsavailableat']
[1374] path: c2ra21594b.pdf, doi: ['10.1039/c2ra21594b']
[1375] path: 1-s2.0-S0037073805002733-main.pdf, doi: ['10.1016/j.sedgeo.2005.08.004']
[1376] path: 1-s2.0-S2211926417306732-main.pdf, doi: ['doi:10.1016/j.algal.2017.11.007']
[1377] path: 1-s2.0-S0378382016312553-main.pdf, doi: ['10.1016/j.fuproc.2017.05.006']
[1378] path: 1-s2.0-S0896844617309117-main.pdf, doi: ['doi:10.1016/j.supflu.2018.04.011']
[1379] path: Williams2016_Article_SourcesOfBiomassFeedstockVaria.pdf, doi: ['10.1007/s12155-015-9694-y']
[1380] path: 1-s2.0-S0960852417309896-main.pdf, doi: ['doi:10.1016/j.biortech.2017.06.085']
[1381] path: 1-

[1470] path: 1-s2.0-S0896844615301753-main.pdf, doi: ['doi:10.1016/j.supflu.2015.11.002']
[1471] path: ente.201500367.pdf, doi: []
[1472] path: s13068-019-1465-0.pdf, doi: ['doi.org/10.1186/s13068-019-1465-0']
[1473] path: acs.energyfuels.8b00068.pdf, doi: ['10.1021/acs.energyfuels.8b00068EnergyFuels']
[1474] path: 1-s2.0-S0926337316306713-main.pdf, doi: ['doi:10.1016/j.apcatb.2016.08.063']
[1475] path: acssuschemeng.7b02854.pdf, doi: ['10.1021/acssuschemeng.7b02854ACSSustainableChem.Eng.']
[1476] path: ef402075e.pdf, doi: ['10.1021/ef402075e|EnergyFuels']
[1477] path: acs.energyfuels.7b00160.pdf, doi: ['10.1021/acs.energyfuels.7b00160EnergyFuels']
[1478] path: Bio processing of algal bio refinery a review on current advances and future perspectives.pdf, doi: ['doi:10.1080/21655979.2019.1679697']
[1479] path: 1-s2.0-S0960852417318771-main.pdf, doi: ['doi:10.1016/j.biortech.2017.10.048']
[1480] path: acssuschemeng.5b00153.pdf, doi: ['10.1021/acssuschemeng.5b00153']
[1481] path: 1-s2.0-S

[1565] path: 1-s2.0-S0961953420300118-main.pdf, doi: ['doi:10.1016/j.biombioe.2020.105477']
[1566] path: 1-s2.0-S0961953417300399-main.pdf, doi: ['doi:10.1016/j.biombioe.2017.01.022']
[1567] path: 1-s2.0-S136403211500876X-main.pdf, doi: ['doi:10.1016/j.rser.2015.08.030']
[1568] path: DeFariasSilva-Bertucco2017_Article_DiluteAcidHydrolysisOfMicroalg.pdf, doi: ['doi:10.1007/s11144-017-1271-2']
[1569] path: 1-s2.0-S0960148119317331-main.pdf, doi: ['doi:10.1016/j.renene.2019.11.046']
[1570] path: 1-s2.0-S001623611731520X-main.pdf, doi: ['doi:10.1016/j.fuel.2017.11.108']
[1571] path: 1-s2.0-S0165237013000521-main.pdf, doi: ['doi:10.1016/j.jaap.2013.03.001']
[1572] path: 1-s2.0-S0896844616301450-main.pdf, doi: ['doi:10.1016/j.supflu.2016.05.044']
[1573] path: 1-s2.0-S0960852413008663-main.pdf, doi: ['doi:10.1016/j.biortech.2013.05.098']
[1574] path: acs.est.7b02137.pdf, doi: ['10.1021/acs.est.7b02137Environ.Sci.Technol.']
[1575] path: 1-s2.0-S0048969719332164-main.pdf, doi: ['10.1016/j.scito

[1656] path: acssuschemeng.7b04359.pdf, doi: ['10.1021/acssuschemeng.7b04359ACSSustainableChem.Eng.']
[1657] path: Minarick2011_Article_ProductAndEconomicAnalysisOfDi.pdf, doi: ['10.1007/s12155-011-9157-z']
[1658] path: 1-s2.0-S0956053X19306993-main.pdf, doi: ['doi:10.1016/j.wasman.2019.11.003']
[1659] path: acssuschemeng.6b02367.pdf, doi: ['10.1021/acssuschemeng.6b02367ACSSustainableChem.Eng.']
[1660] path: acs.iecr.9b02442.pdf, doi: ['10.1021/acs.iecr.9b02442Ind.Eng.Chem.Res.']
[1661] path: 1-s2.0-S0960308518300105-main.pdf, doi: ['doi:10.1016/j.fbp.2018.02.002']
[1662] path: 1-s2.0-S2211926418302078-main.pdf, doi: ['doi:10.1016/j.algal.2018.08.011']
[1663] path: Fields2014_Article_SourcesAndResourcesImportanceO.pdf, doi: ['10.1007/s00253-014-5694-7']
[1664] path: acs.energyfuels.8b00669.pdf, doi: ['10.1021/acs.energyfuels.8b00669EnergyFuels']
[1665] path: c3ra46607h.pdf, doi: ['10.1039/c3ra46607h']
[1666] path: 184.pdf, doi: ['10.3301/IJG.2018.35']
[1667] path: 1-s2.0-S1876107018301

[1751] path: 1-s2.0-S0926669019304285-main.pdf, doi: ['doi:10.1016/j.indcrop.2019.05.075']
[1752] path: 1-s2.0-S0168165613003167-main.pdf, doi: ['doi:10.1016/j.jbiotec.2013.07.020']
[1753] path: 1-s2.0-S1364032117308353-main.pdf, doi: ['doi:10.1016/j.rser.2017.05.197']
[1754] path: C6GC02746F.pdf, doi: ['10.1039/c6gc02746f']
[1755] path: Raheem2017_Chapter_PotentialApplicationsOfNanotec.pdf, doi: ['10.1007/978-3-319-45459-7_5']
[1756] path: app.35655.pdf, doi: ['10.1002/app.35655']
[1757] path: 1-s2.0-S0960148119313230-main.pdf, doi: ['doi:10.1016/j.renene.2019.08.136']
[1758] path: 1-s2.0-S0960308519306224-main.pdf, doi: ['doi:10.1016/j.fbp.2019.12.010']
[1759] path: 1-s2.0-S0896844617305715-main.pdf, doi: ['doi:10.1016/j.supflu.2017.10.020']
Stream has ended unexpectedly
[1761] path: acs.energyfuels.6b03022.pdf, doi: ['10.1021/acs.energyfuels.6b03022EnergyFuels']
[1762] path: 1-s2.0-S2211926416300650-main.pdf, doi: ['10.1016/j.algal.2016.02.026']
[1763] path: 1-s2.0-S0960852414009328

[1850] path: 1-s2.0-S016523700800123X-main.pdf, doi: ['10.1016/j.jaap.2008.09.005']
[1851] path: 1-s2.0-S0960852416314651-main.pdf, doi: ['doi:10.1016/j.biortech.2016.10.059']
[1852] path: 1-s2.0-S2211926416308062-main.pdf, doi: ['doi:10.1016/j.algal.2017.05.010']
[1853] path: C5RA10503J.pdf, doi: ['doi:10.1039/C5RA10503J']
[1854] path: 1-s2.0-S0360319908007398-main.pdf, doi: ['10.1016/j.ijhydene.2008.06.024']
[1855] path: 1-s2.0-S0016236115011084-main.pdf, doi: ['doi:10.1016/j.fuel.2015.10.094']
[1856] path: c8ra08971j.pdf, doi: ['doi:10.1039/C8RA08971J']
[1857] path: 1-s2.0-S0960852416312196-main.pdf, doi: ['doi:10.1016/j.biortech.2016.08.091']


In [4]:
# Using pandas dataframe for DOIs JSON.
df = pd.DataFrame({'Filename' : filenames, 'Doi' : dois})
df.set_index('Filename', inplace =True)
df

Unnamed: 0_level_0,Doi
Filename,Unnamed: 1_level_1
1-s2.0-S0196890413007474-main.pdf,[doi:10.1016/j.enconman.2013.11.026]
1-s2.0-S2211926416307834-main.pdf,[10.1016/j.algal.2017.03.017]
1-s2.0-S0961953412000451-main.pdf,[doi:10.1016/j.biombioe.2012.01.035]
bbb.1974.pdf,[10.1002/bbb.1974;]
gcbb.12359.pdf,[10.1111/gcbb.12359]
...,...
C5RA10503J.pdf,[doi:10.1039/C5RA10503J]
1-s2.0-S0360319908007398-main.pdf,[10.1016/j.ijhydene.2008.06.024]
1-s2.0-S0016236115011084-main.pdf,[doi:10.1016/j.fuel.2015.10.094]
c8ra08971j.pdf,[doi:10.1039/C8RA08971J]


In [5]:
# Saving the JSON for DOIs data.
df.to_json("../data/doi_data.json")