# Make dataset

This notebook contains the code to analyse content of the PubMedCentral Author Manuscript Collection. \
See: https://www.ncbi.nlm.nih.gov/pmc/about/mscollection/

Files can be downloaded here: https://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/ \
**Please ensure** that files are downloaded into `pmc_dataset` folder to proceed.

Resulting tables will be created under `dataset_with_refs` folder.

In [1]:
dict_articles = {}

# These files should be downloaded from https://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/
with open("pmc_dataset/filelist.txt", 'r') as f:
    for line in f:
        filename, pmcid, pmid, mid = line.split()
        if filename == 'File':
            continue
        dict_articles[pmid] = filename

In [2]:
print(list(dict_articles.items())[:10])

[('15950352', 'PMC0012XXXXX/PMC1249490.xml'), ('15898963', 'PMC0012XXXXX/PMC1249491.xml'), ('14566027', 'PMC0012XXXXX/PMC1249508.xml'), ('15200711', 'PMC0012XXXXX/PMC1266050.xml'), ('15871598', 'PMC0012XXXXX/PMC1266051.xml'), ('15855276', 'PMC0012XXXXX/PMC1274277.xml'), ('16013438', 'PMC0012XXXXX/PMC1282457.xml'), ('15356073', 'PMC0012XXXXX/PMC1283128.xml'), ('15284285', 'PMC0012XXXXX/PMC1283142.xml'), ('15460925', 'PMC0013XXXXX/PMC1314973.xml')]


In [4]:
from lxml import etree

tree = etree.parse("pmc_dataset/" + list(dict_articles.values())[0])

In [5]:
dir(tree.getroot())

['__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_init',
 'addnext',
 'addprevious',
 'append',
 'attrib',
 'base',
 'clear',
 'cssselect',
 'extend',
 'find',
 'findall',
 'findtext',
 'get',
 'getchildren',
 'getiterator',
 'getnext',
 'getparent',
 'getprevious',
 'getroottree',
 'index',
 'insert',
 'items',
 'iter',
 'iterancestors',
 'iterchildren',
 'iterdescendants',
 'iterfind',
 'itersiblings',
 'itertext',
 'keys',
 'makeelement',
 'nsmap',
 'prefix',
 'remove',
 'replace',
 'set',
 'sourceline',
 'tag',
 'tail',
 'text',
 'values',
 'xpath']

In [6]:
import nltk

def split_text(text):
    sents = nltk.tokenize.sent_tokenize(text)
    res_sents = []
    i = 0
    while i < len(sents):
        check = False
        if i + 1 < len(sents):
            check = sents[i + 1].strip()[0].islower() or sents[i + 1].strip()[0].isdigit()
        made = sents[i]
        while i + 1 < len(sents) and (made.endswith('Fig.') or check):
            made += " " + " ".join(sents[i + 1].strip().split())
            i += 1
            if i + 1 < len(sents):
                check = sents[i + 1].strip()[0].islower() or sents[i + 1].strip()[0].isdigit()
        res_sents.append(" ".join(made.strip().split()))
        i += 1
    return res_sents

def get_sentences(node):
    def helper(node, is_disc):
        if node.tag == 'xref':
            ntail = ''
            if node.tail is not None:
                ntail = node.tail
            res = f' xref_{node.get("ref-type")}_{node.get("rid")} ' + ntail
            if res is None:
                return '', ''
            if is_disc:
                return '', res
            return res, ''
        if node.tag == 'title':
            if node.tail is None:
                return '', ''
            if is_disc:
                return '', node.tail
            return node.tail, ''
        if not is_disc and node.find('title') is not None:
            title = "".join(node.find('title').itertext()).lower()
            if 'discussion' in title:
                is_disc = True
        st_text = ''
        if node.text is not None:
            st_text = node.text
        if is_disc:
            n_disc = st_text
            n_gen = ""
        else:
            n_gen = st_text
            n_disc = ""
        for ch in node.getchildren():
            gen, disc = helper(ch, is_disc)
            n_gen += gen
            n_disc += disc
        tail = ""
        if node.tail is not None:
            tail = node.tail
        if is_disc:
            n_disc += tail
        else:
            n_gen += tail
        return n_gen, n_disc
    gen_res, disc_res = helper(node.find('body'), False)
    gen_res = split_text(gen_res)
    disc_res = split_text(disc_res)
    
    abstract = ""
    
    try:
        abstract = "".join(node.find('front').find('article-meta').find('abstract').itertext())
        abstract = " ".join(abstract.strip().split())
    except Exception:
        pass
    return gen_res, disc_res, abstract

In [7]:
tree = etree.parse("pmc_dataset/PMC0020XXXXX/PMC2000292.xml")

In [8]:
sents = get_sentences(tree.getroot())

In [9]:
sents

(['Autism and attention deficit hyperactivity disorder (ADHD) are two common, largely genetic, childhood-onset psychiatric disorders affecting key fronto-striatal and fronto-parietal circuits that are important for executive function ( xref_bibr_R28 ; xref_bibr_R35 ; xref_bibr_R69 ; xref_bibr_R85 ).',
  'These two disorders differ substantially in symptom presentation, but they also share a number of important features ( xref_bibr_R95 ).',
  'Despite the exclusion of one disorder in the formal diagnosis of the other, there appears to be a degree of comorbidity (or a sharing of symptoms) between the two disorders ( xref_bibr_R44 ; xref_bibr_R46 ; xref_bibr_R93 ).',
  'Both disorders have a strong genetic component to their aetiology, with heritability estimates of 0.9 for autism and 0.7 for ADHD ( xref_bibr_R4 ; xref_bibr_R33 ); indeed there is preliminary evidence of genetic linkage in autism and ADHD at chromosomal locations 2q24 and 16p13 ( xref_bibr_R36 ; xref_bibr_R47 ).',
  'Execu

In [10]:
print(len(sents[0]), len(sents[1]), len(sents[2]))

182 73 1763


In [11]:
def get_all_refs(node):
    
    def get_cit_id_type(node):
        if node.find('element-citation') is None:
            return None
        if node.find('element-citation').find('pub-id') is None:
            return None
        return node.find('element-citation').find('pub-id').get('pub-id-type')
        
    
    def get_citation_info(node):
        if node is None:
            return {}
        res = {}
        for ch in node.getchildren():
            if ch.tag == 'ref':
                id_type = get_cit_id_type(ch)
                if id_type is not None and id_type == 'pmid':
                    res[ch.get('id')] = {
                        'publication-type': ch.find('element-citation').get('publication-type'),
                        'pmid': ch.find('element-citation').find('pub-id').text
                    }
        return res
    def get_figs_info(node):
        if node is None:
            return {}
        res = {}
        for ch in node.getchildren():
            if ch.tag == 'fig' and ch.find('caption') is not None:
                res[ch.get('id')] = " ".join(''.join(ch.find('caption').itertext()).strip().split())
        return res
    def get_tables_info(node):
        if node is None:
            return {}
        res = {}
        for ch in node.getchildren():
            if ch.tag == 'table-wrap' and ch.find('caption') is not None:
                res[ch.get('id')] = " ".join(''.join(ch.find('caption').itertext()).strip().split())
        return res
        
    citations = get_citation_info(node.find('back').find('ref-list'))
    figs = get_figs_info(node.find('floats-group'))
    tables = get_tables_info(node.find('floats-group'))
    return citations, figs, tables

In [12]:
get_all_refs(tree.getroot())

({'R1': {'publication-type': 'journal', 'pmid': '10501551'},
  'R3': {'publication-type': 'journal', 'pmid': '7480451'},
  'R5': {'publication-type': 'journal', 'pmid': '17131588'},
  'R6': {'publication-type': 'journal', 'pmid': '16168728'},
  'R7': {'publication-type': 'journal', 'pmid': '15381021'},
  'R8': {'publication-type': 'journal', 'pmid': '15037868'},
  'R9': {'publication-type': 'journal', 'pmid': '10560028'},
  'R10': {'publication-type': 'journal', 'pmid': '15660647'},
  'R12': {'publication-type': 'journal', 'pmid': '1592761'},
  'R13': {'publication-type': 'journal', 'pmid': '10376114'},
  'R14': {'publication-type': 'journal', 'pmid': '15949999'},
  'R15': {'publication-type': 'journal', 'pmid': '10734014'},
  'R16': {'publication-type': 'journal', 'pmid': '15652870'},
  'R17': {'publication-type': 'journal', 'pmid': '7977887'},
  'R18': {'publication-type': 'journal', 'pmid': '8660127'},
  'R19': {'publication-type': 'journal', 'pmid': '12365958'},
  'R20': {'publicat

In [13]:
import re

pattern = re.compile("(?<=xref_bibr_)[\d\w]+")

def count_reverse(sents_gen, sents_disc, pmid):
    result = []
    for i, sent in enumerate(sents_gen):
        results = re.findall(pattern, sent)
        result.extend(list(map(lambda x: (pmid, 'general', str(i), x), results)))
    for i, sent in enumerate(sents_disc):
        results = re.findall(pattern, sent)
        result.extend(list(map(lambda x: (pmid, 'discussion', str(i), x), results)))
    return result

In [14]:
gen_sents, disc_sents, abst = get_sentences(tree.getroot())
count_reverse(gen_sents, disc_sents, '2000292')

[('2000292', 'general', '0', 'R28'),
 ('2000292', 'general', '0', 'R35'),
 ('2000292', 'general', '0', 'R69'),
 ('2000292', 'general', '0', 'R85'),
 ('2000292', 'general', '1', 'R95'),
 ('2000292', 'general', '2', 'R44'),
 ('2000292', 'general', '2', 'R46'),
 ('2000292', 'general', '2', 'R93'),
 ('2000292', 'general', '3', 'R4'),
 ('2000292', 'general', '3', 'R33'),
 ('2000292', 'general', '3', 'R36'),
 ('2000292', 'general', '3', 'R47'),
 ('2000292', 'general', '4', 'R40'),
 ('2000292', 'general', '4', 'R104'),
 ('2000292', 'general', '5', 'R19'),
 ('2000292', 'general', '5', 'R53'),
 ('2000292', 'general', '6', 'R7'),
 ('2000292', 'general', '6', 'R60'),
 ('2000292', 'general', '6', 'R97'),
 ('2000292', 'general', '10', 'R82'),
 ('2000292', 'general', '12', 'R25'),
 ('2000292', 'general', '12', 'R26'),
 ('2000292', 'general', '12', 'R34'),
 ('2000292', 'general', '12', 'R72'),
 ('2000292', 'general', '12', 'R78'),
 ('2000292', 'general', '13', 'R11'),
 ('2000292', 'general', '13', 'R

In [15]:
root_dir = "dataset_with_refs"

In [16]:
import traceback

num_id = 0

for id, filename in list(dict_articles.items()):
    print(f'\r{num_id} {filename}', end='')
    num_id += 1
    try:
        tree = etree.parse("pmc_dataset/" + filename).getroot()
        gen_sents, disc_sents, abstract = get_sentences(tree)
        cits, figs, tables = get_all_refs(tree)
    except Exception as e:
        print("\rsomething went wrong", id, filename, e)
        continue
    with open(f'{root_dir}/sentences.csv', 'a') as f:
        for i, sent in enumerate(gen_sents):
            print('\t'.join([id, str(i), 'general', sent]), file=f)
        for i, sent in enumerate(disc_sents):
            print('\t'.join([id, str(i), 'discussion', sent]), file=f)
    if abstract != '':
        with open(f'{root_dir}/abstracts.csv', 'a') as f:
            print('\t'.join([id, abstract]), file=f)
    with open(f'{root_dir}/citations.csv', 'a') as f:
        for i, dic in cits.items():
            print('\t'.join([id, str(i), dic['publication-type'], dic['pmid']]), file=f)
    with open(f'{root_dir}/figures.csv', 'a') as f:
        for i, text in figs.items():
            print('\t'.join([id, i, text]), file=f)
    with open(f'{root_dir}/tables.csv', 'a') as f:
        for i, text in tables.items():
            print('\t'.join([id, i, text]), file=f)
    with open(f'{root_dir}/reverse_ref.csv', 'a') as f:
        res = count_reverse(gen_sents, disc_sents, id)
        for row in res:
            print('\t'.join(list(row)), file = f)
    
    

something went wrong 16454893 PMC0019XXXXX/PMC1914215.xml 'NoneType' object has no attribute 'find'
something went wrong 19030111 PMC0025XXXXX/PMC2585368.xml 'NoneType' object has no attribute 'find'
something went wrong 19012720 PMC0026XXXXX/PMC2610361.xml 'NoneType' object has no attribute 'find'
something went wrong 19194517 PMC0026XXXXX/PMC2633698.xml 'NoneType' object has no attribute 'find'
something went wrong 18504525 PMC0026XXXXX/PMC2659409.xml 'NoneType' object has no attribute 'find'
something went wrong 18941897 PMC0026XXXXX/PMC2673715.xml 'NoneType' object has no attribute 'find'
something went wrong 19498953 PMC0026XXXXX/PMC2690060.xml 'NoneType' object has no attribute 'find'
something went wrong 18491029 PMC0026XXXXX/PMC2692295.xml 'NoneType' object has no attribute 'find'
something went wrong 19028819 PMC0027XXXXX/PMC2708002.xml 'NoneType' object has no attribute 'find'
something went wrong 18949557 PMC0027XXXXX/PMC2710583.xml 'NoneType' object has no attribute 'find'


something went wrong 23356976 PMC0035XXXXX/PMC3577066.xml 'NoneType' object has no attribute 'find'
something went wrong 23505327 PMC0035XXXXX/PMC3598601.xml 'NoneType' object has no attribute 'find'
something went wrong 11063928 PMC0036XXXXX/PMC3604808.xml 'NoneType' object has no attribute 'find'
something went wrong 23541219 PMC0036XXXXX/PMC3668554.xml 'NoneType' object has no attribute 'find'
something went wrong 23578147 PMC0036XXXXX/PMC3670800.xml 'NoneType' object has no attribute 'find'
something went wrong 18952577 PMC0036XXXXX/PMC3674821.xml 'NoneType' object has no attribute 'find'
something went wrong 19689313 PMC0036XXXXX/PMC3690776.xml 'NoneType' object has no attribute 'find'
something went wrong 19571662 PMC0036XXXXX/PMC3693554.xml 'NoneType' object has no attribute 'find'
something went wrong 19745084 PMC0036XXXXX/PMC3695641.xml 'NoneType' object has no attribute 'find'
something went wrong 19186705 PMC0037XXXXX/PMC3700418.xml 'NoneType' object has no attribute 'find'


something went wrong 19425012 PMC0041XXXXX/PMC4110958.xml 'NoneType' object has no attribute 'find'
something went wrong 25089046 PMC0041XXXXX/PMC4115454.xml 'NoneType' object has no attribute 'find'
something went wrong 23993379 PMC0041XXXXX/PMC4116427.xml 'NoneType' object has no attribute 'find'
something went wrong 25101147 PMC0041XXXXX/PMC4120492.xml 'NoneType' object has no attribute 'find'
something went wrong 24986555 PMC0041XXXXX/PMC4120661.xml 'NoneType' object has no attribute 'find'
something went wrong 22170532 PMC0041XXXXX/PMC4124935.xml 'NoneType' object has no attribute 'find'
something went wrong 20560524 PMC0041XXXXX/PMC4138042.xml 'NoneType' object has no attribute 'find'
something went wrong 23209036 PMC0041XXXXX/PMC4139095.xml 'NoneType' object has no attribute 'find'
something went wrong 25152699 PMC0041XXXXX/PMC4140215.xml 'NoneType' object has no attribute 'find'
something went wrong 26580390 PMC0041XXXXX/PMC4140216.xml 'NoneType' object has no attribute 'find'


something went wrong 25448597 PMC0044XXXXX/PMC4418787.xml 'NoneType' object has no attribute 'find'
something went wrong 25210891 PMC0044XXXXX/PMC4427039.xml 'NoneType' object has no attribute 'find'
something went wrong 23196840 PMC0044XXXXX/PMC4429872.xml 'NoneType' object has no attribute 'find'
something went wrong 19673811 PMC0044XXXXX/PMC4430085.xml 'NoneType' object has no attribute 'find'
something went wrong 23764702 PMC0044XXXXX/PMC4435688.xml 'NoneType' object has no attribute 'tag'
something went wrong 25237734 PMC0044XXXXX/PMC4436587.xml 'NoneType' object has no attribute 'find'
something went wrong 25953364 PMC0044XXXXX/PMC4442048.xml 'NoneType' object has no attribute 'find'
something went wrong 26023301 PMC0044XXXXX/PMC4443689.xml 'NoneType' object has no attribute 'find'
something went wrong 25282314 PMC0044XXXXX/PMC4445144.xml 'NoneType' object has no attribute 'find'
something went wrong 18763214 PMC0044XXXXX/PMC4445826.xml 'NoneType' object has no attribute 'find'
s

something went wrong 20006785 PMC0050XXXXX/PMC5003407.xml 'NoneType' object has no attribute 'find'
something went wrong 27454593 PMC0050XXXXX/PMC5014707.xml 'NoneType' object has no attribute 'tag'
something went wrong 25972072 PMC0050XXXXX/PMC5021303.xml 'NoneType' object has no attribute 'find'
something went wrong 27278223 PMC0050XXXXX/PMC5029785.xml 'NoneType' object has no attribute 'find'
something went wrong 25202864 PMC0050XXXXX/PMC5036449.xml 'NoneType' object has no attribute 'find'
something went wrong 27695670 PMC0050XXXXX/PMC5040067.xml 'NoneType' object has no attribute 'tag'
something went wrong 27799659 PMC0050XXXXX/PMC5051632.xml 'NoneType' object has no attribute 'tag'
something went wrong 27746663 PMC0050XXXXX/PMC5057184.xml 'NoneType' object has no attribute 'find'
something went wrong 25539589 PMC0050XXXXX/PMC5058355.xml 'NoneType' object has no attribute 'find'
something went wrong 27648707 PMC0050XXXXX/PMC5067956.xml 'NoneType' object has no attribute 'tag'
some

something went wrong 27890308 PMC0057XXXXX/PMC5714310.xml 'NoneType' object has no attribute 'find'
something went wrong 29112854 PMC0057XXXXX/PMC5716339.xml 'NoneType' object has no attribute 'find'
something went wrong 29141231 PMC0057XXXXX/PMC5716830.xml 'NoneType' object has no attribute 'find'
something went wrong 29076915 PMC0057XXXXX/PMC5716866.xml 'NoneType' object has no attribute 'tag'
something went wrong 23099219 PMC0057XXXXX/PMC5718191.xml 'NoneType' object has no attribute 'find'
something went wrong 29226028 PMC0057XXXXX/PMC5722461.xml 'NoneType' object has no attribute 'find'
something went wrong 28131468 PMC0057XXXXX/PMC5723153.xml 'NoneType' object has no attribute 'find'
something went wrong 28281920 PMC0057XXXXX/PMC5723158.xml 'NoneType' object has no attribute 'find'
something went wrong 28777787 PMC0057XXXXX/PMC5724517.xml 'NoneType' object has no attribute 'find'
something went wrong 28869186 PMC0057XXXXX/PMC5724971.xml 'NoneType' object has no attribute 'find'
s

something went wrong 29689200 PMC0059XXXXX/PMC5922980.xml 'NoneType' object has no attribute 'find'
something went wrong 23424150 PMC0059XXXXX/PMC5927555.xml 'NoneType' object has no attribute 'find'
something went wrong 29098892 PMC0059XXXXX/PMC5929121.xml 'NoneType' object has no attribute 'find'
something went wrong 24472370 PMC0059XXXXX/PMC5930028.xml 'NoneType' object has no attribute 'find'
something went wrong 28386122 PMC0059XXXXX/PMC5931716.xml 'NoneType' object has no attribute 'find'
something went wrong 27646117 PMC0059XXXXX/PMC5937994.xml 'NoneType' object has no attribute 'find'
something went wrong 29523483 PMC0059XXXXX/PMC5938626.xml 'NoneType' object has no attribute 'find'
something went wrong 28358067 PMC0059XXXXX/PMC5938735.xml 'NoneType' object has no attribute 'find'
something went wrong 29727685 PMC0059XXXXX/PMC5940330.xml 'NoneType' object has no attribute 'find'
something went wrong 24796522 PMC0059XXXXX/PMC5942205.xml 'NoneType' object has no attribute 'find'


something went wrong 30184511 PMC0061XXXXX/PMC6186214.xml 'NoneType' object has no attribute 'find'
something went wrong 29772204 PMC0061XXXXX/PMC6192252.xml 'NoneType' object has no attribute 'find'
something went wrong 29782874 PMC0061XXXXX/PMC6192534.xml 'NoneType' object has no attribute 'find'
something went wrong 30017134 PMC0061XXXXX/PMC6192674.xml 'NoneType' object has no attribute 'find'
something went wrong 30220351 PMC0061XXXXX/PMC6193200.xml 'NoneType' object has no attribute 'find'
something went wrong 29274015 PMC0061XXXXX/PMC6197804.xml 'NoneType' object has no attribute 'find'
something went wrong 30369966 PMC0062XXXXX/PMC6202057.xml 'NoneType' object has no attribute 'find'
something went wrong 29690938 PMC0062XXXXX/PMC6202274.xml 'NoneType' object has no attribute 'find'
something went wrong 30119917 PMC0062XXXXX/PMC6204312.xml 'NoneType' object has no attribute 'find'
something went wrong 27838960 PMC0062XXXXX/PMC6205227.xml 'NoneType' object has no attribute 'find'


something went wrong 30613709 PMC0063XXXXX/PMC6319661.xml 'NoneType' object has no attribute 'find'
something went wrong 30613694 PMC0063XXXXX/PMC6319662.xml 'NoneType' object has no attribute 'find'
something went wrong 30613725 PMC0063XXXXX/PMC6319816.xml 'NoneType' object has no attribute 'find'
something went wrong 30613726 PMC0063XXXXX/PMC6319817.xml 'NoneType' object has no attribute 'find'
something went wrong 30613727 PMC0063XXXXX/PMC6319869.xml 'NoneType' object has no attribute 'find'
something went wrong 30613728 PMC0063XXXXX/PMC6319870.xml 'NoneType' object has no attribute 'find'
something went wrong 30613729 PMC0063XXXXX/PMC6319874.xml 'NoneType' object has no attribute 'find'
something went wrong 30613730 PMC0063XXXXX/PMC6319875.xml 'NoneType' object has no attribute 'find'
something went wrong 30613695 PMC0063XXXXX/PMC6319879.xml 'NoneType' object has no attribute 'find'
something went wrong 30613731 PMC0063XXXXX/PMC6319880.xml 'NoneType' object has no attribute 'find'


something went wrong 30097107 PMC0064XXXXX/PMC6457340.xml 'NoneType' object has no attribute 'find'
something went wrong 30620898 PMC0064XXXXX/PMC6467221.xml 'NoneType' object has no attribute 'find'
something went wrong 29544099 PMC0064XXXXX/PMC6469393.xml 'NoneType' object has no attribute 'find'
something went wrong 30609041 PMC0064XXXXX/PMC6469857.xml 'NoneType' object has no attribute 'tag'
something went wrong 24820197 PMC0064XXXXX/PMC6476335.xml 'NoneType' object has no attribute 'find'
something went wrong 30253877 PMC0064XXXXX/PMC6477677.xml 'NoneType' object has no attribute 'find'
something went wrong 30904180 PMC0064XXXXX/PMC6478522.xml 'NoneType' object has no attribute 'find'
something went wrong 30794793 PMC0064XXXXX/PMC6481290.xml 'NoneType' object has no attribute 'find'
something went wrong 30526941 PMC0064XXXXX/PMC6481297.xml 'NoneType' object has no attribute 'find'
something went wrong 30522628 PMC0064XXXXX/PMC6482942.xml 'NoneType' object has no attribute 'find'
s

something went wrong 30688738 PMC0066XXXXX/PMC6687392.xml 'NoneType' object has no attribute 'tag'
something went wrong 27132076 PMC0066XXXXX/PMC6688604.xml 'NoneType' object has no attribute 'find'
something went wrong 30076050 PMC0066XXXXX/PMC6690053.xml 'NoneType' object has no attribute 'find'
something went wrong 31365878 PMC0066XXXXX/PMC6690723.xml 'NoneType' object has no attribute 'find'
something went wrong 31378536 PMC0066XXXXX/PMC6697045.xml 'NoneType' object has no attribute 'find'
something went wrong 30715990 PMC0066XXXXX/PMC6697159.xml 'NoneType' object has no attribute 'find'
something went wrong 30904132 PMC0066XXXXX/PMC6699622.xml 'NoneType' object has no attribute 'find'
something went wrong 31299203 PMC0067XXXXX/PMC6707723.xml 'NoneType' object has no attribute 'find'
something went wrong 29744623 PMC0067XXXXX/PMC6711376.xml 'NoneType' object has no attribute 'find'
something went wrong 31445666 PMC0067XXXXX/PMC6713268.xml 'NoneType' object has no attribute 'find'
s