## Read PubMed text export

Steps to get the relevant PubMed export:
+ Go to [PubMed](https://pubmed.ncbi.nlm.nih.gov/)
+ Type in your (advanced) search
+ Click 'Save'
+ In 'Selection' select 'All results'
+ In 'Format' select 'PubMed'
+ Click 'Create file'

You should now have a text file in your downloads folder. Store it, for example, in the folder of this Python Notebook.

In [None]:
import pandas as pd

filename = 'pubmed-NLPTitleAb-set.txt' # Change filename to match your file!

df = pd.read_csv(filename, sep='\n', header=None)
df.head()

## Loop through records and extract abstract

In [None]:
# Keep track of keywords and abstracts
keyword_list = []
abstract_list = []

keywords = ''
abstract = ''
for index, element in enumerate(df[0]):
    # Enumerate all rows
    if element[0:2] == 'AB':
        # Add to the abstract string
        abstract += element[6:]
        counter = 1
        while True:
            new_element = df[0][index + counter]
            if new_element[0:4] != '    ':
                break
            else:
                abstract += new_element[6:]
                counter += 1
                

    elif element[0:3] == 'OT ':
        # Add to the keyword string, 6 if no *, 7 if *
        if element[6] == '*':
            keywords += ' | ' + element[7:]
        else:
            keywords += ' | ' + element[6:]
            
    elif element[0:4] == 'PMID' and index > 0:
        # If no abstract, add empty fields
        if len(abstract) == 0:
            keyword_list.append('EMPTY')
            abstract_list.append('EMPTY')
        else:
            # New publication, append keywords and abstract and clear
            keyword_list.append(keywords[3:])
            abstract_list.append(abstract)
            
        keywords = ''
        abstract = ''

# Append last keywords and abstract
keyword_list.append(keywords)
abstract_list.append(abstract)

print('Looped through records and found ' + str(len(keyword_list)) + ' abstracts.')

Check if the number of abstracts in the line above is what you expected. It could be that some papers do not have abstracts, but it could also be that something went wrong in the code.

## Export Abstract & Keywords to CSV

In [None]:
result_df = pd.DataFrame({'abstract': abstract_list, 'keywords': keyword_list})

result_df.to_csv('PubMedQueryAdditions.csv', index=False, encoding='utf-8')

You now have the abstracts of your search results available and are ready for title and abstract screening!