# Frontiers Summarizer Scratch Pad

### Content
1) Packages and Modules
2) Extracting Relevant Text and Image URLs
3) Involving OpenAI
4) OpenAI Model Comparison

### 1) Packages and Modules

In [1]:
#Creating connection with websites:
import requests

#Regex operations for data cleaning:
import re

#Extracting relevant items from html:
from bs4 import BeautifulSoup

#Connecting with OpenAI:
import openai

#Importing api_key:
from open_ai_key import OPENAI_API_KEY

### 2) Extracting Relevant Text and Image URLs

In [2]:
class Extract_Clean:
    
    def __init__(self, url):
        """Add various settings: type of paper, rating, views...etc"""
        self.url = url
    
    def request(self):
        response = requests.get(self.url)
        if response.status_code!=200:
            print("The request to the url could not be made.\nPlease check if your link is correct and it works in your browser.")
        else:
            print("Successfully retrieved information from the url.")
            soup = BeautifulSoup(response.text, 'html.parser')
            return soup
        
    def extract_main_body(self):
        main_text = self.request().find_all('p', ['mb15','mb0'])
        return main_text
    
    def extract_abstract(self):
        try:
            if self.request().find('div','JournalAbstract').find('p')==None:
                abstract = self.request().find('div','JournalAbstract').find('div','abstracttext')
            else:
                abstract = self.request().find('div','JournalAbstract').find('p')
            return abstract
        except:
            print("This article does not have an abstract.")
            
    def extract_article_type(self):
        article_type = self.request().find('div','header-bar-one').find('h2')
        return article_type
    
    def clean_main_body(self):
        clean_text = re.sub("<([;\s\w\"\=\,\:\./\~\{\}\?\!\-\%\&\#\$\^\(\)]*?)>", "", str(self.extract_main_body()))
        clean_text = clean_text.replace("[", "").replace("]","")
        return clean_text
        
    def extract_image_links(self):
        img_links = []
        for i in self.request().find_all('img', 'lazy'):
            img_links.append(i.get('data-src'))
            
        return img_links  

#### Example: Extracting Image Links

In [3]:
r = Extract_Clean("https://www.frontiersin.org/articles/10.3389/fnhum.2019.00008/full")

In [4]:
liss = r.extract_image_links()

Successfully retrieved information from the url.


In [159]:
liss

['https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g001.jpg',
 'https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g002.jpg',
 'https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g003.jpg',
 'https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g004.jpg',
 'https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g005.jpg',
 'https://www.frontiersin.org/files/Articles/420567/fnhum-13-00008-HTML/image_m/fnhum-13-00008-g006.jpg']

Note: All of the 'img' html tags that are of class='lazy' contain figures. Double checking this with many more frontiers articles is a must.

#### Extract abstract example

In [5]:
abstract = r.extract_abstract()

Successfully retrieved information from the url.
Successfully retrieved information from the url.


In [6]:
abstract

<p>Repeated pairing of electrical stimulation of a peripheral nerve with transcranial magnetic stimulation (TMS) over the primary motor cortex (M1) representation for a target muscle can induce neuroplastic adaptations in the human brain related to motor learning. The extent to which the motor state during this form of paired associative stimulation (PAS) influences the degree and mechanisms of neuroplasticity or motor learning is unclear. Here, we investigated the effect of volitional muscle contraction during PAS on: (1) measures of general corticomotor excitability and intracortical circuit excitability; and (2) motor performance and learning. We assessed measures of corticomotor excitability using TMS and motor skill performance during a serial reaction time task (SRTT) at baseline and at 0, 30, 60 min post-PAS. Participants completed a SRTT retention test 1 week following the first two PAS sessions. Following the PAS intervention where the hand muscle maintained an active muscle c

#### Extract article type example

In [7]:
article_type = r.extract_article_type()

Successfully retrieved information from the url.


In [8]:
article_type

<h2>
                                ORIGINAL RESEARCH article
                            </h2>

#### Example: Extracting Main Body Text Data

Example 1

In [35]:
req = Extract_Clean(url="https://www.frontiersin.org/articles/10.3389/fncom.2016.00003/full")

In [38]:
cl = req.clean()

Successfully retrieved information from the url.


In [39]:
cl

'Beginning with the theoretical foundations of cybernetics and information theory by Wiener (1948) and Shannon (1948), the field of theoretical neuroscience started to develop in the direction of neural information processing. At that time, scientists were inspired by the idea that the same theoretical ideas can be employed both in technological developments and in the understanding of biological, environmental or even sociological systems. Many new concepts in the areas of control, pattern recognition, sensory and motor physiology, neurology, and brain research in general were invented, and it often remains obscure whether the first inspiration comes from biological or neurological observations on one hand or from cybernetical or engineering inventions on the other. An example is the idea of the “receptive field” of neurons in the visual system and of edge or line detectors, called “simple cells”, in particular, that were substantiated by the Nobel prize winning research of the neurop

Example 2

In [40]:
req_2 = Extract_Clean(url="https://www.frontiersin.org/articles/10.3389/fpsyt.2020.578706/full")


In [41]:
cl1 = req_2.clean()

Successfully retrieved information from the url.


In [42]:
cl1

"The understanding of psychiatric disorders is based on multiple interacting levels, from genetics to cells, neural circuits, cognition, behavior, and the surrounding environment. In addition to efforts to accumulate large-scale biological and psychological evidence such as the ENIGMA (1) and COCORO (2) projects, recent advances in computational methodology have helped to handle high-dimensional data to understand psychiatric disorders (3, 4) Bringing mathematical methodologies and theoretical frameworks to psychiatric research is expected to have a central role in the development of treatments and preventive strategies; the use of such methods constitutes an area of research called computational psychiatry (CPSY), which has come to be treated as an important discipline of psychiatry. The development of this field has been reviewed in many papers from the perspective of clinical and computational neuroscience (5–10). This is a highly multidisciplinary field that requires researchers fr

Example 3

In [112]:
req_3 = Extract_Clean(url="https://www.frontiersin.org/articles/10.3389/fnhum.2019.00008/full")

In [113]:
cl2 = req_3.clean()

Successfully retrieved information from the url.


In [114]:
cl2

'Triggered by a variety of internal and external environmental stimuli, neural networks have a remarkable ability to modify their structure and function to learn new behaviors (Kandel, 2001; Cooper, 2005). Neural plasticity is an underlying mechanism for motor learning in both the neurologically intact and injured brain (Kleim and Jones, 2008). To improve motor function, various types of noninvasive brain stimulation to induce positive neural plasticity in the injured brain and as a primer for rehabilitation have been studied (Player et al., 2012; Carson and Kennedy, 2013). One type, paired associative stimulation (PAS), involves the repetitive close pairing of an electrical stimulus of a peripheral nerve with transcranial magnetic stimulation (TMS) of the contralateral primary motor cortex (M1). Through spike-timing dependent plasticity (STDP) and the induction of long-term potentiation (LTP)-like processes, changes in corticomotor excitability can be induced with this repeating pairi

### 3) Involving OpenAI
Here we explore the performance of various OpenAI models for summarizing (for now).

Notes:
1) Include more control parameters for user e.g. temperature.
2) Enhance prompt

#### A) "text-davinci-003"

In [77]:
#References: 
#1) Tutorial of use: https://www.geeksforgeeks.org/how-to-use-chatgpt-api-in-python/
#2) OpenAI API Documentation: https://platform.openai.com/docs/api-reference

class OpenAI_Davinci:
    
    def __init__(self, api_key, text, token_limit):
        self.api_key = api_key
        self.text = text
        self.token_limit = int(token_limit)
        
    def generate_response(self):
        
        openai.api_key = self.api_key
        prompt = f'''You will summarize scientific text. You need to follow these rules:
        1) Focus on results and conclusions.
        2) Complete all your sentences.
        3) Provide statistical information with context if contained in the text.
        Summarize the following text in words: '''
        #text = self.text  
        
        if self.text:
            chat = openai.Completion.create(
                model="text-davinci-003",
                prompt=prompt+self.text,
                temperature=0.0,
                max_tokens=self.token_limit)
            
            response = chat.choices[0].text
            return response
        else:
            print("No message provided. Please provide a message to generate a response.")

In [78]:
a = OpenAI_Davinci(api_key=OPENAI_API_KEY, text=cl, token_limit=150)


In [79]:
reply = a.generate_response()

In [80]:
reply

'\n\nBeginning with the theoretical foundations of cybernetics and information theory by Wiener (1948) and Shannon (1948), the field of theoretical neuroscience has developed in the direction of neural information processing. This has led to the invention of many new concepts in the areas of control, pattern recognition, sensory and motor physiology, neurology, and brain research in general. An example is the idea of the “receptive field” of neurons in the visual system and of edge or line detectors, called “simple cells”, which was substantiated by the Nobel prize winning research of the neurophysiologists Hubel and Wiesel (e.g., Hubel and Wiesel, 1968; Hubel et al., 1977'

#### B) "gpt-3.5-turbo"

In [105]:
class OpenAI_GPT_Turbo:
    
    def __init__(self, api_key, text, token_limit):
        self.api_key = api_key
        self.text = text
        self.token_limit = int(token_limit)
        
    def generate_response(self):
        
        openai.api_key = self.api_key
        
        prompt = '''You will summarize scientific text. You need to follow these rules:
        1) Focus on results and conclusions.
        2) Complete all your sentences.
        3) Provide statistical information with context if contained in the text.'''
        
        
        text = "Summarize the following text: "+self.text  
        
        if text:
            chat = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                {"role": "system", 
                 "content": prompt},
                {"role": "user", 
                 "content": text}],
                temperature=0.0,
                max_tokens=self.token_limit)
            
            response = chat.choices[0].message.content
            return response
        else:
            print("No message provided. Please provide a message to generate a response.")

In [106]:
b = OpenAI_GPT_Turbo(api_key=OPENAI_API_KEY, text=cl, token_limit=150)


In [107]:
reply_b = b.generate_response()

In [108]:
reply_b

'The field of theoretical neuroscience developed from the theoretical foundations of cybernetics and information theory. It has been influenced by both technological developments and the understanding of biological systems. The field has split into technology-oriented research and neuroscience-oriented research. Technological advancements in learning systems have become more sophisticated, but have lost some appeal to neuroscience. On the other hand, neuroscience has made advancements in neural modeling, particularly in the spatio-temporal integration of activity in dendrites and synaptic transmission. Computational neuroscience has been established as a discipline, using information theory to evaluate single neuron responses and neural networks. Information theory has also been used in neurocomputing and learning theory. However, there is a lack of integration in current computational neuroscience when it comes to complex cognitive tasks.'

In [109]:
c = OpenAI_GPT_Turbo(api_key=OPENAI_API_KEY, text=cl1, token_limit=150)


In [110]:
reply_c = c.generate_response()

In [111]:
reply_c

'The text discusses the field of computational psychiatry (CPSY) and the need for better integration between neuroscience, psychiatry, and computational models. The authors introduce a database called CPSYMAP, which organizes and visualizes research papers in the field of CPSY. The database allows users to search, register, and obtain an overview of CPSY research. The authors describe the different perspectives of neuroscientific, psychiatric, and computational methods and how they are implemented in the database. They also provide preliminary analysis of the registered papers in the database, highlighting the distribution of papers based on cognitive functions, mental disorders, and computational frameworks. The authors acknowledge the limitations of the database, including the limited number of registered papers and potential credibility issues with tagging.'

In [115]:
d = OpenAI_GPT_Turbo(api_key=OPENAI_API_KEY, text=cl2, token_limit=150)

In [116]:
reply_d = d.generate_response()

InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 8137 tokens. Please reduce the length of the messages.

#### Overcoming Token Limit Resources:
1) https://uxplanet.org/how-to-overcome-gpt-token-limit-721c30a18d55

### 4) Comparing Models
"text-davinci-003" seems to copy the text whilst "gpt-3.5 turbo" is giving a good summary. What about following the rules?