# Spock demonstration

In this Notebook, we would be visiting Spock's main features and how to use them. 
We would first import Spock.

In [1]:
from spock_literature.spock import Spock
import pprint as pp
import os

* 'fields' has been removed


### Download PDFs

Sometimes, we might find it easier to just give the URL to a scientific paper and have the PDF downloaded. Spock can do this for you. 

We would first look at the HTML code of the URL given to us, if Spock notices a PDF link, it would download it for us. If not, it would read the text and give it to an LLM to judge if the text given to us is a complete scientific paper that could undergo further processing. If not, it would return an error and ask the user to download it and process it normally.

Example:


In [2]:
# From preprints

spock_arxiv = Spock(model='gpt-4o', publication_url="https://www.biorxiv.org/content/10.1101/2024.11.11.622734v1", papers_download_path=os.getcwd()+"/papers")
spock_arxiv.download_pdf()

# From journals

spock_journal = Spock(model='gpt-4o', publication_url="https://www.nature.com/articles/s41467-023-44599-9")
spock_journal.download_pdf() # Could not find the pdf link but judges that the article is complete and would put it's content in the paper attribute so it can go further with the analysis
assert spock_journal.paper != ""


# From PDFs locally

spock = Spock(model='gpt-4o', paper="data-sample.pdf")


INFO:spock_literature.utils.Url_downloader:Found PDF link: /content/10.1101/2024.11.11.622734v1.full.pdf
INFO:spock_literature.utils.Url_downloader:https://www.biorxiv.org/content/10.1101/2024.11.11.622734v1.full.pdf
INFO:spock_literature.utils.Url_downloader:PDF downloaded successfully to /gpfs/fs0/scratch/m/mehrad/brikiyou/spock/examples/papers
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:spock_literature.utils.Url_downloader:Document is a complete scientific paper: True


### Summarize PDFs

Spock can summarize PDFs for you. It would use a reduce chain to summarize the text in the PDF for better results, which might take a bit longer.

Due to the time it takes to summarize a PDF, we would be using llama3.2:3b or GPT-3.5 Turbo for this task.

In [3]:

spock.summarize()
print(spock.paper_summary)

  map_chain = LLMChain(llm=llm, prompt=map_prompt)
  combine_documents_chain = StuffDocumentsChain(
  reduce_documents_chain = ReduceDocumentsChain(
  map_reduce_chain = MapReduceDocumentsChain(
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTT

The main themes of the documents listed include the application of deep learning and machine learning models in cheminformatics, bioinformatics, and cancer research. Topics such as predictive modeling of peptides, quantitative structure-activity relationship studies, protein solubility prediction, hemolytic peptide activity prediction, and antioxidant peptide design are explored. Additionally, the use of web-based tools, serverless computing, and open science principles are highlighted, along with considerations for model transparency, ethical implications, and reproducibility in research. The documents also touch on the integration of omics data for biomarker discovery, the development of tools for survival analysis in cancer, and the use of artificial intelligence in antibiotic discovery and protein structure prediction. These themes collectively contribute to advancements in predictive modeling, biomarker discovery, and the ethical and transparent use of AI in various scientific fie

### Getting topics from PDFs

Spock can also get topics from PDFs. It uses the summary of the PDF to get the topics.

In [4]:
spock.get_topics()
print(spock.topics)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Deep Learning/Machine Learning/Cheminformatics/Bioinformatics/Cancer Research/Predictive Modeling/Quantitative Structure-Activity Relationship/Protein Solubility Prediction/Hemolytic Peptide Activity Prediction/Antioxidant Peptide Design/Web-Based Tools/Serverless Computing/Open Science/Model Transparency/Ethical Implications/Reproducibility/Omics Data/Biomarker Discovery/Survival Analysis/Artificial Intelligence/Antibiotic Discovery/Protein Structure Prediction


### Adding custom questions

Spock can also answer custom questions from the PDF. It uses an LLM to extract the topic of the question so it can be formatted properly.

In [5]:
spock.custom_questions = ["What is the main conclusion of the paper?", "What are the main results of the paper?"] # Or be passed as a parameter in the constructor
spock.add_custom_questions()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


### Scan PDFs for metrics

Spock can also scan PDFs for metrics. It also answers the custom questions from the PDF.

In [6]:
spock.scan_pdf()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https

### Format output
Formatting the output response into Json format to make it easier to read and work with .

In [7]:
pp.pprint(spock.format_output())

('📄 Summary of the Publication\n'
 'The main themes of the documents listed include the application of deep '
 'learning and machine learning models in cheminformatics, bioinformatics, and '
 'cancer research. Topics such as predictive modeling of peptides, '
 'quantitative structure-activity relationship studies, protein solubility '
 'prediction, hemolytic peptide activity prediction, and antioxidant peptide '
 'design are explored. Additionally, the use of web-based tools, serverless '
 'computing, and open science principles are highlighted, along with '
 'considerations for model transparency, ethical implications, and '
 'reproducibility in research. The documents also touch on the integration of '
 'omics data for biomarker discovery, the development of tools for survival '
 'analysis in cancer, and the use of artificial intelligence in antibiotic '
 'discovery and protein structure prediction. These themes collectively '
 'contribute to advancements in predictive modeling, biom

### Or just call the instance

Spock can also be called directly to do all the tasks at once. The call special method is implemented to do this.

In [8]:
spock = Spock(model='gpt-4o', paper="data-sample.pdf", settings={"Binary Response": False, "Summary": False, "Questions": True})
spock()

pp.pprint(spock.questions)
pp.pprint(spock.format_output())

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  new materials


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  screening algorithms


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  experimental methodology


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request

An error occured while scanning the PDF for the question:  drug formulations explored


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  lead small-molecule drug candidates


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  clinical trials


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  main conclusion


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


An error occured while scanning the PDF for the question:  main results
{'ML algirothms': {'output': {'response': 'No', 'sentence': 'None'},
                   'question': 'Does the document mention the development of '
                               'any new machine learning and deep learning '
                               'algorithm                                            '
                               'or AI '
                               'model/architecutre?                                        '
                               'Examples sentences for new machine learning or '
                               'deep learning algorithms '
                               ':                                            '
                               '1. In this study, we developed and optimized a '
                               'novel and reliable hybrid machine '
                               'learning                                            '
                             