# Spock demonstration

In this Notebook, we would be visiting Spock's main features and how to use them. 
We would first import Spock.

In [1]:
from spock_literature.spock import Spock
import pprint as pp
import os

### Download PDFs

Sometimes, we might find it easier to just give the URL to a scientific paper and have the PDF downloaded. Spock can do this for you. 

We would first look at the HTML code of the URL given to us, if Spock notices a PDF link, it would download it for us. If not, it would read the text and give it to an LLM to judge if the text given to us is a complete scientific paper that could undergo further processing. If not, it would return an error and ask the user to download it and process it normally.

Example:


In [2]:
# From preprints

spock_arxiv = Spock(model='gpt-4o', publication_url="https://www.biorxiv.org/content/10.1101/2024.11.11.622734v1", papers_download_path=os.getcwd()+"/papers")
spock_arxiv.download_pdf()

# From journals

spock_journal = Spock(model='gpt-4o', publication_url="https://www.nature.com/articles/s41467-023-44599-9")
spock_journal.download_pdf() # Could not find the pdf link but judges that the article is complete and would put it's content in the paper attribute so it can go further with the analysis
assert spock_journal.paper != ""



INFO:spock_literature.utils.Url_downloader:Found PDF link: /content/10.1101/2024.11.11.622734v1.full.pdf
INFO:spock_literature.utils.Url_downloader:https://www.biorxiv.org/content/10.1101/2024.11.11.622734v1.full.pdf
INFO:spock_literature.utils.Url_downloader:PDF downloaded successfully to /home/youssef/clone/spock/examples/papers
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:spock_literature.utils.Url_downloader:Document is a complete scientific paper: True


### Summarize PDFs

Spock can summarize PDFs for you. It would use a reduce chain to summarize the text in the PDF for better results, which might take a bit longer.

Due to the time it takes to summarize a PDF, we would be using llama3.2:3b or GPT-3.5 Turbo for this task.

In [None]:
spock_journal.summarize()
print(spock_journal.paper_summary)

  map_chain = LLMChain(llm=llm, prompt=map_prompt)
  combine_documents_chain = StuffDocumentsChain(
  reduce_documents_chain = ReduceDocumentsChain(
  map_reduce_chain = MapReduceDocumentsChain(
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTT

### Getting topics from PDFs

Spock can also get topics from PDFs. It uses the summary of the PDF to get the topics.

In [4]:
spock_journal.get_topics()
print(spock_journal.topics)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Scientific Research/Publishing/Author Services/Editorial Policies/Open Access/Research Data/Professional Development/Privacy Policies/Legal Notices/Accessibility/Language Editing/Funding/Editorial Values/Metrics/Social Media/Alerts/Search Functionality


### Adding custom questions

Spock can also answer custom questions from the PDF. It uses an LLM to extract the topic of the question so it can be formatted properly.

In [5]:
spock_journal.custom_questions = ["What is the main conclusion of the paper?", "What are the main results of the paper?"] # Or be passed as a parameter in the constructor
spock_journal.add_custom_questions()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


### Scan PDFs for metrics

Spock can also scan PDFs for metrics. It also answers the custom questions from the PDF.

In [6]:
spock_journal.scan_pdf()

Not a PDF file


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https

### Format output
Formatting the output response into Json format to make it easier to read and work with .

In [7]:
pp.pprint(spock_journal.format_output())

('📄 Summary of the Publication\n'
 'The main themes of the documents revolve around scientific research, '
 'publishing, author services, editorial policies, and partnerships in the '
 'field of science. Topics include journals, articles, research data, language '
 'editing, professional development, privacy policies, legal notices, and '
 'accessibility statements. The documents also cover open access fees, '
 'funding, calls for papers, editorial values, metrics, and highlights, as '
 'well as social media presence, alerts for updates, and search functionality '
 'for articles.\n'
 '━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n'
 '📝 Topics Covered in the Publication\n'
 'Scientific Research/Publishing/Author Services/Editorial Policies/Open '
 'Access/Research Data/Professional Development/Privacy Policies/Legal '
 'Notices/Accessibility/Language Editing/Funding/Editorial '
 'Values/Metrics/Social Media/Alerts/Search Functionality\n'
 '━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n'
 '❓ Question: new materials\n'


### Or just call the instance

Spock can also be called directly to do all the tasks at once. The call special method is implemented to do this.

In [8]:
spock = Spock(model='gpt-4o', paper="ansari-white-2023-serverless-prediction-of-peptide-properties-with-recurrent-neural-networks.pdf", papers_download_path=os.getcwd()+"/papers")
spock()
pp.pprint(spock.format_output())

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https

('📄 Summary of the Publication\n'
 'The main themes across the provided summaries revolve around the application '
 'of deep learning and machine learning algorithms in bioinformatics, '
 'specifically focusing on peptide and protein properties prediction. These '
 'themes include the prediction of hemolytic activity, solubility, '
 'antimicrobial properties, and antioxidant activity of peptides. The use of '
 'recurrent neural networks, LSTM models, and deep learning frameworks like '
 'Keras and TensorFlow is highlighted, along with the importance of model '
 'transparency, evaluation metrics, and protein engineering for drug '
 'discovery. Additionally, the summaries touch on the utilization of '
 'databases, resources, and predictive models for various biological '
 'activities, showcasing the advancements in computational methods for '
 'bioinformatics research.\n'
 '━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n'
 '📝 Topics Covered in the Publication\n'
 'None\n'
 '━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n