In [1]:
# Run the first 4 blocks of code then choose which test case block you want to run.
import weaviate
import json

In [2]:
client = weaviate.Client(   
    url="http://localhost:8080"
)

In [3]:
# Verifies the datbase is connected. Should print a schem with the class Paper if using the projects database
print(client.schema.get())

{'classes': [{'class': 'Paper', 'description': 'Articles', 'invertedIndexConfig': {'bm25': {'b': 0.75, 'k1': 1.2}, 'cleanupIntervalSeconds': 60, 'stopwords': {'additions': None, 'preset': 'en', 'removals': None}}, 'moduleConfig': {'text2vec-transformers': {'poolingStrategy': 'masked_mean', 'vectorizeClassName': True}}, 'properties': [{'dataType': ['string'], 'description': 'The id', 'indexInverted': False, 'moduleConfig': {'text2vec-transformers': {'skip': True, 'vectorizePropertyName': False}}, 'name': 'pdfId', 'tokenization': 'word'}, {'dataType': ['text'], 'description': 'The abstract', 'indexInverted': True, 'moduleConfig': {'text2vec-transformers': {'skip': False, 'vectorizePropertyName': False}}, 'name': 'abstract', 'tokenization': 'word'}, {'dataType': ['string[]'], 'description': 'The categories', 'indexInverted': False, 'moduleConfig': {'text2vec-transformers': {'options': {'useCache': True, 'useGPU': True, 'waitForModel': True}, 'skip': True, 'vectorizePropertyName': False}},

In [4]:
def prettyPrintResults (queryResults):
    # Prints nicer looking results that outputting the raw json

    bold = '\033[1m'
    end = '\033[0m'

    results = queryResults["data"]["Get"]["Paper"]

    for paper in results:

        title = bold + "Title" + end + "\n\n" + paper["title"] + "\n"
        certainty = bold + "Certainty" + end + "\n\n" + str(paper["_additional"]["certainty"]) + "\n"
        abstract = bold + "Abstract" + end + "\n\n" + paper["abstract"] + "\n"

        resultPdfLink = paper["pdfId"]

        if (paper["pdfId"].__contains__("/") is False):
            resultPdfLink = resultPdfLink.replace("-", ".");
    
        resultPdfLink = "https://arxiv.org/pdf/" + resultPdfLink + ".pdf";

        link = bold + "Link" + end + "\n\n" + resultPdfLink + "\n"

        print(title)
        print(certainty)
        print(abstract)
        print(link)

In [6]:
# test case one

testArgument = {
    "concepts": ["Topic modeling can help in optimizing the search process. In this article, we will be discussing Latent Dirichlet Allocation, a topic modeling process."]
}

data = client.query.get("Paper", ["pdfId title abstract _additional {certainty}"]).with_near_text(testArgument).with_limit(3).do()

prettyPrintResults(data)

[1mTitle[0m

Latent Dirichlet Allocation (LDA) and Topic modeling: models,   applications, a survey

[1mCertainty[0m

0.8717871308326721

[1mAbstract[0m

  Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research developme

In [7]:
# test case two

testArgument = {
    "concepts": ["Parallelism is essential to effective use of accelerators because they contain many independent processing elements that are capable of executing code in parallel. There are three ways to develop parallel code."]
}

data = client.query.get("Paper", ["pdfId title abstract _additional {certainty}"]).with_near_text(testArgument).with_limit(3).do()

prettyPrintResults(data)

[1mTitle[0m

Object-Oriented Parallel Programming

[1mCertainty[0m

0.8802706003189087

[1mAbstract[0m

  We introduce an object-oriented framework for parallel programming, which is based on the observation that programming objects can be naturally interpreted as processes. A parallel program consists of a collection of persistent processes that communicate by executing remote methods. We discuss code parallelization and process persistence, and explain the main ideas in the context of computations with very large data objects. 

[1mLink[0m

https://arxiv.org/pdf/1404.4666.pdf

[1mTitle[0m

Effective Parallelisation for Machine Learning

[1mCertainty[0m

0.8790191411972046

[1mAbstract[0m

  We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other parallelisation techniques, it can be applied to a

In [9]:
# test case three

testArgument = {
    "concepts": ["Stem cells are the body's raw materials â€” cells from which all other cells with specialized functions are generated. Under the right conditions in the body or a laboratory, stem cells divide to form more cells called daughter cells."]
}

data = client.query.get("Paper", ["pdfId title abstract _additional {certainty}"]).with_near_text(testArgument).with_limit(3).do()

prettyPrintResults(data)

[1mTitle[0m

Intrinsic cell factors that influence tumourigenicity in cancer stem   cells - towards hallmarks of cancer stem cells

[1mCertainty[0m

0.7861415147781372

[1mAbstract[0m

  Since the discovery of a cancer initiating side population in solid tumours, studies focussing on the role of so-called cancer stem cells in cancer initiation and progression have abounded. The biological interrogation of these cells has yielded volumes of information about their behaviour, but there has, as of yet, not been many actionable generalised theoretical conclusions. To address this point, we have created a hybrid, discrete/continuous computational cellular automaton model of a generalised stem-cell driven tissue and explored the phenotypic traits inherent in the inciting cell and the resultant tissue growth. We identify the regions in phenotype parameter space where these initiating cells are able to cause a disruption in homeostasis, leading to tissue overgrowth and tumour formation. 