#

# Article Graph Example

This notebook contains a quick overview of the `article_graph` module together with the `topic_modeling`, `similarity` and `ner` modules.

In [1]:
# List of mock papers to be used in this example

papers = [
    {
        'title': 'Title 1',
        'abstract': 'The universe is a vast expanse of space containing countless galaxies, stars, planets, and other celestial objects.',
        'release_date': '2024-01-24',
        'acknowledgements': '''
            We thank the referees for their constructive feedback, which has helped us to improve the quality of this manuscript.
            This work is based on spectropolarimetric observations obtained at the TBL, AATand 3.6-m ESO telescope.
            We thank the technical staff at each of these facilities for their time and data.
            We also acknowledge the use of the PolarBase database, which makes TBL observations publicly available,
            and is operated by the Centre National de la Recherche Scientifique of France (CNRS), Observatoire''',
    },
    {
        'title': 'Title 2',
        'abstract': 'Ancient civilizations such as the Egyptians, Greeks, and Romans have left behind rich legacies of art, architecture, and knowledge.',
        'release_date': '2024-02-15',
        'acknowledgements': '''
            Acknowledgements. We are grateful to our referee, Nicolas Cowan.
            We gratefully acknowledge the open source software which made this work possible:
                astropy (Astropy Collaboration et al. 2013Collaboration et al. , 2018Collaboration et al. , 2022)),
                ipython (Pérez &amp; Granger 2007),
                numpy (Harris et al. 2020),
                scipy (Virtanen et al. 2020),
                matplotlib (Hunter 2007),
                JAX (Bradbury et al. 2018),
                arviz (Kumar et al. 2019),
                numpyro (Phan et al. 2019),
                FastChem (Stock et al. 2018(Stock et al. , 2022;;Kitzmann et al. 2023),
                LX-MIE (Kitzmann &amp; Heng 2018b),
                celerite2 (Foreman-Mackey et al. 2017; Foreman-Mackey 2018) exoplanet (Foreman-Mackey et al. 2021b),
                lightkurve (Lightkurve Collaboration et al. 2018),
                corner (Foreman-Mackey 2016),
                kelp (Morris et al. 2022).

            This research has made use of the SVO Filter Profile Service (http://svo2.cab.inta-csic.es/theory/fps/) supported
            from the Spanish MINECO through grant AYA2017-84089.'''
    },
    {
        'title': 'Title 3',
        'abstract': 'Climate change is a pressing global issue that requires urgent action to mitigate its impacts on the environment and human societies.',
        'release_date': '2024-03-10',
        'acknowledgements': '''Acknowledgements. This work is based on data from eROSITA, the soft X-ray instrument aboard SRG,
        a joint Russian-German science mission supported by the Russian Space Agency (Roskosmos),
        in the interests of the Russian Academy of Sciences represented by its Space Research Institute (IKI),
        and the Deutsches Zentrum für Luft-und Raumfahrt (DLR).'''
    },
]

In [2]:
# We make available the packages inside all the modules

import sys
import os
sys.path.append(os.path.dirname(os.getcwd()))

## Adding the Papers to the Graph

In this section, we will be adding all the papers to the graph!

In [None]:
from article_graph.article_graph import ArticleGraph

# We create the graph
g = ArticleGraph()

# We add the documents to the graph
for paper_id, paper_info in enumerate(papers):
    g.add_paper(paper_id=paper_id,
                title=paper_info['title'],
                abstract=paper_info['abstract'],
                release_date=paper_info['release_date'])
    
# Explore the graph by printing the titles of the papers
for s, p, o in g.graph.triples((None, g.ns.title, None)):
    print(s, p, o)

## Topic Modeling

In this section, we will be exploring the use of topic modeling inside the **Article Graph** !

### Generating the Topics

In this subsection, we will be extracting topics from the papers' abstracts using the `topic_modeling` module!

In [4]:
from topic_modeling.lda import LDA

# Create the LDA model for Topic Modeling
# We need to specify the number of topics and the number of words per topic
lda_model = LDA(corpus=[paper['abstract'] for paper in papers],
                num_topics=3,
                num_words=7)
lda_model.fit()

# Display the generated topics
for i, topic in enumerate(lda_model.topics):
    print(f'Topic {i}: {topic}')

Topic 0: ['the', 'is', 'issue', 'on', 'change', 'climate', 'environment']
Topic 1: ['and', 'the', 'of', 'behind', 'knowledge', 'legacies', 'civilizations']
Topic 2: ['the', 'is', 'of', 'vast', 'planets', 'galaxies', 'countless']


### Adding Topics to Graph

In this subsection, we will be adding the generated topics to the graph!

In [5]:
# We add the topics to the graph
for topic_id, keywords in enumerate(lda_model.topics):
    g.add_topic(topic_id, keywords)

# We explore the graph by printing the keywords of each topic
for s, p, o in g.graph.triples((None, g.ns.keyword, None)):
    print(s, p, o)

http://open_science.com/topic#0 http://open_science.com/keyword the
http://open_science.com/topic#1 http://open_science.com/keyword the
http://open_science.com/topic#2 http://open_science.com/keyword the
http://open_science.com/topic#0 http://open_science.com/keyword is
http://open_science.com/topic#2 http://open_science.com/keyword is
http://open_science.com/topic#0 http://open_science.com/keyword issue
http://open_science.com/topic#0 http://open_science.com/keyword on
http://open_science.com/topic#0 http://open_science.com/keyword change
http://open_science.com/topic#0 http://open_science.com/keyword climate
http://open_science.com/topic#0 http://open_science.com/keyword environment
http://open_science.com/topic#1 http://open_science.com/keyword and
http://open_science.com/topic#1 http://open_science.com/keyword of
http://open_science.com/topic#2 http://open_science.com/keyword of
http://open_science.com/topic#1 http://open_science.com/keyword behind
http://open_science.com/topic#1 h

### Adding TopicBelongings to Graph

In this subsection, we will be adding the topic belonging relationships to the graph! These relationships represent the topic dostributions of each paper to every topic in the graph.

In [6]:
# We predict the topic distributions for each paper to all the topics
lda_model.predict_all()

# We add the topic belonging for each topic and paper storing the degree of belonging
for paper_id, paper_info in enumerate(lda_model.topic_distributions):
    for topic_id, topic_dist in paper_info.items():
        g.add_topic_belonging(paper_id, topic_id, topic_dist)

# We explore the graph by printing the topic belonging for each paper to all the topics
for s, p, o in g.graph.triples((None, g.ns.belongs_to_topic, None)):
    for _, p1, o1 in g.graph.triples((o, g.ns.degree, None)):
        print(s, p, o, p1, o1)

http://open_science.com/paper#0 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#00 http://open_science.com/degree 0.020183839800524628
http://open_science.com/paper#0 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#01 http://open_science.com/degree 0.020414235129847587
http://open_science.com/paper#0 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#02 http://open_science.com/degree 0.9594019250696278
http://open_science.com/paper#1 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#10 http://open_science.com/degree 0.016995018976179124
http://open_science.com/paper#1 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#11 http://open_science.com/degree 0.965817619919958
http://open_science.com/paper#1 http://open_science.com/belongs_to_topic http://open_science.com/topic_belonging#12 http://open_science.com/degree 0.0171873611038628

## Named Entity Recognition

In this section, we will be exploring the use of named entity recognition inside the **Article Graph** !

In [7]:
# TODO

## Similarity

In this section, we will be exploring the use of similarity inside the **Article Graph** !

### Calculating similarity

In this subsection, we will be calculating the similarity between the papers' abstracts using the `similarity` module!

In [3]:
# pip install -U sentence-transformers
from similarity.Model import Model

# Name of the SentenceTransformer model to use
model_name = 'sentence-transformers/all-mpnet-base-v2'

# Create an instance of the class
Model_instance = Model([paper['abstract'] for paper in papers], model_name)

# Calculate similarity and retrieve the results
similarity_results = Model_instance.calculate_similarity()

# Print similarity results
print("Similarity results:")
for result in similarity_results:
    print(result)

  from .autonotebook import tqdm as notebook_tqdm


Similarity results:
{'text_id1': 0, 'text_id2': 1, 'similarity': 0.13612501}
{'text_id1': 0, 'text_id2': 2, 'similarity': 0.0900681}
{'text_id1': 1, 'text_id2': 2, 'similarity': 0.06453505}


### Adding similarity to Graph

In this subsection, we will be adding the calculated similarity to the graph!

In [None]:
# Iterate over the similarity results and add them to the graph
for result in similarity_results:
    text_id1 = result['text_id1']
    text_id2 = result['text_id2']
    similarity_score = result['similarity']
    
    # Add the similarity to the graph using the add_similarity method
    g.add_similarity(text_id1, text_id2, similarity_score)

# Explore the graph by printing the similarity between papers
for s, p, o in g.graph.triples((None, g.ns.similar_to, None)):
    for _, p1, o1 in g.graph.triples((s, g.ns.degree, None)):
        print(f"Paper 1: {s}, Paper 2: {o}, Similarity Score: {o1}")

for s, p, o in g.graph.triples((None, g.ns.similar_from, None)):
    for _, p1, o1 in g.graph.triples((s, g.ns.degree, None)):
        print(f"Paper 1: {o1}, Paper 2: {s}, Similarity Score: {o1}")

 