Implement a OL using FCA:

- Python Library for FCA

  [https://pypi.org/project/concepts/](https://pypi.org/project/concepts/)

- Your choice (both the "concepts" and the features).
  - Example features: semantic properties, related words, synsets, occurrence in documents, etc.
  
Example (useful but not limited to):
- Given two input languages (e.g., Italian and English):
  - Concepts: terms (from both languages).
  - Features: membership synsets.

### Functions

In our implementation we use 2 different type of data: the first one type are 2 list of word (english and italian), the second one is a topic took from wikipedia using the api.
#### `get_info(syns)`

This function takes a list of synsets (WordNet word senses) as input and returns a list of strings containing information about their parts of speech and names. It does the following:

1. Initializes an empty set `pos_set` to keep track of unique parts of speech.
2. Initializes an empty list `info` to store information about the synsets.
3. Iterates through the synsets:
   - Retrieves the part of speech using `syn.pos()`.
   - Checks if the part of speech has been encountered before. If yes, it skips to the next iteration.
   - Adds the part of speech to `pos_set` to track unique occurrences.
   - Based on the part of speech, appends a corresponding string to the `info` list ('noun' for 'n', 'adjective' for 'a', 'verb' for 'v').
   - If there are synsets (not an empty list), appends the name of the first synset to the `info` list.
4. Returns the `info` list containing part of speech and synset name information.


### get_data 
The function get_data  sets up a user agent and uses the WikipediaAPI to retrieve summaries for specific Wikipedia pages, which are defined in the 'titles' list. These summaries are then joined together and returned as a single string.


### extract_concepts
The extract_concepts function processes the content by tokenizing it into words, removing stopwords and non-alphabetic tokens, and extracting nouns. These nouns are considered as concepts.

### Creating Definitions using Synsets

The code block then creates definitions for the generated words using WordNet synsets.

1. Initializes a `Definition` object named `d` to store word definitions.
2. Iterates through each word in the combined list of English and Italian.
   - Determines the language of the word ('eng' for English and 'ita' for Italian).
   - Retrieves the synsets of the word from WordNet using `wn.synsets()`.
   - Calls the `get_info()` function to extract information about the synsets (part of speech and name).
   - Adds the word and its associated information to the `d` object using `d.add_object()`.
3. Creates a `Context` named `c` using the populated `Definition` object `d`.

### Visualizing the Lattice and Printing the Context

The code finishes by visualizing the lattice of the context and printing the context itself.

1. Calls the `graphviz()` function on the lattice of the `Context` object `c` to visualize it using Graphviz (if available).
2. Prints the `Context` object `c`, which includes the word definitions and relationships based on the WordNet synsets.



### IMPORT

In [2]:
from nltk.corpus import wordnet as wn
from concepts import Context, Definition
import wikipediaapi
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords


from nltk.tag import pos_tag

import nltk
nltk.download('words')
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package words to /home/palius/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package punkt to /home/palius/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/palius/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:

def get_info(syns) -> list[str]:
    """
    Returns the information about the parts of speech (POS) of the given synonyms.
    :param syns: A list of synonyms
    :return: A list of POS information for each synonym
    """
    pos_set = set()
    info = []
    for syn in syns:
        pos = syn.pos()
        if pos in pos_set:
            continue
        pos_set.add(pos)
        if pos == 'n':
            info.append('noun')
        elif pos == 'a':
            info.append('adjective')
        elif pos == 'v':
            info.append('verb')
    
    if syns:  # Check if syns list is not empty
        info.append(syns[0].name())
    
    return info

def get_data():
    """
    Retrieves data from Wikipedia based on the provided titles.
    :return: A string containing the summaries of the Wikipedia pages
    """
    user_agent = "university_project/1.0"
    wiki_wiki = wikipediaapi.Wikipedia(user_agent, 'en')

    titles = [
        'Mariano Rivera'
    ]
    summaries = [wiki_wiki.page(title).summary for title in titles]

    return ' '.join(summaries)

def extract_concepts(content):
    """
    Extracts noun keywords from the given content.
    :param content: The content to extract keywords from
    :return: A list of noun keywords
    """
    tokens = word_tokenize(content)
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word.lower() for word in tokens if word.lower() not in stop_words and word.lower().isalpha()]
    tagged_tokens = pos_tag(filtered_tokens)
    noun_keywords = [word for word, pos in tagged_tokens if pos.startswith('N')]   #concepts are nouns
    return noun_keywords


### VERSION WITH ENGLISH AND ITALIAN WORD

In [4]:
# English words
en_words = ["dog", "cat", "car", "door", "bird"]

# Italian translations
it_words = ["cane", "gatto", "auto", "porta", "uccello"]

# Create a Definition object
d = Definition()

# Iterate over both English and Italian words
for word in en_words + it_words:
    # Determine the language of the word
    lang = 'eng' if word in en_words else 'ita'
    
    # Get the synsets (meaning) of the word in the specified language
    syns = wn.synsets(word, lang=lang)
    
    # Get the information about the parts of speech (POS) of the synsets
    w_info = get_info(syns)
    
    # Add the word and its information to the Definition object
    d.add_object(word, w_info)

# Create a Context object using the Definition object
c = Context(*d)

# Generate a graph visualization of the lattice and display it
c.lattice.graphviz(view=True)

# Print the Context object
print(c)

<Context object mapping 10 objects to 7 properties [82009819] at 0x7f527ebfffd0>
           |noun|verb|dog.n.01|cat.n.01|car.n.01|door.n.01|bird.n.01|
    dog    |X   |X   |X       |        |        |         |         |
    cat    |X   |X   |        |X       |        |         |         |
    car    |X   |    |        |        |X       |         |         |
    door   |X   |    |        |        |        |X        |         |
    bird   |X   |X   |        |        |        |         |X        |
    cane   |X   |    |X       |        |        |         |         |
    gatto  |X   |    |        |X       |        |         |         |
    auto   |X   |    |        |        |X       |         |         |
    porta  |X   |    |        |        |        |X        |         |
    uccello|X   |    |        |        |        |         |X        |


### VERSION WITH DATA RETRIVED WITH WIKIPEDIA API

In [7]:
words=extract_concepts(get_data())

d = Definition()
for word in words[:10]:
    syns = wn.synsets(word)
    w_info = get_info(syns)
    d.add_object(word, w_info)


c = Context(*d)
c.lattice.graphviz(view=True)
print(c)

<Context object mapping 9 objects to 10 properties [65422e3e] at 0x7f527b2d7160>
            |noun|rivera.n.01|baseball.n.01|pitcher.n.01|verb|season.n.01|league.n.01|york.n.01|yankee.n.01|career.n.01|
    mariano |    |           |             |            |    |           |           |         |           |           |
    rivera  |X   |X          |             |            |    |           |           |         |           |           |
    baseball|X   |           |X            |            |    |           |           |         |           |           |
    pitcher |X   |           |             |X           |    |           |           |         |           |           |
    seasons |X   |           |             |            |X   |X          |           |         |           |           |
    league  |X   |           |             |            |X   |           |X          |         |           |           |
    york    |X   |           |             |            |    |          

/snap/core20/current/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /lib/x86_64-linux-gnu/libproxy.so.1)
Failed to load module: /home/palius/snap/code/common/.cache/gio-modules/libgiolibproxy.so

	Using the fallback 'C' locale.
/home/palius/snap/code/common/.cache/gio-modules/libgiolibproxy.so: cannot open shared object file: Permission denied
Failed to load module: /home/palius/snap/code/common/.cache/gio-modules/libgiolibproxy.so

Gtk-Message: 15:58:29.078: Failed to load module "canberra-gtk-module"
Gtk-Message: 15:58:29.079: Failed to load module "canberra-gtk-module"
/home/palius/snap/code/common/.cache/gio-modules/libdconfsettings.so: cannot open shared object file: Permission denied
Failed to load module: /home/palius/snap/code/common/.cache/gio-modules/libdconfsettings.so

This may indicate that pixbuf loaders or the mime database could not be found.
