You can use the Stanza library in Python for lemmatizing Latin text and then perform topic modeling on the lemmatized text. Here is an example of how you can do it:

Install the Stanza library by running **!pip install stanza** in your command line.

Import the necessary modules:

In [1]:
!pip install stanza

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stanza
  Downloading stanza-1.4.2-py3-none-any.whl (691 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m691.3/691.3 KB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting emoji
  Downloading emoji-2.2.0.tar.gz (240 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m240.9/240.9 KB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: emoji
  Building wheel for emoji (setup.py) ... [?25l[?25hdone
  Created wheel for emoji: filename=emoji-2.2.0-py3-none-any.whl size=234926 sha256=597459eb643cb5aee1c1afc546ecbfdd69666f9963db25d9a4c7389f8a19f88a
  Stored in directory: /root/.cache/pip/wheels/86/62/9e/a6b27a681abcde69970dbc0326ff51955f3beac72f15696984
Successfully built emoji
Installing collected packages: emoji, stanza
Successfully installed emoji-2.2.

In [47]:
!pip3 install plotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [28]:
import stanza
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer


Download the Latin language model by running stanza.download('la')

Load the language model and process the text:

In [29]:
nlp = stanza.Pipeline('la')
doc = nlp("Quo usque tandem abutere, Catilina, patientia nostra? Quam diu etiam furor iste tuus nos eludet? Quem ad finem sese effrenata iactabit audacia?")
lemmatized_text = [word.lemma for sent in doc.sentences for word in sent.words]


INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.1.json:   0%|   …

INFO:stanza:Loading these models for language: la (Latin):
| Processor | Package |
-----------------------
| tokenize  | ittb    |
| pos       | ittb    |
| lemma     | ittb    |
| depparse  | ittb    |

INFO:stanza:Use device: cpu
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: pos
INFO:stanza:Loading: lemma
INFO:stanza:Loading: depparse
INFO:stanza:Done loading processors!


Perform topic modeling using Latent Dirichlet Allocation (LDA)

In [39]:
vectorizer = CountVectorizer(analyzer='word', lowercase=False)
X = vectorizer.fit_transform(lemmatized_text)

lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)


Note that this is a basic example, you may want to fine-tune the parameters of the LDA model or use another topic modeling algorithm that better suits your data.

Here is the full code to visualize the results of topic modeling using a radar chart with the plotly library in Python:

In [49]:
import plotly.graph_objs as go
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# vectorize the text data
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(lemmatized_text)

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)

# Get the topic-word matrix
topic_word = lda.components_

# Get the similarity matrix
similarity_matrix = cosine_similarity(topic_word)

# Get the feature names
feature_names = vectorizer.get_feature_names_out()

# Create the radar chart
data = []
for i in range(similarity_matrix.shape[0]):
    topic = similarity_matrix[i,:]
    data.append(go.Scatterpolar(
        r = topic,
        theta = feature_names,
        fill = 'toself',
        name = 'Topic {}'.format(i)
    ))
    
layout = go.Layout(
    polar = dict(
        radialaxis = dict(
            visible = True,
            range = [0, 1]
        )
    ),
    showlegend = True
)

fig = go.Figure(data=data, layout=layout)
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot(fig)


Topic modeling with Radar Chart
Here is an example of how you can use the plotly library in Python to visualize the results of topic modeling using a radar chart, this time with the necessary imports:

In [33]:
!pip install -U scikit-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [45]:
import plotly.graph_objs as go
from sklearn.metrics.pairwise import cosine_similarity
import plotly

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=10, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)

# Get the topic-word matrix
topic_word = lda.components_

# Get the similarity matrix
similarity_matrix = cosine_similarity(topic_word)

# Create the radar chart
data = []
for i in range(similarity_matrix.shape[0]):
    topic = similarity_matrix[i,:]
    data.append(go.Scatterpolar(
        r = topic,
        theta = vectorizer.get_feature_names(),
        fill = 'toself',
        name = 'Topic {}'.format(i)
    ))
    
layout = go.Layout(
    polar = dict(
        radialaxis = dict(
            visible = True,
            range = [0, 1]
        )
    ),
    showlegend = True
)

fig = go.Figure(data=data, layout=layout)
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot(fig)



Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.



In [44]:
import plotly.graph_objs as go
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import plotly

# Prepare the data
text_data = ["This is a text about topic A", "This text is about topic B", "Another text about topic A", "Yet another text about topic B"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(text_data)

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)

# Get the topic-word matrix
topic_word = lda.components_

# Get the similarity matrix
similarity_matrix = cosine_similarity(topic_word)

# Create the radar chart
data = []
feature_names = vectorizer.get_feature_names_out()
for i in range(similarity_matrix.shape[0]):
    topic = similarity_matrix[i,:]
    data.append(go.Scatterpolar(
        r = topic,
        theta = feature_names,
        fill = 'toself',
        name = 'Topic {}'.format(i)
    ))
    
layout = go.Layout(
    polar = dict(
        radialaxis = dict(
            visible = True,
            range = [0, 1]
        )
    ),
    showlegend = True
)

fig = go.Figure(data=data, layout=layout)
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot(fig)

By using plotly.offline.init_notebook_mode(connected=True) and plotly.offline.iplot(fig) you are able to run the code in a Jupyter notebook.
This code will display a radar chart of the topics, where each topic will be represented by a polygon. The vertex of the polygon closer to the center represents the words with higher weight for that topic. You can use the radar chart to compare the topics and identify similarities and differences between them.
Please note that you need to have plotly installed to use this code, you can install it by running !pip install plotly or !pip3 install plotly if you have multiple versions of python installed in your system, or conda install -c plotly plotly if you are using Anaconda distribution.

Here is an example of how you can use the pyLDAvis library in Python to visualize the results of topic modeling:

In [8]:
!pip install joblib scikit-learn pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [11]:
!pip install -U scikit-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-learn
  Downloading scikit_learn-1.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m61.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.0.2
    Uninstalling scikit-learn-1.0.2:
      Successfully uninstalled scikit-learn-1.0.2
Successfully installed scikit-learn-1.2.0


In [12]:
!pip3 install pyLDAvis

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyLDAvis
  Using cached pyLDAvis-3.3.1.tar.gz (1.7 MB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting sklearn
  Using cached sklearn-0.0.post1.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting funcy
  Using cached funcy-1.17-py2.py3-none-any.whl (33 kB)
Building wheels for collected packages: pyLDAvis, sklearn
  Building wheel for pyLDAvis (pyproject.toml) ... [?25l[?25hdone
  Created wheel for pyLDAvis: filename=pyLDAvis-3.3.1-py2.py3-none-any.whl size=136898 sha256=81e5ee95b2bb46fd688f20d758ffc28a109f04a2a9ffe47705b35400cc844772
  Stored in directory: /root/.cache/pip/wheels/90/61/ec/9dbe9efc3acf9c4e37ba70fbbcc3f3a0ebd121060aa593181a
  Building wheel for skl

In [41]:
import pyLDAvis
import pyLDAvis.sklearn

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=3, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)

# Create the data to be visualized
vis_data = pyLDAvis.sklearn.prepare(lda, X, vectorizer)

# Display the visualization
#pyLDAvis.show(vis_data)

pyLDAvis.display(vis_data)

This code will display an interactive visualization of the topics and the words associated with them. You can use the visualization to explore the topics and the words associated with them. You can also use the visualization to compare the topics and identify similarities and differences between them.

The visualization will show the weight of each word for each topic. Also, it will show the relevance of each topic and the top words associated with it. You can move and zoom in/out the visualization to see more details.

To perform topic modeling on multiple documents using Stanza for lemmatization in Python, you can follow these steps:

Install the Stanza library by running !pip install stanza or !pip3 install stanza.

Prepare your documents by reading them into a list of strings, where each element of the list corresponds to one document.

Use the Stanza library to lemmatize the documents. You can use the stanza.Pipeline() function to create a pipeline object and use it to process the documents.

In [None]:
import stanza

# Initialize the Stanza pipeline
stanza.download('en')
nlp = stanza.Pipeline('en')

# Process the documents
lemmatized_docs = []
for doc in documents:
    lemmatized_docs.append([token.lemma for sent in nlp(doc).sentences for token in sent.tokens])


Use the lemmatized documents to perform topic modeling using a library such as scikit-learn's Latent Dirichlet Allocation (LDA) algorithm.

In [None]:
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

# Vectorize the lemmatized documents
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(lemmatized_docs)

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)


You can then use the transform() method to obtain the topics for each document.

In [None]:
document_topics = lda.transform(X)


#How do the topic modeling of multiple txt documents in Latin in folder with Stanza lemmatization?

To perform topic modeling on multiple txt documents in Latin in a folder using Stanza for lemmatization in Python, you can follow these steps:

Install the Stanza library by running !pip install stanza or !pip3 install stanza.

Prepare your documents by reading them into a list of strings, where each element of the list corresponds to one document. You can use the os library to get a list of all the files in a folder and read each txt file into a string.

In [None]:
import os
path = 'path/to/folder'
documents = []
for filename in os.listdir(path):
    with open(os.path.join(path, filename), 'r') as f:
        documents.append(f.read())


Use the Stanza library to lemmatize the documents. You can use the stanza.Pipeline() function to create a pipeline object and use it to process the documents.

In [None]:
import stanza

# Initialize the Stanza pipeline
stanza.download('la')
nlp = stanza.Pipeline('la')

# Process the documents
lemmatized_docs = []
for doc in documents:
    lemmatized_docs.append([token.lemma for sent in nlp(doc).sentences for token in sent.tokens])


Use the lemmatized documents to perform topic modeling using a library such as scikit-learn's Latent Dirichlet Allocation (LDA) algorithm.

In [None]:
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

# Vectorize the lemmatized documents
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(lemmatized_docs)

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)


You can then use the transform() method to obtain the topics for each document.

In [None]:
document_topics = lda.transform(X)


#How do the topic modeling of multiple documents in Latin in csv file with Stanza lemmatization?

To perform topic modeling on multiple documents in Latin in a CSV file using Stanza for lemmatization in Python, you can follow these steps:

Install the Stanza library by running !pip install stanza or !pip3 install stanza.

Prepare your documents by reading them into a list of strings, where each element of the list corresponds to one document. You can use the pandas library to read the CSV file and extract the text data from it.

In [None]:
import pandas as pd

# Read the CSV file
df = pd.read_csv('path/to/file.csv')

# Extract the text data from the CSV file
documents = df['column_name'].tolist()


Use the Stanza library to lemmatize the documents. You can use the stanza.Pipeline() function to create a pipeline object and use it to process the documents.

In [None]:
import stanza

# Initialize the Stanza pipeline
stanza.download('la')
nlp = stanza.Pipeline('la')

# Process the documents
lemmatized_docs = []
for doc in documents:
    lemmatized_docs.append([token.lemma for sent in nlp(doc).sentences for token in sent.tokens])


Use the lemmatized documents to perform topic modeling using a library such as scikit-learn's Latent Dirichlet Allocation (LDA) algorithm.

In [None]:
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

# Vectorize the lemmatized documents
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(lemmatized_docs)

# Fit the LDA model
lda = LatentDirichletAllocation(n_components=5, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(X)
