* **Modelagem de tópico**
    
    Modelagem de tópico é um tipo de modelo estatística para descobrir tópicos abstratos que ocorrem em documentos. A modelagem de tópicos é uma ferramenta de mineração de texto que é utilizando de maneira frequente para descobrir estruturas semânticas ocultas em um corpo de texto.

* **LDA**

    LDA é uma técnica que tenta descobrir tópicos em documentos de texto usando uma distribuição de probabilidade.;
* **Tópicos:** Decompõe os objetos em atributos em comum.
* **Agrupamentos:** Reune um determinado número de objetos que compartilham as mesma características.

**Aplicação**

In [None]:
import pandas as pd

news_data = pd.read_csv("abcnews-date-text.csv")
print(news_data.shape)

NUM_SAMPLES = 20000
sample_df = news_data.sample(NUM_SAMPLES, replace=False).reset_index(drop=True)

sample_df.sample(5) 

(1244184, 2)


Unnamed: 0,publish_date,headline_text
361,20050906,abuse allegation against former principal upheld
6471,20200514,will australia have a second wave of coronavir...
14270,20140729,alleged rebels bikies arrested over 20m drug b...
19987,20041020,smoking bill clears tas upper house
3463,20120427,iron ore heir michael wright dies


In [None]:
# Import the CountVectorizer module
from sklearn.feature_extraction.text import CountVectorizer

# Create Count Vectorizer instance and Document X Term matrix (dtm)
cv = CountVectorizer(max_df=0.95, min_df=3, stop_words="english")
dtm = cv.fit_transform(sample_df['headline_text'])

# Show the dtm in in compressed sparse Row format
dtm

<20000x6376 sparse matrix of type '<class 'numpy.int64'>'
	with 89573 stored elements in Compressed Sparse Row format>

A linha 9 mostra a seguinte saída, significando que temos 20.000 documentos e 6440 palavras distintas.

In [None]:
# Get all the words/features
feature_names = cv.get_feature_names()

# show the words from the 6420th
feature_names[6440:]



[]

**Construção de modelo de LDA**

A partir da matriz DTM, pode ser construido um modelo LDA para extrair tópicos dos textos sublinhados. O número de tópicos a serem extraídos é um hiperparâmetro, aonde será usado 7 tópicos.
Como o LDA é um algoritmo iterativo, para esse caso terá 30 iterações, mas o valor padrão é 10. O valor do estado aleatório pode ser qualquer dígito e visa reproduzir o mesmo resultado.

In [None]:
# Import the LDA module from sklearn
from sklearn.decomposition import LatentDirichletAllocation

# Set the number of topics
NB_TOPICS = 7 

# Creat the model
LDA_model = LatentDirichletAllocation(n_components = NB_TOPICS, 
                                      max_iter = 30, random_state = 2021)

# Fit the model on the dtm
LDA_model.fit(dtm)

LatentDirichletAllocation(max_iter=30, n_components=7, random_state=2021)

In [None]:
for i, topic in enumerate(LDA_model.components_):
    print("THE TOP {} WORDS FOR TOPIC #{}".format(10, i))
    print([cv.get_feature_names()[index] for index in topic.argsort()[-10:]])
    print("\n")

THE TOP 10 WORDS FOR TOPIC #0
['blaze', 'hill', 'trump', 'national', 'new', 'deal', 'east', 'north', 'news', 'interview']


THE TOP 10 WORDS FOR TOPIC #1
['country', 'decision', 'new', 'help', 'report', 'urged', 'government', 'council', 'water', 'govt']


THE TOP 10 WORDS FOR TOPIC #2
['drought', 'covid', 'return', 'south', 'house', 'dead', 'win', 'queensland', 'new', 'says']


THE TOP 10 WORDS FOR TOPIC #3
['coronavirus', 'farm', 'home', 'china', 'sydney', 'final', 'world', 'cup', 'australia', 'day']


THE TOP 10 WORDS FOR TOPIC #4
['killed', 'murder', 'car', 'woman', 'death', 'charged', 'crash', 'court', 'man', 'police']


THE TOP 10 WORDS FOR TOPIC #5
['centre', 'wa', 'indigenous', 'changes', 'opposition', 'minister', 'union', 'new', 'australia', 'health']


THE TOP 10 WORDS FOR TOPIC #6
['war', 'season', 'end', 'plan', 'australian', 'council', 'canberra', 'market', 'gold', 'coast']




In [None]:
# Link documents to topics
final_topics = LDA_model.transform(dtm)

# Show the shape of the object 
print(final_topics.shape)

# Create a dataframe from the final_topics 
sample_df["Topic N°"] = final_topics.argmax(axis=1)

# Show 5 first documents and their associated topics
sample_df.head()

(20000, 7)


Unnamed: 0,publish_date,headline_text,Topic N°
0,20061107,water opener am1,6
1,20161023,somali pirates release hostages after four years,0
2,20041214,furniture workers sacked after company collapse,3
3,20170407,naplan online test pilot queensland pulls out ...,2
4,20071008,lifeguard cameras promise safety boost,1


In [None]:
# Import the pyLDAvis library
!pip install pyldavis



Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pyLDAvis.sklearn

 # Enable the visualization on the notebook
pyLDAvis.enable_notebook()

# Create the panel for the visualization
panel = pyLDAvis.sklearn.prepare(LDA_model, dtm, cv, mds='tsne') 

# Show the panel

  from collections import Iterable
  from collections import Mapping
  by='saliency', ascending=False).head(R).drop('saliency', 1)


**Modelagem de tópicos - aplicação**

In [None]:
news_data

Unnamed: 0,publish_date,headline_text
0,20030219,aba decides against community broadcasting lic...
1,20030219,act fire witnesses must be aware of defamation
2,20030219,a g calls for infrastructure protection summit
3,20030219,air nz staff in aust strike for pay rise
4,20030219,air nz strike to affect australian travellers
...,...,...
1244179,20211231,two aged care residents die as state records 2...
1244180,20211231,victoria records 5;919 new cases and seven deaths
1244181,20211231,wa delays adopting new close contact definition
1244182,20211231,western ringtail possums found badly dehydrate...


Dividindo os dados em uma lista de palavras com a função split

In [None]:
news_data = news_data.headline_text.str.split(" ")
news_data

0          [aba, decides, against, community, broadcastin...
1          [act, fire, witnesses, must, be, aware, of, de...
2          [a, g, calls, for, infrastructure, protection,...
3          [air, nz, staff, in, aust, strike, for, pay, r...
4          [air, nz, strike, to, affect, australian, trav...
                                 ...                        
1244179    [two, aged, care, residents, die, as, state, r...
1244180    [victoria, records, 5;919, new, cases, and, se...
1244181    [wa, delays, adopting, new, close, contact, de...
1244182    [western, ringtail, possums, found, badly, deh...
1244183    [what, makes, you, a, close, covid, contact, h...
Name: headline_text, Length: 1244184, dtype: object

Pré processamento com WordNEtLemmatizer

In [None]:
from nltk.stem import WordNetLemmatizer

In [None]:
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [None]:
retira_instancia = WordNetLemmatizer

In [None]:
news_data = news_data.apply(lambda doc:[nltk.stem.WordNetLemmatizer().lemmatize(word) for word in doc])
news_data

0          [aba, decides, against, community, broadcastin...
1          [act, fire, witness, must, be, aware, of, defa...
2          [a, g, call, for, infrastructure, protection, ...
3          [air, nz, staff, in, aust, strike, for, pay, r...
4          [air, nz, strike, to, affect, australian, trav...
                                 ...                        
1244179    [two, aged, care, resident, die, a, state, rec...
1244180    [victoria, record, 5;919, new, case, and, seve...
1244181    [wa, delay, adopting, new, close, contact, def...
1244182    [western, ringtail, possum, found, badly, dehy...
1244183    [what, make, you, a, close, covid, contact, he...
Name: headline_text, Length: 1244184, dtype: object

remoção de stop words

In [None]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
from nltk.corpus import stopwords

In [None]:
lista_palavras = stopwords.words('english')
lista_palavras

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [None]:
news_data = news_data.apply(lambda doc: [word for word in doc if word not in lista_palavras])
news_data

0           [aba, decides, community, broadcasting, licence]
1              [act, fire, witness, must, aware, defamation]
2              [g, call, infrastructure, protection, summit]
3                  [air, nz, staff, aust, strike, pay, rise]
4           [air, nz, strike, affect, australian, traveller]
                                 ...                        
1244179    [two, aged, care, resident, die, state, record...
1244180    [victoria, record, 5;919, new, case, seven, de...
1244181    [wa, delay, adopting, new, close, contact, def...
1244182    [western, ringtail, possum, found, badly, dehy...
1244183             [make, close, covid, contact, new, rule]
Name: headline_text, Length: 1244184, dtype: object

In [None]:
from gensim.corpora.dictionary import Dictionary

Convertendo as palavras em número, pois é mais adequado para processamento textual

In [None]:
documento_palavras = Dictionary(news_data)

removendo palavras extremas

In [None]:
documento_palavras.filter_extremes(no_below = 5, no_above = 0.5)

In [None]:
len(documento_palavras.token2id)

35685

In [None]:
repete_id = news_data.apply(documento_palavras.doc2bow)
repete_id

0                   [(0, 1), (1, 1), (2, 1), (3, 1), (4, 1)]
1          [(5, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1)]
2              [(11, 1), (12, 1), (13, 1), (14, 1), (15, 1)]
3          [(16, 1), (17, 1), (18, 1), (19, 1), (20, 1), ...
4          [(16, 1), (18, 1), (22, 1), (23, 1), (24, 1), ...
                                 ...                        
1244179    [(35, 1), (77, 1), (1338, 1), (1478, 1), (1733...
1244180    [(35, 1), (167, 1), (348, 1), (446, 1), (1352,...
1244181    [(289, 1), (446, 1), (1240, 1), (1665, 1), (53...
1244182    [(398, 1), (794, 1), (1634, 1), (8734, 1), (10...
1244183    [(446, 1), (501, 1), (866, 1), (1240, 1), (532...
Name: headline_text, Length: 1244184, dtype: object

Aplicando o LDA

In [None]:
from gensim.models import LdaModel

In [None]:
lda = LdaModel(repete_id, num_topics = 10, id2word = documento_palavras)

[1;30;43mA saída de streaming foi truncada nas últimas 5000 linhas.[0m
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elogthetad + Elogbeta[:, int(id)]) for id, cnt in doc)
  score += np.sum(cnt * logsumexp(Elo

Imprimindo a importância de cada palavra no tópico

In [None]:
for i in range(10):
  print("Tópicos" + str(i), lda.print_topic(i))
  print("")

Tópicos0 0.032*"case" + 0.022*"election" + 0.019*"donald" + 0.019*"coronavirus" + 0.017*"record" + 0.015*"child" + 0.015*"new" + 0.015*"home" + 0.013*"quarantine" + 0.012*"house"

Tópicos1 0.032*"government" + 0.029*"police" + 0.025*"woman" + 0.018*"people" + 0.016*"morrison" + 0.015*"found" + 0.014*"case" + 0.012*"man" + 0.010*"missing" + 0.010*"joe"

Tópicos2 0.036*"queensland" + 0.026*"wa" + 0.018*"news" + 0.017*"new" + 0.016*"first" + 0.015*"australia" + 0.015*"coronavirus" + 0.014*"world" + 0.011*"hospital" + 0.010*"andrew"

Tópicos3 0.027*"coronavirus" + 0.026*"australia" + 0.026*"vaccine" + 0.025*"u" + 0.022*"china" + 0.019*"south" + 0.019*"border" + 0.014*"national" + 0.014*"australian" + 0.014*"scott"

Tópicos4 0.031*"australian" + 0.025*"coronavirus" + 0.018*"two" + 0.014*"crash" + 0.014*"tasmania" + 0.012*"perth" + 0.012*"australia" + 0.011*"uk" + 0.010*"resident" + 0.010*"travel"

Tópicos5 0.075*"covid" + 0.048*"19" + 0.024*"nsw" + 0.020*"health" + 0.018*"restriction" + 0.0

**BERT**

Umas das aplicações do BERT é aplicar mecanismos de atenção para coletar informações sobre o contexto relevante de uma determinada palavra e, em seguida, codificar esse contexto em um vetor rico que representa a palavra de maneira inteligente.

* Lançado em 2018 pela Google
* Um dos modelos com melhores resultados na área de PLN
* Modelo com melhor compreensão das palavras e frases dentro de contexto
* Implementação no sistema de busca do google

Cada índice do vetor recebe uma palavra, levando em consideração a frase, o contexto e o sentido, Possuindo um token de significado semântico. 

In [None]:
#instalando a biblioteca transformers
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.22.2-py3-none-any.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 9.6 MB/s 
Collecting huggingface-hub<1.0,>=0.9.0
  Downloading huggingface_hub-0.10.0-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 54.4 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 52.7 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.0 tokenizers-0.12.1 transformers-4.22.2


In [None]:
#Bibliotecas
import numpy as np
import pandas as pd
import torch
from torch.utils.data import Dataset, dataloader
from transformers import BertForSequenceClassification
from transformers import BloomTokenizerFast
from transformers import pipeline
from transformers import BertTokenizer
from sklearn import preprocessing
from tqdm import tqdm


In [None]:
#Hiperparâmentros
treino = 0.7
teste = 0.1
validacao = 0.2
tamanho_max = 512
tamanho_batch = 16
dispositivo = torch.device("cuda:0" if (torch.cuda.is_available()) else "cpu")

In [None]:
print("Conferindo a unidade de processamento:", dispositivo)

Conferindo a unidade de processamento: cpu


In [None]:
arquivo = pd.read_csv("imdb-reviews-pt-br.csv")

In [None]:
arquivo.shape

(49459, 4)

In [None]:
arquivo

Unnamed: 0,id,text_en,text_pt,sentiment
0,1,Once again Mr. Costner has dragged out a movie...,"Mais uma vez, o Sr. Costner arrumou um filme p...",neg
1,2,This is an example of why the majority of acti...,Este é um exemplo do motivo pelo qual a maiori...,neg
2,3,"First of all I hate those moronic rappers, who...","Primeiro de tudo eu odeio esses raps imbecis, ...",neg
3,4,Not even the Beatles could write songs everyon...,Nem mesmo os Beatles puderam escrever músicas ...,neg
4,5,Brass pictures movies is not a fitting word fo...,Filmes de fotos de latão não é uma palavra apr...,neg
...,...,...,...,...
49454,49456,"Seeing as the vote average was pretty low, and...","Como a média de votos era muito baixa, e o fat...",pos
49455,49457,"The plot had some wretched, unbelievable twist...",O enredo teve algumas reviravoltas infelizes e...,pos
49456,49458,I am amazed at how this movieand most others h...,Estou espantado com a forma como este filme e ...,pos
49457,49459,A Christmas Together actually came before my t...,A Christmas Together realmente veio antes do m...,pos


In [None]:
arquivo["sentiment"].value_counts()

neg    24765
pos    24694
Name: sentiment, dtype: int64

**Demonstração Task em inglês**

In [None]:
bert_en = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
instancia_teste = 100

arquivo["text_en"][instancia_teste], bert_en(arquivo["text_en"][instancia_teste])

('Imagine every stereotypical, overacted cliche from every movie and TV show set on the streets of Brooklyn between 1930 and 1980. Populate it with a cast of interchangeable caricatures instead of actual characters. Throw in a mix of "period" music and wailing electric guitars during the "rumble" scenes. Then pass the time trying to figure out or care which of the Deuces is going to be killed in the anticlimactic final rumble.Ill give this movie points for not being just another romantic comedy, teen slasher, explosive action movie, teen sex comedy, kiddie musical, or Oscar-nomination vehicle. But bringing something new or interesting to the street-gang tragedy genre mightve been nice.',
 [{'label': 'NEGATIVE', 'score': 0.997478187084198}])

### **Para PT-BR**
**Tokenização**

In [None]:
tokenizer = BertTokenizer.from_pretrained("neuralmind/bert-base-portuguese-cased")

Downloading:   0%|          | 0.00/210k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/647 [00:00<?, ?B/s]

In [None]:
#pré-processamento do dataset para saber o que é tag e frase
arquivo_tokenize = tokenizer.batch_encode_plus(arquivo["text_pt"], return_tensors = "pt", padding = True,
                                               truncation = True)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [None]:
print(arquivo_tokenize["input_ids"].shape, arquivo_tokenize["attention_mask"].shape)

torch.Size([49459, 1432]) torch.Size([49459, 1432])


**Definição do X e Y**

In [None]:
# Transformando dicionário em matriz
X = torch.stack((arquivo_tokenize["input_ids"], arquivo_tokenize["attention_mask"]), dim = 0)

In [None]:
arquivo["sentiment"] = arquivo["sentiment"].apply(lambda x: 0 if x == 'neg' else 1)
Y = torch.tensor(arquivo["sentiment"].to_numpy())

### **Dataloader**

In [None]:
class TextDataset(Dataset):
  def __init__(self, X, Y):
    self.X = X
    self.X = self.X.to(dispositivo)

    self.Y = Y
    self.Y = self.Y.to(dispositivo)

    self.len = len(Y)

  def __len__ (self):
    return self.len

  def __getitem__(self, idx):
    return self.X[:, idx], self.Y[idx]

In [None]:
dataset = TextDataset(X, Y)

In [None]:
num_treino_instancia = np.int(np.round(dataset.len * treino))
num_validacao_instancia = np.int(np.round(dataset.len * validacao))
num_teste_instancia = np.int(np.round(dataset.len * teste))

print(f"treino: {num_treino_instancia}, Validação: {num_validacao_instancia} e Teste: {num_teste_instancia}")

treino: 34621, Validação: 9892 e Teste: 4946


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  """Entry point for launching an IPython kernel.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  This is separate from the ipykernel package so we can avoid doing imports until


In [None]:
treino_split, validacao_split, teste_split = torch.utils.data.random_split(dataset, [num_treino_instancia, num_validacao_instancia, num_teste_instancia]) 

In [None]:
treino_loader = torch.utils.data.DataLoader(treino_split, shuffle = True)
validacao_loader = torch.utils.data.DataLoader(validacao_split, shuffle = True)
teste_loader = torch.utils.data.DataLoader(teste_split, shuffle = True)

### **Treino**

In [None]:
epocas = 40

passo_por_epoca = 200
validacao_epoca = 50 

modelo = BertForSequenceClassification.from_pretrained("neuralmind/bert-base-portuguese-cased").to(dispositivo)


for parametro in modelo.base_model.parameters():
  parametro.requires_grad = True

funcao_perda = torch.nn.CrossEntropyLoss()
optmi = torch.optim.Adam(modelo.parameters())

saida_modelo = lambda output, labels: (labels == output.argmax(axis = 1)).sum()


Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the

In [None]:
epoca_metadata = []
