# <p align="center">**Pré-Processamento e Vetorização** 🧬</p>
 
##### <p align="center">**Autores:**  Glauber Nascimento & Lorena Ribeiro</p>
##### <p align="center">**Orientador:**  James Moraes de Almeida  </p>
 

<div style="background-color: lightblue; font-size: 18px; padding: 10px;">
<div style="text-align: justify"><strong>Objetivo:</strong> Compreensão e aplicação do pré-processamento e vetorização de linguagem para abstracts sobre nanotoxicologia </div>

## 🗣️ **Introdução**

In [58]:
# !pip install nltk

In [1]:
import pprint
import pandas as pd
import numpy as np
import spacy

from scipy import spatial
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import nltk
from nltk.tokenize import TreebankWordTokenizer

In [60]:
df_nanotox = pd.read_csv('nanotox_completo.xls')

#### Pre-processing

Coletando os abstracts presentes no dataset obtido pelo Web of Science

In [61]:
abstracts = df_nanotox['Abstract']
abstracts

0       The contamination of coastal marine environmen...
1       The present aimed to characterize the toxicity...
2       Microplastics, small pieces of plastic derived...
3       Ingestion and transdermal delivery are two com...
4       Various analytical methods have been employed ...
                              ...                        
3013    Human health is increasingly affected by chron...
3014    IntroductionDue to the increasing resistance o...
3015    Background Nano-sized drug delivery system has...
3016    Background and objectivesThe administration of...
3017    Significance Optical tweezers have revolutioni...
Name: Abstract, Length: 3018, dtype: object

In [62]:
abstract_choosed = abstracts[23]
print(abstract_choosed)

Hybrid nanosystems have useful properties for preparing therapeutic systems. Among the most commonly used inorganic components in hybrid nanosystems are gold nanoparticles (AuNP). The design of these nanosystems may require AuNP of hydrophilic or hydrophobic nature. Upon irradiation of AuNP, reactive oxygen species (ROS) are formed, and the temperature of the surrounding medium rises, depending on the size, shape and structure of the nanoparticle. The aim of this work is to evaluate whether irradiating 5 nm spherical gold nanoparticles both 'bare' (AuNP) and functionalized with dodecanethiol (AuNPf) with a Nd:YAG pulsed laser (30 ps, and 10 Hz) at wavelengths of 532 nm (0.031 J cm(-2)) and 1064 nm (1.91 J cm(-2)) produces ROS and heat sufficiently to induce cytotoxicity, or to demonstrate whether functionalization significantly influences such processes. It was verified by UV-vis spectrophotometry with ABMA and DCPIP that AuNP and AuNPf in solution induced ROS formation. They also prod

In [63]:
df_nanotox['Article Title'][23]


'Characterization of the absorption properties of 5 nm spherical gold nanoparticles functionalized with dodecanothiol and without functionalization with potential therapeutic applications'

Agora, vamos carregar a NLP que existe dentro do spaCy: en_core_web_sm

In [64]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ------------------ --------------------- 5.8/12.8 MB 32.0 MB/s eta 0:00:01
     ----------------------------------- --- 11.5/12.8 MB 30.1 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 29.2 MB/s eta 0:00:01
     ---------------------------

In [65]:
nlp = spacy.load('en_core_web_sm')

#### Tokenização usando spaCy

In [66]:
corpus = []

i = 0
abs = 0
for abstract in abstracts:
    print(abs)
    abs += 1
    doc = nlp(abstract)
    doc_tokenizado = [t.text for t in doc]
    corpus.append(doc_tokenizado)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [67]:
corpus

[['The',
  'contamination',
  'of',
  'coastal',
  'marine',
  'environments',
  'by',
  'plastics',
  'of',
  'sizes',
  'ranging',
  'from',
  'mm',
  'down',
  'to',
  'the',
  'nanoscale',
  '(',
  'nm',
  ')',
  'could',
  'pose',
  'a',
  'threat',
  'to',
  'aquatic',
  'organisms',
  '.',
  'The',
  'purpose',
  'of',
  'this',
  'study',
  'was',
  'to',
  'examine',
  'the',
  'toxicity',
  'of',
  'poly',
  '-',
  'styrene',
  'nanoparticles',
  '(',
  'PsNP',
  ')',
  'of',
  'various',
  'sizes',
  '(',
  '50',
  ',',
  '100',
  'and',
  '1000',
  'nm',
  ')',
  'to',
  'the',
  'marine',
  'clams',
  'Mya',
  'arenaria',
  '.',
  'Clams',
  'were',
  'exposed',
  'to',
  'concentrations',
  'of',
  'PsPP',
  'for',
  '7',
  'days',
  'at',
  '15',
  'degrees',
  'C',
  'and',
  'analyzed',
  'for',
  'uptake',
  '/',
  'transformation',
  ',',
  'changes',
  'in',
  'energy',
  'metabolism',
  ',',
  'oxidative',
  'stress',
  ',',
  'genotoxicity',
  'and',
  'circadian'

### Retirando stop words do nosso corpus

In [68]:
stop_words = nlp.Defaults.stop_words
print(stop_words)

{'sixty', 'or', 'upon', 'therefore', 'some', "'re", 'may', 'hers', 'as', 'sometimes', 'whence', 'were', 'three', 'made', 'unless', 'below', 'already', 'through', 'hereafter', 'fifty', 'keep', 'such', 'once', 'about', 'now', 'another', 'becoming', 're', 'whither', '’s', 'again', 'by', 'hereby', 'do', 'although', 'put', 'your', 'behind', 'around', 'five', 'please', 'wherein', 'just', 'so', 'have', 'seems', 'yours', 'our', 'give', 'further', 'serious', 'is', "'m", 'whereafter', 'those', 'seemed', 'becomes', 'namely', 'it', 'ourselves', 'across', 'above', 'formerly', 'am', 'full', 'during', 'mostly', 'n‘t', '‘m', '‘re', 'we', 'was', 'very', 'beside', 'must', 'towards', 'meanwhile', 'too', 'this', "'s", 'elsewhere', 'beforehand', 'ca', 'almost', 'doing', 'that', 'everywhere', 'whether', '’re', 'somewhere', 'something', 'hereupon', 'seeming', 'are', 'from', 'yet', 'us', 'someone', "'d", 'per', 'afterwards', 'besides', 'neither', 'without', 'more', 'did', 'before', 'show', 'take', 'but', '’ll

In [69]:
corpus_without_stop_words = []

for tokens in corpus:
    tokens_without_stop_words = [word for word in tokens if word not in stop_words]
    string_unica = " ".join(tokens_without_stop_words)
    tokens_without_stop_words = [string_unica]
    corpus_without_stop_words.append(tokens_without_stop_words)

In [70]:
corpus_without_stop_words

[['The contamination coastal marine environments plastics sizes ranging mm nanoscale ( nm ) pose threat aquatic organisms . The purpose study examine toxicity poly - styrene nanoparticles ( PsNP ) sizes ( 50 , 100 1000 nm ) marine clams Mya arenaria . Clams exposed concentrations PsPP 7 days 15 degrees C analyzed uptake / transformation , changes energy metabolism , oxidative stress , genotoxicity circadian neural activity . The results revealed PsNP accumulated digestive gland 50 nm > 100 nm > 1000 nm . All sized increased oxidative stress follows : 50 nm ( peroxidase , antioxidant potential LPO ) , 100 nm ( LPO antioxidant potential ) 1000 nm ( LPO ) . Tissue damage size dependent increasing genotoxicity . The 100 nm PsPP altered levels circadian metabolite melatonin . We conclude toxicity plastics size dependent clams .'],
 ['The present aimed characterize toxicity silica nanoparticles Sprague Dawley rats determine dose levels repeated - dose toxicity study . Silica nanoparticles ( 

### Vetorização

In [71]:
vectorizer = CountVectorizer()

In [72]:
corpus = pd.DataFrame(np.array(corpus_without_stop_words).reshape(3018,1))
corpus_nanotox = corpus[0]
corpus_nanotox


0       The contamination coastal marine environments ...
1       The present aimed characterize toxicity silica...
2       Microplastics , small pieces plastic derived p...
3       Ingestion transdermal delivery common routes n...
4       Various analytical methods employed assess nan...
                              ...                        
3013    Human health increasingly affected chronic inf...
3014    IntroductionDue increasing resistance bacteria...
3015    Background Nano - sized drug delivery system w...
3016    Background objectivesThe administration 5 - FU...
3017    Significance Optical tweezers revolutionized f...
Name: 0, Length: 3018, dtype: object

In [73]:
bow = vectorizer.fit_transform(corpus_nanotox)

In [74]:
# df_teste = pd.DataFrame({"Tokens": corpus_without_stop_words})

In [75]:
# bow = vectorizer.fit_transform(df_teste['Tokens'][0])


In [76]:
# View features (tokens).
print(len(vectorizer.get_feature_names_out()))
print(vectorizer.get_feature_names_out())


# View vocabulary dictionary.
vectorizer.vocabulary_

24570
['00' '000' '0001' ... 'zwitterionic' 'zymomonas' 'zymosan']


{'the': 22474,
 'contamination': 6177,
 'coastal': 5639,
 'marine': 13904,
 'environments': 8719,
 'plastics': 17970,
 'sizes': 20981,
 'ranging': 19292,
 'mm': 14674,
 'nanoscale': 15546,
 'nm': 15981,
 'pose': 18306,
 'threat': 22648,
 'aquatic': 2927,
 'organisms': 16609,
 'purpose': 19012,
 'study': 21702,
 'examine': 9011,
 'toxicity': 22908,
 'poly': 18135,
 'styrene': 21709,
 'nanoparticles': 15485,
 'psnp': 18899,
 '50': 1030,
 '100': 159,
 '1000': 160,
 'clams': 5454,
 'mya': 15185,
 'arenaria': 2966,
 'exposed': 9180,
 'concentrations': 5951,
 'pspp': 18908,
 'days': 6905,
 '15': 317,
 'degrees': 7078,
 'analyzed': 2565,
 'uptake': 23690,
 'transformation': 23020,
 'changes': 5116,
 'energy': 8623,
 'metabolism': 14224,
 'oxidative': 16820,
 'stress': 21653,
 'genotoxicity': 10277,
 'circadian': 5405,
 'neural': 15805,
 'activity': 1804,
 'results': 19862,
 'revealed': 19929,
 'accumulated': 1676,
 'digestive': 7532,
 'gland': 10364,
 'all': 2295,
 'sized': 20979,
 'increased

## Similaridade por Cosseno

In [77]:
# The cosine method expects array_like inputs, so we need to generate
# arrays from our sparse matrix.
doc1_vs_doc2 = 1 - spatial.distance.cosine(bow[0].toarray()[0], bow[1].toarray()[0])
doc1_vs_doc3 = 1 - spatial.distance.cosine(bow[0].toarray()[0], bow[2].toarray()[0])
doc1_vs_doc4 = 1 - spatial.distance.cosine(bow[0].toarray()[0], bow[3].toarray()[0])

print(corpus_nanotox)

print(f"Doc 1 vs Doc 2: {doc1_vs_doc2}")
print(f"Doc 1 vs Doc 3: {doc1_vs_doc3}")
print(f"Doc 1 vs Doc 4: {doc1_vs_doc4}")

0       The contamination coastal marine environments ...
1       The present aimed characterize toxicity silica...
2       Microplastics , small pieces plastic derived p...
3       Ingestion transdermal delivery common routes n...
4       Various analytical methods employed assess nan...
                              ...                        
3013    Human health increasingly affected chronic inf...
3014    IntroductionDue increasing resistance bacteria...
3015    Background Nano - sized drug delivery system w...
3016    Background objectivesThe administration 5 - FU...
3017    Significance Optical tweezers revolutionized f...
Name: 0, Length: 3018, dtype: object
Doc 1 vs Doc 2: 0.2993257386100998
Doc 1 vs Doc 3: 0.3390980453430015
Doc 1 vs Doc 4: 0.27982357298344906


In [78]:
'''Verificando a quantidade de Abstracts com IC50'''
contagem = []

for abstract in corpus:
    for token in abstract:
        if token == "IC50":
            contagem.append("Possui")
        else:
            continue
print(len(contagem))


TypeError: 'int' object is not iterable

## Utilizando o REGEX 

Expressões regulares para encontrar padrões em textos

In [2]:
# !pip install regex
import regex as re
import pandas as pd

## Executando apenas para os dados de validação

In [3]:
dados_validacao = pd.read_excel('Dados_validação_nlp.xlsx')
corpus_validacao = dados_validacao['Abstract']
corpus_validacao

0     Collagen nanoparticles (collagen-NPs) are prom...
1     Collagen nanoparticles (collagen-NPs) are prom...
2     Collagen nanoparticles (collagen-NPs) are prom...
3     Introduction Gold nanoparticles (Au-NPs) hold ...
4     The present study aimed to environmentally fri...
5     Doxorubicin hydrochloride (DOX) is an anthracy...
6     This research successfully demonstrated the gr...
7     This research successfully demonstrated the gr...
8     In the present study, lead oxide nanoparticles...
9     Dictyota ciliolata is a brown alga rich in bio...
10    Lung cancer is the second most common cancer d...
11    Objective: Cobalt nanoparticles (NPs) when rel...
12    Objective: Cobalt nanoparticles (NPs) when rel...
13    The limitations of both inorganic and organic ...
14    The limitations of both inorganic and organic ...
15    Silver nanoparticles (AgNPs) exhibit concentra...
16    Photothermal therapy (PTT) utilizes near-infra...
17    Long-term antibiotic treatment results in 

In [4]:
sentence = corpus_validacao[0]
sentence

"Collagen nanoparticles (collagen-NPs) are promising biopolymeric nanoparticles due to their superior biodegradability and biocompatibility. The low immunogenicity and non-toxicity of collagen-NPs makes it preferable for a wide range of applications. A total of eight morphologically distinct actinomycetes strains were newly isolated from various soil samples in Egypt. The cell-free supernatants of these strains were tested for their ability. These strains' cell-free supernatants were tested for their ability to synthesize collagen-NPs. Five isolates had the ability to biosynthesize collagen-NPs. Among these, a potential culture, Streptomyces sp. NEAA-1, was chosen and identified as Streptomyces xinghaiensis NEAA-1 based on 16S rRNA sequence analysis as well as morphological, cultural and physiological properties. The sequence data has been deposited at the GenBank database under the accession No. OQ652077.1. Face-centered central composite design (FCCD) has been conducted to maximize c

In [5]:
property = re.findall("\d+\snm", sentence) 
print(property)
print(property[0])

['59 nm', '5 nm']
59 nm


In [6]:
nlp = spacy.load('en_core_web_sm')

In [7]:

dados_para_df_val = []

for abstract in corpus_validacao:
    doc = nlp(str(abstract))

    all_sizes, all_compositions, all_cells, all_toxics, all_sentences = [], [], [], [], []

    for sentence in doc.sents:
        sent_text = str(sentence)

        size = re.findall(r"\b\d+(?:\.\d+)?(?:\s*[-–]\s*\d+(?:\.\d+)?)?(?:\s*(?:\+/-|±)\s*\d+(?:\.\d+)?)?\s*(?:nm|µm|μm)\b", sent_text)
        compositions = re.findall(r"\b(?:[A-Z][a-z]?\d*)+(?:-(?:[A-Z][a-z]?\d*)+)*|[A-Z]{2,10}\d*\s*(?:NPs?|NP|nanoparticles?)\b", sent_text)
        cell = re.findall(r"\b(?:[A-Z]{2,5}\d{0,3}(?:-[0-9A-Z]+)?)\b|\b(?:[A-Z][a-z0-9-]+(?:\s+[a-z0-9-]+)*\s*(?:cells?|cell lines?))\b", sent_text)
        toxic = re.findall(r"\b\d+(?:\.\d+)?(?:\s*(?:\+/-|±)\s*\d+(?:\.\d+)?)?\s*(?:mu|µ|ug|mg)\s*[gL]?(?:/mL|\s*mL\(-1\)|/ml)?\b", sent_text)

        # acumula os resultados
        all_sizes.extend(size)
        all_compositions.extend(compositions)
        all_cells.extend(cell)
        all_toxics.extend(toxic)
        all_sentences.append(sent_text)

    # depois de percorrer todas as sentenças do abstract, salva uma linha só
    dados_para_df_val.append({
        "Composição": all_compositions,
        "Toxicidade": all_toxics,
        "Tamanho": all_sizes,
        "Tipo celular": all_cells,
        "Sentenças": all_sentences
    })


In [9]:
result_regex_val = pd.DataFrame(dados_para_df_val)

In [14]:
result_regex_val.to_excel("regex_nanotox_validacao.xlsx", index=False)

## Executando para todos os abstracts

In [10]:
df_nanotox = pd.read_csv('nanotox_completo.xls')
corpus_nanotox = df_nanotox['Abstract']
corpus_nanotox

0       The contamination of coastal marine environmen...
1       The present aimed to characterize the toxicity...
2       Microplastics, small pieces of plastic derived...
3       Ingestion and transdermal delivery are two com...
4       Various analytical methods have been employed ...
                              ...                        
3013    Human health is increasingly affected by chron...
3014    IntroductionDue to the increasing resistance o...
3015    Background Nano-sized drug delivery system has...
3016    Background and objectivesThe administration of...
3017    Significance Optical tweezers have revolutioni...
Name: Abstract, Length: 3018, dtype: object

Testando para vários abstracts

In [11]:
nlp = spacy.load('en_core_web_sm')

In [12]:

dados_para_df_comp = []

for abstract in df_nanotox['Abstract']:
    doc = nlp(str(abstract))

    all_sizes, all_compositions, all_cells, all_toxics, all_sentences = [], [], [], [], []

    for sentence in doc.sents:
        sent_text = str(sentence)

        size = re.findall(r"\b\d+(?:\.\d+)?(?:\s*[-–]\s*\d+(?:\.\d+)?)?(?:\s*(?:\+/-|±)\s*\d+(?:\.\d+)?)?\s*(?:nm|µm|μm)\b", sent_text)
        compositions = re.findall(r"\b(?:[A-Z][a-z]?\d*)+(?:-(?:[A-Z][a-z]?\d*)+)*|[A-Z]{2,10}\d*\s*(?:NPs?|NP|nanoparticles?)\b", sent_text)
        cell = re.findall(r"\b(?:[A-Z]{2,5}\d{0,3}(?:-[0-9A-Z]+)?)\b|\b(?:[A-Z][a-z0-9-]+(?:\s+[a-z0-9-]+)*\s*(?:cells?|cell lines?))\b", sent_text)
        toxic = re.findall(r"\b\d+(?:\.\d+)?(?:\s*(?:\+/-|±)\s*\d+(?:\.\d+)?)?\s*(?:mu|µ|ug|mg)\s*[gL]?(?:/mL|\s*mL\(-1\)|/ml)?\b", sent_text)

        # acumula os resultados
        all_sizes.extend(size)
        all_compositions.extend(compositions)
        all_cells.extend(cell)
        all_toxics.extend(toxic)
        all_sentences.append(sent_text)

    # depois de percorrer todas as sentenças do abstract, salva uma linha só
    dados_para_df_comp.append({
        "Composição": all_compositions,
        "Toxicidade": all_toxics,
        "Tamanho": all_sizes,
        "Tipo celular": all_cells,
        "Sentenças": all_sentences
    })


In [13]:
result_regex_completo = pd.DataFrame(dados_para_df_comp)

result_regex_completo

Unnamed: 0,Composição,Toxicidade,Tamanho,Tipo celular,Sentenças
0,"[Th, Th, PsNP, My, Cl, PsPP, C, Th, PsNP, Al, ...",[],"[1000 nm, 50 nm, 100 nm, 1000 nm, 50 nm, 100 n...","[LPO, LPO, LPO]",[The contamination of coastal marine environme...
1,"[Th, Sp, Da, Si, SiO2, SiO2, Ea, We, Ra, SiO2,...","[200 mu g/mL, 400 mu g/mL, 200 mu g/mL, 400 mu...","[20 nm, 50 nm, 20 nm, 50 nm, 50 nm, 50 nm, 20 ...",[],[The present aimed to characterize the toxicit...
2,"[Mi, In, PS-NPs, PS-NPs, PS-NPs, Be, EEG, PS-N...",[],"[50 nm, 100 nm, 50 nm, 50 nm, 100 nm, 50 nm, 1...","[PS, PS, PS, EEG, PS, EEG, PS, PS, PS, BBB, PS...","[Microplastics, small pieces of plastic derive..."
3,"[In, NP, In, AuNPs, PEG, AuNPs, PEG, AuNPs, PE...",[],"[14 nm, 20 nm, 14 nm, 14 nm, 14 nm, 14 nm, 14 ...","[NP, PEG, PEG, PEG, SAM, Cytotoxicity assessed...",[Ingestion and transdermal delivery are two co...
4,"[Va, NP, In, SP-ICP-MS, AgNP, U937, Fo, AgNPs,...",[],"[40 nm, 70 nm, 40 nm, 70 nm, 40 nm, 70 nm, 40 ...","[NP, SP-ICP, MS, U937 cells, After cells, Ag c...",[Various analytical methods have been employed...
...,...,...,...,...,...
3013,"[Hu, Tr, NSAIDs, COX, Wh, COX, NSAIDs, Mo, Pa,...","[0.02 mg/mL, 0.07 mg/mL]",[250-300 nm],"[COX-1, COX-2, COX-2, PDA, PDA, PDA, PDA, COX-...",[Human health is increasingly affected by chro...
3014,"[In, In, AgNPs, Me, Fu, JTW1, Tr, El, Mi, TEM,...","[512 mu g, 0.125 mu g, 512 mu g, 0.125 mu g, 0...",[15.56 +/- 9.22 nm],"[JTW1, TEM, XRD, FTIR, NTA, DLS, MIC, MBC, FIC...",[IntroductionDue to the increasing resistance ...
3015,"[Ba, Na, Th, Me, To, MSC, As, LNPs, Su, MSC, F...","[1 mu L, 1 mu L, 1 mu L, 6.6 mu , 230.7 mu ]",[100-120 nm],"[MSC, MSC, SMAC-P, FRRG-DOX, SMAC-P, FRRG-DOX,...",[Background Nano-sized drug delivery system ha...
3016,"[Ba, FU, On, FU, In, Fe3O4, PLA-HA, Th, HCT116...",[],[235 nm],"[FU, FU, PLA-HA, HCT116, PLA, PLA-HA, FU, NMR,...",[Background and objectivesThe administration o...


In [102]:
result_regex_completo.to_excel("regex_nanotox_completo.xlsx", index=False)