Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar doesn't works in English in the windows version #119

Open
ebarbot opened this issue Sep 17, 2021 · 6 comments
Open

Grammar doesn't works in English in the windows version #119

ebarbot opened this issue Sep 17, 2021 · 6 comments
Assignees
Labels
bug Windows Windows related issues

Comments

@ebarbot
Copy link

ebarbot commented Sep 17, 2021

(everything is a noun)

test_joseph_I

@kleag
Copy link
Contributor

kleag commented Sep 17, 2021

You're right. It is completely wrong. But there is not enough information to debug. Could you try the command line version (analyzeText), please and paste (or attach) here both the text and all the console output?

@kleag
Copy link
Contributor

kleag commented Sep 17, 2021

As you can see, the results are quite better under Linux:

# sent_id = 1
# text = February 23 - A revolt against the government of King Joseph I of Portugal takes place in the city of Oporto.
1	February	February	PROPN	_	NUMBER=SING	_	_	_	NE=DateTime.DATE|Pos=1|Len=8
2	23	23	NUM	_	_	_	_	_	NE=DateTime.DATE|Pos=10|Len=2
3	-	-	COLON	_	_	3	Dummy	_	Pos=13|Len=1
4	A	a	DET	_	_	4	det	_	Pos=15|Len=1
5	revolt	revolt	NOUN	_	NUMBER=SING	13	SUJ_V	_	Pos=17|Len=6
6	against	against	ADP	_	_	7	PREPSUB	_	Pos=24|Len=7
7	the	the	DET	_	_	7	det	_	Pos=32|Len=3
8	government	government	NOUN	_	NUMBER=SING	4	COMPDUNOM	_	Pos=36|Len=10
9	of	of	ADP	_	_	10	PREPSUB	_	Pos=47|Len=2
10	King	king	NOUN	_	NUMBER=SING	10	ADJPRENSUB	_	Pos=50|Len=4
11	Joseph	Joseph	PROPN	_	NUMBER=SING	_	_	_	NE=Person.PERSON|Pos=55|Len=6
12	I	I	PRON	_	_	_	_	_	NE=Person.PERSON|Pos=62|Len=1
13-14	joseph	_	_	_	_	_	_	_	_
13	of	of	ADP	_	_	12	PREPSUB	_	Pos=64|Len=2
14	Portugal	Portugal	PROPN	_	NUMBER=SING	_	_	_	NE=Location.LOCATION|Pos=67|Len=8
15	takes	take	VERB	_	_	0	_	_	Pos=76|Len=5
16	place	place	NOUN	_	NUMBER=SING	13	COD_V	_	Pos=82|Len=5
17	in	in	ADP	_	_	17	PREPSUB	_	Pos=88|Len=2
18	the	the	DET	_	_	17	det	_	Pos=91|Len=3
19	city	city	NOUN	_	NUMBER=SING	14	COMPDUNOM	_	Pos=95|Len=4
20	of	of	ADP	_	_	19	PREPSUB	_	Pos=100|Len=2
21	Oporto	Oporto	PROPN	_	NUMBER=SING	_	_	_	NE=Location.LOCATION|Pos=103|Len=6
22	.	.	SENT	_	_	0	_	_	Pos=109|Len=1

We need more information to understand what happens under Windows.

@ebarbot
Copy link
Author

ebarbot commented Sep 17, 2021

I get this, I don't know if I am supposed to set something to print more logs ?

test_josephI_output

H:\test_lima_windows>analyzeText -l eng joseph_I.txt
Analyzing 1/1 (100.00%) 'joseph_I.txt'# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = 1
# text = February 23 - A revolt against the government of King Joseph I of Portugal takes place in the city of Oporto.
1 February February PROPN _ NUMBER=SING _ _ _ NE=DateTime.DATE|Pos=1|Len=8
2 23 23 NUM _ _ _ _ _ NE=DateTime.DATE|Pos=10|Len=2
3 - - COMMA _ _ 3 Dummy _ Pos=13|Len=1
4 A A PROPN _ NUMBER=SING 4 ADJPRENSUB _ Pos=15|Len=1
5 revolt revolt NOUN _ NUMBER=SING 5 ADJPRENSUB _ Pos=17|Len=6
6 against against NOUN _ NUMBER=SING 6 ADJPRENSUB _ Pos=24|Len=7
7 the the NOUN _ NUMBER=SING 7 ADJPRENSUB _ Pos=32|Len=3
8 government government NOUN _ NUMBER=SING 8 ADJPRENSUB _ Pos=36|Len=10
9 of of NOUN _ NUMBER=SING 10 ADJPRENSUB _ Pos=47|Len=2
10 King King PROPN _ NUMBER=SING 10 SUBSUBJUX _ Pos=50|Len=4
11 Joseph Joseph PROPN _ NUMBER=SING _ _ _ NE=Person.PERSON|Pos=55|Len=6
12 I i NUM _ NUMBER=SING _ _ _ NE=Person.PERSON|Pos=62|Len=1
13 of of NOUN _ NUMBER=SING 12 ADJPRENSUB _ Pos=64|Len=2
14 Portugal Portugal PROPN _ NUMBER=SING _ _ _ NE=Location.LOCATION|Pos=67|Len=8
15 takes takes NOUN _ NUMBER=SING 14 ADJPRENSUB _ Pos=76|Len=5
16 place place NOUN _ NUMBER=SING 15 ADJPRENSUB _ Pos=82|Len=5
17 in in NOUN _ NUMBER=SING 16 ADJPRENSUB _ Pos=88|Len=2
18 the the NOUN _ NUMBER=SING 17 ADJPRENSUB _ Pos=91|Len=3
19 city city NOUN _ NUMBER=SING 18 ADJPRENSUB _ Pos=95|Len=4
20 of of NOUN _ NUMBER=SING 19 ADJPRENSUB _ Pos=100|Len=2
21 Oporto Oporto PROPN _ NUMBER=SING _ _ _ NE=Location.LOCATION|Pos=103|Len=6
22 . . SENT _ _ 0 _ _ Pos=109|Len=1

@kleag kleag self-assigned this Sep 17, 2021
@kleag kleag added bug Windows Windows related issues labels Sep 17, 2021
@kleag
Copy link
Contributor

kleag commented Sep 17, 2021

@victorbocharov , you are the last developer having ensured a successful Windows build. Have you noticed problems like that ?

@victorbocharov
Copy link
Contributor

No, I haven't. Moreover, I don't have Windows computers, so I won't be able to reproduce this. I can only suggest a few guesses:

  • PoS tags are given according to some tokenization rules: starts from capital => PROPN, digits => NUM, ...
  • lemmatization doesn't work (takes -> takes)
  • NER works

Looks like English dictionary isn't used or it is empty. @kleag : How to check this?
@ebarbot : Is the pipeline "main" unchanged?
@ebarbot : How old is the version of LIMA?

@ebarbot
Copy link
Author

ebarbot commented Sep 20, 2021

I downloaded the 3.0.0.20210912222206-0c3404de version, and if I explicitely write analyzeText -l eng -p main joseph_I.txt I get the same result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Windows Windows related issues
Projects
None yet
Development

No branches or pull requests

3 participants