Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOKENIZATION] #11 #35

Closed
skripov-ds-ai opened this issue Jan 14, 2020 · 0 comments
Closed

[TOKENIZATION] #11 #35

skripov-ds-ai opened this issue Jan 14, 2020 · 0 comments
Assignees
Labels
tokenization Suspected tokenization error

Comments

@skripov-ds-ai
Copy link

skripov-ds-ai commented Jan 14, 2020

Update: examples were from old data. Nowadays it is from current repository data

Filepath

train/t6_sociology_1_101.deft

Content

Madame	 data/source_txt/t6_sociology_mkaplan_101.txt	 15507	 15513	 O	 -1	 -1	 0
Jeanne	 data/source_txt/t6_sociology_mkaplan_101.txt	 15514	 15520	 O	 -1	 -1	 0
Calment	 data/source_txt/t6_sociology_mkaplan_101.txt	 15521	 15528	 O	 -1	 -1	 0
of	 data/source_txt/t6_sociology_mkaplan_101.txt	 15529	 15531	 O	 -1	 -1	 0
France	 data/source_txt/t6_sociology_mkaplan_101.txt	 15532	 15538	 O	 -1	 -1	 0
was	 data/source_txt/t6_sociology_mkaplan_101.txt	 15539	 15542	 O	 -1	 -1	 0
the	 data/source_txt/t6_sociology_mkaplan_101.txt	 15543	 15546	 O	 -1	 -1	 0
world	 data/source_txt/t6_sociology_mkaplan_101.txt	 15547	 15552	 O	 -1	 -1	 0
's	 data/source_txt/t6_sociology_mkaplan_101.txt	 15552	 15554	 O	 -1	 -1	 0
oldest	 data/source_txt/t6_sociology_mkaplan_101.txt	 15555	 15561	 O	 -1	 -1	 0
living	 data/source_txt/t6_sociology_mkaplan_101.txt	 15563	 15569	 O	 -1	 -1	 0
person	 data/source_txt/t6_sociology_mkaplan_101.txt	 15570	 15576	 O	 -1	 -1	 0
until	 data/source_txt/t6_sociology_mkaplan_101.txt	 15577	 15582	 O	 -1	 -1	 0
she	 data/source_txt/t6_sociology_mkaplan_101.txt	 15583	 15586	 O	 -1	 -1	 0
died	 data/source_txt/t6_sociology_mkaplan_101.txt	 15587	 15591	 O	 -1	 -1	 0
at	 data/source_txt/t6_sociology_mkaplan_101.txt	 15592	 15594	 O	 -1	 -1	 0
122	 data/source_txt/t6_sociology_mkaplan_101.txt	 15595	 15598	 O	 -1	 -1	 0
years	 data/source_txt/t6_sociology_mkaplan_101.txt	 15599	 15604	 O	 -1	 -1	 0
old	 data/source_txt/t6_sociology_mkaplan_101.txt	 15605	 15608	 O	 -1	 -1	 0
;	 data/source_txt/t6_sociology_mkaplan_101.txt	 15608	 15609	 O	 -1	 -1	 0
there	 data/source_txt/t6_sociology_mkaplan_101.txt	 15610	 15615	 O	 -1	 -1	 0
are	 data/source_txt/t6_sociology_mkaplan_101.txt	 15616	 15619	 O	 -1	 -1	 0
currently	 data/source_txt/t6_sociology_mkaplan_101.txt	 15620	 15629	 O	 -1	 -1	 0
six	 data/source_txt/t6_sociology_mkaplan_101.txt	 15630	 15633	 O	 -1	 -1	 0
women	 data/source_txt/t6_sociology_mkaplan_101.txt	 15634	 15639	 O	 -1	 -1	 0
in	 data/source_txt/t6_sociology_mkaplan_101.txt	 15640	 15642	 O	 -1	 -1	 0
the	 data/source_txt/t6_sociology_mkaplan_101.txt	 15643	 15646	 O	 -1	 -1	 0
world	 data/source_txt/t6_sociology_mkaplan_101.txt	 15647	 15652	 O	 -1	 -1	 0
whose	 data/source_txt/t6_sociology_mkaplan_101.txt	 15653	 15658	 O	 -1	 -1	 0
ages	 data/source_txt/t6_sociology_mkaplan_101.txt	 15659	 15663	 O	 -1	 -1	 0
are	 data/source_txt/t6_sociology_mkaplan_101.txt	 15664	 15667	 O	 -1	 -1	 0
well	 data/source_txt/t6_sociology_mkaplan_101.txt	 15668	 15672	 O	 -1	 -1	 0
documented	 data/source_txt/t6_sociology_mkaplan_101.txt	 15673	 15683	 O	 -1	 -1	 0
as	 data/source_txt/t6_sociology_mkaplan_101.txt	 15684	 15686	 O	 -1	 -1	 0
115	 data/source_txt/t6_sociology_mkaplan_101.txt	 15687	 15690	 O	 -1	 -1	 0
years	 data/source_txt/t6_sociology_mkaplan_101.txt	 15691	 15696	 O	 -1	 -1	 0
or	 data/source_txt/t6_sociology_mkaplan_101.txt	 15697	 15699	 O	 -1	 -1	 0
older	 data/source_txt/t6_sociology_mkaplan_101.txt	 15700	 15705	 O	 -1	 -1	 0
(	 data/source_txt/t6_sociology_mkaplan_101.txt	 15706	 15707	 O	 -1	 -1	 0
Diebel	 data/source_txt/t6_sociology_mkaplan_101.txt	 15707	 15713	 O	 -1	 -1	 0
2014)	 data/source_txt/t6_sociology_mkaplan_101.txt	 15714	 15719	 O	 -1	 -1	 0
.	 data/source_txt/t6_sociology_mkaplan_101.txt	 15719	 15720	 O	 -1	 -1	 0

Lines 2489-2530. Error in line 2529.

@skripov-ds-ai skripov-ds-ai added the tokenization Suspected tokenization error label Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tokenization Suspected tokenization error
Projects
None yet
Development

No branches or pull requests

2 participants