Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate MyStem (Russian stemmer) #1308

Closed
Horsmann opened this issue Nov 8, 2018 · 4 comments
Closed

Integrate MyStem (Russian stemmer) #1308

Horsmann opened this issue Nov 8, 2018 · 4 comments
Assignees
Labels
Milestone

Comments

@Horsmann
Copy link
Member

Horsmann commented Nov 8, 2018

Integration of a Russian stemmer https://tech.yandex.ru/mystem/
Closed-source, distributed as pre-compiled fat binaries that seem to include the model.

Non-profit/research use is permitted, commercial usage has constrained. Website is Russian only~

@Horsmann Horsmann assigned Horsmann and unassigned Horsmann Nov 8, 2018
@Horsmann Horsmann closed this as completed Nov 8, 2018
@Horsmann
Copy link
Member Author

Horsmann commented Nov 9, 2018

@reckart The stemmer is essentially completed but I will definitely not replace tabs with whitespaces. Can this checkstyle stuff be turned off?

@reckart
Copy link
Member

reckart commented Nov 9, 2018

The style guidelines we have since ages define spaces instead of tabs. Checkstyle just enforces that for a better experience. Tabs have the problem that different viewers use different tab widths (2, 4, 8, whatever) which makes the code look very differently in different viewers. Spaces do not have this problem. Please install the DKPro Core Style file for Eclipse or configure whatever IDE you are using correspondingly.

@reckart
Copy link
Member

reckart commented Nov 9, 2018

Mind that the style file does not take XML files into account - but they should also be formatted with spaces and use 2 space characters for indentation.

@Horsmann Horsmann reopened this Nov 9, 2018
@Horsmann
Copy link
Member Author

Horsmann commented Nov 9, 2018

ok, thx.
This tool is only available for 64bit. They dropped support for 32bit systems in the last iteration. I only added the 64bit binaries.

@reckart reckart added this to the 1.11.0 milestone Jan 12, 2019
@reckart reckart added the ⭐️ Enhancement New feature or request label Jan 12, 2019
@reckart reckart modified the milestones: 1.11.0, 1.12.0 Feb 12, 2019
Horsmann added a commit to Horsmann/dkpro-core that referenced this issue Mar 4, 2019
Horsmann added a commit that referenced this issue Mar 4, 2019
@reckart reckart modified the milestones: 1.12.0, 1.11.0 Mar 4, 2019
reckart added a commit that referenced this issue Mar 4, 2019
@reckart reckart closed this as completed Mar 4, 2019
reckart added a commit that referenced this issue Mar 19, 2019
* master:
  #1322 - Upgrade to OpenNLP 1.9.1
  #1308 - integrate mystem
  #1327 - Update LIF support
  #1327 - Update LIF support
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1327 - Update LIF support
  #1323 - File extension generated by BinaryCasWriter does not contain dot
  #858 - Out-of-tagset tags should map to the generic type
  #1239 - Rename NYTCollectionReader to NitfReader
  #858 - Out-of-tagset tags should map to the generic type
  #1317 - Standard parameter to disable type mapping
  No issue. If a DKProTextContext is available, then TestRunner generates an XMI file from the processed data and stores it in the test output folder.
  No issue - Log names of files with license issues to the console.
  #1160 - Better support for CoNLL-U v2 (1.11.0)

% Conflicts:
%	dkpro-core-asl/pom.xml
reckart added a commit to tilmanbeck/dkpro-core that referenced this issue Apr 19, 2019
* master:
  dkpro#1325 - Avoid datasets being extracted outside their target directory
  dkpro#1325 - Avoid datasets being extracted outside their target directory
  dkpro#1325 - Avoid datasets being extracted outside their target directory
  dkpro#1338 - Factor CAS <-> brat conversion code into Pojos
  dkpro#1338 - Factor CAS <-> brat conversion code into Pojos
  dkpro#1322 - Upgrade to OpenNLP 1.9.1
  dkpro#1308 - integrate mystem
  dkpro#1327 - Update LIF support
  dkpro#1327 - Update LIF support
  dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV
  dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV
  dkpro#1329 - Span annotations with slot features may disappear from WebAnno TSV
  dkpro#1327 - Update LIF support
  dkpro#1325 - Avoid datasets being extracted outside their target directory
  dkpro#1325 - Avoid datasets being extracted outside their target directory
  dkpro#1323 - File extension generated by BinaryCasWriter does not contain dot
  dkpro#858 - Out-of-tagset tags should map to the generic type
  dkpro#858 - Out-of-tagset tags should map to the generic type
reckart added a commit that referenced this issue Apr 26, 2019
* master: (21 commits)
  #1305 - Update TreeTagger models in build.xml
  #1325 - Avoid datasets being extracted outside their target directory
  #1325 - Avoid datasets being extracted outside their target directory
  #1325 - Avoid datasets being extracted outside their target directory
  #1338 - Factor CAS <-> brat conversion code into Pojos
  #1338 - Factor CAS <-> brat conversion code into Pojos
  #1322 - Upgrade to OpenNLP 1.9.1
  #1308 - integrate mystem
  #1327 - Update LIF support
  #1327 - Update LIF support
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1329 - Span annotations with slot features may disappear from WebAnno TSV
  #1327 - Update LIF support
  #1325 - Avoid datasets being extracted outside their target directory
  #1325 - Avoid datasets being extracted outside their target directory
  #1323 - File extension generated by BinaryCasWriter does not contain dot
  #858 - Out-of-tagset tags should map to the generic type
  #1239 - Rename NYTCollectionReader to NitfReader
  #858 - Out-of-tagset tags should map to the generic type
  ...

% Conflicts:
%	dkpro-core-asl/pom.xml
%	dkpro-core-io-lif-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/lif/LifReaderWriterTest.java
%	dkpro-core-io-lif-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/lif/LifWriterTest.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants