latin characters in hashtags breaks the entities extraction #16

Closed
sagar opened this Issue Feb 1, 2013 · 0 comments

Comments

Projects
None yet
2 participants

sagar commented Feb 1, 2013

text = "El caso #Bárcenas en el New York Times http://goo.gl/e5Mio #MarcaEspaña"
extractor = Extractor(text)
hts = extractor.extract_hashtags_with_indices(False)
print hts
[{'indices': (8, 10), 'hashtag': u'B'}, {'indices': (60, 70), 'hashtag': u'MarcaEspa'}]

dryan referenced this issue May 16, 2013

Merged

Upgrade to 2.0 #17

6 of 6 tasks complete

dryan closed this May 16, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment