Skip to content
This repository has been archived by the owner on Jun 14, 2020. It is now read-only.

Invalid Answers #36

Open
ghost opened this issue May 18, 2019 · 14 comments
Open

Invalid Answers #36

ghost opened this issue May 18, 2019 · 14 comments
Assignees
Labels
contributors welcome Contributor Friendly docs Documentation

Comments

@ghost
Copy link

ghost commented May 18, 2019

When asking: "Who was the first president of the United States?"

The answer is:

Normally vice presidents hold some power and special responsibilities below that of the president. The amendment also specifies that if any eligible person serves as president or acting president for more than two years of a term for which some other eligible person was elected president, the former can only be elected president once. Mitt Romney for president. Perhaps the best known sub-national presidents are the borough presidents of the Five Boroughs of New York City. The president fulfills various ceremonial duties.

@5hirish
Copy link
Owner

5hirish commented May 22, 2019

@infosisio currently I am unable to maintain this project, I have identified some issues/shortcomings with the project if you are interested to contribute I will share it with you.

@idoroiengel
Copy link
Contributor

@5hirish Hey, I am interested in helping out, please let me know what needs to be done, and I'll try to do something about it :)

@5hirish
Copy link
Owner

5hirish commented Jun 1, 2019

@idoroiengel That's great to hear. When I started the project the basic outline I chalked out was to have a Question Answering system where you would ask a question it would go and perform basic NLP operations on the question like Tokenisation, Stemming, POS tagging, Dependency extraction. It will try to extract all the relevant keywords from the question which could be used to construct a query to search on any knowledge source. After searching the on a knowledge source it would get the raw data, try to filter out irrelevant information or summarize and generate candidate answers and rank them.

@5hirish
Copy link
Owner

5hirish commented Jun 1, 2019

Since then a lot of things have changed with my understanding of this problem statement and the different ways to solve it. There are a lot of constructs in the system currently that can work against its favor and give out irrelevant answers such as above. To understand the current state of the system I would redirect you to /docs folder of the repo where there is an architecture diagram and a white paper of the system. I will also note down a couple of issues I am aware of here in this issue. Also, the build on Travis is failing I will also look in to that and try to fix it. In the mean time you can reach out to mean on my email address in case you need nay help with the project and trying to understand its codebase or having any troubles setting up the project.

@5hirish
Copy link
Owner

5hirish commented Jun 1, 2019

I have compiled this list a long time ago, so I have forgotten the specifics of it, but nonetheless, it should be a good start.

  1. Issues with the keywords being searched on Wikipedia [Selective Search]: Irrelevant keywords being searched on knowledge source leading to add noise in the extracted knowledge.
  2. Improve the keyword extraction: Working on a keyword extraction algorithm, so that the current rule-based keyword extraction can be deprecated for an unsupervised methodology. We can look into the dependency relations of each token and take into account its other grammatical features to identify the keywords in it.
  3. Search on the structured info: A lot of tabular and structured information is extracted from Wikipedia. Work on an algorithm to search on nested JSON data to identify the relevant keys in it and get their values.
  4. Question classification: Revisited question classification model (Support Vector Machine), tweak it if necessary try to include the classified label in keyword extraction or query construction phase to improve keyword extraction/query construction
  5. Information retrieval: Revisit information extraction phase (Vector Space Model), can we improve it with LSTM maybe?
  6. Can we leverage Elasticsearch more in the project?

@5hirish
Copy link
Owner

5hirish commented Jun 1, 2019

@idoroiengel Maybe this easiest thing to start with can be upgrading the dependencies like spacy. I would be glad if we can revive this project and will try to take this up more regularly!!!

@5hirish 5hirish self-assigned this Jun 1, 2019
@5hirish 5hirish added contributors welcome Contributor Friendly docs Documentation labels Jun 1, 2019
@5hirish
Copy link
Owner

5hirish commented Jun 1, 2019

Fixed build issues with Travis CI

@TharunAts
Copy link

what is know_corp in Corpus and how does it will affect the model?

@idoroiengel
Copy link
Contributor

@5hirish sounds good, I also already glanced at some of the docs, and I think I got the basics. I work mostly on Android, but since I'm MA Linguistics graduate I want to do some NLP coding. I can take a look at the dependencies this week. I built it successfully with the current dependencies on my local machine, and ran it a few times with several queries to test the system.

@idoroiengel
Copy link
Contributor

@5hirish do you have any specific notes for the branches of the project that I should be aware of? Also, should we continue this discussion in a different conversation?

@5hirish
Copy link
Owner

5hirish commented Jun 2, 2019

@idoroiengel currently all the branches are stale and no feature is under development. So, master is the stable branch. Yes, let us carry out this conversation on mail (mail@5hirish.com) or Gitter or maybe Slack.

Also, in December I was thinking of trying to implement some of the SQUAD 2.0 approaches. SQUAD 2.0 Think ths would be a good start to kickstart the project again. Going through some of the approaches from this competition and trying to implement one of it that uits our project and the problem we are trying to solve.

@5hirish
Copy link
Owner

5hirish commented Jun 2, 2019

@TharunAts this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time 😅

@ghost
Copy link
Author

ghost commented Jun 2, 2019 via email

@5hirish
Copy link
Owner

5hirish commented Jun 2, 2019

@infosisio @idoroiengel I have created a Gitter chat for the project, which would be much more convenient for any discussions related to the project. As broad conversations would be quite inconvenient to carry out on a single issue. Feel free to join Gitter chat

Also, I had created a Kanban project board here on GitHub when I was thinking of SQAUD competition and have documented whatever initial findings I had done. Kanban Board

@5hirish 5hirish pinned this issue Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
contributors welcome Contributor Friendly docs Documentation
Projects
None yet
Development

No branches or pull requests

3 participants