Technex Hackathon

Round 1 Final Submission

Rule Based NLU – Pattern Matching Algorithm

NOTE: All data files which were initially provided are kept in ./resources/ directory. The final output.txt is present at ./resources/Testing/ directory.

Assumptions/Edits in initial resources

The given data had some errors. We modified the given data at some places:

The keyword 'alram' in the file ./resources/Grammar/CreateAlarmWithTime.txt was changed to 'alarm'.
A space character was inserted in between '{please_concept}{tell_concept}' so that we can have two different tokens. The files changed in this case were ./resources/Grammar/WeatherInCity.txt and /resources/Grammar/WeatherTimeCity.txt.
The BookTicket.txt was moved from /resources/Concept/ to /resources/Grammar/ directory.
The keyword 'thunderstrom' was changed to 'thunderstorm' in file ./resources/Concept/weather_activity.txt.

Our Approach to the problem

The two major task involved in our solution:

Conversion of the input sentence into the Grammar compatible form.
Matching of above converted sentence to all grammars available.

First task:

We target day, date, time, datetime placeholders in the text by using various regex patterns(see custom_utils/DateTimePlaceholder.java).
We target contact names and places by first extracting every token from the given data, and then matching them with our input sentence. Here, we ensure that a fuzzy matching also takes place between tokens and input sentence. For computing such similarity scores, we use Jaro Winkler Distance algorithm(See custom_utils/PlaceholderMatch.java). This algorithm is good matching algo for smaller length strings(like names).
We identify concepts in the input sentence by first building a general purpose matrix. This matrix is of shape NUM_CONCEPTS X VOCABULARY, and its element M[i][j] is a boolean value describing whether the jth word in the vocabulary will belong to concept i. We use this matrix to deduce concepts in the input sentence(See custom_utils/Concept_parser.java).

After the above steps, we move to our next task.

Note: consider the transformed sentence as S from now on.

Second task

First of all, we preprocess the given grammar txt files. We observed that some tokens were exclusive to some specfic grammar rules(for e.g 'alarm') and some tokens were spread across all grammar rules(e.g. '<please_concept>'). We decided to find some weights for each token, which can help us in matching S with grammar rules. We adopted the TF-IDF method for finding such weights(See See custom_utils/GrammarWeight.java).
Now the next task was to actually match the transformed sentence S with grammar rules. Given a grammar file, we choose a rule from that file, and then match our S with the rule. S and the rule are converted to a new form first (based on TF-IDF weights) and then matched with a algorithm. We chose Levenshtein Distance algorithm first(see custom_utils/LevenshteinDistance.java), but we got better results with Jaro Distance algorithm.
For this task, please see custom_utils/GrammarAnalyser.java.

After this, we just chose our final answer as the grammar rule which has maximum similarity score among all others. And this completes the complete problem.

Running the program

$ javac samsungtest.java
$ java samsungtest

Contributors

Aakash Mallik Github @AakashMallik
Utsav Krishnan Github @ketankr9
Divesh Pandey Github @pandeydivesh15

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
custom_utils		custom_utils
resources		resources
resources_old		resources_old
.gitignore		.gitignore
README.md		README.md
a		a
directoryStructure.txt		directoryStructure.txt
readme.txt		readme.txt
samsungtest.java		samsungtest.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Technex Hackathon

Round 1 Final Submission

Rule Based NLU – Pattern Matching Algorithm

Assumptions/Edits in initial resources

Our Approach to the problem

First task:

Second task

Running the program

Contributors

About

Releases

Packages

Contributors 3

Languages

Pipe-Runner/hackathon-technex

Folders and files

Latest commit

History

Repository files navigation

Technex Hackathon

Round 1 Final Submission

Rule Based NLU – Pattern Matching Algorithm

Assumptions/Edits in initial resources

Our Approach to the problem

First task:

Second task

Running the program

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages