-
Notifications
You must be signed in to change notification settings - Fork 3
Meeting Minutes Jan 26 18
| Location | Time | Duration |
|---|---|---|
| CSC-262 | 12:00 - 14:00 | 2hr |
- Questions to answer Eleni's emails
- Do you have a good idea on what you will be delivering -- A search page, that has a well defined search syntax, very minimalist design -- Should have some predefined questions with associated pre-made searches that people will find interesting -- Visualizations we are not sure on yet -- We take excel files, parse into database, deliver database queries using the SPARQL language
- We need a github repo
- Need record of all meetings
- THere is a reasrearch component of this project -- You need to be able to take text and extract entities out of them to place in the database also -- Eg. "Bill Gates founded Microsoft", extract bill gates, microsoft, founding relationship
- How do you think the system will look -- Our definition given -- Its not very clear - WE NEED TO SCHEDULE A MEETING WITH ELENI
- What technologies will you be using -- Denilson is going to tell you which libraries to use to extract the entities
- Repo -- Need use cases (Tentative assignment: Julienne and Chris) -- Mockups and navigation diagram (tentative assignment: Austin) -- 2 UML diagram, component, class, plus the high level diagram (tentative assignment: Cecilia) --- The diagrams need more descriptions and notes than normal so Diego understands -- A GANTT chart, with people assigned to the task -- Glossary, List of similar products, Description (tentative assignment: Vuk) -- MUST BE USING GITHUB issues
-
Sharepoint data is transformed into PDF
-
There is extra information attached onto the PDFs after they are generated
-
Take note of the UML diagram that was sent, it describes the excel dump we were sent
-
First step: bring up PDFs that mention a phrase/keyword
-
Second step: bring up only the relevant information from each PDF, what it is describing, and what other sections it links to, some sort of visualization
-
Third step: Take text, analyze syntax structure, auto label everything semantically (nouns, relations: eg. Stroulia said "blah"). Job is to take the text from the PDF's, take all the people we have from the knowledge graph from the spreadsheets, and then extract knowledge from the PDF's.
-
We could potentially solve this basically with SOLR, find the appropriate PDFs to scan
-
SPARQL is used to scan triple ("Eleni", is, "Human"), which is a storage of relationships
-
We have 2 options
-
We either go relational database, or we go SPARQL style
-
SPARQL + knowledge graphs -- We need to get information from the excel, build knowledge graphs by defining all the data as triples -- Then use NLP to extract more information and add more triples
-
RElational
-
Take spreadsheets -> SQL
-
Use SOLR to index text, get information that matches the diagram of entities
-
Then do NLP to get more information and add more data
-
Recommended d3.js that will implement many visualizations
- Records
- Sprint 1
- Grades
- Test Documentation
- Client Documentation
- Not Yet Developed
- Presentation
- Screencast