Meeting Minutes Jan 26 18

2018-01-26

Location	Time	Duration
CSC-262	12:00 - 14:00	2hr

Questions to answer Eleni's emails
Do you have a good idea on what you will be delivering -- A search page, that has a well defined search syntax, very minimalist design -- Should have some predefined questions with associated pre-made searches that people will find interesting -- Visualizations we are not sure on yet -- We take excel files, parse into database, deliver database queries using the SPARQL language

We need a github repo
Need record of all meetings
THere is a reasrearch component of this project -- You need to be able to take text and extract entities out of them to place in the database also -- Eg. "Bill Gates founded Microsoft", extract bill gates, microsoft, founding relationship
How do you think the system will look -- Our definition given -- Its not very clear - WE NEED TO SCHEDULE A MEETING WITH ELENI
What technologies will you be using -- Denilson is going to tell you which libraries to use to extract the entities
Repo -- Need use cases (Tentative assignment: Julienne and Chris) -- Mockups and navigation diagram (tentative assignment: Austin) -- 2 UML diagram, component, class, plus the high level diagram (tentative assignment: Cecilia) --- The diagrams need more descriptions and notes than normal so Diego understands -- A GANTT chart, with people assigned to the task -- Glossary, List of similar products, Description (tentative assignment: Vuk) -- MUST BE USING GITHUB issues

Sharepoint data is transformed into PDF
There is extra information attached onto the PDFs after they are generated
Take note of the UML diagram that was sent, it describes the excel dump we were sent
First step: bring up PDFs that mention a phrase/keyword
Second step: bring up only the relevant information from each PDF, what it is describing, and what other sections it links to, some sort of visualization
Third step: Take text, analyze syntax structure, auto label everything semantically (nouns, relations: eg. Stroulia said "blah"). Job is to take the text from the PDF's, take all the people we have from the knowledge graph from the spreadsheets, and then extract knowledge from the PDF's.
We could potentially solve this basically with SOLR, find the appropriate PDFs to scan
SPARQL is used to scan triple ("Eleni", is, "Human"), which is a storage of relationships
We have 2 options
We either go relational database, or we go SPARQL style
SPARQL + knowledge graphs -- We need to get information from the excel, build knowledge graphs by defining all the data as triples -- Then use NLP to extract more information and add more triples
RElational
Take spreadsheets -> SQL
Use SOLR to index text, get information that matches the diagram of entities
Then do NLP to get more information and add more data
Recommended d3.js that will implement many visualizations