# A deep dive into our NLP solution

In this notebook we will see how does our model react in concrete situation, for that we will use only one PDF and some raw text.\
Feel free to play with our model ;)

First of all, let's import our packages:
- `os`: Various interfaces for the operating system
- `sys`: System specific parameters and fonctions
- `metaData`: Fetching informations from raw text and PDF
- `dataExtraction`: Work with pdf and raw data extraction
- `TrQuestions` : Work with transformer for QnA
- `TrSentymentAnalysis` : Work with transformer for Sentyment Analysis
- `Numpy` : Fundamental package for scientific computing with Python
- `pyplot` : Collection of functions that make matplotlib work like MATLAB

In [1]:
import os
import sys

import metaData
import dataExtraction
import TrQuestions
import TrSentymentAnalysis

import numpy as np
import matplotlib.pyplot as plt

**Now let's load and parse our report!**
- `dataExtraction.PDFToText(path)`: convert pdf to raw text
- `metaData.getInfo(data, from_text=bool)`: parse raw text to extract usefull informations

In [2]:
report = dataExtraction.PDFToText("exemple_report.pdf")
reportInfos = metaData.getInfo(report, from_text=True)

Let's play with report's data !\
First we will see what informations did we fetch for the part *Pupils' achievements*:

In [3]:
print(reportInfos[4])

The learning achievements of pupilsThe overall learning achievements of the pupils are very good. They are in receipt of 
a  broad  and  balanced  curriculum  and  are  highly  motivatedinterested  and 
enthusiasticReading levels in English are very good and literacy overall is successfully integrated 
across  the  curriculumIn  order  to  match  the  pupilsvery  good  levels  in  readinga 
more cohesive whole-school approach to children’s writing across a range of genres 
is recommended. Some very good examples of pupil competence and confidence in 
oral  language  were  also  observed  during  the  evaluationPupils  at  all  levels 
demonstrate very good understanding of discrete mathematical conceptsTá tusicint agus stór focail maith ag na páistí sa Ghaeilge agus tá scileanna reasúnta 
maith á bhaint amach acu sa Ghaeilge labhartha. Tá réimse maith dánta, rannta agus 
amhráin ar eolas acu. Chun muinín na ndaltaí agus a gcumas cainte a fhorbairt, ba 
chóir  clár  céimniúil  agus  ábh

We can see our data fetching worked pretty well, let's play with it by asking some questions.\
What about "What is the quality of pupils", pretty straight forward right?

- `pipeline('question-answering')`: load QnA Transformer model
- `nlp_qa(context=text, question=text)`: ask to the model the sentiment of the sentence

In [4]:
from transformers import pipeline
nlp_qa = pipeline('question-answering')
nlp_qa(context=reportInfos[4], question='What is the quality of pupils')

{'score': 0.6283722519874573, 'start': 87, 'end': 96, 'answer': 'very good'}

Impressive ! Our model succed to answer right ou question, but is a bit to specific.\
How does it handle a verry vague question: "Who ?"

In [5]:
nlp_qa(context=reportInfos[4], question='Who ?')

{'score': 0.6767464280128479, 'start': 1063, 'end': 1069, 'answer': 'pupils'}

Well, it succed once again !\
Let's complexify it one more time by testing our own custom text.

**PS: Feel free to test with your own text/questions**

In [6]:
text = """
PoC is a Student Innovation Center, currently based at EPITECH, which aims to promote Innovation and Open Source through its projects and events.
We work in innovation through three axes: 
- Our internal projects: Funded by PoC and carried out by our members, in partnership with foundations and research actors.
- Our services: For innovative companies in all sectors
- Our events: Workshops, talks or hackathons on the theme of technological innovation
"""

In [7]:
print(nlp_qa(context=text, question='What is PoC ?'))
print(nlp_qa(context=text, question='What is PoC main goal ?'))

{'score': 0.6073386669158936, 'start': 10, 'end': 35, 'answer': 'Student Innovation Center'}
{'score': 0.4512992203235626, 'start': 76, 'end': 113, 'answer': 'to promote Innovation and Open Source'}


Ok, now we know ouu model is confident and pretty efficient with QnA, but what about Sentiments analysis ?\
Let's check with our report once again ;)
- `pipeline('sentiment-analysis')`: load sentiment analysis Transformer model
- `nlp_sentence_classif(text)`: do sentiment analysis on provided text

In [8]:
nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif(reportInfos[4])

[{'label': 'POSITIVE', 'score': 0.9994074702262878}]

For the last test we will use a custom text and once again feel free to try with your own text.

In [9]:
text = "I'm soo glad to be here !"
TrSentymentAnalysis.getSentiment(text)

[{'label': 'POSITIVE', 'score': 0.9994363188743591}]