# Writing Analytics Taster

*[Andrew Gibson](http://staff.qut.edu.au/staff/andrew.gibson) - EUN673 Presentation - 15 June 2022*

The purpose of this notebook is to provide you with a bit of a taster of *Writing Analytics* as a sub-field of *Learning Analytics*, and in particular highlight some of the key approaches that are taken and some of the significant challenges that are encountered.

### Check out the Handbook

[The Handbook of Learning Analytics - Online](https://www.solaresearch.org/publications/hla-22/) 

**PDF Downloads:**

[The Handbook of Learning Analytics.](https://solaresearch.org/wp-content/uploads/hla22/HLA22.pdf) 2nd Ed. Charles Lang, George Siemens, Alyssa Friend Wise, Dragan Gaševic, and Agathe Merceron. (Eds.) Vancouver, Canada: SoLAR.


Gibson, A., & Shibani, A. (2022). [Natural Language Processing-Writing Analytics.](https://solaresearch.org/wp-content/uploads/hla22/HLA22_Chapter_10_Gibson.pdf) In Charles Lang, George Siemens, Alyssa Friend Wise, Dragan Gaševic, and Agathe Merceron (Eds.) *The Handbook of Learning Analytics.* 2nd Ed. Vancouver, Canada: SoLAR, 96-104.

#### How do we approach WA?

- Linguistic orientation
- Domain orientation

#### What is the intention for using WA?

- Descriptive WA
- Evaluative WA

#### How do our orientations and intentions influence the efficacy of the WA?

## >> Three *opinionated* principles...

### 1 - WA *should be* pragmatic

Gibson, Andrew & Lang, Charles. (2018) [The pragmatic maxim as learning analytics research method. ](https://eprints.qut.edu.au/118262/22/118262.pdf) In Ochoa, X, Ferguson, R, Merceron, A, & Buckingham Shum, S (Eds.) Proceedings of the 8th International Conference on Learning Analytics and Knowledge.
Association for Computing Machinery, United States of America, pp. 461- 465.

> Consider what effects, that might conceivably have practical bearings, we conceive the object of our con- ception to have. Then, our conception of these effects is the whole of our conception of the object. (Charles Sanders Peirce, 1878, How to Make Our Ideas Clear)




#### Pragmatic Inquiry for Learning Analytics Research (PILAR)

1. **Contextualise** irritation or doubt within a clearly designed learning situation
2. **Clarify** the analytics to investigate based on practical learning effects
3. **Hypothesise** how the analytics will result in the practical effects
4. **Apply** the hypothesis by putting it to the test in the learning context
5. **Evaluate** the extent to which the hypothesis is true and practical effects are realised

### 2 - WA *should be* socio-technical

Knight, S., Gibson, A., & Shibani, A. (2020). [Implementing Learning Analytics for Learning Impact: Taking Tools to Task.](https://eprints.qut.edu.au/197662/1/Social_and_Technical_Infrastructure_ORO.pdf) Internet and Higher Education. https://doi.org/10.1016/j.iheduc.2020.100729

<div>
<img src="https://github.com/andrewresearch/presentations/blob/79d0e1a4cc448b7a6db8132d2255aaa0b80735f2/EUN673-WritingAnalyticsTaster-220615/WAlayers.png?raw=true" width="500"/>
</div>

## 3 - WA *should* transcend accurate measurement

Kitto, K., Buckingham Shum, S., & Gibson, A. (2018). [Embracing imperfection in learning analytics.](https://opus.lib.uts.edu.au/bitstream/10453/133000/1/p451-kitto.pdf) In Proceedings of the 8th international conference on learning analytics and knowledge (pp. 451-460).

- What is the learning activity in which this WA should be used?
- What model is used to link low level data to the WA?
- What form of feedback is provided to learners?
- In what way does the feedback contribute to learning gain?


## >> Exploring implementation

## Example 1: Towards the discovery of metacognition from reflective writing

Gibson, A., Kirsty, K., and Bruza, P. (2016) Towards the Discovery of Learner Metacognition From Reflective Writing. Journal of Learning Analytics, 3(2), 22-36. doi: [http://dx.doi.org/10.18608/jla.2016.32.3](http://dx.doi.org/10.18608/jla.2016.32.3)

<div>
<img src="http://nlytx.io/metacognition/assets/images/ReflectionMetacognitionSpectrum.jpg" width="500"/>
</div>

[Experiment by clicking on the live demo: http://nlytx.io/metacognition/](http://nlytx.io/metacognition/index.html)

## Example 2: Computer features ≠ Human concepts

*Derived from LASI2022 Writing Analytics Tutorial*

We can use Natural Language Processing (NLP) to perform low-level analysis of text. However, this is not particularly useful to most people (except perhaps linguists). To make the analysis useful for feedback on writing, we need to transform computational analysis into human understandable features that have a clear meaning for the writer.

To illustrate, let's look at a short demo reflection I wrote for one of my data analytics classes..

> Yesterday I went to a seminar on deep learning that really challenged me to consider whether I am taking full advantage of this new technology in my research. Jeremy and Rachel presented about why deep learning is not over-hyped, and how it is going to be significant in the longer term. Initially, I was underwhelmed by this idea, and was thinking that the seminar might not hold much value for me. However, some of the material they presented was surprising and challenged me to re-think where I was at. Jeremy suggested that deep learning technologies were similar to electricity and that often the immediate applications are not obvious, but as time goes on, the technology is seen to be more valuable in many areas of society. He suggested that this could also be the case for deep learning technologies. I certainly hadn’t thought much about this with respect to my own research. However, this idea suggested that perhaps I need to reconsider whether it may be a valuable tech to add to my current technologies that I use.
 I have tended to avoid using the technology in my own research, for many reasons, and Jeremy and Rachel highlighted some of these reasons as common reasons why people avoid the technology. They then presented counter examples to challenge that thinking. This certainly challenged me in some ways, and I at times I felt a bit uncomfortable in seeing how flimsy some of my reasoning was. 
Certainly, moving forward I should probably be more open to how some of these new developments might be able to aid the advancement of my work in reflective writing analytics. I think that although there is still good reason to be careful, and aware of poor uses of this technology, I have tended to use this as bit of an excuse to avoid taking the extra effort to try and apply the technology in my own work.
Moving forward, I need to take some time to look carefully at which applications might advance my research, and give them a go. Although I need to balance the use of deep learning with my other techniques, I should be careful not to favour those techniques that have served me well in the past, and be blinded to the potential of new (and potentially better) techniques. I should at least be open to the improvements that might bring to my work.

#### Strings

In order for the computer to work with this reflection, we need use code or computer lanuage (in this case, Python) to manipulate text.

In [None]:
# Assign text (in the form of a muliline string) to a variable 'reflection'

reflection = """
Yesterday I went to a seminar on deep learning that really challenged me to consider whether I am taking full advantage of this new technology in my research. Jeremy and Rachel presented about why deep learning is not over-hyped, and how it is going to be significant in the longer term. Initially, I was underwhelmed by this idea, and was thinking that the seminar might not hold much value for me. However, some of the material they presented was surprising and challenged me to re-think where I was at. Jeremy suggested that deep learning technologies were similar to electricity and that often the immediate applications are not obvious, but as time goes on, the technology is seen to be more valuable in many areas of society. He suggested that this could also be the case for deep learning technologies. I certainly hadn’t thought much about this with respect to my own research. However, this idea suggested that perhaps I need to reconsider whether it may be a valuable tech to add to my current technologies that I use.
 I have tended to avoid using the technology in my own research, for many reasons, and Jeremy and Rachel highlighted some of these reasons as common reasons why people avoid the technology. They then presented counter examples to challenge that thinking. This certainly challenged me in some ways, and I at times I felt a bit uncomfortable in seeing how flimsy some of my reasoning was. 
Certainly, moving forward I should probably be more open to how some of these new developments might be able to aid the advancement of my work in reflective writing analytics. I think that although there is still good reason to be careful, and aware of poor uses of this technology, I have tended to use this as bit of an excuse to avoid taking the extra effort to try and apply the technology in my own work.
Moving forward, I need to take some time to look carefully at which applications might advance my research, and give them a go. Although I need to balance the use of deep learning with my other techniques, I should be careful not to favour those techniques that have served me well in the past, and be blinded to the potential of new (and potentially better) techniques. I should at least be open to the improvements that might bring to my work.
""".strip()

The variable `reflection` contains one long **string** of characters (each character is a byte stored in the computers memory)
Notice that it doesn't include any of the normal formatting that we associate with text. It doesn't even know words!

In [None]:
print(reflection)

#### Rendering

To make it more readable, we can display the text as `HTML` (the structure of data in web browsers) - the browser will parse the HTML in a way that makes it easier to see the whole text. This is basic text visualisation in the browser. The browser is doing some basic work in translating raw strings into formats that humans are used to reading. We're not doing any analysis yet!

In [None]:
# import display.HTML and use it display the text as HTML

from IPython import display

display.HTML(reflection)

#### More human requires more code!

We could make it more readable still by turning the text into a list of paragraphs and displaying each with a space between.

In [None]:
# create a list of paragraphs

reflection_paras = reflection.split('\n')

# wrap each paragraph in html <p> tags and separate with a </br> tag

html_reflection = '</br>'.join(map(lambda x: '<p>'+x+'</p>', reflection_paras))
display.HTML(html_reflection)

#### Analysing with Natural Language Processing (NLP)

To perform analysis on text, we generally make use of NLP libraries. Two of the most common libraries for the Python language are `Spacy` and `NLTK`. We will use Spacy ([spacy.io](http://spacy.io)) for our analyis...

In [None]:
# We may need to install the spacy 'library' or 'package' first before we can use it
! pip install spacy
! python -m spacy download en_core_web_sm

In [None]:
# Load the spacy library and a pre-trained language model

import spacy
nlp = spacy.load("en_core_web_sm")
spacy.info() # Check version info

We're now ready to process our text with Spacy. For this exercise, we'll just do a simple analysis.

In [None]:
# process the reflection as a spacy Doc object

doc = nlp(reflection)

# view some of the information stored in the Doc object

for token in doc[:10]:
  print(token.idx, token.text, token.lemma_, token.pos_)


In [None]:
#Output the doc in json format to get an overview of all of the features

doc.to_json()

#### Meaningful features

Some of the NLP features look meaningful straight away

In [None]:
# spacy also identified named entities

doc.ents

In [None]:
# these can be visualised

from spacy import displacy
ent_render = displacy.render(doc, style="ent")
display.HTML(ent_render)

#### Useful features

Other features don't appear meaningful, but perhaps they could be useful?

In [None]:
# We can also visualise the dependency tree

sentences = [sent for sent in doc.sents]
deptree = displacy.render(sentences[0], style='dep')
display.HTML(deptree)

## Example 3: Using NLP (towards WA)

How do we decide what might be useful out of this low level analysis, and how can we use it to construct features that are more aligned with how humans think about the writing?

One approach is to think about meaningful patterns in the text and use the NLP software to extract them...

#### Pattern matching

In [None]:
# Match a pattern in the text

from spacy.matcher import Matcher

matcher = Matcher(nlp.vocab)

# Find a modal verb followed by any word
pattern = [{"TAG": "PRP"},{"TAG": "MD"},{"POS": "ADV", "OP": "*"},{},{}]
matcher.add("example", [pattern])

matches = matcher(doc)
for match_id, start, end in matches:
    span = doc[start:end]  # The matched span
    print(start, end, span.text)

matcher.remove("example")

In [None]:
# We can turn this into a function to allow us to experiment

def findPattern(match_pattern):
  matcher.add("experiment", [match_pattern])
  matches = matcher(doc)
  matcher.remove("experiment")
  return [doc[start:end].text for match_id, start,end in matches]

# Try it with pattern above

findPattern(pattern)

#### Finding patterns that are useful

Experiment with matching patterns using the [Rule Based Matcher Explorer](https://explosion.ai/demos/matcher) and then try using the resulting pattern in the `findPattern` function below:


In [None]:
# Try your pattern out here

my_pattern = [{'POS': 'PRON', 'OP': '?'},
           {'POS': 'ADV', 'OP': '?'},
           {'POS': 'VERB', 'OP': '?'},
           {'TAG': 'PRP', 'OP': '?'}]

findPattern(my_pattern)

## Example 4: Reflexive Expressions (WIP)

