Skip to content

Commit

Permalink
slides online
Browse files Browse the repository at this point in the history
  • Loading branch information
Andreas Blätte authored and Andreas Blätte committed Dec 14, 2023
1 parent eb4ae8d commit 44955c2
Show file tree
Hide file tree
Showing 25 changed files with 4,795 additions and 21 deletions.
28 changes: 16 additions & 12 deletions cqp.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Cooking with GermaParl"
date: "2023-06-01"
title: "Cooking with GermaParl: Spicy Queries"
date: "2023-11-23"
output:
xaringan::moon_reader:
css: ["./css/default.css", "./css/metropolis.css", "./css/robot-fonts.css", './css/polminify.css']
Expand All @@ -14,11 +14,14 @@ editor_options:
library(DiagrammeR)
library(xml2)
library(kableExtra)
library(polmineR)
library(dplyr)
library(purrr)
```

```{r load_polmineR, echo = FALSE, message = FALSE, warning = FALSE}
library(polmineR)
```

# Data formats and structures

* TEI-inspired XML: <br/>https://github.com/PolMine/GermaParlTEI
Expand All @@ -30,7 +33,7 @@ See the [landing page at Zenodo](https://zenodo.org/record/7949074) for further

---

# Data Preparation Workflow
# GermaParl: Data Preparation Workflow

```{r make_diagrammeR", echo = FALSE, fig.width = 11}
grViz("
Expand Down Expand Up @@ -68,7 +71,7 @@ PARSE -> XML -> ENRICH-> XML2 -> ANNOTATE -> VRT -> ENCODE -> CWB
# XML/TEI

* structural annotation (metadata on document and speaker level)
* quasi standardization, inspired by the standards of the Text Encoding Initiative
* quasi standardization, inspired by the standards of the Text Encoding Initiative (TEI)

```{r read_xml, echo = FALSE}
xml2::read_xml(x = "~/Lab/github/GermaParlTEI/01/BT_01_003.xml") %>%
Expand Down Expand Up @@ -252,6 +255,7 @@ corpus("GERMAPARL2") |>
```{r, eval = FALSE}
x <- corpus("GERMAPARL2") %>%
subset(ne_type = "PERSON") %>%
split(s_attribute = "ne_type") %>%
get_token_stream(p_attribute = "word") %>%
table()
```
Expand All @@ -271,16 +275,16 @@ corpus("GERMAPARL2") %>%
dispersion(query = look_up, cqp = TRUE, s_attribute = "protocol_date") %>%
as_tibble() %>%
mutate(date = as.Date(protocol_date)) %>%
mutate(month = floor_date(date, unit = "quarter")) %>%
filter(!is.na(month)) %>%
select(count, month) %>%
group_by(month) %>%
mutate(year = floor_date(date, unit = "year")) %>%
filter(!is.na(year)) %>%
select(count, year) %>%
group_by(year) %>%
summarise(sum = sum(count)) %>%
as.xts(x = .$sum, order.by = .$month) %>%
as.xts(x = .$sum, order.by = .$year) %>%
plot(
main = "'FDGO' in Bundestag Plenary Debates (N/quarter)",
xlab = "total per quarter",
ylim = c(0, 40), type = "h", col = "darkblue", main.timespan = FALSE
xlab = "total per year",
ylim = c(0, 100), col = "darkblue", main.timespan = FALSE
)
```

558 changes: 558 additions & 0 deletions docs/cqp.html

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions docs/cqp_files/DiagrammeR-styles-0.2/styles.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.DiagrammeR,.grViz pre {
white-space: pre-wrap; /* CSS 3 */
white-space: -moz-pre-wrap; /* Mozilla, since 1999 */
white-space: -pre-wrap; /* Opera 4-6 */
white-space: -o-pre-wrap; /* Opera 7 */
word-wrap: break-word; /* Internet Explorer 5.5+ */
}

.DiagrammeR g .label {
font-family: Helvetica;
font-size: 14px;
color: #333333;
}

Binary file added docs/cqp_files/figure-html/unnamed-chunk-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 91 additions & 0 deletions docs/cqp_files/grViz-binding-1.0.10/grViz.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
HTMLWidgets.widget({

name: 'grViz',

type: 'output',

initialize: function(el, width, height) {

return {
// TODO: add instance fields as required
};
},

renderValue: function(el, x, instance) {
// Use this to sort of make our diagram responsive
// or at a minimum fit within the bounds set by htmlwidgets
// for the parent container
function makeResponsive(el){
var svg = el.getElementsByTagName("svg")[0];
if (svg) {
if (svg.width) {svg.removeAttribute("width")}
if (svg.height) {svg.removeAttribute("height")}
svg.style.width = "100%";
svg.style.height = "100%";
}
}

if (x.diagram !== "") {

if (typeof x.config === "undefined"){
x.config = {};
x.config.engine = "dot";
x.config.options = {};
}

try {

el.innerHTML = Viz(x.diagram, format="svg", engine=x.config.engine, options=x.config.options);

makeResponsive(el);

if (HTMLWidgets.shinyMode) {
// Get widget id
var id = el.id;

$("#" + id + " .node").click(function(e) {
// Get node id
var nodeid = e.currentTarget.id;
// Get node text object and make an array
var node_texts = $("#" + id + " #" + nodeid + " text");
//var node_path = $("#" + nodeid + " path")[0];
var text_array = node_texts.map(function() {return $(this).text(); }).toArray();
// Build return object *obj* with node-id, node text values and node fill
var obj = {
id: nodeid,
//fill: node_path.attributes.fill.nodeValue,
//outerHMTL: node_path.outerHTML,
nodeValues: text_array
};
// Send *obj* to Shiny's inputs (input$[id]+_click e.g.: input$vtree_click))
Shiny.setInputValue(id + "_click", obj, {priority: "event"});
});
}

// set up a container for tasks to perform after completion
// one example would be add callbacks for event handling
// styling
if (typeof x.tasks !== "undefined") {
if ((typeof x.tasks.length === "undefined") ||
(typeof x.tasks === "function")) {
// handle a function not enclosed in array
// should be able to remove once using jsonlite
x.tasks = [x.tasks];
}
x.tasks.map(function(t){
// for each tasks add it to the mermaid.tasks with el
t.call(el);
});
}
} catch(e){
var p = document.createElement("pre");
p.innerText = e;
el.appendChild(p);
}
}

},

resize: function(el, width, height, instance) {
}
});
12 changes: 12 additions & 0 deletions docs/cqp_files/header-attrs-2.22/header-attrs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
var i, h, a;
for (i = 0; i < hs.length; i++) {
h = hs[i];
if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6
a = h.attributes;
while (a.length > 0) h.removeAttribute(a[0].name);
}
});
12 changes: 12 additions & 0 deletions docs/cqp_files/header-attrs-2.25/header-attrs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
var i, h, a;
for (i = 0; i < hs.length; i++) {
h = hs[i];
if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6
a = h.attributes;
while (a.length > 0) h.removeAttribute(a[0].name);
}
});
Loading

0 comments on commit 44955c2

Please sign in to comment.