Explorer revamp #428

sal-uva · 2024-04-23T14:56:25Z

This PR revamps how the Explorer works and looks. It specifically does the following:

Adds a new OPTION_DATASOURCES_TABLE user input that creates a table with dynamic columns for each enabled dataset. Input fields per row can be text, dropdown, and checkbox
Uses this table to create a new Settings page where the Explorer can be enabled per data source (more table options can be added later).
Simplifies how custom data source templates for the Explorer are handled: they are now composed of CSS files (in static/css/explorer/) and Jinja2 templates (in webtool/templates/explorer/datasource-templates/) instead of CSS and JSON files in the data source folders that need to be verified and parsed.
Integrates the Explorer with the UI of 4CAT.
Makes the Explorer use iterate_items.
Re-integrate sorting so that all dataset columns can be used for sorting in the Explorer.
Enable new annotation columns for for sorting, filtering, and other features.
Adds Explorer templates for Twitter and Instagram (other data sources will soon follow).
Deletes much unnecessary code.

Note that some unused code is still present for future updates with respect to 4CAT scrapers and database-accessible data sources generally.

…or changes

…hodsinitiative/4cat into explorer-improvements # Conflicts: # common/lib/config_definition.py

…rer settings page

…s functionality, start Instagram template

dale-wahl

I looked over the backend code and only noticed one real issue (in an edge case). I ran this version and tested out the Explorer on a number of datasets (instagram, custom, telegram, tumblr, tiktok, youtube, reddit). It looks good! Sort works well. Reddit was missing the "subject" field (it's probably the only dataset that uses subject anymore). Telegram has an issue which I will post separately.

I tested saving annotations and writing them to datasets. This worked for me (and broke one with my edge case 😬; see comment). I did notice that the new fields show up in the Dataset preview view, but the values saved to the database do not show up in preview. The values do show up after you have run "write annotations".

Changing deactivating/activating settings seem to work fine. There is an explorerflask settings group that could probably be merged with the Explorer group.

If you want to merge now, I would deactivate Telegram as a default (till addressed) and consider how to address my comment re: field names for annotations.

dale-wahl · 2024-04-23T18:34:25Z

backend/lib/processor.py

@@ -418,7 +418,7 @@ def add_field_to_parent(self, field_name, new_data, which_parent=source_dataset,
 		parent_path = which_parent.get_results_path()

 		if len(new_data) != which_parent.num_rows:
-			raise ProcessorException('Must have new data point for each record: parent dataset: %i, new data points: %i' % (which_parent.num_rows, len(new_data)))
+			self.dataset.update_status('The amount of new data points and existing records don\'t match; data may be misaligned (parent dataset: %i, new data points: %i)' % (which_parent.num_rows, len(new_data)))


If we do not return here or raise then the code will add data to the original dataset. This may be the intent (if whatever list always starts at the first item, the result would be fine, BUT if you fed a list that starts somewhere else, then those new records will be incorrectly updated).

This is intended; this method could not be used with different data lengths before, but we do need this now because num_rows does not take into account when map_item() fails and creates a shorter CSV than an NDJSON. Didn't seem to cause any problems when making it a warning instead of exception!

dale-wahl · 2024-04-23T18:40:32Z

datasources/ninegag/search_9gag.py

@@ -8,6 +8,7 @@

 from backend.lib.search import Search
 from common.lib.item_mapping import MappedItem
+from common.lib.helpers import UserInput


We're importing UserInput in a few datasources unnecessarily. Probably an oversight and otherwise has no effect.

True, we should do some cleanup..

dale-wahl · 2024-04-23T18:43:06Z

processors/filtering/write_annotations.py

@@ -101,7 +102,7 @@ def process(self):

 		# Write to top dataset
 		for label, values in new_data.items():
-			self.add_field_to_parent("annotation_" + label, values, which_parent=self.source_dataset, update_existing=True)
+			self.add_field_to_parent(label, values, which_parent=self.source_dataset, update_existing=True)


The add_field_to_parent function does not check for existing fields. If a User creates a field called "username" they will overwrite an existing field with the same name. If I recall, I could not figure out how to check that an existing column had the name because the add_field_to_parent function needs to be able to update existing annotation fields. This is just a bit dangerous.

Tested on a dataset by creating a field called "author", adding some values, and writing to dataset. I was able to overwrite the original "author" field (which in my case was actually a dictionary of author related data which caused map item to break). I recommend reverting this change. We could even add 4CAT_annotation_ or something so that it would be virtually impossible for raw data to contain that fieldname.

This is indeed an oversight for now, though I would like to have the option for annotation fields to have a 'clean' name; long names are quickly unreadable in spreadsheet software. This can be resolved by initially checking whether an annotation field key already exists in the dataset columns or, when annotated datasets are filtered and create a new dataset, if it is not a field registered in the annotations table for the parent dataset.

This is a sort-of edge case for now, but I'll try to resolve this next week!

Fixed this with a back-end and front-end check

dale-wahl · 2024-04-24T09:01:22Z

And this is the issue with a Telegram dataset I ran into:

File "/opt/venv/lib/python3.8/site-packages/flask/templating.py", line 151, in render_template
2024-04-23 23:46:05     return _render(app, template, context)
2024-04-23 23:46:05   File "/opt/venv/lib/python3.8/site-packages/flask/templating.py", line 132, in _render
2024-04-23 23:46:05     rv = template.render(context)
2024-04-23 23:46:05   File "/opt/venv/lib/python3.8/site-packages/jinja2/environment.py", line 1301, in render
2024-04-23 23:46:05     self.environment.handle_exception()
2024-04-23 23:46:05   File "/opt/venv/lib/python3.8/site-packages/jinja2/environment.py", line 936, in handle_exception
2024-04-23 23:46:05     raise rewrite_traceback_stack(source=source)
2024-04-23 23:46:05   File "/usr/src/app/webtool/templates/explorer/explorer.html", line 1, in top-level template code
2024-04-23 23:46:05     {% extends "layout.html" %}
2024-04-23 23:46:05   File "/usr/src/app/webtool/templates/layout.html", line 71, in top-level template code
2024-04-23 23:46:05     {% block body %}
2024-04-23 23:46:05   File "/usr/src/app/webtool/templates/explorer/explorer.html", line 40, in block 'body'
2024-04-23 23:46:05     {% include "explorer/post.html" %}
2024-04-23 23:46:05   File "/usr/src/app/webtool/templates/explorer/post.html", line 12, in top-level template code
2024-04-23 23:46:05     {% include "explorer/datasource-templates/generic.html" %}
2024-04-23 23:46:05   File "/usr/src/app/webtool/templates/explorer/datasource-templates/generic.html", line 122, in top-level template code
2024-04-23 23:46:05     <i class="fa-solid fa-comment"></i> {{ fields.comments | commafy }}
2024-04-23 23:46:05   File "/usr/src/app/webtool/lib/template_filters.py", line 64, in _jinja2_filter_commafy
2024-04-23 23:46:05     number = int(number)
2024-04-23 23:46:05 ValueError: invalid literal for int() with base 10: '👍👍👍👍👍❤🏆🆒'

Looks like perhaps the emojis are killing the template. Telegram, I think, is the only datasource using them.

sal-uva · 2024-04-29T09:25:06Z

Since merging to master is no longer of immediate concern, I will address the above and make further improvements; will notify when it's time for a review!

…front- and back-end

…hodsinitiative/4cat into explorer-improvements # Conflicts: # webtool/views/views_explorer.py

…and @-mentions

sal-uva added 23 commits April 8, 2024 12:46

First commit :)

1d3f587

Use regular iterate_items method when looping through dataset + min…

c4a4606

…or changes

Change wording in Explorer settings

cac644e

Allow Explorer CSS to be inserted and changed in Settings

8b78452

Move around Explorer CSS files

0fe3ea6

Edit custom Explorer CSS options

e06760a

Forgot to save these

a921967

Typozzz

e37ebc9

First setup for dynamic Explorer options in Settings

a7668f0

First steps to datasource table user input

59e33b0

Merge branch 'explorer-improvements' of https://github.com/digitalmet…

712aff1

…hodsinitiative/4cat into explorer-improvements # Conflicts: # common/lib/config_definition.py

Add basic UserInput.DATASOURCES_TABLE functionality, and use in Explo…

46628c6

…rer settings page

Simplify config setting name

340d1ff

Only show Explorer when enabled per data source

dfbe5f3

First steps in integrating the Explorer more with the main interface

28abb42

First steps in bringing back sorting

70d00b1

More sorting stuff

e937362

Fix and simplify sorting, control box styling

11eaaf9

Style and fix annotation field editor, enable config settings for CSS

c33fd72

Fix annotation saving, improve CSS inclusions

00993ed

Make sure annotations are kept in NDJSON and CSV, change custom field…

7149a6d

…s functionality, start Instagram template

Improve Instagram template, add location fields to Instagram search

93460ba

Simplify template settings, add Twitter and Instagram template

6929986

sal-uva requested a review from dale-wahl April 23, 2024 14:59

sal-uva added 3 commits April 23, 2024 17:57

Merge remote-tracking branch 'origin/master' into explorer-improvements

7d81a79

Remove prints

08735b7

Don't prepend 'annotations'

759b36a

dale-wahl reviewed Apr 24, 2024

View reviewed changes

Don't commafy post body in generic Explorer template

0cf2ccd

sal-uva added 4 commits April 24, 2024 11:52

Leftover string in config definition

bc386b3

No user input needed for 9GAG

7bf4ed8

Remove old files

4c66f41

Rudimentary TikTok template

6744736

sal-uva added 12 commits April 29, 2024 17:35

Make invalid fields have a red border

afa2d3a

Make sure that annotation fields do not use existing column names in …

1716fe4

…front- and back-end

Get rid of non-necessary code and libraries

bdfa308

Move save annotation functions to dataset.py

a7f7375

Merge branch 'explorer-improvements' of https://github.com/digitalmet…

6715c7e

…hodsinitiative/4cat into explorer-improvements # Conflicts: # webtool/views/views_explorer.py

Add 'social_mediafy' template filter to add links to URLs, hashtags, …

fcad68b

…and @-mentions

Improve social_mediafy regexes and add to templates

7a7af83

Allow reverse-sorting by dataset order

e1dc2f2

Add animations

19b4639

There needs to be something to save!

67a8298

Add coauhtors to instagram map_item and add to Explorer template

5a2862b

Better social_mediafy regexes, implement per platform

67f0746

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explorer revamp #428

Explorer revamp #428

sal-uva commented Apr 23, 2024

dale-wahl left a comment

dale-wahl Apr 23, 2024

sal-uva Apr 24, 2024

dale-wahl Apr 23, 2024

sal-uva Apr 24, 2024

dale-wahl Apr 23, 2024

dale-wahl Apr 23, 2024

sal-uva Apr 24, 2024

sal-uva Apr 30, 2024

dale-wahl commented Apr 24, 2024

sal-uva commented Apr 29, 2024

Explorer revamp #428

Are you sure you want to change the base?

Explorer revamp #428

Conversation

sal-uva commented Apr 23, 2024

dale-wahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dale-wahl commented Apr 24, 2024

sal-uva commented Apr 29, 2024