Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain when (not) to use OpenRefine #103

Open
bencomp opened this issue Feb 21, 2022 · 4 comments
Open

Explain when (not) to use OpenRefine #103

bencomp opened this issue Feb 21, 2022 · 4 comments
Labels
help wanted Looking for Contributors type:clarification Suggest change for make lesson clearer type:discussion Discussion or feedback about the lesson type:teaching example PR showing how lesson was modified in a workshop

Comments

@bencomp
Copy link
Contributor

bencomp commented Feb 21, 2022

I have taught the OpenRefine lesson a few times; most recently today. Even though I always try to explain when you could choose OpenRefine for a problem, and how to compare OpenRefine to spreadsheets and writing a script, students keep asking for more explanation and comparisons.
In our workshop the OpenRefine lesson is between Data organisation in spreadsheets and Introduction to R and that is also how I tried to frame OpenRefine: it shows your data like a spreadsheet application, but it has powers like a programming environment.

Seeing how I keep struggling to explain it well, even with years of experience with OR, we should probably improve the lesson materials.

It was suggested by helpers that referring back to my situating OR between spreadsheets and programming in the introduction later in the lesson might help, but the introduction episode should provide more context first.

@bencomp bencomp added help wanted Looking for Contributors type:clarification Suggest change for make lesson clearer type:discussion Discussion or feedback about the lesson type:teaching example PR showing how lesson was modified in a workshop labels Feb 21, 2022
@bencomp
Copy link
Contributor Author

bencomp commented Feb 21, 2022

I do realise now that this has been mentioned in part in #79 and also relates to #56 and #38.

@bencomp
Copy link
Contributor Author

bencomp commented Jun 23, 2022

I think we should look at the Library Carpentry lesson on OpenRefine for clearer use cases in the introduction episode: splitting data elements into different columns, normalising date formats and maybe matching/enhancing. This would go instead of the Motivations section, which is currently written for potential instructors (I feel).

Let's replace the Features and Getting help sections with How is OR different from spreadsheet applications? and When would you write a script instead of using OR?.

Spreadsheets

  • OR is not for creating data and doesn't handle colours/formulae/comments/... in cells
  • it is easier to undo/redo actions, especially applying actions on different files
  • find and replace works on everything or one thing at a time, OR allows row selection and works in one column at a time
  • OR has clustering so that you don't have to be aware of which variations exist, unlike in spreadsheets.
  • you can load spreadsheets, but they need to be tabular

Scripts

  • OR is more for exploratory cleaning, scripts are more useful when you know what to fix
  • use scripts when you have too many data for OR
  • use scripts when OR cannot do what you want, like fixing data using a machine-learning model.

@bencomp
Copy link
Contributor Author

bencomp commented Oct 13, 2022

From #37:

  • Change learning objective "describe use cases" to understand how OpenRefine compares to spreadsheet apps and scripting".

@bencomp
Copy link
Contributor Author

bencomp commented Oct 25, 2022

Perhaps it's also useful to distinguish OR from using SQL with a relational database. SQL also allows selection of rows and creating derivative columns. The cross function allows to join data from different projects, like JOIN in SQL. (cross is not currently part of the lesson, but I have used it myself.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors type:clarification Suggest change for make lesson clearer type:discussion Discussion or feedback about the lesson type:teaching example PR showing how lesson was modified in a workshop
Projects
None yet
Development

No branches or pull requests

1 participant