Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between data cleaning and data organization? #56

Open
maneesha opened this issue Jan 19, 2020 · 8 comments
Open

Difference between data cleaning and data organization? #56

maneesha opened this issue Jan 19, 2020 · 8 comments
Labels
help wanted Looking for Contributors type:clarification Suggest change for make lesson clearer type:discussion Discussion or feedback about the lesson

Comments

@maneesha
Copy link
Contributor

The Introduction episode notes that one objective is:

  • Differentiate data cleaning from data organization.

However, this is not covered in the introductino episode.

@brownsarahm brownsarahm added high priority Need to be addressed ASAP help wanted Looking for Contributors and removed high priority Need to be addressed ASAP labels Aug 6, 2020
@brownsarahm
Copy link
Contributor

Thanks @maneesha this is a good point that should be addressed.

@BhaswatiRoy
Copy link

Hello
I would like to work on this issue
It would be great if I could be assigned
Thanks

@bencomp
Copy link
Contributor

bencomp commented Jan 11, 2023

Thanks for your offer, @BhaswatiRoy! Your contribution to this lesson would be much appreciated. I don't think anyone else is currently working on this, so feel free to start. I can still assign the issue to you.

Perhaps you can share your ideas for change(s) here first, so that we can help you go in the right direction with a PR? If you need any help, please let us know.

@BhaswatiRoy
Copy link

Hello @bencomp
After looking through the given link, I believe the "Motivations for the OpenRefine Lesson" section gives insights into the issue.
So the detailed differences between data cleaning & organization can be added under this section.
Or maybe a separate section can be created for explaining in depth.
image

@bencomp
Copy link
Contributor

bencomp commented Jan 18, 2023

I think that section needs an update anyway (I mentioned it in a comment in issue #103 as well), so maybe it would be good to add a new short section explaining the differences. Do you have an idea of what you would add? I feel it does not need to be very detailed, because understanding this difference is probably not necessary to follow the rest of the lesson.

@BhaswatiRoy
Copy link

We can add the meaning of both terms along with a short example to explain how both of them actually work.

@bencomp
Copy link
Contributor

bencomp commented Jan 24, 2023

That sounds good. I think core to this issue is the question what the actual differences are, so any suggestions for what to write are especially welcome.

@bencomp
Copy link
Contributor

bencomp commented May 31, 2023

Let me make some suggestions for what data organisation and data cleaning entail. This is up for discussion.

Data organisation:

  • combining data from multiple files
  • reordering columns (supported! see Introduce menu for column reordering #83)
  • moving table contents from a random position in a spreadsheet (not starting in cell A1) to the top left of the spreadsheet
  • splitting or combining values into or from multiple columns

Data cleaning:

  • removing unwanted variations, like typos, word order differences, date formats (e.g. ., / or - as separators)
  • normalising capitalisation
  • dealing with outliers
  • normalising missing values

Following these example definitions, that I really just made up, it is not like OpenRefine supports only data cleaning and not data organisation. Maybe we should remove the objective? We can (or should) still add examples like these to better introduce what OpenRefine can and cannot do (as discussed in #86 and #103).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors type:clarification Suggest change for make lesson clearer type:discussion Discussion or feedback about the lesson
Projects
None yet
Development

No branches or pull requests

4 participants