-
Notifications
You must be signed in to change notification settings - Fork 1
03. Assignment: Clean dataset
I got an assignment to clean a part of a 'dirty' dataset.
This dataset was came from a survey that was filled in by students.
The survey asked students what haircolor they had in a Hexcode format. Since not every student had answered the question in the right format the data had to be cleaned.
To load the csv file into a JavaScript project i made use of the JavaScript library D3. In order to only use one part of D3 i used JavaScript's import
statement like this:
import { csv } from 'd3';
Afterwards i wanted to make sure that the csv data of the survey was loaded before doing anything with it.
Doing this requires the use of a JavaScript Promise.
csv("../data/enquete.csv")
.then(data => {
});
In order to clean the data i wanted to do it step by step. I created variables and let every variable do something until i cleaned all the data for example like this:
csv("../data/enquete.csv")
.then(data => {
let items = [];
for(let i = 0; i < data.length; i++) {
items.push(data[i]["Kleur haar (HEX code)"].toUpperCase());
}
let filterUndefined = items.filter(item => item !== '' && item !== "0");
});
And the complete code looked like this:
csv("../data/enquete.csv")
.then(data => {
let items = [];
for(let i = 0; i < data.length; i++) {
items.push(data[i]["Kleur haar (HEX code)"].toUpperCase());
}
//remove undefined
let filterUndefined = items.filter(item => item !== '' && item !== "0");
//regex for testing hashtags
let regExHashtag = /^\#.*/;
//regex for testing no hashtags
let regExNoHashtag = /^(?!\#).*/;
//filter no hashtag
let filterNoHashtag = filterUndefined.filter(item => item.match(regExHashtag));
//filter hashtag
let filterHashtag = filterUndefined.filter(item => item.match(regExNoHashtag));
//filter non-hexcode
let filterKleurNaam = filterHashtag.filter(item => !item.includes("BLOND") && !item.includes("BRUIN"));
let hashtag = "#";
//add # to non hashtag items
let listWithHashtag = filterKleurNaam.map(item => "#" + item);
//make one array of the non-hashtags and hashtags
let cleanList = filterNoHashtag.concat(listWithHashtag);
});
After showing my assignment to the teacher, i received the following feedback: "You need to make the code functional".
So i started refactoring my code so it would adhere to the functional programming standards. The same piece of code refactored into functional programming standards looks like this:
import { csv } from 'd3';
csv("../data/enquete.csv")
.then(data => makeArray(data))
.then(data => filterUndefined(data))
.then(data => filterKleurNaam(data))
.then(data => console.log(addHashtag(data)));
function makeArray(items) {
return items.map(item => item["Kleur haar (HEX code)"].toUpperCase());
}
function filterUndefined(items) {
return items.filter(item => item !== '' && item !== "0");
}
function filterKleurNaam(items) {
return items.filter(item => !item.includes("BLOND") && !item.includes("BRUIN"))
}
function addHashtag(items) {
return items.map(item => item[0]!== "#" ? "#" + item : item );
}
This function uses the data received from the dataset and takes the property values of the chosen column "Kleur haar (HEX code)" and puts these in a new Array using the .map()
method.
function makeArray(items) {
return items.map(item => item["Kleur haar (HEX code)"].toUpperCase());
}
This function filters the data on undefined values and values equal to 0
.
function filterUndefined(items) {
return items.filter(item => item !== '' && item !== "0");
}
This function filters the values that are not hexcodes
function filterKleurNaam(items) {
return items.filter(item => !item.includes("BLOND") && !item.includes("BRUIN"))
}
This function adds a hashtag to hexcodes that didn't have one.
function addHashtag(items) {
return items.map(item => item[0]!== "#" ? "#" + item : item );
}
To make sure all the functions get executed in the right order, i used promise chaining like this:
csv("../data/enquete.csv")
.then(data => makeArray(data))
.then(data => filterUndefined(data))
.then(data => filterKleurNaam(data))
.then(data => addHashtag(data));