Data-Preprocessing

Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. With the increasing amount of data available for research and analysis, real-world data is often incomplete or inconsistent and thus not ready to be used directly. Multiple spreadsheets, missing values, typos, numbers shown as text, unnecessary columns… Data without adequate preparation will deliver poor or misleading findings. This is exemplified by the pithy data scientist phrase ‘GIGO’, which stands for ‘Garbage In Garbage Out’.

Binder Link

Click this link to launch a virtual environment in your browser. This allows you to code along without downloading or installing anything on your computer! However, the virtual environment may not work well if lots of people are trying to use it at the same time.

Further Information

To access learning materials from the wider Computational Social Science training series: [Training Materials]
To keep up to date with upcoming and past training events: [Events]
To get in contact with feedback, ideas or to seek assistance: [Help]

Thank you and good luck on your journey exploring new forms of data!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Slides		Slides
Data-preprocessing-code-20210128 .ipynb		Data-preprocessing-code-20210128 .ipynb
README.md		README.md
merged.csv		merged.csv
requirements.txt		requirements.txt
titanic - features.csv		titanic - features.csv
titanic - survival.csv		titanic - survival.csv
titanic.csv		titanic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Slides

Slides