Fix data processing error in tutorial #51

kklein · 2022-08-12T11:34:16Z

Rendered docs:
https://datajudge--51.org.readthedocs.build/en/51/

kklein · 2022-08-12T11:35:13Z

docs/source/examples/example_twitch.rst

  constraints. The ``VarCharRegex`` constraint compared the columns' values to a regular
  expression. The ``UniquesEquality`` constraint expected the unique values of the
  ``language`` column to not have changed between version 1 and version 2.
-* The failing ``KolmogorovSminrnov`` constraint tells us that we shouldn't assume the


This test only failed due to a bug in the data processing.

codecov · 2022-08-12T11:35:16Z

Codecov Report

Merging #51 (55821b2) into main (7a5d603) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #51   +/-   ##
=======================================
  Coverage   93.90%   93.90%           
=======================================
  Files          15       15           
  Lines        1607     1607           
=======================================
  Hits         1509     1509           
  Misses         98       98

Impacted Files	Coverage Δ
src/datajudge/__init__.py	`77.77% <ø> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

kklein · 2022-08-12T11:36:51Z

docs/source/examples/twitch_process.py

+    df_v2[fluctuating_column] = ((1 + change) * df_v1[fluctuating_column]).astype(int)
+
+# Make old version not have data about all channels from current version.
+df_v1 = df_v1.sample(frac=0.85, random_state=SEED)


Previously, this subsampling happened before the numeric perturbations but after the copying.

Hence, df_v2 only received updated values for the subsampled rows/indices. Remaining rows/indices were assigned NA values.

kklein · 2022-08-12T11:38:04Z

docs/source/examples/twitch_upload.py

-# Introduce a data error.
-index = (~df_v2["channel"].isin(df_v1["channel"])).idxmax()
-df_v2.loc[index, "language"] = "Sw3d1zh"
+df_v1 = pd.read_csv("twitch_version1.csv")


Dump to file system such that a separate upload script can be run even when not running the processing oneself.

kklein added 5 commits August 12, 2022 12:53

Adapt line length.

a560680

Fix phrasing.

ba09629

Replace borken htmlpreview link with simple reference to file.

bec8f04

Lower-case 'datajudge'.

bd7e4b5

Fix bug in data processing.

55821b2

kklein commented Aug 12, 2022

View reviewed changes

kklein changed the title ~~More docs cleanup~~ Fix data processing error in tutorial Aug 15, 2022

kklein requested a review from ivergara August 15, 2022 09:16

ivergara approved these changes Aug 15, 2022

View reviewed changes

kklein marked this pull request as ready for review August 15, 2022 09:30

kklein merged commit e0ea007 into main Aug 15, 2022

kklein deleted the more_docs_cleanup branch August 15, 2022 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix data processing error in tutorial #51

Fix data processing error in tutorial #51

Uh oh!

kklein commented Aug 12, 2022 •

edited

Loading

Uh oh!

kklein Aug 12, 2022

Uh oh!

codecov bot commented Aug 12, 2022 •

edited

Loading

Uh oh!

kklein Aug 12, 2022

Uh oh!

kklein Aug 12, 2022

Uh oh!

Uh oh!

Fix data processing error in tutorial #51

Fix data processing error in tutorial #51

Uh oh!

Conversation

kklein commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kklein Aug 12, 2022

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kklein Aug 12, 2022

Choose a reason for hiding this comment

Uh oh!

kklein Aug 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kklein commented Aug 12, 2022 •

edited

Loading

codecov bot commented Aug 12, 2022 •

edited

Loading