Skip to content

Do not modify input astropy table when converting to pandas DataFrame (fixes #268)#273

Open
zonca wants to merge 1 commit into
datacarpentry:mainfrom
zonca:fix-268-no-modify-input-table
Open

Do not modify input astropy table when converting to pandas DataFrame (fixes #268)#273
zonca wants to merge 1 commit into
datacarpentry:mainfrom
zonca:fix-268-no-modify-input-table

Conversation

@zonca
Copy link
Copy Markdown

@zonca zonca commented May 18, 2026

Summary

Fixes #268

The step-by-step walkthrough in episode 03 was adding GD-1 coordinate columns (phi1, phi2, pm_phi1, pm_phi2) directly to the input astropy Table (polygon_results) before calling to_pandas(). This modified the function's input, which is bad practice — a function should not silently modify its input unless that is its explicit purpose.

The original reason for adding columns to the table was a workaround for a pandas 1.3.0 bug (#74) where adding Quantity columns to a DataFrame caused describe() to fail. However, using .value to extract plain numerical values (without units) when adding columns to the DataFrame avoids this bug entirely.

Changes

episodes/03-transform.md

  • Removed the step that added GD-1 columns to the astropy table (polygon_results['phi1'] = skycoord_gd1.phi1)
  • Reordered so to_pandas() is called first (producing a 6-column DataFrame), then GD-1 columns are added with .value (producing the final 10-column DataFrame)
  • Updated expected outputs: shape now shows (140339, 6) before GD-1 columns and (140339, 10) after
  • Added a "Why .value?" callout explaining why we use .value to strip units
  • Moved the "Pandas DataFrames versus Astropy Tables" and proper_motion callouts to flow naturally with the restructured section

student_download/episode_functions.py

  • Updated make_dataframe() docstring to match the episode (function code was already correct — it already used .value)

Testing

I executed all 30 Python code blocks from the updated episode 03 sequentially and confirmed they all pass (1 block containing %matplotlib inline is a Jupyter magic and was skipped as expected).

A test script that extracts and runs all code blocks with additional verification asserts is available as a public gist:

https://gist.github.com/zonca/71774b5e3cd00480558dfc3825c6ed2b

The assert statements in the test script (clearly marked "ASSERT: not part of the lesson") verify:

  • All code blocks execute without errors
  • polygon_results is NOT modified after the full episode flow (only has original 6 columns)
  • results_df has 10 columns with float dtype (not Quantity)
  • make_dataframe() produces identical results to the step-by-step approach
  • describe() works correctly (no pandas Quantity bug)

Fixes datacarpentry#268

Previously the step-by-step walkthrough in episode 03 added GD-1
coordinate columns (phi1, phi2, pm_phi1, pm_phi2) directly to the
input astropy Table before calling to_pandas(). This modified the
function's input, which is bad practice.

Now the episode converts to a pandas DataFrame first, then adds
the GD-1 columns using .value to extract plain numerical values
without units. The make_dataframe() function already used this
approach and is unchanged except for a docstring update.

Tested by executing all 30 Python code blocks from the updated
episode sequentially — all pass (1 Jupyter magic skipped).
See test script: https://gist.github.com/zonca/71774b5e3cd00480558dfc3825c6ed2b
@github-actions
Copy link
Copy Markdown

🆗 Pre-flight checks passed 😃

This pull request has been checked and contains no modified workflow files, spoofing, or invalid commits.

It should be safe to Approve and Run the workflows that need maintainer approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not modify the input astropy table when converting to pandas dataframe

1 participant