Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermediate - migrate beyond tutorials google sheet to v3 format #20855

Merged
merged 8 commits into from Feb 26, 2018

Conversation

sureshc
Copy link
Contributor

@sureshc sureshc commented Feb 23, 2018

First Intermediate Step to migrate Beyond Tutorials Sheet

https://trello.com/c/WgUqM812/1-upgrade-data-gsheets-to-v3-gsheets

Currently three Pegasus Google Sheets are imported using a legacy rake seed task. Add configuration to import one of these legacy sheets - HocBeyondTutorials - in the newer v3 format where column name suffixes indicate datatype with a new name (cdo_beyond_tutorials) as an intermediate step. A future Pull Request will modify the Pegasus view/controller logic to present data from the new sheet / table and will archive the old sheet / table.

… suffixes indicate type, using new option to strip column data type suffixes
@caleybrock
Copy link
Contributor

👀

@caleybrock
Copy link
Contributor

screen shot 2018-02-23 at 9 48 57 am

Test error definitely looks related.

@@ -28,7 +28,8 @@ call,call (a function),This is the piece of code that you add to a program to in
click,click,Press the mouse button.,,,,,,,,,,,,
code,code,The language that programmers create and use to tell a computer what to do.,Course 1: Stage 2,Course 1: Stage 2,,,,,,,,,,
command,command,An instruction for the computer. Many commands put together make up algorithms and computer programs.,Course 1: Stage 2,Course 1: Stage 2,,,,,,,,,,
computational thinking,computational thinking,"Mental processes and strategies that include: decomposition, pattern matching, abstraction, algorithms (decomposing problems into smaller, more manageable problems, finding repeating patterns, abstracting specific differences to make one solution work for multiple problems, and creating step-by-step algorithms).",Course 3: Stage 1,Course 3: Stage 1,,,,,,,,,,
computational thinking,computational thinking,"Modifying a problem in such a way that it can be modeled or solved using a computer or machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't have changes to these csv's in PR, they'll get overwritten by gsheet changes once merged. Let's not add them to commits.

@sureshc
Copy link
Contributor Author

sureshc commented Feb 23, 2018

TODO: add screen shots of spreadsheet and Pegasus UI, include deployment and cutover plan, resolve test that is failing in Circle

@sureshc
Copy link
Contributor Author

sureshc commented Feb 23, 2018

OUTDATED - see latest comments

Cutover Deployment Plan

We don't have a separate Test Google Drive environment, so this change was tested locally by temporarily configuring the beyond_tutorials.gsheet file to point to a Test folder in the code.org Google Drive Pegasus folder. Here is the plan for committing and deploying this change to production:

10AM PST

  1. Move GoogleDrive://Pegasus/Data/HocBeyondTutorials to GoogleDrive://Pegasus/v3 and rename to beyond_tutorials
  2. Copy new column headers with datatype suffixes from Test Sheet (GoogleDrive://Pegasus/TestSuresh/beyond_tutorials) and paste into new production sheet (GoogleDrive://Pegasus/v3/beyond_tutorials)
  3. Backup and Delete test sheet - GoogleDrive://Pegasus/TestSuresh/beyond_tutorials

10:30AM PST

  1. Commit update to /pegasus/data/beyond_tutorials.gsheet to point path to /v3/beyond_tutorials instead of test sheet
  2. Merge this branch to staging branchso that within 5 minutes bin/cron/import_google_sheets will download the new beyond_tutorials sheet in the v3 format and covert to CSV on the staging environment

12:00 PM PST

Daily bin/cron/commit_content job on staging commits pegasus/data/beyond_tutorials.csv that was downloaded and converted to CSV using the new import logic

@sureshc
Copy link
Contributor Author

sureshc commented Feb 23, 2018

@caleybrock I'm developer of the day on Monday, so that will be good timing to merge this change and carefully carry out the cutover.

@sureshc sureshc changed the title [Work In Progress] migrate beyond tutorials google sheet to v3 format migrate beyond tutorials google sheet to v3 format Feb 23, 2018
@caleybrock
Copy link
Contributor

Thanks for writing up the deployment plan. I'm a bit concerned that we might get an error when we delete the current gsheet, but hopefully it's resolved quickly.
I also think we should add to the deployment plan that we should do an out of bounds content scoop of that gsheet to make sure we deploy all the changes together.

@breville
Copy link
Member

From memory this gsheet only drives one page. Would it keep things simpler to simply update that page to use the v3 column naming?

@breville
Copy link
Member

breville commented Feb 25, 2018

I haven't fully reasoned through the cutover plan, but it's quite common during deploys that we have new frontend files running against an older database for a while on our machines. (I think? @caleybrock knows this well.)

Would it make sense to do a cutover by introducing a new v3 gsheet that puts data into a new table alongside the legacy one, and actually deploying that a day earlier to make sure that the database contains the necessary data, and then updating the view? Then when we deploy, we start serving the new view as the frontends are updated, but fortunately the data is already ready?

@caleybrock
Copy link
Contributor

I think Brendan's suggestion would work. Duplicating the data temporarily doesn't seem bad, and then they are all imported consistently.

One of the benefits Suresh mentioned about this approach is that the database doesn't have column names with types in the name. Although there may be something to be said about consistency of all the gsheets? That is why we are trying to update them.

Does either approach make the next upgrade of HocTutorials simpler?

@breville
Copy link
Member

I'm somewhat conservative when it comes to changing this kind of foundational code, especially when it doesn't have unit tests, and especially when a change introduces multiple code paths which remain untested. If we want to add more test coverage here, I'd feel better about the multiple code paths, but I feel like the facility for having columns both with and without the suffix is kind of a band aid solution that we wouldn't need if we were willing to change the consumers of the data (or, as a thought experiment, we'd have written to consume the existing v3 naming convention in the first place). (Could this new approach also lead to some difficult situations with columns that are named identically apart from their suffix?)

Certainly there will be more columns in the HocTutorials migration, and dealing with that at the endpoint of consumption would involve changes to the TutorialExplorer JS code. However, we could also put a small transformation function in https://github.com/code-dot-org/code-dot-org/blob/staging/pegasus/src/database.rb which would convert from the native v3 format to the format used by the TutorialExplorer, localising code changes to a small function that could be easily verified.

I'm also fairly conservative about changes to the live site, because there are a variety of deployment and caching timing issues. Just as we have often built new version of pages by building a new page alongside the old one, I'm in favour of building out this new data source alongside the old one and making the transition once we're confident we can do an instant flip (and safely back again if anything goes wrong).

@sureshc
Copy link
Contributor Author

sureshc commented Feb 26, 2018

I was thinking of two benefits to optionally dropping the datatype suffixes during import:

  • eliminate the need to update controller and view logic that consumes this data

  • provide the option to not use datatype suffixes in database column names, because they are redundant, and likely only used for type safety during the import process. At some point in the distant future, Google Sheets won't be used as the system of record for this type of data and we'll want to stop using datatype suffixes in column / variable names (although in the olden days type indicators were common).

Given how important the CSV-to-database logic is and that there aren't any unit tests on this logic, the additional code path to optionally drop the datatype suffix is too risky. I'll create the new table in v3 format as an intermediate step.

@sureshc sureshc changed the title migrate beyond tutorials google sheet to v3 format intermediate - migrate beyond tutorials google sheet to v3 format Feb 26, 2018
@sureshc sureshc removed the request for review from joshlory February 26, 2018 19:25
@caleybrock
Copy link
Contributor

Discussed offline: moving the ghseet to /v3 with proper naming convention.

@@ -0,0 +1,2 @@
TestSuresh/cdo-beyond-tutorials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSuresh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the new sheet in a temporary folder to ensure content editors would not try to use it. It's now in the correct location (v3).

@sureshc sureshc merged commit a60b749 into staging Feb 26, 2018
@sureshc sureshc deleted the migrate-beyondtutorials-googlesheet-to-v3 branch July 23, 2018 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants