New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intermediate - migrate beyond tutorials google sheet to v3 format #20855
Conversation
… suffixes indicate type, using new option to strip column data type suffixes
👀 |
pegasus/data/cdo-glossary-csf.csv
Outdated
@@ -28,7 +28,8 @@ call,call (a function),This is the piece of code that you add to a program to in | |||
click,click,Press the mouse button.,,,,,,,,,,,, | |||
code,code,The language that programmers create and use to tell a computer what to do.,Course 1: Stage 2,Course 1: Stage 2,,,,,,,,,, | |||
command,command,An instruction for the computer. Many commands put together make up algorithms and computer programs.,Course 1: Stage 2,Course 1: Stage 2,,,,,,,,,, | |||
computational thinking,computational thinking,"Mental processes and strategies that include: decomposition, pattern matching, abstraction, algorithms (decomposing problems into smaller, more manageable problems, finding repeating patterns, abstracting specific differences to make one solution work for multiple problems, and creating step-by-step algorithms).",Course 3: Stage 1,Course 3: Stage 1,,,,,,,,,, | |||
computational thinking,computational thinking,"Modifying a problem in such a way that it can be modeled or solved using a computer or machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't have changes to these csv's in PR, they'll get overwritten by gsheet changes once merged. Let's not add them to commits.
TODO: add screen shots of spreadsheet and Pegasus UI, include deployment and cutover plan, resolve test that is failing in Circle |
OUTDATED - see latest comments Cutover Deployment PlanWe don't have a separate Test Google Drive environment, so this change was tested locally by temporarily configuring the beyond_tutorials.gsheet file to point to a Test folder in the code.org Google Drive Pegasus folder. Here is the plan for committing and deploying this change to production: 10AM PST
10:30AM PST
12:00 PM PSTDaily |
@caleybrock I'm developer of the day on Monday, so that will be good timing to merge this change and carefully carry out the cutover. |
Thanks for writing up the deployment plan. I'm a bit concerned that we might get an error when we delete the current gsheet, but hopefully it's resolved quickly. |
From memory this gsheet only drives one page. Would it keep things simpler to simply update that page to use the v3 column naming? |
I haven't fully reasoned through the cutover plan, but it's quite common during deploys that we have new frontend files running against an older database for a while on our machines. (I think? @caleybrock knows this well.) Would it make sense to do a cutover by introducing a new v3 gsheet that puts data into a new table alongside the legacy one, and actually deploying that a day earlier to make sure that the database contains the necessary data, and then updating the view? Then when we deploy, we start serving the new view as the frontends are updated, but fortunately the data is already ready? |
I think Brendan's suggestion would work. Duplicating the data temporarily doesn't seem bad, and then they are all imported consistently. One of the benefits Suresh mentioned about this approach is that the database doesn't have column names with types in the name. Although there may be something to be said about consistency of all the gsheets? That is why we are trying to update them. Does either approach make the next upgrade of HocTutorials simpler? |
I'm somewhat conservative when it comes to changing this kind of foundational code, especially when it doesn't have unit tests, and especially when a change introduces multiple code paths which remain untested. If we want to add more test coverage here, I'd feel better about the multiple code paths, but I feel like the facility for having columns both with and without the suffix is kind of a band aid solution that we wouldn't need if we were willing to change the consumers of the data (or, as a thought experiment, we'd have written to consume the existing v3 naming convention in the first place). (Could this new approach also lead to some difficult situations with columns that are named identically apart from their suffix?) Certainly there will be more columns in the HocTutorials migration, and dealing with that at the endpoint of consumption would involve changes to the TutorialExplorer JS code. However, we could also put a small transformation function in https://github.com/code-dot-org/code-dot-org/blob/staging/pegasus/src/database.rb which would convert from the native v3 format to the format used by the TutorialExplorer, localising code changes to a small function that could be easily verified. I'm also fairly conservative about changes to the live site, because there are a variety of deployment and caching timing issues. Just as we have often built new version of pages by building a new page alongside the old one, I'm in favour of building out this new data source alongside the old one and making the transition once we're confident we can do an instant flip (and safely back again if anything goes wrong). |
I was thinking of two benefits to optionally dropping the datatype suffixes during import:
Given how important the CSV-to-database logic is and that there aren't any unit tests on this logic, the additional code path to optionally drop the datatype suffix is too risky. I'll create the new table in v3 format as an intermediate step. |
Discussed offline: moving the ghseet to /v3 with proper naming convention. |
@@ -0,0 +1,2 @@ | |||
TestSuresh/cdo-beyond-tutorials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestSuresh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the new sheet in a temporary folder to ensure content editors would not try to use it. It's now in the correct location (v3
).
First Intermediate Step to migrate Beyond Tutorials Sheet
https://trello.com/c/WgUqM812/1-upgrade-data-gsheets-to-v3-gsheets
Currently three Pegasus Google Sheets are imported using a legacy rake seed task. Add configuration to import one of these legacy sheets - HocBeyondTutorials - in the newer v3 format where column name suffixes indicate datatype with a new name (
cdo_beyond_tutorials
) as an intermediate step. A future Pull Request will modify the Pegasus view/controller logic to present data from the new sheet / table and will archive the old sheet / table.