feat(script): add content seeding script and sample data #324

kelsk · 2022-02-11T09:03:32Z

Add python script to inject sample data into a Firestore database
Add sample data in json format
Update website and content-api READMEs to include instructions on seeding the database

Fixes #316 & #315

ace-n

Mostly formatting nits.

The one blocker I see is telling people how to {determine, specify} EMBLEM_URL.

content-api/README.md

website/README.md

engelke

LGTM

Co-authored-by: Ace Nassri <anassri@google.com>

iennae

LGTM! Great job.

grayside

This is a great foundation for content seeding! Thanks Kelsey.

I've left a number of feedback points on this PR. While there are a number of things I'd like to see fixed pretty much everything can be deferred to a follow-up. I'd appreciate if you could file an issue or issues for anything you don't think is worth tackling as part of this PR.

The only blocker here is the open question on Wikimedia image copyright.

grayside · 2022-02-11T22:38:52Z

website/README.md

+To run the website locally, use the `flask run` command. By default, the website will run on port `8080`.
+
+## Seed Database


Rather than maintain duplicate seeding instructions, shouldn't this be removed here and assumed is done as part of properly setting up the Content API?

grayside · 2022-02-11T22:39:39Z

content-api/data/seed_database.py

+    client = firestore.Client(project)
+    print("Adding content to the database, this may take a few minutes...")
+    for item in content:
+        doc_ref = client.collection(item["collection"]).document(item["id"])


Content API uses an optional test prefix on the collection name:

emblem/content-api/main_approver_test.py

Line 26 in dcba905

os.environ["EMBLEM_DB_ENVIRONMENT"] = "TEST"

This script should respect that, so the seeding can be part of a test environment.

grayside · 2022-02-11T22:40:39Z

content-api/data/seed_database.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from google.cloud import firestore


As a standalone script, please include some basic inline docs, such as what the script does and how to run it in a basic way.

grayside · 2022-02-11T22:40:53Z

content-api/data/seed_database.py

+import json
+
+
+project = os.getenv("PROJECT_ID")


PROJECT_ID is a pretty vague variable name. Is this meant to override for this script, or set as a general shell variable? The convention is GOOGLE_CLOUD_PROJECT.

grayside · 2022-02-11T22:41:18Z

content-api/data/seed_database.py

+
+def seed_database(content):
+
+    client = firestore.Client(project)


The way I tend to do set project is if GOOGLE_CLOUD_PROJECT variable is set, use that, otherwise default to no project so the client can retrieve the project from the metadata server. Using metadata server is probably how we'd prefer to run this on Cloud Build.

grayside · 2022-02-11T22:43:18Z

content-api/data/seed_database.py

+
+
+with open("sample_data.json", "r") as f:
+    seed_content = json.load(f)


Future: We should probably have a linter configured to check the JSON doc is well-formed as a PR check. It's pretty easy to break JSON structure, especially through things like git conflict resolution.

grayside · 2022-02-11T23:09:51Z

content-api/data/sample_data.json

+    "data": {
+      "name": "Careers for Fish",
+      "description": "This cause provides careers for gifted fish.",
+      "imageUrl": "https://upload.wikimedia.org/wikipedia/commons/2/23/Georgia_Aquarium_-_Giant_Grouper_edit.jpg",


Wikimedia recommends against hotlinking, but supports it.

Either way, copyright is left to the user. https://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia/technical

Do you have a read on the copyrights of the images? Do we need to add a general copyright link or a per image attribution?

Each image I added had the CC0, CC BY or CC BY-SA license, which all permit free commercial use of the images.

However, I missed the requirement on CC BY and CC BY-SA to include attribution (the creator's name and a link to the license).

Instead of adding attribution for each image, I'll replace any CC BY/BY-SA licensed images with ones that have a CC0 license or that are in the public domain.

(And instead of hotlinking, we should upload the images to a storage bucket, yes?)

grayside · 2022-02-11T23:11:38Z

content-api/data/sample_data.json

@@ -0,0 +1,3052 @@
+[


future: This file is pretty large for human-managed JSON. We may want to think about partitioning the data into separate files and run the seed script multiple times.

grayside · 2022-02-11T23:12:24Z

content-api/data/seed_database.py

@@ -0,0 +1,42 @@
+# Copyright 2021 Google LLC


Copyright block should use the year the file is created.

Suggested change

# Copyright 2021 Google LLC

# Copyright 2022 Google LLC

grayside · 2022-02-11T23:22:41Z

content-api/data/seed_database.py

+
+    client = firestore.Client(project)
+    print("Adding content to the database, this may take a few minutes...")
+    for item in content:


future: consider parallelization, we could probably speed up seeding time.

grayside · 2022-02-12T00:10:46Z

Follow-up for the blocker in #325.

grayside

With blocker moved out, switching to approve. Still good to raid other comments here for future improvements.

kelsk added 4 commits February 11, 2022 00:31

add content seeding script

cb45a1f

add generated sample data

4033387

change env from PROJECT to PROJECT_ID

56ada6d

update website & content-api READMEs

b1dda5d

kelsk added component: content-api Related to the Content API. component: demo services Related to interactive learning using the app. labels Feb 11, 2022

kelsk requested a review from a team as a code owner February 11, 2022 09:03

Merge branch 'main' into content-seeding

6b94f0e

ace-n suggested changes Feb 11, 2022

View reviewed changes

content-api/README.md Outdated Show resolved Hide resolved

content-api/README.md Outdated Show resolved Hide resolved

website/README.md Show resolved Hide resolved

website/README.md Outdated Show resolved Hide resolved

website/README.md Outdated Show resolved Hide resolved

engelke approved these changes Feb 11, 2022

View reviewed changes

kelsk and others added 5 commits February 11, 2022 13:42

add instructions to set PROJECT_ID and EMBLEM_API_URL

0fc4848

Update content-api/README.md

aa35e76

Co-authored-by: Ace Nassri <anassri@google.com>

Update content-api/README.md

a2ef36e

Co-authored-by: Ace Nassri <anassri@google.com>

Update website/README.md

0885fb4

Co-authored-by: Ace Nassri <anassri@google.com>

Update website/README.md

c985706

Co-authored-by: Ace Nassri <anassri@google.com>

kelsk requested a review from ace-n February 11, 2022 21:43

ace-n approved these changes Feb 11, 2022

View reviewed changes

grayside added this to the v0.6.0 milestone Feb 11, 2022

iennae approved these changes Feb 11, 2022

View reviewed changes

grayside suggested changes Feb 11, 2022

View reviewed changes

grayside mentioned this pull request Feb 12, 2022

demo services: Image copyright in content seeding #325

Open

grayside approved these changes Feb 12, 2022

View reviewed changes

grayside merged commit a403a87 into main Feb 12, 2022

grayside deleted the content-seeding branch February 12, 2022 00:11

github-actions bot mentioned this pull request Jan 26, 2024

chore(main): release emblem 1.0.0 #732

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(script): add content seeding script and sample data #324

feat(script): add content seeding script and sample data #324

kelsk commented Feb 11, 2022 •

edited

Loading

ace-n left a comment

engelke left a comment

iennae left a comment

grayside left a comment

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

kelsk Feb 12, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside Feb 11, 2022

grayside commented Feb 12, 2022

grayside left a comment

		To run the website locally, use the `flask run` command. By default, the website will run on port `8080`.

		## Seed Database


		def seed_database(content):

		client = firestore.Client(project)



		with open("sample_data.json", "r") as f:
		seed_content = json.load(f)

feat(script): add content seeding script and sample data #324

feat(script): add content seeding script and sample data #324

Conversation

kelsk commented Feb 11, 2022 • edited Loading

ace-n left a comment

Choose a reason for hiding this comment

engelke left a comment

Choose a reason for hiding this comment

iennae left a comment

Choose a reason for hiding this comment

grayside left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grayside commented Feb 12, 2022

grayside left a comment

Choose a reason for hiding this comment

kelsk commented Feb 11, 2022 •

edited

Loading