Add a User service, with database for user, feed info, drop planet feed list #1642

humphd · 2021-02-03T14:19:23Z

What would you like to be added:

Add a new database to our system to hold user, feed, and other related data that needs to exist long term. Our new data design would be:

database: long term storage of users and all information related to users, for example, their Seneca email, preferred display name, RSS feeds, URLs to blogs, social media, GitHub, etc.
redis: ephemeral storage (e.g., cache) for posts, shared session data, etc
elasticsearch: indexing of data for search, for example posts.

Why would you like this to be added:

We currently use the CDOT Planet Feed List as our primary data source. We've augmented that with Redis, but we don't have a great backup strategy or dedicated DevOps team, so it's possible to lose our data (#1365).

It should be possible to use this new database and replace the planet feed list dependency completely.

Considerations:

I'm not sure which database we should use. Here are some thoughts:

students coming to this course have all worked with MongoDB in the web stream. Because it's a database that people know, it might be a logical choice.
it would be easy to add whatever database we choose to our network of apps (e.g., run it in docker). But should we consider running this in a cloud we don't manage? For example, MongoDB Atlas has a free tier, but it's not really designed for production use. There is also Fauna, Firebase, and lots more that have a free tier. Our data needs are always going to be modest, since we only need to store user info and a few URLs for hundreds of users. Maybe we can fit in a free tier db and then not worry about data loss?
whatever db we choose, it would be nice if it was really easy for a dev to either mock locally, or have it be easy to sign-up for a free instance on their own to use in dev.
if we do decide to add the db to our network and run it ourselves, we need to consider how to do backups.

PedroFonsecaDEV · 2021-02-04T09:11:29Z

@humphd, so we need to find a cloud to hold our long term storage? Or a SQL running in our machines would solve the problem facing a power outage?

chrispinkney · 2021-02-04T19:39:35Z

Let's just ask Seneca to provide us with a backup diesel generator in case of power-outages. 😄

I'm using Firestore (nosql) in my capstone project, it's great and extremely generous with its free tier (50k reads per day, 20k writes/deletes per day.) Would love to discuss this further sometime.

PedroFonsecaDEV · 2021-02-04T20:19:48Z

I used Firebase in a React project; I agree with you @chrispinkney it's a good one.

chrispinkney · 2021-02-05T20:55:23Z

So here's my initial ideas based on the conversation we all had in both Teams and Slack:

DB

This (NoSQL) DB will be provided by Google's Cloud Firestore.
- It will contain the current CDOT Planet Feed List for online backup purposes.
  - It'll contain user information such as GitHub User URL, Avatar URL(?) (In addition to the user's name and rss/atom feed url.)
- It needs to be testable (emulateable) using Jest.
  - create a service in Telescope that hides the details of how firebase works, and make it so that it can swap implementations between the real version and testing version based on env
- Two versions need to exist for both prod and staging servers to use.
We need to migrate the current data feed to Firestore (I can probably whip up a script in Python for just that)

User "Schema":

The following information needs to be present per user:
- Name
- Blog RSS/Atom Feed URL
- GitHub Username (and profile URL?)
- Avatar URL (?)
A form can be present the moment a user logs in to Telescope wherein they can add the missing information that the schema needs in order to add their blog to the master planet feed.

Potential Issues / Blockers

We need a way for developers to access this DB, at the moment AFAIK I believe that you can only access it with a login and API key provided by the DB admin.

humphd · 2021-02-06T00:12:43Z

Some further details.

When a user signs-up to Telescope, they will first authenticate with Seneca's SSO provider, which returns data that looks like this:

{
  "issuer": "https://sts.windows.net/...idp-uuid.../",
  "inResponseTo": "_851...",
  "sessionIndex": "_dfa...",
  "nameID": "username@seneca-domain.ca",
  "nameIDFormat": "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress",
  "http://schemas.microsoft.com/identity/claims/tenantid": "...app-uuid...",
  "http://schemas.microsoft.com/identity/claims/objectidentifier": "...uuid...",
  "http://schemas.microsoft.com/identity/claims/displayname": "Full Name",
  "http://schemas.microsoft.com/identity/claims/identityprovider": "https://sts.windows.net/...app-uuid.../",
  "http://schemas.microsoft.com/claims/authnmethodsreferences": "http://schemas.microsoft.com/ws/2008/06/identity/authenticationmethod/password",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": "Firstname",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": "Lastname",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "username@seneca-domain.ca",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name": "username@seneca-domain.ca",
  "sAMAccountName": "username"
}

We don't care about a lot of this, but we need the following (see what we currently use):

id: we hash the nameID and get a 10-character hash (i.e., we won't store the real Seneca username/email)
name: user's first and last name. We could store first/last as well, but probably a name is enough? Many students don't go by the name that Seneca uses, and often the first/last name portions Seneca has are not correct.

We also want to augment this with new data:

githubUsername: the user's github username. Probably storing this is enough, but we might want to get the id that GitHub uses, in case it's useful for API calls later? Things like their avatar can be constructed at runtime from this username, so we don't need to store it.
feeds: an Array of one or more feeds belonging to this user. Feeds are a bit more complex than just storing URLs. We also need to keep track of info like whether the feed is flagged (e.g., don't show it), invalid (e.g., don't bother parsing it), etc. We probably need to think about storing these in a separate table and just putting the id here? Not sure
Users might want to have other URLs they keep here too: a home page, a Twitter account, a YouTube channel, Twitch stream? Probably we can collect these as a list of URLs, and have some intelligence in the back-end/front-end to identify special URLs (e.g., we might treat Twitter specially or Twitch, etc).

I suspect that what we'll want to do is create a microservice for this (cc @raygervais), which lets us get info about a user, a list of feeds, etc. This needs some thought, but we can probably build all of this at once vs. trying to do it for the current backend.

chrispinkney · 2021-02-10T18:35:55Z

After some discussion during Tuesday's meeting we have come to an agreement and also solved some concerns/questions:

We are going to be storing the User's information in the following 'schema':
Firestore Document ID (per user) : id (see: this for more info on id)
- firstName : string
- lastName : string
  - firstName, lastName will be pulled by Telescope's SSO.
- displayName : string
  - displayName will be entered by the user and preferred to be displayed on Telescope vs First Name and Last Name.
- github: object {
  - username: string,
  - avatarUrl: string (to be updated by GH microservice?)
    }
- isAdmin : boolean (false)
- isFlagged : boolean (false)
- feeds : array (of user blog feed RSS/Atom URLs)
We will store two collections of users:
- Users Collection:
  - This collection will store the current users of Telescope.
- Legacy Collection:
  - This collection will store the current CDOT Planet Feed List for backup and migration (see below) purposes.
  - When a user logs in, if the user is not found in the Users collection, we'll check to see if the user is present in the Legacy collection.
    - If present in the Legacy collection, we'll migrate the user to the Users collection.
    - If not present, we'll create a new User and store them in the Users collection.
We do not need to create the front end right now, we should do this in a future PR.

If the person reading this has any comments/concerns/feedback please let us know!

birtony · 2021-02-10T18:41:19Z

Thanks for summarizing, @chrispinkney! I thought we are going to pull the displayName from Telescope SSO (fullName) at first as well? Also, another question that crossed my mind is what email address are we going to use to create a Firebase account? Should it be yours @humphd?

humphd · 2021-02-10T18:43:50Z

I think isFlagged is probably better language than isDisabled.

Should we have a github key, and then it can have sub-fields?

github: {
  username: 'humphd',
  avatarUrl: 'https://avatars.githubusercontent.com/u/427398?u=faca4e4a8e16d7f7fc61fc9fd185df1671cb9adf&v=4'
}

And we can add more as we go if necessary.

chrispinkney · 2021-02-10T18:47:18Z

@birtony Yeah I believe he said he'll be the admin for the DB. Also displayName will be as specified by the user, so maybe we'll set it to firstName + lastName initially and allow the user to change it after?

@humphd Awesome feedback, agreed. I'll change it up.

I'm not too sure about how we'd get the GitHub Avatar URL + other info (maybe wait until Mo's GH PR is up and toy around with it then?), so maybe something like:

github: {
    username: <as specified by user>,
    avatarUrl: <to be updated by GH microservice, defaulted to empty string>
}

What do you think?

humphd · 2021-02-10T19:14:49Z

The GitHub API will let us get the following info for a user, based on their username:

{
  "login": "octocat",
  "id": 1,
  "node_id": "MDQ6VXNlcjE=",
  "avatar_url": "https://github.com/images/error/octocat_happy.gif",
  "gravatar_id": "",
  "url": "https://api.github.com/users/octocat",
  "html_url": "https://github.com/octocat",
  "followers_url": "https://api.github.com/users/octocat/followers",
  "following_url": "https://api.github.com/users/octocat/following{/other_user}",
  "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
  "organizations_url": "https://api.github.com/users/octocat/orgs",
  "repos_url": "https://api.github.com/users/octocat/repos",
  "events_url": "https://api.github.com/users/octocat/events{/privacy}",
  "received_events_url": "https://api.github.com/users/octocat/received_events",
  "type": "User",
  "site_admin": false,
  "name": "monalisa octocat",
  "company": "GitHub",
  "blog": "https://github.com/blog",
  "location": "San Francisco",
  "email": "octocat@github.com",
  "hireable": false,
  "bio": "There once was...",
  "twitter_username": "monatheoctocat",
  "public_repos": 2,
  "public_gists": 1,
  "followers": 20,
  "following": 0,
  "created_at": "2008-01-14T04:33:35Z",
  "updated_at": "2008-01-14T04:33:35Z"
}

This info could be collected in the browser before submitting the data to the User web service and/or the User web service could get it based on the username. cc @Metropass

chrispinkney · 2021-02-10T19:25:48Z

Hm I'm not too sure what the better approach for this is. If it's just a simple get request then yeah the frontend could pull the avatar url for us with some sort of hook tied to a submit button pretty easily I imagine, which is then saved to the db along with the username. This way the GH microservice could leverage the username specified by the user and pull the service's required info based on the username as specified by the user.

humphd · 2021-02-10T20:57:14Z

I think you can do stuff like this in this User service (i.e., we don't need a separate service to get the info).

If the browser sends us this data, we use it; if we just get a username, we use that to get it ourselves.

Metropass · 2021-02-10T23:59:46Z

That's very useful, we also need to figure out a way where we could hook someone's github onto their blog posts on Telescope. This would be great for an automated About this contributor page or something

humphd · 2021-02-13T19:39:13Z

Another cool thing we can do without having to invent new code is to use GitHub's automatic Atom Feeds for users we create in the system

Try it with your account username:

curl -H "Accept: application/atom+xml" https://github.com/humphd

We can ingest this into our current feed parser, and figure out how to do the UI for it.

humphd · 2021-02-13T19:45:55Z

https://github.com/benwinding/example-jest-firestore-triggers is a useful source of info on how to do testing.

humphd added type: enhancement New feature or request area: back-end area: deployment Production or Staging deployment labels Feb 3, 2021

humphd mentioned this issue Feb 3, 2021

Improve Telescope sign-up flow #1643

Closed

humphd assigned humphd and chrispinkney Feb 5, 2021

humphd mentioned this issue Feb 6, 2021

Explore using microservices in the back-end #1627

Closed

chrispinkney assigned humphd, chrispinkney and birtony and unassigned humphd and chrispinkney Feb 9, 2021

yuanLeeMidori added this to the 1.7 Release milestone Feb 9, 2021

HyperTHD mentioned this issue Feb 13, 2021

Creating a Post microservice #1735

Closed

humphd changed the title ~~Add a database for user and feed info, drop planet feed list~~ Add a User service, with database for user, feed info, drop planet feed list Feb 13, 2021

humphd mentioned this issue Feb 13, 2021

Create a "post parsing/sync" microservice #1736

Closed

huyxgit modified the milestones: 1.7 Release, 1.8 Release Feb 16, 2021

humphd mentioned this issue Feb 20, 2021

Authentication/Authorization microservice #1796

Merged

This was referenced Feb 27, 2021

Add Firebase emulator via Docker #1840

Merged

Post parsing/sync microservice #1828

Merged

[auth] Improve how we define admin accounts #1868

Closed

HyperTHD mentioned this issue Mar 6, 2021

Added Dependabot config for Docker, GitHub Actions, and more npm packages #1878

Merged

8 tasks

chrispinkney mentioned this issue Mar 12, 2021

Closes #1642 - Adds User Microservice #1915

Merged

8 tasks

chrispinkney modified the milestones: 1.8 Release, 1.9 Release Mar 16, 2021

chrispinkney closed this as completed in #1915 Mar 19, 2021

chrispinkney mentioned this issue Mar 25, 2021

Fixes issue-1929: Paginated get route #2022

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a User service, with database for user, feed info, drop planet feed list #1642

Add a User service, with database for user, feed info, drop planet feed list #1642

humphd commented Feb 3, 2021

PedroFonsecaDEV commented Feb 4, 2021

chrispinkney commented Feb 4, 2021

PedroFonsecaDEV commented Feb 4, 2021

chrispinkney commented Feb 5, 2021

humphd commented Feb 6, 2021

chrispinkney commented Feb 10, 2021 •

edited

Loading

birtony commented Feb 10, 2021

humphd commented Feb 10, 2021

chrispinkney commented Feb 10, 2021 •

edited

Loading

humphd commented Feb 10, 2021

chrispinkney commented Feb 10, 2021 •

edited

Loading

humphd commented Feb 10, 2021

Metropass commented Feb 10, 2021

humphd commented Feb 13, 2021

humphd commented Feb 13, 2021

Add a User service, with database for user, feed info, drop planet feed list #1642

Add a User service, with database for user, feed info, drop planet feed list #1642

Comments

humphd commented Feb 3, 2021

PedroFonsecaDEV commented Feb 4, 2021

chrispinkney commented Feb 4, 2021

PedroFonsecaDEV commented Feb 4, 2021

chrispinkney commented Feb 5, 2021

DB

User "Schema":

Potential Issues / Blockers

humphd commented Feb 6, 2021

chrispinkney commented Feb 10, 2021 • edited Loading

birtony commented Feb 10, 2021

humphd commented Feb 10, 2021

chrispinkney commented Feb 10, 2021 • edited Loading

humphd commented Feb 10, 2021

chrispinkney commented Feb 10, 2021 • edited Loading

humphd commented Feb 10, 2021

Metropass commented Feb 10, 2021

humphd commented Feb 13, 2021

humphd commented Feb 13, 2021

chrispinkney commented Feb 10, 2021 •

edited

Loading

chrispinkney commented Feb 10, 2021 •

edited

Loading

chrispinkney commented Feb 10, 2021 •

edited

Loading