Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a User service, with database for user, feed info, drop planet feed list #1642

Closed
humphd opened this issue Feb 3, 2021 · 15 comments · Fixed by #1915
Closed

Add a User service, with database for user, feed info, drop planet feed list #1642

humphd opened this issue Feb 3, 2021 · 15 comments · Fixed by #1915
Assignees
Labels
area: back-end area: deployment Production or Staging deployment type: enhancement New feature or request
Milestone

Comments

@humphd
Copy link
Contributor

humphd commented Feb 3, 2021

What would you like to be added:

Add a new database to our system to hold user, feed, and other related data that needs to exist long term. Our new data design would be:

  • database: long term storage of users and all information related to users, for example, their Seneca email, preferred display name, RSS feeds, URLs to blogs, social media, GitHub, etc.
  • redis: ephemeral storage (e.g., cache) for posts, shared session data, etc
  • elasticsearch: indexing of data for search, for example posts.

Why would you like this to be added:

We currently use the CDOT Planet Feed List as our primary data source. We've augmented that with Redis, but we don't have a great backup strategy or dedicated DevOps team, so it's possible to lose our data (#1365).

It should be possible to use this new database and replace the planet feed list dependency completely.

Considerations:

I'm not sure which database we should use. Here are some thoughts:

  • students coming to this course have all worked with MongoDB in the web stream. Because it's a database that people know, it might be a logical choice.
  • it would be easy to add whatever database we choose to our network of apps (e.g., run it in docker). But should we consider running this in a cloud we don't manage? For example, MongoDB Atlas has a free tier, but it's not really designed for production use. There is also Fauna, Firebase, and lots more that have a free tier. Our data needs are always going to be modest, since we only need to store user info and a few URLs for hundreds of users. Maybe we can fit in a free tier db and then not worry about data loss?
  • whatever db we choose, it would be nice if it was really easy for a dev to either mock locally, or have it be easy to sign-up for a free instance on their own to use in dev.
  • if we do decide to add the db to our network and run it ourselves, we need to consider how to do backups.
@humphd humphd added type: enhancement New feature or request area: back-end area: deployment Production or Staging deployment labels Feb 3, 2021
@PedroFonsecaDEV
Copy link
Contributor

@humphd, so we need to find a cloud to hold our long term storage? Or a SQL running in our machines would solve the problem facing a power outage?

@chrispinkney
Copy link
Contributor

Let's just ask Seneca to provide us with a backup diesel generator in case of power-outages. 😄

I'm using Firestore (nosql) in my capstone project, it's great and extremely generous with its free tier (50k reads per day, 20k writes/deletes per day.) Would love to discuss this further sometime.

@PedroFonsecaDEV
Copy link
Contributor

I used Firebase in a React project; I agree with you @chrispinkney it's a good one.

@chrispinkney
Copy link
Contributor

So here's my initial ideas based on the conversation we all had in both Teams and Slack:

DB

  • This (NoSQL) DB will be provided by Google's Cloud Firestore.
    • It will contain the current CDOT Planet Feed List for online backup purposes.
      • It'll contain user information such as GitHub User URL, Avatar URL(?) (In addition to the user's name and rss/atom feed url.)
    • It needs to be testable (emulateable) using Jest.
      • create a service in Telescope that hides the details of how firebase works, and make it so that it can swap implementations between the real version and testing version based on env
    • Two versions need to exist for both prod and staging servers to use.
  • We need to migrate the current data feed to Firestore (I can probably whip up a script in Python for just that)

User "Schema":

  • The following information needs to be present per user:
    • Name
    • Blog RSS/Atom Feed URL
    • GitHub Username (and profile URL?)
    • Avatar URL (?)
  • A form can be present the moment a user logs in to Telescope wherein they can add the missing information that the schema needs in order to add their blog to the master planet feed.

Potential Issues / Blockers

  • We need a way for developers to access this DB, at the moment AFAIK I believe that you can only access it with a login and API key provided by the DB admin.

@humphd
Copy link
Contributor Author

humphd commented Feb 6, 2021

Some further details.

When a user signs-up to Telescope, they will first authenticate with Seneca's SSO provider, which returns data that looks like this:

{
  "issuer": "https://sts.windows.net/...idp-uuid.../",
  "inResponseTo": "_851...",
  "sessionIndex": "_dfa...",
  "nameID": "username@seneca-domain.ca",
  "nameIDFormat": "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress",
  "http://schemas.microsoft.com/identity/claims/tenantid": "...app-uuid...",
  "http://schemas.microsoft.com/identity/claims/objectidentifier": "...uuid...",
  "http://schemas.microsoft.com/identity/claims/displayname": "Full Name",
  "http://schemas.microsoft.com/identity/claims/identityprovider": "https://sts.windows.net/...app-uuid.../",
  "http://schemas.microsoft.com/claims/authnmethodsreferences": "http://schemas.microsoft.com/ws/2008/06/identity/authenticationmethod/password",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": "Firstname",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": "Lastname",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "username@seneca-domain.ca",
  "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name": "username@seneca-domain.ca",
  "sAMAccountName": "username"
}

We don't care about a lot of this, but we need the following (see what we currently use):

  • id: we hash the nameID and get a 10-character hash (i.e., we won't store the real Seneca username/email)
  • name: user's first and last name. We could store first/last as well, but probably a name is enough? Many students don't go by the name that Seneca uses, and often the first/last name portions Seneca has are not correct.

We also want to augment this with new data:

  • githubUsername: the user's github username. Probably storing this is enough, but we might want to get the id that GitHub uses, in case it's useful for API calls later? Things like their avatar can be constructed at runtime from this username, so we don't need to store it.
  • feeds: an Array of one or more feeds belonging to this user. Feeds are a bit more complex than just storing URLs. We also need to keep track of info like whether the feed is flagged (e.g., don't show it), invalid (e.g., don't bother parsing it), etc. We probably need to think about storing these in a separate table and just putting the id here? Not sure
  • Users might want to have other URLs they keep here too: a home page, a Twitter account, a YouTube channel, Twitch stream? Probably we can collect these as a list of URLs, and have some intelligence in the back-end/front-end to identify special URLs (e.g., we might treat Twitter specially or Twitch, etc).

I suspect that what we'll want to do is create a microservice for this (cc @raygervais), which lets us get info about a user, a list of feeds, etc. This needs some thought, but we can probably build all of this at once vs. trying to do it for the current backend.

@chrispinkney
Copy link
Contributor

chrispinkney commented Feb 10, 2021

After some discussion during Tuesday's meeting we have come to an agreement and also solved some concerns/questions:

  • We are going to be storing the User's information in the following 'schema':

  • Firestore Document ID (per user) : id (see: this for more info on id)

    • firstName : string
    • lastName : string
      • firstName, lastName will be pulled by Telescope's SSO.
    • displayName : string
      • displayName will be entered by the user and preferred to be displayed on Telescope vs First Name and Last Name.
    • github: object {
      • username: string,
      • avatarUrl: string (to be updated by GH microservice?)
        }
    • isAdmin : boolean (false)
    • isFlagged : boolean (false)
    • feeds : array (of user blog feed RSS/Atom URLs)
  • We will store two collections of users:

    • Users Collection:
      • This collection will store the current users of Telescope.
    • Legacy Collection:
      • This collection will store the current CDOT Planet Feed List for backup and migration (see below) purposes.
      • When a user logs in, if the user is not found in the Users collection, we'll check to see if the user is present in the Legacy collection.
        • If present in the Legacy collection, we'll migrate the user to the Users collection.
        • If not present, we'll create a new User and store them in the Users collection.
  • We do not need to create the front end right now, we should do this in a future PR.

If the person reading this has any comments/concerns/feedback please let us know!

@birtony
Copy link
Contributor

birtony commented Feb 10, 2021

Thanks for summarizing, @chrispinkney! I thought we are going to pull the displayName from Telescope SSO (fullName) at first as well? Also, another question that crossed my mind is what email address are we going to use to create a Firebase account? Should it be yours @humphd?

@humphd
Copy link
Contributor Author

humphd commented Feb 10, 2021

I think isFlagged is probably better language than isDisabled.

Should we have a github key, and then it can have sub-fields?

github: {
  username: 'humphd',
  avatarUrl: 'https://avatars.githubusercontent.com/u/427398?u=faca4e4a8e16d7f7fc61fc9fd185df1671cb9adf&v=4'
}

And we can add more as we go if necessary.

@chrispinkney
Copy link
Contributor

chrispinkney commented Feb 10, 2021

@birtony Yeah I believe he said he'll be the admin for the DB. Also displayName will be as specified by the user, so maybe we'll set it to firstName + lastName initially and allow the user to change it after?

@humphd Awesome feedback, agreed. I'll change it up.

I'm not too sure about how we'd get the GitHub Avatar URL + other info (maybe wait until Mo's GH PR is up and toy around with it then?), so maybe something like:

github: {
    username: <as specified by user>,
    avatarUrl: <to be updated by GH microservice, defaulted to empty string>
}

What do you think?

@humphd
Copy link
Contributor Author

humphd commented Feb 10, 2021

The GitHub API will let us get the following info for a user, based on their username:

{
  "login": "octocat",
  "id": 1,
  "node_id": "MDQ6VXNlcjE=",
  "avatar_url": "https://github.com/images/error/octocat_happy.gif",
  "gravatar_id": "",
  "url": "https://api.github.com/users/octocat",
  "html_url": "https://github.com/octocat",
  "followers_url": "https://api.github.com/users/octocat/followers",
  "following_url": "https://api.github.com/users/octocat/following{/other_user}",
  "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
  "organizations_url": "https://api.github.com/users/octocat/orgs",
  "repos_url": "https://api.github.com/users/octocat/repos",
  "events_url": "https://api.github.com/users/octocat/events{/privacy}",
  "received_events_url": "https://api.github.com/users/octocat/received_events",
  "type": "User",
  "site_admin": false,
  "name": "monalisa octocat",
  "company": "GitHub",
  "blog": "https://github.com/blog",
  "location": "San Francisco",
  "email": "octocat@github.com",
  "hireable": false,
  "bio": "There once was...",
  "twitter_username": "monatheoctocat",
  "public_repos": 2,
  "public_gists": 1,
  "followers": 20,
  "following": 0,
  "created_at": "2008-01-14T04:33:35Z",
  "updated_at": "2008-01-14T04:33:35Z"
}

This info could be collected in the browser before submitting the data to the User web service and/or the User web service could get it based on the username. cc @Metropass

@chrispinkney
Copy link
Contributor

chrispinkney commented Feb 10, 2021

Hm I'm not too sure what the better approach for this is. If it's just a simple get request then yeah the frontend could pull the avatar url for us with some sort of hook tied to a submit button pretty easily I imagine, which is then saved to the db along with the username. This way the GH microservice could leverage the username specified by the user and pull the service's required info based on the username as specified by the user.

@humphd
Copy link
Contributor Author

humphd commented Feb 10, 2021

I think you can do stuff like this in this User service (i.e., we don't need a separate service to get the info).

If the browser sends us this data, we use it; if we just get a username, we use that to get it ourselves.

@Metropass
Copy link
Contributor

That's very useful, we also need to figure out a way where we could hook someone's github onto their blog posts on Telescope. This would be great for an automated About this contributor page or something

@humphd
Copy link
Contributor Author

humphd commented Feb 13, 2021

Another cool thing we can do without having to invent new code is to use GitHub's automatic Atom Feeds for users we create in the system

Try it with your account username:

curl -H "Accept: application/atom+xml" https://github.com/humphd

We can ingest this into our current feed parser, and figure out how to do the UI for it.

@humphd humphd changed the title Add a database for user and feed info, drop planet feed list Add a User service, with database for user, feed info, drop planet feed list Feb 13, 2021
@humphd
Copy link
Contributor Author

humphd commented Feb 13, 2021

https://github.com/benwinding/example-jest-firestore-triggers is a useful source of info on how to do testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: back-end area: deployment Production or Staging deployment type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants