Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Google File Store as a replacement for NFS! #3898

Closed
3 tasks done
balajialg opened this issue Oct 31, 2022 · 37 comments
Closed
3 tasks done

Explore Google File Store as a replacement for NFS! #3898

balajialg opened this issue Oct 31, 2022 · 37 comments
Assignees
Labels
enhancement Issues around improving existing functionality

Comments

@balajialg
Copy link
Contributor

balajialg commented Oct 31, 2022

We have been facing NFS issues due to a race condition in the Linux kernel which is hard to troubleshoot and resulted in a few outages in the past month. During our last team meeting, some of us were interested to explore Google FIle Store (GFS) as an alternative to NFS. @ericvd-ucb kindly agreed to do the outreach and reached out to Google Filestore folks using his contacts to have a conversation about GFS instead of NFS. GFS's point of contact outlined the below points in their response,

What you described is common when customers run NFS themselves. These consistency issues are hard to troubleshoot. We do have many customers running Filestore with multiple directories that in turn serve multiple users. The benefit of using a managed NFS solution like Filestore is that you don't have to manage NFS and simply get it out of the box. Filestore has multiple tiers (we recommended Filestore Enterprise to give you a HA solution by default) but you can also choose basic if you like. You only pay for the storage you consume. (as opposed to when you are running NFS yourself you probably are consuming compute from VMs and storage form PDs) The one thing to watch out for with Filestore Enterprise (based on what we hear from other customers) is the entry point of 1 TiB. You can of course consume the space by placing the directories of multiple users in the same Filestore instance, driving up utilization. In case you want isolation between users, you can also use multishares that share the underlying Filestore instance and drive up utilization. Outside this specific concern of the min entry size (that you can workaround based on solutions shared above), you get regional-backed storage, managed NFS, and just pay for the storage consumed and many customers use it at scale.

We need to evaluate whether what they proposed above is something we are interested to explore from a technical standpoint.

From my limited understanding, I looked at our billing report for the month of Oct 22 and found that their enterprise version (~$600 per month for 10 TiB) is at par with whatever we are spending for PD + snapshots (~4100 per month for 70 TiB). I am assuming I didn't miss anything in this calculation but please correct me if my interpretation is wrong.

image

To Do

  • Evaluate whether GFS fits our need
  • Deploy GFS on a pilot hub
  • If the pilot is successful, transition all hubs from NFS to GFS
@balajialg balajialg changed the title Explore Google File Store as a replacement for NFS Server! Explore Google File Store as a replacement for NFS! Oct 31, 2022
@ryanlovett
Copy link
Collaborator

Thanks for looking into this @balajialg !

Are there published reports about real world use of Filestore and its reliability? Our nodes would still talk NFS to the Filestore and there could still be buggy NFS client behavior. In such cases, there would be no way to debug from the Filestore.

Can we monitor the Filestore with prometheus or is there some other method? (or is Filestore so reliable that we don't need to monitor it?)

Would everything be moved to Filestore, or would some nodepools move to Filestore while others would be kept on self-managed NFS?

Should performance be tested before and after a node is moved to Filestore?

@yuvipanda @felder Why did we migrate away from Filestore originally? Cost?

Recommended Linux client mount options

@balajialg
Copy link
Contributor Author

balajialg commented Nov 1, 2022

This is all I could find through a Google search for case studies - https://cloud.google.com/filestore#section-5. Great questions and I will let the experts answer these questions and suggest the way forward. @ericvd-ucb Can we consolidate these questions and share the relevant ones with Filestore folks?

@ryanlovett Do you think it is the best use of our DevOps time to set up a conversation with them? or just do our preliminary investigation at our end before deciding whether we want to engage with them further?

@ryanlovett
Copy link
Collaborator

Do you think it is the best use of our DevOps time to set up a conversation with them? or just do our preliminary investigation at our end before deciding whether we want to engage with them further?

That's probably a question for @shaneknapp . :)

@shaneknapp
Copy link
Contributor

shaneknapp commented Nov 1, 2022 via email

@shaneknapp
Copy link
Contributor

shaneknapp commented Nov 1, 2022 via email

@yuvipanda
Copy link
Contributor

@shaneknapp 2i2c-org/infrastructure#764 has info on longer term fixes that are being investigated as well

@shaneknapp
Copy link
Contributor

shaneknapp commented Nov 1, 2022 via email

@felder
Copy link
Contributor

felder commented Nov 1, 2022

@ryanlovett yes I believe cost is the primary reason Google Filestore was not fully explored.

https://cloud.google.com/filestore/pricing
vs
https://cloud.google.com/compute/disks-image-pricing#disk

@balajialg balajialg added this to To do in 2022-11 Sprint Board Nov 2, 2022
@balajialg
Copy link
Contributor Author

balajialg commented Nov 2, 2022

Apparently, Google FIlestore was used for the Data 8x hub and the move to NFS happened due to pandemic-related cost cuts in March 2020. For more details, check out this issue - #1374

@ryanlovett
Copy link
Collaborator

@balajialg
Copy link
Contributor Author

balajialg commented Nov 3, 2022

@ryanlovett awesome! It will be great if there is some billing-related info available during this duration when PRs get merged. I will work with @felder (if he has the time) to see if we can model costs for filestore based on our current usage.

@balajialg balajialg moved this from To do to In progress in 2022-11 Sprint Board Nov 3, 2022
@balajialg
Copy link
Contributor Author

balajialg commented Nov 10, 2022

@shaneknapp @felder Any suggestions on the way forward with filestore exploration? Is this something we want to a) pursue and b) if yes then is this a priority for this semester? I was thinking we can get back to the Google Filestore PM about where we stand before the end of this week. If you all need more time then let me know.

@shaneknapp
Copy link
Contributor

shaneknapp commented Nov 10, 2022 via email

@balajialg
Copy link
Contributor Author

@shaneknapp I think we can parallel process this! Correct me if I am wrong - we want to evaluate whether the filestore solution is a) desirable and b) feasible. I can probably work with @felder or anyone else who has the time to figure out the feasibility part from the cost perspective. The more important question is whether this solution is even desirable - whether we want to invest the effort to do a pilot. I think you and @felder are best positioned to guide us with this decision (with support from Ryan and Yuvi).

@balajialg
Copy link
Contributor Author

balajialg commented Nov 18, 2022

Updating the latest conversation with @felder about Google filestore. We definitely want to do a pilot implementation of filestore and based on that experience decide whether to transition all our hubs. @shaneknapp has also given a thumbs up about exploring filestore. We need to figure out when we want to scope this work which we can plan during Sprint Planning Meeting for December.

https://docs.datahub.berkeley.edu/en/latest/admins/storage.html#nfs-client still highlights that we use filestore for our Data 8x deployment. This needs to be corrected. I can spend some time to update this information asap.

@balajialg
Copy link
Contributor Author

balajialg commented Nov 22, 2022

Yet another update here based on multiple discussions with the team. @felder and @shaneknapp will do a detailed analysis of Google Filestore and reconvene to discuss their learnings and the path forward. @ryanlovett is also doing his own research about filestore as he is thinking about moving at least the Stat 20 hub from the NFS server to Google Filestore and evaluating whether it resolves some of the NFS challenges. He also has a bunch of open questions that he wants the team to think about which he will add to this github thread. It is mostly around the information available in this doc - https://cloud.google.com/filestore/docs/creating-instances#instance_type

I have scoped the first half of our Sprint planning meeting on Dec 8th to discuss and decide the path forward with regards to moving our hubs to Google filestore.

@ryanlovett
Copy link
Collaborator

I'll discuss the Stat 20 aspect at the next meeting, but I definitely want to use Filestore for Spring '23. Some questions:

  1. Should there be per-hub instances, other aggregations like how hub disk/directories are configured now, or one big Filestore?
  2. What service level? Options are Basic HD, Basic SSD, Enterprise, and High Scale. Basic HD and Basic SSD are not limited in terms of size, but Enterprise is 1-10TB and High Scale is 10-100TB. Our largest disk consumers are 7-9TB so using Enterprise could be limiting if preserving the current hub/volume mapping, and Enterprise is also 2x the cost of High Scale. However High Scale only lets you resize in 2.5TB increments. That's about 10% of current utilization, so maybe those bumps aren't too painful in terms of headroom costs. Basic SSD seems very flexible, but is its performance sufficient?
  3. It isn't clear what the reliability differences are between service levels. I'm guessing they use the same NFS versions/implementations so there's probably nothing much to mention.
  4. We should extract and aggregate on server R/W IOPS and R/W throughput from prometheus. Currently we're seeing client figures. Then we can compare apples to apples for their service offerings.
  5. Using Filestore is more expensive so overprovisioning is more painful. We should set reasonable defaults but we'll have to monitor and scale up as time goes on. How will this happen?
  6. Given the cost, we'll have to monitor usage for large consumers, and apply downward pressure. A disk usage policy is very important. Should scaling disk be automated?
  7. What do we do if our NFS clients emit high test_stateid ops even after switching? We would no longer be able to affect this on the server side. We could monitor clients, and choose a service level with a sufficiently high ops/s.
  8. Setting a timeline for switching is important. I'd want to having something in place by the first week of January. I'm fine with deploying this for just Stat 20 if that is too aggressive for the other hubs.

@ryanlovett
Copy link
Collaborator

Regarding #3, @balajialg quoted the Google rep who said, "Filestore Enterprise [gives] you a HA solution by default." High scale and Enterprise both have: Non-disruptive maintenance: Supported and remain available during maintenance events, while Basic HDD/SSD are not supported.

https://cloud.google.com/filestore/docs/service-tiers

Regarding #5, the Google rep said, "just pay for the storage consumed", so overprovisioning would not be painful.

@balajialg balajialg removed this from In progress in 2022-11 Sprint Board Nov 28, 2022
@balajialg balajialg added this to To Do in 2022-12 Sprint Board via automation Nov 28, 2022
@balajialg balajialg moved this from To Do to In Progress in 2022-12 Sprint Board Nov 28, 2022
@balajialg balajialg added the enhancement Issues around improving existing functionality label Nov 28, 2022
@felder
Copy link
Contributor

felder commented Nov 30, 2022

@ryanlovett @balajialg @shaneknapp

Looking at:
https://cloud.google.com/filestore/docs/service-tiers
https://cloud.google.com/filestore/pricing
https://cloud.google.com/filestore/docs/backups
https://cloud.google.com/filestore/docs/snapshots

  1. I'm leaning toward one volume per hub.
  2. High scale has a 10TB minimum and no support for backups. Enterprise might be good, but it's also the most expensive tier at $0.60GB/month. Enterprise is also limited to a max of 10TB per volume. That's going to be a problem if we're not more aggressive about limiting/managing storage. Another consideration is the maximum number of recommended clients. Our busiest customers exceed the recommended limits for all of the tiers. Lastly pay special attention to the data recovery options. "Snapshots" to me looks like it'd be quite undesirable for our use case. "Backups" are only available for the Basic tiers. The naming here is somewhat odd. "Backups" seem to function more like persistent disk snapshots. "Snapshots" in this case do not function like persistent disk snapshots. Of particular note, deleting a file captured in a "snapshot" does not free the space on the filesystem. I'd really like to talk to someone at google about the ins and outs of these data recovery options.
  3. Enterprise has regional availability vs zonal for the other tiers, so I'd expect it to be more "available"
  4. True, but at the same time each tier has other features and costs which may be bigger factors in the decision. For example no data recovery options for the High Scale tier would make it a non starter IMO.
  5. Good question, I'm more concerned with monitoring and storage size limits than the cost of provisioning.
  6. Agreed, managing storage consumption due to server tier limits and cost is definitely going to be of increased importance. IMO this is already a major, but currently overlooked, consideration for the datahub service in general.
  7. Google support contract? I say that seriously because once we move to a managed service we are essentially handing over control of storage management (and debugging) to them.

@ryanlovett according to the pricing page, you pay for storage that is allocated (not just consumed). If the google rep said otherwise, that appears to be in conflict with the docs.

@felder
Copy link
Contributor

felder commented Nov 30, 2022

There is also this service:
https://cloud.google.com/filestore/docs/multishares

@felder
Copy link
Contributor

felder commented Dec 1, 2022

Also came across this:
https://cloud.google.com/community/tutorials/gke-filestore-dynamic-provisioning

Could this be used to provision per student pvcs of a fixed size?

@felder felder closed this as completed Dec 1, 2022
2022-12 Sprint Board automation moved this from In Progress to Done Dec 1, 2022
@felder felder reopened this Dec 1, 2022
2022-12 Sprint Board automation moved this from Done to In Progress Dec 1, 2022
@ericvd-ucb
Copy link
Contributor

Potentially could talk to GCP folks about some of Ryan and Jons questions above, if its helpful - on Friday ?

@balajialg
Copy link
Contributor Author

balajialg commented Dec 12, 2022

Next Steps from Sprint Planning Meeting:

  • Question about the nature of backup for GFS services to filestore folks
  • Visual representation of GFS-managed services and our decision to go with a specific plan
  • Set up filestore for an a11y hub this week

Useful docs:

@balajialg
Copy link
Contributor Author

balajialg commented Dec 19, 2022

  • Estimate the total file store cost and evaluate whether that is in alignment with the leadership expectations.

@shaneknapp
Copy link
Contributor

shaneknapp commented Dec 19, 2022

224M	a11y
821G	astro
4.4T	biology
32G	cee
868K	cs194
163G	data101
240G	data102
347G	dlab
1.4T	eecs
2.7G	highschool
12G	julia
30G	prob140
30M	shiny
1.4T	stat159
281G	stat20
440K	stat89a
17G	workshop
408K	xfsconfig```

@ryanlovett
Copy link
Collaborator

@shaneknapp There's no stat89a deployment and it looks like it won't be taught next Spring too so you can skip that one.

It might be possible to skip shiny as well if the R hub rebuild fixes shiny-related issues (potentially fixed by repo2docker update). It wasn't used much in Fall. (Cc @ericvd-ucb)

@balajialg
Copy link
Contributor Author

balajialg commented Dec 20, 2022

Thanks, @shaneknapp for the detailed storage report. Super insightful.

@shaneknapp @felder @ryanlovett Have a few questions related to our strategy for filestore creation. The spirit of the below questions comes from how can we be a good steward of RTL's extra $5k per month grant for our cloud usage. None of the below points are relevant to our major hubs like Datahub, R hub, I School, Stat 20, Biology, EECS, Public Health, Data 8, and Data 100 hubs.

Mini Filestore: I wonder if it makes sense to have a shared filestore for all the small hubs (based on storage) like a11y, CEE, CS 194, High school, Julia, Prob 140, Stat 89a, Shiny, and Workshop hub?
No Filestore: Do we even need to create a shared file store for hubs like a11y, Shiny, Julia, High School, and Workshop? They are not actively used and have a seasonality to their usage. Based on what I heard from CEE, D-Lab and Econ 140 instructors, Most of the users of the hub which is occasionally used had a good experience with Datahub this semester.
Medium Filestore: I understand that we want to isolate all the major hubs from each other. How about having a shared file store for medium storage hubs like Data 101, 102 and D-Lab which have storage of fewer than 350 GB? What are the benefits and pitfalls of this approach? How risky would that be?

@ryanlovett
Copy link
Collaborator

I'll defer to @shaneknapp and @felder about conserving filestore spend.

IMO, hubs which have a lot of users and/or I/O activity should be on separate filestores regardless of how much space they're using. It is the I/O burden that we wan't to keep separate. I believe that on some storage tiers, larger filestores perform better. That would be one reason to commingle deployments on the same filestore. If we had more first hand experience and data on performance and reliability then I might change my mind.

@balajialg balajialg removed this from In Progress in 2022-12 Sprint Board Dec 22, 2022
@shaneknapp
Copy link
Contributor

completed syncs: astro, biology, data100, eecs, stat159, stat20
currently syncing: data8, datahub, ischool
remaining to be synced: cee, data101, data102, dlab, prob140

@shaneknapp
Copy link
Contributor

re conserving filestore spend. i'd much rather stick to the 1:1 ratio of course->hub->filestore. IO ops, compartmentalizing failures, etc.

@shaneknapp
Copy link
Contributor

completed syncs: astro, biology, data100, eecs, stat159, stat20 currently syncing: data8, datahub, ischool remaining to be synced: cee, data101, data102, dlab, prob140

this is done, and i scaled up a bunch of instances.

see also https://docs.google.com/spreadsheets/d/1rj-iCpcHBcA_lUT7NXrOJaTpT2Le4fcb5Tank8D0ICQ/edit?usp=sharing

@balajialg
Copy link
Contributor Author

balajialg commented Jan 4, 2023

Adding information about the courses using the smaller hubs and the student enrollment count to help with decision-making related to filestore allocation.

Smaller hubs (No. of courses using it, Total count of students enrolled as part of these courses, periodicity of usage)

  • Julia hub (No course, periodic)
  • Shiny hub (1 course, 350 students, periodic)
  • a11y hub (No course, periodic)
  • High school hub (1 program, 23 students, periodic)
  • Workshop hub (1 DSUS workshop, <100 participants, periodic)
  • CEE hub (1 course, 70 students, periodic)

@ryanlovett
Copy link
Collaborator

shiny hub was necessary when there was an issue with RStudio/R/R Graphics API/Shiny on the R hub. Hopefully that will go away with various version toggling. Though it was created with a separate set of home directories, my opinion is that when it is backed by filestore it should use the same filestore and node pool as R hub. export/homedirs-other-2020-07-29/shiny need not be copied anywhere.

Also, shiny hub was not used very much. Only a few people logged into it.

@balajialg
Copy link
Contributor Author

Agreed, @ryanlovett! Most of the above mentioned hubs had really few people logging in during FA 22.

@shaneknapp
Copy link
Contributor

#4072
#4073
#4075

all identified courses should be migrated to their own nodepool + GFS instance (except R hub, which shares infra w/datahub)

@balajialg
Copy link
Contributor Author

@shaneknapp Closing this issue as you all have completed the pending tasks. Please feel free to reopen if there are any pending tasks to be tracked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues around improving existing functionality
Projects
2023-01 User Board
Awaiting triage
Development

No branches or pull requests

6 participants