Explore Google File Store as a replacement for NFS! #3898

balajialg · 2022-10-31T23:26:54Z

We have been facing NFS issues due to a race condition in the Linux kernel which is hard to troubleshoot and resulted in a few outages in the past month. During our last team meeting, some of us were interested to explore Google FIle Store (GFS) as an alternative to NFS. @ericvd-ucb kindly agreed to do the outreach and reached out to Google Filestore folks using his contacts to have a conversation about GFS instead of NFS. GFS's point of contact outlined the below points in their response,

What you described is common when customers run NFS themselves. These consistency issues are hard to troubleshoot. We do have many customers running Filestore with multiple directories that in turn serve multiple users. The benefit of using a managed NFS solution like Filestore is that you don't have to manage NFS and simply get it out of the box. Filestore has multiple tiers (we recommended Filestore Enterprise to give you a HA solution by default) but you can also choose basic if you like. You only pay for the storage you consume. (as opposed to when you are running NFS yourself you probably are consuming compute from VMs and storage form PDs) The one thing to watch out for with Filestore Enterprise (based on what we hear from other customers) is the entry point of 1 TiB. You can of course consume the space by placing the directories of multiple users in the same Filestore instance, driving up utilization. In case you want isolation between users, you can also use multishares that share the underlying Filestore instance and drive up utilization. Outside this specific concern of the min entry size (that you can workaround based on solutions shared above), you get regional-backed storage, managed NFS, and just pay for the storage consumed and many customers use it at scale.

We need to evaluate whether what they proposed above is something we are interested to explore from a technical standpoint.

From my limited understanding, I looked at our billing report for the month of Oct 22 and found that their enterprise version (~$600 per month for 10 TiB) is at par with whatever we are spending for PD + snapshots (~4100 per month for 70 TiB). I am assuming I didn't miss anything in this calculation but please correct me if my interpretation is wrong.

To Do

Evaluate whether GFS fits our need
Deploy GFS on a pilot hub
If the pilot is successful, transition all hubs from NFS to GFS

ryanlovett · 2022-10-31T23:50:50Z

Thanks for looking into this @balajialg !

Are there published reports about real world use of Filestore and its reliability? Our nodes would still talk NFS to the Filestore and there could still be buggy NFS client behavior. In such cases, there would be no way to debug from the Filestore.

Can we monitor the Filestore with prometheus or is there some other method? (or is Filestore so reliable that we don't need to monitor it?)

Would everything be moved to Filestore, or would some nodepools move to Filestore while others would be kept on self-managed NFS?

Should performance be tested before and after a node is moved to Filestore?

@yuvipanda @felder Why did we migrate away from Filestore originally? Cost?

Recommended Linux client mount options

balajialg · 2022-11-01T01:25:36Z

This is all I could find through a Google search for case studies - https://cloud.google.com/filestore#section-5. Great questions and I will let the experts answer these questions and suggest the way forward. @ericvd-ucb Can we consolidate these questions and share the relevant ones with Filestore folks?

@ryanlovett Do you think it is the best use of our DevOps time to set up a conversation with them? or just do our preliminary investigation at our end before deciding whether we want to engage with them further?

ryanlovett · 2022-11-01T16:54:27Z

Do you think it is the best use of our DevOps time to set up a conversation with them? or just do our preliminary investigation at our end before deciding whether we want to engage with them further?

That's probably a question for @shaneknapp . :)

shaneknapp · 2022-11-01T17:41:15Z

yeah, let me look in to this a bit over the next couple of days. from a glance, it looks like it'll be a little (maybe?) more expensive but (hopefully?) more reliable. i'm also curious how 2i2c does this.

…

On Tue, Nov 1, 2022 at 9:54 AM Ryan Lovett ***@***.***> wrote: Do you think it is the best use of our DevOps time to set up a conversation with them? or just do our preliminary investigation at our end before deciding whether we want to engage with them further? That's probably a question for @shaneknapp <https://github.com/shaneknapp> . :) — Reply to this email directly, view it on GitHub <#3898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMIHLBVZHRPXOLUQBHWAIDWGFDM5ANCNFSM6AAAAAARTR2EQQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shaneknapp · 2022-11-01T17:57:14Z

answered my own question, re 212c, NFS and filestore: https://infrastructure.2i2c.org/en/latest/howto/operate/manual-nfs-setup.html?highlight=filestore [image: image.png]

…

On Tue, Nov 1, 2022 at 10:41 AM shane knapp ☠ ***@***.***> wrote: yeah, let me look in to this a bit over the next couple of days. from a glance, it looks like it'll be a little (maybe?) more expensive but (hopefully?) more reliable. i'm also curious how 2i2c does this. On Tue, Nov 1, 2022 at 9:54 AM Ryan Lovett ***@***.***> wrote: > Do you think it is the best use of our DevOps time to set up a > conversation with them? or just do our preliminary investigation at our end > before deciding whether we want to engage with them further? > > That's probably a question for @shaneknapp > <https://github.com/shaneknapp> . :) > > — > Reply to this email directly, view it on GitHub > <#3898 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAMIHLBVZHRPXOLUQBHWAIDWGFDM5ANCNFSM6AAAAAARTR2EQQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

yuvipanda · 2022-11-01T19:25:37Z

@shaneknapp 2i2c-org/infrastructure#764 has info on longer term fixes that are being investigated as well

shaneknapp · 2022-11-01T19:47:30Z

this is great, thanks yuvi!

…

On Tue, Nov 1, 2022 at 12:25 PM Yuvi Panda ***@***.***> wrote: @shaneknapp <https://github.com/shaneknapp> 2i2c-org/infrastructure#764 <2i2c-org/infrastructure#764> has info on longer term fixes that are being investigated as well — Reply to this email directly, view it on GitHub <#3898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMIHLC5Q626HZJB274CYN3WGFVDXANCNFSM6AAAAAARTR2EQQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

felder · 2022-11-01T19:47:34Z

@ryanlovett yes I believe cost is the primary reason Google Filestore was not fully explored.

https://cloud.google.com/filestore/pricing
vs
https://cloud.google.com/compute/disks-image-pricing#disk

balajialg · 2022-11-02T22:40:32Z

Apparently, Google FIlestore was used for the Data 8x hub and the move to NFS happened due to pandemic-related cost cuts in March 2020. For more details, check out this issue - #1374

ryanlovett · 2022-11-02T23:51:12Z

@balajialg Looks like we moved a few hubs to Filestore in 2019.

Other commits: https://github.com/berkeley-dsep-infra/datahub/search?q=filestore&type=commits

balajialg · 2022-11-03T00:48:36Z

@ryanlovett awesome! It will be great if there is some billing-related info available during this duration when PRs get merged. I will work with @felder (if he has the time) to see if we can model costs for filestore based on our current usage.

balajialg · 2022-11-10T01:19:00Z

@shaneknapp @felder Any suggestions on the way forward with filestore exploration? Is this something we want to a) pursue and b) if yes then is this a priority for this semester? I was thinking we can get back to the Google Filestore PM about where we stand before the end of this week. If you all need more time then let me know.

shaneknapp · 2022-11-10T01:32:42Z

someone needs to investigate pricing... what we have vs same deployment on GFS.

…

On Wed, Nov 9, 2022 at 5:19 PM Balaji Alwar ***@***.***> wrote: @shaneknapp <https://github.com/shaneknapp> @felder <https://github.com/felder> Any suggestions on the way forward with filestore exploration? Is this something we want to a) pursue and b) if yes then this semester? I was thinking we can get back to the Google Filestore PM about where we stand before the end of this week. If you all need more time then let me know. — Reply to this email directly, view it on GitHub <#3898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMIHLBI4JM54LBZAZFBZWLWHREQ7ANCNFSM6AAAAAARTR2EQQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

balajialg · 2022-11-10T18:37:37Z

@shaneknapp I think we can parallel process this! Correct me if I am wrong - we want to evaluate whether the filestore solution is a) desirable and b) feasible. I can probably work with @felder or anyone else who has the time to figure out the feasibility part from the cost perspective. The more important question is whether this solution is even desirable - whether we want to invest the effort to do a pilot. I think you and @felder are best positioned to guide us with this decision (with support from Ryan and Yuvi).

balajialg · 2022-11-18T21:12:38Z

Updating the latest conversation with @felder about Google filestore. We definitely want to do a pilot implementation of filestore and based on that experience decide whether to transition all our hubs. @shaneknapp has also given a thumbs up about exploring filestore. We need to figure out when we want to scope this work which we can plan during Sprint Planning Meeting for December.

https://docs.datahub.berkeley.edu/en/latest/admins/storage.html#nfs-client still highlights that we use filestore for our Data 8x deployment. This needs to be corrected. I can spend some time to update this information asap.

balajialg · 2022-11-22T02:20:02Z

Yet another update here based on multiple discussions with the team. @felder and @shaneknapp will do a detailed analysis of Google Filestore and reconvene to discuss their learnings and the path forward. @ryanlovett is also doing his own research about filestore as he is thinking about moving at least the Stat 20 hub from the NFS server to Google Filestore and evaluating whether it resolves some of the NFS challenges. He also has a bunch of open questions that he wants the team to think about which he will add to this github thread. It is mostly around the information available in this doc - https://cloud.google.com/filestore/docs/creating-instances#instance_type

I have scoped the first half of our Sprint planning meeting on Dec 8th to discuss and decide the path forward with regards to moving our hubs to Google filestore.

ryanlovett · 2022-11-22T03:55:50Z

I'll discuss the Stat 20 aspect at the next meeting, but I definitely want to use Filestore for Spring '23. Some questions:

Should there be per-hub instances, other aggregations like how hub disk/directories are configured now, or one big Filestore?
What service level? Options are Basic HD, Basic SSD, Enterprise, and High Scale. Basic HD and Basic SSD are not limited in terms of size, but Enterprise is 1-10TB and High Scale is 10-100TB. Our largest disk consumers are 7-9TB so using Enterprise could be limiting if preserving the current hub/volume mapping, and Enterprise is also 2x the cost of High Scale. However High Scale only lets you resize in 2.5TB increments. That's about 10% of current utilization, so maybe those bumps aren't too painful in terms of headroom costs. Basic SSD seems very flexible, but is its performance sufficient?
It isn't clear what the reliability differences are between service levels. I'm guessing they use the same NFS versions/implementations so there's probably nothing much to mention.
We should extract and aggregate on server R/W IOPS and R/W throughput from prometheus. Currently we're seeing client figures. Then we can compare apples to apples for their service offerings.
Using Filestore is more expensive so overprovisioning is more painful. We should set reasonable defaults but we'll have to monitor and scale up as time goes on. How will this happen?
Given the cost, we'll have to monitor usage for large consumers, and apply downward pressure. A disk usage policy is very important. Should scaling disk be automated?
What do we do if our NFS clients emit high test_stateid ops even after switching? We would no longer be able to affect this on the server side. We could monitor clients, and choose a service level with a sufficiently high ops/s.
Setting a timeline for switching is important. I'd want to having something in place by the first week of January. I'm fine with deploying this for just Stat 20 if that is too aggressive for the other hubs.

ryanlovett · 2022-11-23T01:36:17Z

Regarding #3, @balajialg quoted the Google rep who said, "Filestore Enterprise [gives] you a HA solution by default." High scale and Enterprise both have: Non-disruptive maintenance: Supported and remain available during maintenance events, while Basic HDD/SSD are not supported.

https://cloud.google.com/filestore/docs/service-tiers

Regarding #5, the Google rep said, "just pay for the storage consumed", so overprovisioning would not be painful.

felder · 2022-11-30T19:27:51Z

@ryanlovett @balajialg @shaneknapp

Looking at:
https://cloud.google.com/filestore/docs/service-tiers
https://cloud.google.com/filestore/pricing
https://cloud.google.com/filestore/docs/backups
https://cloud.google.com/filestore/docs/snapshots

I'm leaning toward one volume per hub.
High scale has a 10TB minimum and no support for backups. Enterprise might be good, but it's also the most expensive tier at $0.60GB/month. Enterprise is also limited to a max of 10TB per volume. That's going to be a problem if we're not more aggressive about limiting/managing storage. Another consideration is the maximum number of recommended clients. Our busiest customers exceed the recommended limits for all of the tiers. Lastly pay special attention to the data recovery options. "Snapshots" to me looks like it'd be quite undesirable for our use case. "Backups" are only available for the Basic tiers. The naming here is somewhat odd. "Backups" seem to function more like persistent disk snapshots. "Snapshots" in this case do not function like persistent disk snapshots. Of particular note, deleting a file captured in a "snapshot" does not free the space on the filesystem. I'd really like to talk to someone at google about the ins and outs of these data recovery options.
Enterprise has regional availability vs zonal for the other tiers, so I'd expect it to be more "available"
True, but at the same time each tier has other features and costs which may be bigger factors in the decision. For example no data recovery options for the High Scale tier would make it a non starter IMO.
Good question, I'm more concerned with monitoring and storage size limits than the cost of provisioning.
Agreed, managing storage consumption due to server tier limits and cost is definitely going to be of increased importance. IMO this is already a major, but currently overlooked, consideration for the datahub service in general.
Google support contract? I say that seriously because once we move to a managed service we are essentially handing over control of storage management (and debugging) to them.

@ryanlovett according to the pricing page, you pay for storage that is allocated (not just consumed). If the google rep said otherwise, that appears to be in conflict with the docs.

felder · 2022-11-30T20:07:28Z

There is also this service:
https://cloud.google.com/filestore/docs/multishares

felder · 2022-12-01T00:54:36Z

Also came across this:
https://cloud.google.com/community/tutorials/gke-filestore-dynamic-provisioning

Could this be used to provision per student pvcs of a fixed size?

ericvd-ucb · 2022-12-03T23:58:18Z

Potentially could talk to GCP folks about some of Ryan and Jons questions above, if its helpful - on Friday ?

balajialg · 2022-12-12T18:54:24Z

Next Steps from Sprint Planning Meeting:

Question about the nature of backup for GFS services to filestore folks
Visual representation of GFS-managed services and our decision to go with a specific plan
Set up filestore for an a11y hub this week

Useful docs:

ryanlovett · 2022-12-12T18:58:31Z

More 2i2c links...

balajialg · 2022-12-19T23:43:35Z

Estimate the total file store cost and evaluate whether that is in alignment with the leadership expectations.

shaneknapp · 2022-12-19T23:51:53Z

224M	a11y
821G	astro
4.4T	biology
32G	cee
868K	cs194
163G	data101
240G	data102
347G	dlab
1.4T	eecs
2.7G	highschool
12G	julia
30G	prob140
30M	shiny
1.4T	stat159
281G	stat20
440K	stat89a
17G	workshop
408K	xfsconfig```

ryanlovett · 2022-12-20T00:09:18Z

@shaneknapp There's no stat89a deployment and it looks like it won't be taught next Spring too so you can skip that one.

It might be possible to skip shiny as well if the R hub rebuild fixes shiny-related issues (potentially fixed by repo2docker update). It wasn't used much in Fall. (Cc @ericvd-ucb)

balajialg · 2022-12-20T00:49:23Z

Thanks, @shaneknapp for the detailed storage report. Super insightful.

@shaneknapp @felder @ryanlovett Have a few questions related to our strategy for filestore creation. The spirit of the below questions comes from how can we be a good steward of RTL's extra $5k per month grant for our cloud usage. None of the below points are relevant to our major hubs like Datahub, R hub, I School, Stat 20, Biology, EECS, Public Health, Data 8, and Data 100 hubs.

Mini Filestore: I wonder if it makes sense to have a shared filestore for all the small hubs (based on storage) like a11y, CEE, CS 194, High school, Julia, Prob 140, Stat 89a, Shiny, and Workshop hub?
No Filestore: Do we even need to create a shared file store for hubs like a11y, Shiny, Julia, High School, and Workshop? They are not actively used and have a seasonality to their usage. Based on what I heard from CEE, D-Lab and Econ 140 instructors, Most of the users of the hub which is occasionally used had a good experience with Datahub this semester.
Medium Filestore: I understand that we want to isolate all the major hubs from each other. How about having a shared file store for medium storage hubs like Data 101, 102 and D-Lab which have storage of fewer than 350 GB? What are the benefits and pitfalls of this approach? How risky would that be?

ryanlovett · 2022-12-20T01:54:40Z

I'll defer to @shaneknapp and @felder about conserving filestore spend.

IMO, hubs which have a lot of users and/or I/O activity should be on separate filestores regardless of how much space they're using. It is the I/O burden that we wan't to keep separate. I believe that on some storage tiers, larger filestores perform better. That would be one reason to commingle deployments on the same filestore. If we had more first hand experience and data on performance and reliability then I might change my mind.

shaneknapp · 2022-12-22T18:05:33Z

completed syncs: astro, biology, data100, eecs, stat159, stat20
currently syncing: data8, datahub, ischool
remaining to be synced: cee, data101, data102, dlab, prob140

shaneknapp · 2022-12-22T18:07:35Z

re conserving filestore spend. i'd much rather stick to the 1:1 ratio of course->hub->filestore. IO ops, compartmentalizing failures, etc.

shaneknapp · 2022-12-29T18:48:27Z

completed syncs: astro, biology, data100, eecs, stat159, stat20 currently syncing: data8, datahub, ischool remaining to be synced: cee, data101, data102, dlab, prob140

this is done, and i scaled up a bunch of instances.

see also https://docs.google.com/spreadsheets/d/1rj-iCpcHBcA_lUT7NXrOJaTpT2Le4fcb5Tank8D0ICQ/edit?usp=sharing

balajialg · 2023-01-04T16:37:31Z

Adding information about the courses using the smaller hubs and the student enrollment count to help with decision-making related to filestore allocation.

Smaller hubs (No. of courses using it, Total count of students enrolled as part of these courses, periodicity of usage)

Julia hub (No course, periodic)
Shiny hub (1 course, 350 students, periodic)
a11y hub (No course, periodic)
High school hub (1 program, 23 students, periodic)
Workshop hub (1 DSUS workshop, <100 participants, periodic)
CEE hub (1 course, 70 students, periodic)

ryanlovett · 2023-01-04T17:10:37Z

shiny hub was necessary when there was an issue with RStudio/R/R Graphics API/Shiny on the R hub. Hopefully that will go away with various version toggling. Though it was created with a separate set of home directories, my opinion is that when it is backed by filestore it should use the same filestore and node pool as R hub. export/homedirs-other-2020-07-29/shiny need not be copied anywhere.

Also, shiny hub was not used very much. Only a few people logged into it.

balajialg · 2023-01-04T18:17:29Z

Agreed, @ryanlovett! Most of the above mentioned hubs had really few people logging in during FA 22.

shaneknapp · 2023-01-06T03:50:18Z

#4072
#4073
#4075

all identified courses should be migrated to their own nodepool + GFS instance (except R hub, which shares infra w/datahub)

balajialg · 2023-01-23T22:30:26Z

@shaneknapp Closing this issue as you all have completed the pending tasks. Please feel free to reopen if there are any pending tasks to be tracked.

balajialg changed the title ~~Explore Google File Store as a replacement for NFS Server!~~ Explore Google File Store as a replacement for NFS! Oct 31, 2022

balajialg assigned yuvipanda, felder, shaneknapp, ryanlovett and ericvd-ucb Oct 31, 2022

balajialg added this to To do in 2022-11 Sprint Board Nov 2, 2022

balajialg moved this from To do to In progress in 2022-11 Sprint Board Nov 3, 2022

balajialg removed this from In progress in 2022-11 Sprint Board Nov 28, 2022

balajialg added this to To Do in 2022-12 Sprint Board via automation Nov 28, 2022

balajialg moved this from To Do to In Progress in 2022-12 Sprint Board Nov 28, 2022

balajialg unassigned yuvipanda Nov 28, 2022

balajialg unassigned ericvd-ucb Nov 28, 2022

balajialg added the enhancement Issues around improving existing functionality label Nov 28, 2022

felder closed this as completed Dec 1, 2022

2022-12 Sprint Board automation moved this from In Progress to Done Dec 1, 2022

felder reopened this Dec 1, 2022

2022-12 Sprint Board automation moved this from Done to In Progress Dec 1, 2022

balajialg removed this from In Progress in 2022-12 Sprint Board Dec 22, 2022

shaneknapp mentioned this issue Dec 29, 2022

initial commit of GFS squash flags (do not merge) #4059

Merged

balajialg closed this as completed Jan 23, 2023

Explore Google File Store as a replacement for NFS! #3898

Explore Google File Store as a replacement for NFS! #3898

Comments

balajialg commented Oct 31, 2022 • edited Loading

To Do

ryanlovett commented Oct 31, 2022

balajialg commented Nov 1, 2022 • edited Loading

ryanlovett commented Nov 1, 2022

shaneknapp commented Nov 1, 2022 via email

shaneknapp commented Nov 1, 2022 via email

yuvipanda commented Nov 1, 2022

shaneknapp commented Nov 1, 2022 via email

felder commented Nov 1, 2022

balajialg commented Nov 2, 2022 • edited Loading

ryanlovett commented Nov 2, 2022

balajialg commented Nov 3, 2022 • edited Loading

balajialg commented Nov 10, 2022 • edited Loading

shaneknapp commented Nov 10, 2022 via email

balajialg commented Nov 10, 2022

balajialg commented Nov 18, 2022 • edited Loading

balajialg commented Nov 22, 2022 • edited Loading

ryanlovett commented Nov 22, 2022

ryanlovett commented Nov 23, 2022

felder commented Nov 30, 2022

felder commented Nov 30, 2022

felder commented Dec 1, 2022

ericvd-ucb commented Dec 3, 2022

balajialg commented Dec 12, 2022 • edited by shaneknapp Loading

ryanlovett commented Dec 12, 2022 • edited Loading

balajialg commented Dec 19, 2022 • edited Loading

shaneknapp commented Dec 19, 2022 • edited Loading

ryanlovett commented Dec 20, 2022

balajialg commented Dec 20, 2022 • edited Loading

ryanlovett commented Dec 20, 2022

shaneknapp commented Dec 22, 2022

shaneknapp commented Dec 22, 2022

shaneknapp commented Dec 29, 2022

balajialg commented Jan 4, 2023 • edited Loading

ryanlovett commented Jan 4, 2023

balajialg commented Jan 4, 2023

shaneknapp commented Jan 6, 2023

balajialg commented Jan 23, 2023

balajialg commented Oct 31, 2022 •

edited

Loading

balajialg commented Nov 1, 2022 •

edited

Loading

balajialg commented Nov 2, 2022 •

edited

Loading

balajialg commented Nov 3, 2022 •

edited

Loading

balajialg commented Nov 10, 2022 •

edited

Loading

balajialg commented Nov 18, 2022 •

edited

Loading

balajialg commented Nov 22, 2022 •

edited

Loading

balajialg commented Dec 12, 2022 •

edited by shaneknapp

Loading

ryanlovett commented Dec 12, 2022 •

edited

Loading

balajialg commented Dec 19, 2022 •

edited

Loading

shaneknapp commented Dec 19, 2022 •

edited

Loading

balajialg commented Dec 20, 2022 •

edited

Loading

balajialg commented Jan 4, 2023 •

edited

Loading