Skip to content
This repository has been archived by the owner on May 10, 2024. It is now read-only.
This repository has been archived by the owner on May 10, 2024. It is now read-only.

boto3 is not a replacement of boto (yet?) #3306

Open
vlcinsky opened this issue Aug 18, 2015 · 12 comments
Open

boto3 is not a replacement of boto (yet?) #3306

vlcinsky opened this issue Aug 18, 2015 · 12 comments

Comments

@vlcinsky
Copy link

vlcinsky commented Aug 18, 2015

I am long term boto user (my first boto issue #343) and I also like learning new things. Seeing botocore and boto3 reaching version 1.0 (boto/botocore#586) I felt, I shall celebrate it.
Today I see, my expectations were far too high.
Even worse, I see, that Python programmers are currently loosing usable AWS API

boto status

Positive:

  • proved solution
  • large installed base (as it was for years the only Python API for AWS)
  • well usable API
  • existing tutorials explaining most of use cases

Negative:

  • last release in April (4 month ago)
  • couple of serious problems are not resolved (one being e.g. issue Can't use bucket names with dots #2836)
  • number of ignored PRs is growing, last one in early July - over a month ago (as reported by Any maintainers? #3302)
  • README.rst states, that we shall move to boto3 as boto is planned to be phased out (soon)

boto3 status

Positive:

  • aiming to support complete AWS API
  • great awscli builds on top of botocore. boto3 builds on botocore too.

Negative:

  • too fresh (the fact it claims being in version 1.1.1 has no real value)
  • very few tutorials (cannot compare to boto). E.g. I would expect quick start example, how to read content of AWS S3 object. Did not find (is it even present somewhere?)
  • some use cases are even in tutorials more complex, then it is done by boto. e.g. see https://boto3.readthedocs.org/en/latest/guide/migrations3.html#accessing-a-bucket - why shall boto3 client deal with HTTP status codes of HEAD request to check, if a bucket exists?
  • boto3 is not Pythonic (it can be used as an example, how Python API shall not look like). See boto3 issue (API is not really Pythonic)[https://github.com/[Feedback] API is not really pythonic boto3#112]
    • almost no docstrings. As the code is mostly auto generated from some JSON configuration file, one can understand why there is no docstring, but it does not help at the moment one wants to use the command on interactive console and is in doubt how to do something.
    • abused *args and **kwargs: far too many functions are using **kwargs. Trying to use those functions often means, some essential argument must be passed via kwargs, but it is hard to find working reference to usable description.
    • too many options to do basic tasks make usage far too complex. Having client and resource (combined with session) make it difficult to decide, which one to use. Proper approach would be to use resource as much as possible (as boto3 is higher level library which shall hide lower level complexities) and refer to client only in corner cases.
    • many options are expecting an argument of type dictionary, which must contain certain keywords. Apart of this being anti-pattern, user is often left without instructions, what keywords shall be passed in.

Python programmer perspective: losing API to AWS

My old boto based codebase seems to be endangered. Am I really supposed to rewrite it all into boto3?

Writing solutions in boto3 is very difficult, it takes few times longer to do rather simple tasks and in some cases it is not even possible.

Costs of maintaining code in boto3 is also higher - developer is forced to understand rather complex rules and requirements which is not required when using boto. Lack of easily accessible tutorials and documentation at my fingertips makes the work really hard and disgusting.

Proposed actions for boto

  • accept the fact, boto3 is still in beta phase and is not ready for production.
  • consider adding higher priority to maintaining boto for following reasons:
    • boto3 is still not an option for production
    • even if boto3 reaches production mode, old codebase exists and deserves longer period of support (I would assume 2 years seems good).
  • modify README.rst in boto according to changed perspective.

Proposed actions for boto3

(this is a bit out of scope of this issue as we are in boto issue tracker)

  • take boto tutorial (included in docs) and rewrite all examples to boto3.
  • find all cases, where boto3 requires usage of client and consider it unfinished API for higher level API. Rewrite boto3 to provide required functionality from resource point of view.
  • Resolve docstrings somehow. Docstrings shall in most cases provide all information about required arguments without using google.

Alternative solution would be to stop enhancing boto3 and use saved effort for boto maintenance.

@kyleknap
Copy link
Contributor

Thank you for your comments. We truly appreciate feedback from the users of our libraries. We will have a team discussion to come up with a plan to address your concerns and get back to you shortly.

@garnaat
Copy link
Member

garnaat commented Aug 19, 2015

I want to thank @vlcinsky for the thoughtful criticism. I agree with most of your points.

I also want to make it clear that while I am still an active user of boto/botocore/boto3 I am not involved in the day to day development. That is now handled by the AWS team and I think they are doing a great job. So, what follows is simply my opinion, not any official statement from AWS.

<tl;dr>
I feel strongly that the majority of effort should be devoted to making boto3 better. The boto library should continue to be supported for the foreseeable future. Serious bugs should be fixed but the priority should be on improving the developer experience in boto3.
</tl;dr>

To understand why I feel this is the right prioritization, you have to understand the fundamental differences between boto and boto3.

boto

The boto library is hand coded. In the early days, the number of services available was small and I would get early access to the services from AWS. I would study the API guide and begin hand-coding the interface to the new service. I would then start using the service and develop a higher-level, more Pythonic interface to the service. This was a time-consuming process but resulted in (I think) a reasonably well-thought out Python library for early services like S3, EC2, SQS, etc.

However, as the number of services grew, this became more and more difficult to achieve. Plus, services became more and more specialized and it was impossible to be an expert in each of these services. So the higher-level interfaces would either be done by someone else in the community (thanks!) or it just wouldn't get done at all. So, some services (e.g. EC2) had lovingly crafted API's while others had very rudimentary APIs (e.g. SNS).

Exacerbating the problem was the fact that existing services did not stand still. They evolved. Quickly. And those changes had to be merged into the boto code base. And the more customization involved in the boto module, the harder it was to merge changes, especially large changes, in the API's. Hence, ugly warts emerged such as boto.rds and boto.rds2 and boto.dynamodb and boto.dynamodb2.

As the pace of innovation at AWS continues to increase, it is painfully clear to me that boto is collapsing under the weight of technical debt. Way too much code, way to much surface area. It's simply becoming unmaintainable. It hurts me to say that but it is the truth.

boto3

When we began development of the AWSCLI, we wanted to address this issue of technical debt and maintainability of the low-level code. We wanted to achieve this not only for AWSCLI but also for the nascent boto3 project.

To do this, we created botocore which is a data-driven library that provides a very uniform, low-level interface to all Amazon services. And the best part was that the low-level interface was driven directly from a JSON description of the service which was, in turn, generated directly from the canonical description of the service in AWS. So, we could automatically generate the JSON data, add this data to botocore, and we would magically get a completely up-to-date, low-level interface to the service. We never had to worry about being out of sync with the services or not supporting the latest version of the service.

It is impossible to overstate how much better this approach is than the boto approach. For library developers as well as for customers. A full low-level library driven directly from the canonical description of the service, always up to date, never out of sync. This is the future.

Of course, the low-level interface to the service usually isn't the best way to interact with the service in Python. You need a way to create a higher-level layer on top of that. We accomplished that in AWSCLI by providing a rich customization mechanism. For boto3, the team has created the resource layer which allows much of the customization to happen in JSON data files, as well.

You Can't Go Home Again

The original boto approach is simply not feasible going forward. We can best serve the Python community by focusing as much attention and effort as possible on improving the developer experience of boto3. I think the resource layer approach provides an opportunity for meaningful community involvement by allowing people who are heavily using a particular service to help define the best high-level interface for that service. There is a huge community of dedicated and awesome developers around boto. Let's leverage that to make boto3 the best AWS library on the planet.

@dstufft
Copy link
Contributor

dstufft commented Aug 19, 2015

Just to chime in, when I was integrating boto3 into my project I found it's API for accessing S3 to be a nicer API that I understood better and that I felt like I had to do less watching of how each individual API had it's default to watch for things like where it was going to implicitly make some API call that I had to turn off. I do agree with the original poster that the documentation was lacking and since it was autogenerated code I had a hard time even finding the source for things to figure out what I needed to pass where, but once I did I liked the API.

@gtaylor
Copy link
Contributor

gtaylor commented Aug 19, 2015

For what it's worth, I've been using Google Cloud for the last six months after many years with AWS (and boto). The Google Cloud Python SDK is auto-generated, much like boto3. The criticism mentioned above is almost true (verbatim) if you replace "AWS" with "Google Cloud".

This is a tough problem to solve. As far as the traps to avoid for boto3, I can mention a few things that have really frustrated me with Google's Python SDK:

  • The Python-specific documentation is MIA. You have to look at the JSON HTTP API docs and try to guess what functions/methods to call. You can't auto-generate tutorial/prose-style docs, and you shouldn't try to. Spend the time to get a good collection of these going.
  • Something as simple as authentication is difficult, since they point you to a very generic doc with 500 different options. Unlike AWS, they have a bunch of different kinds of credentials and keys. This probably won't apply as much to AWS, but make sure this is some of your best documentation!
  • The exceptions raised due to errors can be a bit vague.
  • It is incredibly difficult to go code diving, since the whole thing is so dynamically generated.

I haven't used boto3 yet, but consider this a cautionary tale. I think AWS has a much more mature approach to developer wellbeing, but it sounds like the beginnings of some of these problems I've seen with Google Cloud are starting to manifest here.

With that said, I do think auto-generated APIs like boto3 are the future. AWS is in a unique position to try to pull it off much better than the competition. Alternatively, it could continue the trend of auto-generated APIs being more complete but less usable (which would be unfortunate).

@kyleknap
Copy link
Contributor

We really appreciate all the thoughtful comments on this issue. We had a lengthy discussion on this topic, and I'd like share a bit of background as well as our plans and thoughts.

Boto
As @garnaat pointed out, we have experienced maintenance difficulties with Boto. As the surface area of AWS APIs grew, it became increasingly difficult and time-consuming to introduce new APIs as well as diagnose and fix issues. Issue #2836 is one of such issues whose fix may break currently working behavior in other edge cases that we can't fully predict. Because of similar maintenance challenges, we made a very difficult but forward-looking decision to re-balance our resources and quickly get Boto3 to a stable 1.0, so that we can provide a path forward and start to migrate the community as soon as possible. This unfortunately meant that we had to dial back our maintenance efforts on Boto. Since Boto3 became stable, however, we have been putting more resources on addressing Boto PRs and issues. We plan to continue doing so for the foreseeable future and welcome any help we can get from the community as well, as our recently updated contribution guide describes. We will also be doing periodic releases as fixes and changes are merged in.

Boto3
Another consequence of our push to get Boto3 to 1.0 quickly was that Boto3 wouldn't initially feature as many high-level APIs as Boto. Our main focus was to build a strong foundation for high-level features and provide complete API coverage in a scalable manner, so that users will never be out of date with the latest AWS features. As a side note, the reason why Boto3 has a brand-new module name is to allow users to take advantage of new features missing in Boto without having to convert their entire code base to the new version.
We certainly hear @vlcinsky's comments about Boto3's APIs being lower-level than what existing Boto users expect. We are working hard to implement more higher-level abstractions and resource-level APIs for more services. Additionally, we'll be prioritizing various improvement projects for a smoother getting-started experience over the next few months. Authoring more guide-level documentation will be a significant part of that effort. On a similar note, we have just introduced docstrings in client objects and methods to help developers using Boto3 interactively. We will be doing the same for other interfaces including resources and waiters in the near future.

Again, we sincerely appreciate your feedback and would like to thank this wonderful community for the support we receive. We promise to always continue listening to the community's voice and stay relentlessly focused on delivering a great developer experience.

@hannes-ucsc
Copy link

Love the idea of having a generated, but complete and correct base layer and a hand-coded convenience layer on top of it. This gives the community a chance to develop several different approaches to the convenience layer for a particular AWS service.

Have you thought about modularizing boto3 into separate PyPI distributions?

And what's the deal with those camel-case keyword arguments?

@vlcinsky
Copy link
Author

vlcinsky commented Sep 3, 2015

Thanks all for insightful replies.

Maintaining boto - too complex

I accept, that maintaining boto is not feasible due to growing number of services and the fact, all is "hand made".

It would be nice, if existing range of AWS services (or at least the core ones, like S3 and EC2) would be maintained for a while. Currently I am unable to use the latest version of boto as I have buckets with dots in names. It looks like I have to rewrite it to boto3 anyway.

Dynamically generated docstrings (now for botocore)

Usable docstrings on modules, functions and methods is implicitly expected functionality which was apparently not implemented so far (and boto3 is not the only case, as was shown by @gtaylor on Google Python SDK). Note, that it is used not only on interactive console, but also in many IDEs, which are trying to provide help on used packages.

Implementation of dynamically generated docstrings in botocore is really nice news. Looking forward to see it in wider scale and in other automatically generated packages as boto3.

Tutorials can help a lot

It would be great, if existing samples of using boto would be rewritten into boto3. I can imagine this even being independent project.

Anyway, a chapter "hunting for arguments" or something like that would exist in boto3, explaining, where to find data structures to pass into many calls. Showing some piece of code, which must pass in dictionaries, and link to JSON data, where one could derive, what is possibly expected in there.

To me I got my concerns covered and do not expect more to happen on this issue. I might later file more focused issues on boto3.

@dkarchmer
Copy link

+1 on "Tutorials can help a lot"

I am guessing this is a higher level prioritization issue within AWS rather than the Boto3 team itself, as a lot of these tutorial (and application notes) should really be documented on http://docs.aws.amazon.com/ by the different AWS teams (rather than by the Boto3 team). It seems like newer services are only documented with JavaScript (e.g. Cognito, IoT).

Either 1) continue to support Boto or 2) get AWS teams to generate application nodes using Boto3 (especially on new services, not supported by Boto). Otherwise, we will all have to just move to JS :-)

@vlcinsky
Copy link
Author

vlcinsky commented Jan 4, 2016

There are more aspects tutorials/documentations:

  • explain features, which are already available in older boto (show, how to do it in boto3)
  • explain new services and new features, as they are added

I can imagine following approach to creation of tutorials:

  • split the tutorial per service (S3, EC2, ...)
  • create (somewhere) an issue for creation of such tutorial (e.g. tutorial for S3)
  • elaborate in this issue list of expected features and tasks to be covered, focusing on what is provided by existing boto
  • write the tutorial itself

Adding new features shall be separate issue/task.

I am sure, that authoring tutorial for things, which are currently provided by boto will reveal couple of issues, e.g. taks, which are not possible yet, others, which are too complicated to do by boto3 and possibly some bugs.

I could possibly propose list of tutorial topics for S3 (which is my most often used part of AWS), but would have to know, it has well defined steps for completing such tutorial (deciding what repository to use for the tutorial etc.).

@mpdehaan
Copy link

mpdehaan commented Jan 6, 2016

Figured I'd leave some thoughts here as I recently went through the exercise of a 2->3 port.

  • Much of the system is a "low level" API returning CamelCase dicts. Ideally we'd still have objects like in boto2, so methods could be called on the results of returns. Using both high level API and low level boto3 APIs in the same app can get confusing.
  • It would be nice if iteration across results (NextToken) were pythonic and used generators. This would prevent every user of the API from having to determine how these work and implement something. Similarly, this would avoid the problem of having to do it twice for the High Level and Low Level API
  • It would be nice if the whole API, including generators, aforementioned generators supported automatic retry in throttling scenarios, as this blocks casual use of any API for basic scripts, as the API calls can fail at any point in a complex program.
  • In general, it feels like the API is code generated from a Java version or something like that, and as a result, takes more work and many tries at doing Python-like things feel less natural than they could.
  • Generally everything is a botocore.exception.ClientError - which makes it hard to sort out different types of errors. More fine grained error handling would be useful
  • There are some cases where certain APIs -- such as OpsWorks describe_layer appear to not work in the new version when they worked in the old with the same stack IDs.

The end result of the above is I'm keeping the script in question boto2 for now. I like how some things are easier, but I feel it trades some good things for some others, making it more "just different" than a clear successor.

In all I'm still very appreciative of boto3 existing, and like that in the docs in many cases boto2 examples were shown beside boto3 examples.

@deeTEEcee
Copy link

deeTEEcee commented Sep 7, 2016

I don't think I fully understand this package's status as im not a usual user of boto.

accept the fact, boto3 is still in beta phase and is not ready for production.

On the older boto site, "Boto3, the next version of Boto, is now stable and recommended for general use" but at the same time, it's not ready for production so its hard to decide to use or not. What does it mean to be ready for production in this case?

@anthonyd-cd
Copy link

April 2020 (almost 5 years after this thread) and I still find a lot of what vcinsky said about boto3 true to this day! too many options to do basic things make it too difficult and that whole client/resource/session makes it even more confusing to decide which one to use. Sometimes you need to use resource, sometimes client and a lot of times a mix of both to get a simple result. It expects arguments which I have yet to find a good source to list them out and I feel left without instructions on how to do it. I thought boto3 would make my tasks easier but they have turned into hours to do something the AWS CLI does with a simple command. Again it's 2020 and I still find no real way to learn how to use it and the documentation just confuses me more. I was trying to use ec2.instances.filter to get the instance id, instance type and it took me forever to realize i couldn't do instance az because I needed to use a whole other filter with a whole other function and it's just too confusing. If someone can point me to a good course on how to learn it I'd appreciate it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants