Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[译] [105] 使用 Copilot 的风险和挑战 #7

Open
cssmagic opened this issue Feb 28, 2024 · 0 comments
Open

[译] [105] 使用 Copilot 的风险和挑战 #7

cssmagic opened this issue Feb 28, 2024 · 0 comments

Comments

@cssmagic
Copy link
Owner

1.5 Risks and challenges of using Copilot

1.5 使用 Copilot 的风险和挑战

Now that we're all pumped up about getting Copilot to write code for us, we need to talk about the dangers inherent in using AI Assistants. See references [2] and [3] for elaboration on some of these points.

既然我们都对让 Copilot 帮我们编码感到非常兴奋,接下来我们必须讨论一下使用 AI 助手所固有的危险。有关这些观点的更多详细信息,请参见参考资料 [2] 和 [3]。

Copyright. As we discussed above, Copilot is trained on human-written code. More specifically, it was trained using millions of GitHub repositories containing open-source code. One worry is that Copilot will “steal” that code and give it to us. In our experience, Copilot doesn't often suggest a large chunk of someone else’s code, but that possibility is there. Even if the code that Copilot gives us is a melding and transformation of various bits of other people's code, there may still be licensing issues. For example, who owns the code produced by Copilot? There is currently no consensus on the answer.

版权问题。正如我们之前讨论的,Copilot 是基于人类编写的代码进行训练的。具体来说,它通过使用包含开源代码的数百万个 GitHub 仓库进行训练。存在一个担忧,即 Copilot 可能会“盗用”这些代码并提供给我们。根据我们的经验,Copilot 很少直接建议使用别人的大量代码,但这种可能性存在。即便 Copilot 提供给我们的代码是多个其他人代码的融合和转换,也可能存在许可问题。例如,谁拥有 Copilot 生成的代码的所有权?对于这个问题,目前还没有一个共识。

The Copilot team is adding features to help; for example, Copilot will be able to tell you whether the code that it produced is similar to already-existing code and what the license is on that code [4]. Learning and experimenting on your own is great, and we encourage that—but take the necessary care if you do intend to use this code for purposes beyond your home. We’re a bit vague here, and that’s intentional: it may take some time for laws to catch up to this new technology. It’s best to play it safe while these debates are had within society.

Copilot 团队正在引入新的功能来提供帮助;例如,Copilot 将能告诉你其生成的代码与已有代码的相似度,以及那些代码所适用的许可证类型 [4]。自我学习和实验是极好的,我们非常鼓励这种做法——但如果你计划将这些代码应用于个人家庭之外的场景,请务必采取必要的谨慎措施。我们这里的表述有意保持了一定的模糊性,这是因为法律可能需要时间来适应这一新技术。在社会就这些议题进行讨论之际,采取安全的做法总是最佳选择。

Education. As instructors of introductory programming courses ourselves, we have seen first-hand how well Copilot does on the types of assignments we have historically given our students. In one study [5], Copilot was asked to solve 166 common introductory programming tasks. And how well did it do? On its first attempt, it solved almost 50% of these problems. Give Copilot a little more information, and that number goes up to 80%. You have already seen for yourself how Copilot solves a standard introductory programming problem. Education needs to change in light of tools like Copilot, and instructors are currently discussing how these changes may look. Will students be allowed to use Copilot, and in what ways? How can Copilot help students learn? And what will programming assignments look like now?

教育。作为初级编程课程的讲师,我们直接见证了 Copilot 在我们以往布置给学生的作业类型上的表现。在一项研究 [5] 中,Copilot 被要求解决 166 项常见的初学者编程任务。它的表现如何?在首次尝试中,它解决了近 50% 的问题。如果给 Copilot 提供更多信息,这一数字可以提升至 80%。你已经亲身体验了 Copilot 如何解决标准的初学者编程问题。考虑到像 Copilot 这样的工具,教育领域需要变革,教师们目前正在讨论这些变化可能的形态。学生们能否被允许使用 Copilot,以及如何使用?Copilot 如何帮助学生学习?编程作业将会是怎样的新面貌?

Code quality. We need to be careful not to trust Copilot, especially with sensitive code or code that needs to be secure. Code written for medical devices, for example, or code that handles sensitive user data must always be thoroughly understood. It's tempting to ask Copilot for code, marvel at the code that it produces, and accept that code without scrutiny. But that code might be plain wrong. In this book, we will be working on code that will not be deployed at large, so while we will focus on getting correct code, we will not worry about the implications of using this code for broader purposes. In this book, we start building the foundations that you will need to independently determine whether code is correct.

代码质量。我们必须保持警惕,不能盲目依赖 Copilot,特别是处理敏感或需要保障安全的代码时更是如此。例如,用于医疗设备的代码或处理敏感用户数据的代码,必须始终被充分理解。虽然向 Copilot 请求代码,对其生成的代码感到惊叹,并在未经仔细审核的情况下接受这些代码可能非常诱人,但这些代码有可能完全错误。在这本书中,我们将处理的代码不会进行大规模部署,因此我们虽然会专注于获取正确的代码,但不会过多考虑使用这些代码的更广泛影响。在本书中,我们将开始构建独立判断代码正确性所需的基础。

Code security. As with code quality, code security is absolutely not assured when we get code from Copilot. For example, if we were working with user data, getting code from Copilot is not enough. We would need to perform security audits and have expertise to determine that the code is secure. Again, though, we will not be using code from Copilot in real-world scenarios.

代码安全。 如同代码质量一样,从 Copilot 获得的代码并不能保证其安全性。例如,在处理用户数据时,仅仅使用 Copilot 提供的代码是远远不够的。我们需要进行安全审计,并需具备专业知识以确保代码的安全。然而,我们不会在现实世界的应用场景中使用来自 Copilot 的代码。

Therefore, we will not be focusing on security concerns.

因此,我们不会将重点放在安全问题上。

Not an expert. One of the markers of being an expert is awareness of what one knows and, equally importantly, what one doesn't. Experts are also often able to state how confident they are in their response; and, if they are not confident enough, they will learn further until they know that they know. Copilot, and LLMs more generally, do not do this. You ask them a question, and they answer, plain as that. They will confabulate if necessary. They will mix bits of truth with bits of garbage into a plausible sounding but overall nonsensical response. For example, we have seen LLMs fabricate obituaries for people who are alive, which doesn’t make any sense, yet the “obituaries” do contain elements of truth about people’s lives. When asked why an abacus can perform math faster than a computer, we have seen LLMs come up with responses—something about abacuses being mechanical and therefore necessarily the fastest. There is ongoing work in this area for LLMs to be able to say, "sorry, no, I don't know this," but we are not there yet. They don't know what they don't know and that means they need supervision.

不是专家。专家的一个显著特征是对自己所知及所不知有清晰的自觉,他们还能准确表达对自己答案的信心程度;若感到不够自信,便会进一步学习,直到确信自己掌握了知识。然而,Copilot 及更广泛的大型语言模型(LLM)并不具备这样的能力。当你向它们提出问题时,它们会直接回答,必要时甚至会编造答案。它们能将一些真实信息与错误信息混合,拼接出听起来合理但实际上完全没有意义的答案。例如,我们已经见证了 LLM 为仍然健在的人编写讣告,虽然这毫无道理,但这些“讣告”却确实含有关于人们生活的一些真实元素。被问及为何算盘能比电脑更快进行数学计算时,我们看到 LLM 给出了解释——某种关于算盘是机械设备,因此自然更快的论断。这一领域的研究正在进行中,目标是让 LLM 能够说出“对不起,我不知道”,但目前还未达到这一步。它们不知道自己不知道什么,这表明它们需要人工监督。

Bias. LLMs will reproduce the same biases present in the data on which they were trained. If you ask Copilot to generate a list of names, it will generate primarily English names. If you ask for a graph, it may produce a graph that doesn’t consider perceptual differences among humans. And if you ask for code, it may produce code in a style reminiscent of how dominant groups write code. (After all, the dominant groups wrote most of the code in the world, and Copilot is trained on that code.) Computer science and software engineering have long suffered with a lack of diversity. We cannot afford to stifle diversity further, and indeed we need to reverse the trend. We need to let more people in and allow them to express themselves in their own ways. How this will be handled with tools like Copilot is currently being worked out and is of crucial importance for the future of programming. However, we believe Copilot has the potential to improve diversity by lowering barriers for entry into the field.

偏见挑战。LLM 会重现其训练数据中存在的偏见。例如,当你请求 Copilot 生成姓名列表时,它通常会生成一些英文名;如果你要求绘制图表,得到的图表可能未充分考虑到人类之间的视觉感知差异;而要求编写代码时,它输出的代码风格很可能反映了主流群体的编码习惯。(事实上,由于主流群体编写了世界上的大部分代码,Copilot 的训练资料也以这些代码为基础。)长期以来,计算机科学和软件工程领域一直面临多样性不足的问题。我们不能再允许多样性进一步受损,更应努力扭转目前的趋势。我们需要欢迎更多人加入,让他们能以自己的方式自由表达。如何利用像 Copilot 这样的工具应对这一挑战,目前正在积极探索中,这对编程的未来极为关键。尽管如此,我们相信 Copilot 有通过降低入门门槛来增进行业多样性的巨大潜力。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant