ai-code-detector

基于困惑度(Perplexity)检测 AI 生成代码的 npm 包，支持多种编程语言，提供 CLI 和 API 两种使用方式。

安装

npm install ai-code-detector
# 或
yarn add ai-code-detector
# 或
pnpm add ai-code-detector

快速开始

CLI 使用

# 全局安装后直接使用
ai-code-detector -i ./src/index.ts

# 或使用 npx
npx ai-code-detector -i ./src/index.ts

CLI 参数

参数	说明	默认值
`-i, --input <path>`	输入文件或目录路径	必填
`-o, --output <path>`	输出报告文件路径	可选
`-t, --threshold <num>`	困惑度阈值	15.0
`-m, --model <name>`	使用的模型名称	Xenova/codebert-base
`-f, --format <type>`	输出格式 (text/json)	text
`-h, --help`	显示帮助信息	-

CLI 示例

# 检测单个文件
ai-code-detector -i ./src/index.ts

# 检测目录下所有代码文件
ai-code-detector -i ./src/

# 输出 JSON 格式
ai-code-detector -i ./code.js -f json

# 自定义阈值并保存报告
ai-code-detector -i ./project/ -t 20 -o report.txt

API 使用

import { detectAICode, analyzeCodeSegments, calculatePerplexity } from 'ai-code-detector';

// 完整检测报告
const report = await detectAICode(codeString, {
  threshold: 15.0,      // 困惑度阈值
  chunkSize: 512,       // 代码块大小
  overlapSize: 50,      // 重叠大小
});

console.log(report.summary);
console.log(`是否为 AI 生成: ${report.isAIGenerated}`);
console.log(`整体评分: ${report.overallScore}/100`);

// 仅分析代码段
const segments = await analyzeCodeSegments(codeString);
segments.forEach(seg => {
  console.log(`行 ${seg.segment.startLine}-${seg.segment.endLine}: 困惑度=${seg.perplexity.toFixed(2)}`);
});

// 仅计算困惑度
const perplexity = await calculatePerplexity(codeString);
console.log(`困惑度: ${perplexity}`);

检测原理

困惑度 (Perplexity)

困惑度是衡量语言模型对文本"惊讶程度"的指标：

低困惑度 (< 阈值): 代码更加"可预测"，模式规整，可能是 AI 生成
高困惑度 (> 阈值): 代码更加"多样/复杂"，更可能是人类编写

检测流程

代码分段: 识别代码逻辑块（函数、类、条件语句等）
困惑度计算: 使用 CodeBERT 模型或启发式算法计算每段代码的困惑度
阈值判断: 与设定阈值比较，判断是否为 AI 生成
报告生成: 汇总结果，生成详细检测报告

支持的语言

JavaScript / TypeScript
Python
Java
Go
Rust
C / C++
C#
Ruby
PHP
Swift
Kotlin
Scala
SQL

API 文档

`detectAICode(code, options?)`

执行完整的 AI 代码检测，返回检测报告。

interface DetectionOptions {
  model?: string;       // 模型名称，默认 'Xenova/codebert-base'
  threshold?: number;   // 困惑度阈值，默认 15.0
  chunkSize?: number;   // 代码块大小，默认 512
  overlapSize?: number; // 重叠大小，默认 50
  language?: string;    // 指定语言（可选）
}

interface DetectionReport {
  overallScore: number;         // 整体评分 (0-100)
  isAIGenerated: boolean;       // 是否为 AI 生成
  totalSegments: number;        // 总代码段数
  aiGeneratedSegments: number;  // AI 生成段数
  humanWrittenSegments: number; // 人类编写段数
  segments: SegmentResult[];    // 分段详细结果
  summary: string;              // 摘要
  recommendations: string[];    // 建议
  metadata: {                   // 元数据
    model: string;
    threshold: number;
    timestamp: string;
    language: string;
  };
}

`analyzeCodeSegments(code, options?)`

分析代码段，返回每段的检测结果。

`calculatePerplexity(code, options?)`

计算代码的困惑度值。

`ReportGenerator`

报告生成器类，支持格式化输出。

import { ReportGenerator } from 'ai-code-detector';

const generator = new ReportGenerator(options);
await generator.initialize();

const report = await generator.generateReport(code);
console.log(generator.formatReportAsText(report));
console.log(generator.formatReportAsJSON(report));

示例输出

============================================================
              AI 代码检测报告
============================================================

检测时间: 2024-01-15T10:30:00.000Z
使用模型: Xenova/codebert-base
检测语言: typescript
阈值设置: 15

------------------------------------------------------------
                      检测结果
------------------------------------------------------------

整体评分: 35/100
判定结果: ⚠️ 可能是AI生成
总代码段: 5
AI生成段: 4
人类编写段: 1

------------------------------------------------------------
                       摘要
------------------------------------------------------------

该代码显示出明显的AI生成特征。约80%的代码段被判定为AI生成，
平均困惑度为8.52，低于设定的阈值。

------------------------------------------------------------
                      建议
------------------------------------------------------------

1. 所有代码段都显示出AI生成的特征，建议检查代码的原创性
2. AI生成置信度较高，建议添加更多人工注释和定制化逻辑

注意事项

模型加载: 首次使用时会自动下载模型，可能需要一些时间
阈值调整: 默认阈值 15.0 适用于大多数场景，可根据实际情况调整
检测结果: 仅供参考，不能作为判断代码来源的唯一依据
性能: 大型代码文件可能需要较长时间处理

开发

# 克隆仓库
git clone https://github.com/ccOfHome/ai-code-detector.git

# 安装依赖
npm install

# 构建
npm run build

# 测试
node dist/cli.js -i ./src/index.ts

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-code-detector

安装

快速开始

CLI 使用

CLI 参数

CLI 示例

API 使用

检测原理

困惑度 (Perplexity)

检测流程

支持的语言

API 文档

`detectAICode(code, options?)`

`analyzeCodeSegments(code, options?)`

`calculatePerplexity(code, options?)`

`ReportGenerator`

示例输出

注意事项

开发

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-code-detector

安装

快速开始

CLI 使用

CLI 参数

CLI 示例

API 使用

检测原理

困惑度 (Perplexity)

检测流程

支持的语言

API 文档

detectAICode(code, options?)

analyzeCodeSegments(code, options?)

calculatePerplexity(code, options?)

ReportGenerator

示例输出

注意事项

开发

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`detectAICode(code, options?)`

`analyzeCodeSegments(code, options?)`

`calculatePerplexity(code, options?)`

`ReportGenerator`

Packages