这是一个使用 CodeFuse-Query 分析 Python 项目的教程。在教程中，你将体验到使用命令行工具对代码仓库进行数据化，然后使用 Godel 语言来分析这个仓库。

检查cli是否就绪

In [1]:
!which sparrow

/sparrow-cli/sparrow



STEP 0: 克隆要分析的仓库。我们以 python 仓库 [requests](https://github.com/psf/requests.git) 项目为例。

In [2]:
!git clone https://github.com/psf/requests.git -q

STEP 1: 代码数据化。使用 `sparrow database create` 命令创建一个db文件，指定待分析的仓库地址（当前目录下的requests子目录），分析的语言（python），以及db文件的存储路径（放置在当前目录下的/db/requests）。执行该命令之后，就会生成一份db文件，该文件存储着代码仓库的结构化数据，之后的分析就是针对这份数据进行。

In [3]:
!sparrow database create --source-root requests --data-language-type python --output ./db/requests --overwrite > /dev/null

STEP 2: 使用Godel分析语言分析db文件。在本教程中，可以点击代码左侧的执行按钮，或使用快捷键：`Shift+Enter`，直接运行分析脚本。这里使用 `%db /path/to/db` 魔法命令来设置COREF db路径，内核会读取这个值来进行query查询。

<b>示例</b> 查询 [requests](https://github.com/psf/requests.git) 的文件注释率信息。

第一行通过内核魔法命令指定分析的db路径，后面写查询文件代码注释率 Godel 脚本。

In [4]:
%db ./db/requests
// script
use coref::python::*

fn default_db() -> PythonDB {
    return PythonDB::load("coref_python_src.db")
}

/**
 * Get cyclomatic complexity of functions
 *
 * @param name   function name
 * @param value  cyclomatic complexity of function
 * @param path   path of file including this function
 * @param sline  function start line
 * @param eline  function end line
 */
fn getCyclomaticComplexity(
    name: string,
    value: int,
    path: string,
    sline: int,
    eline: int) -> bool {
    // get metric function
    for (c in MetricFunction(default_db())) {
        if (path = c.getLocation().getFile().getRelativePath() &&
            name = c.getQualifiedName() &&
            value = c.getCyclomaticComplexity() &&
            sline = c.getLocation().getStartLineNumber() &&
            eline = c.getLocation().getEndLineNumber()) {
            return true
        }
    }
}

fn main() {
    output(getCyclomaticComplexity())
}

/workspaces/CodeFuse-Query/tutorial/notebook/db/requests


[0;31mSparrow database is set to: /workspaces/CodeFuse-Query/tutorial/notebook/db/requests
[0m

2023-12-06 07:48:27,223 INFO: sparrow 2.0.0
 will start
2023-12-06 07:48:27,223 INFO: database /workspaces/CodeFuse-Query/tutorial/notebook/db/requests/coref_python_src.db size: 5.99 MB
2023-12-06 07:48:27,224 INFO: execute : /sparrow-cli/godel-script/usr/bin/godel /tmp/godel-jupyter-9f9aj65w/query.gdl -p /sparrow-cli/lib-1.0 -o /tmp/tmp8tgaooo4.gdl
2023-12-06 07:48:27,288 INFO: godel-script compile time: 0.06s
2023-12-06 07:48:27,288 INFO: execute : /sparrow-cli/godel-1.0/usr/bin/godel /tmp/tmp8tgaooo4.gdl --run-souffle-directly --package-path /sparrow-cli/lib-1.0 --souffle-fact-dir /workspaces/CodeFuse-Query/tutorial/notebook/db/requests --souffle-output-format json --souffle-output-path /tmp/godel-jupyter-9f9aj65w/query.json
2023-12-06 07:48:29,410 INFO: Task /tmp/godel-jupyter-9f9aj65w/query.gdl is success, result is NOT-EMPTY, execution time is  2.19s.
2023-12-06 07:48:29,411 INFO: run success

Total results: 643


Unnamed: 0,name,value,path,sline,eline
0,httpbin,1,tests/conftest.py,26,27
1,get_encodings_from_content,1,src/requests/utils.py,484,506
2,request,1,src/requests/api.py,14,59
3,cookiejar_from_dict,6,src/requests/cookies.py,521,539
4,consume_socket_content,4,tests/testserver/server.py,6,21
...,...,...,...,...,...
638,TestRequests.test_rewind_body_failed_seek.BadF...,1,tests/test_requests.py,1971,1972
639,TestSuperLen.test_super_len_handles_files_rais...,1,tests/test_utils.py,76,77
640,TestSuperLen.test_super_len_handles_files_rais...,1,tests/test_utils.py,79,80
641,TestSuperLen.test_super_len_with_no__len__.Len...,1,tests/test_utils.py,133,134


保存上一次运行的 query 结果保存到一个JSON文件

In [5]:
%%save_to ./query.json

Query result saved to /workspaces/CodeFuse-Query/tutorial/notebook/query.json


STEP 3: 好了，你可以针对分析生成的结果，进行进一步的代码分析了，比如你可以结合pandas库，使用刚刚生成的 query.json 实现最大函数复杂度Top 10的排序查询：

In [6]:
%%python
import pandas as pd
data = pd.read_json('./query.json')
data.sort_values('value', ascending=False, inplace=True)
top_10 = data.head(10)
print(top_10)

                                       name  value  ... sline  eline
354      RequestEncodingMixin._encode_files     21  ...   137    203
572      HTTPDigestAuth.build_digest_header     19  ...   126    234
232                        HTTPAdapter.send     19  ...   433    537
145            PreparedRequest.prepare_body     17  ...   494    570
142             PreparedRequest.prepare_url     17  ...   409    481
26                    should_bypass_proxies     15  ...   760    818
345  SessionRedirectMixin.resolve_redirects     15  ...   159    280
239                 HTTPAdapter.cert_verify     14  ...   237    291
8                                 super_len     14  ...   133    196
68                           get_netrc_auth     12  ...   199    253

[10 rows x 5 columns]


Enjoy！