Research on the relationship between packages in Python.
以下为统计、分析结果:
在通过爬虫获取的1000个仓库中,统计它们使用的packages,最常被使用的packages如下。其中,一些python自带的包(os, sys, time)最常被使用;第三方库中,numpy被使用频率最高;网页制作框架,数据可视化与数据处理相关的包,机器学习/深度学习相关的包等,出现频率也很高。
序号 | package名 | 被使用次数 |
---|---|---|
1 | os | 419 |
2 | sys | 366 |
3 | time | 183 |
4 | numpy | 180 |
5 | re | 136 |
6 | json | 118 |
7 | setuptools | 117 |
8 | datetime | 112 |
9 | django | 95 |
10 | random | 89 |
11 | logging | 89 |
12 | matplotlib | 83 |
13 | requests | 82 |
14 | argparse | 78 |
15 | __future__ | 76 |
16 | math | 69 |
17 | subprocess | 64 |
18 | pandas | 63 |
19 | unittest | 59 |
20 | urllib | 57 |
21 | collections | 53 |
22 | tensorflow | 48 |
23 | sklearn | 47 |
24 | flask | 44 |
25 | distutils | 41 |
26 | threading | 39 |
27 | scipy | 37 |
28 | shutil | 36 |
29 | csv | 35 |
30 | hashlib | 33 |
在通过爬虫获取的1000个仓库中,统计它们使用的packages,最常被使用在同一个工程中的packages如下。其中,与系统相关的包(如,os,time,sys)经常出现;与科学计算,数据处理相关的包(如,numpy, matplotlib, pandas)也经常一起出现。
序号 | package-1 | package-2 | 同时出现次数 |
---|---|---|---|
1 | os | sys | 283 |
2 | os | time | 118 |
3 | os | re | 101 |
4 | time | sys | 97 |
5 | os | django | 93 |
6 | django | sys | 91 |
7 | numpy | matplotlib | 86 |
8 | re | sys | 86 |
9 | os | numpy | 82 |
10 | logging | os | 73 |
11 | os | datetime | 70 |
12 | os | json | 70 |
13 | os | setuptools | 66 |
14 | numpy | sys | 66 |
15 | datetime | sys | 60 |
16 | os | argparse | 60 |
17 | json | sys | 57 |
18 | os | __future__ | 57 |
19 | time | re | 55 |
20 | os | subprocess | 54 |
21 | numpy | sklearn | 53 |
22 | time | datetime | 52 |
23 | time | json | 51 |
24 | pandas | numpy | 50 |
25 | logging | sys | 49 |
26 | os | random | 49 |
27 | sys | subprocess | 48 |
28 | argparse | sys | 48 |
29 | __future__ | sys | 48 |
30 | sys | __future__ | 48 |
31 | sklearn | matplotlib | 46 |
32 | os | requests | 45 |
33 | numpy | tensorflow | 45 |
34 | time | numpy | 44 |
35 | logging | time | 43 |
36 | os | matplotlib | 42 |
37 | math | sys | 40 |
38 | setuptools | sys | 39 |
39 | re | json | 39 |
40 | datetime | json | 38 |
在通过爬虫获取的16800个仓库中,使用许可证的统计情况(许可证项为None的在统计中会被忽略)如下。其中mit许可证占了超过三分之一的比例,被使用最多。
许可证名 | 仓库数 | 百分比 |
---|---|---|
MIT License | 4736 | 38.90% |
Apache License 2.0 | 1558 | 12.80% |
Other | 1551 | 12.74% |
BSD 3-clause "New" or "Revised" License | 1478 | 12.14% |
GNU General Public License v3.0 | 1157 | 9.50% |
GNU General Public License v2.0 | 532 | 4.37% |
BSD 2-clause "Simplified" License | 433 | 3.56% |
GNU Affero General Public License v3.0 | 191 | 1.57% |
GNU Lesser General Public License v3.0 | 133 | 1.09% |
The Unlicense | 96 | 0.79% |
ISC License | 80 | 0.66% |
GNU Lesser General Public License v2.1 | 67 | 0.55% |
Do What The F*ck You Want To Public License | 48 | 0.39% |
Creative Commons Zero v1.0 Universal | 40 | 0.33% |
Mozilla Public License 2.0 | 37 | 0.30% |
SIL Open Font License 1.1 | 12 | 0.10% |
Eclipse Public License 1.0 | 6 | 0.05% |
Creative Commons Attribution Share Alike 4.0 | 5 | 0.04% |
zlib License | 4 | 0.03% |
Creative Commons Attribution 4.0 | 3 | 0.02% |
Artistic License 2.0 | 2 | 0.02% |
BSD 3-clause Clear License | 2 | 0.02% |
Open Software License 3.0 | 1 | 0.01% |
European Union Public License 1.1 | 1 | 0.01% |
PostgreSQL License | 1 | 0.01% |
在通过爬虫获取的16800个仓库中,最常被使用的标签如下。其中,深度学习相关标签(如,deep-learning, pytorch, tensorflow, neural-network, keras, gan等),占了很大一部分;网页制作框架(如,django, flask),也出现了较多次。
序号 | 标签 | 出现次数 |
---|---|---|
1 | python | 2403 |
2 | deep-learning | 373 |
3 | tensorflow | 330 |
4 | machine-learning | 307 |
5 | django | 289 |
6 | python3 | 188 |
7 | pytorch | 151 |
8 | security | 139 |
9 | linux | 128 |
10 | flask | 124 |
11 | cli | 99 |
12 | keras | 97 |
13 | docker | 93 |
14 | nlp | 90 |
15 | data-science | 82 |
16 | natural-language-processing | 77 |
17 | computer-vision | 76 |
18 | python-3 | 72 |
19 | asyncio | 71 |
20 | neural-network | 69 |
21 | api | 68 |
22 | visualization | 64 |
23 | reinforcement-learning | 63 |
24 | terminal | 61 |
25 | windows | 60 |
26 | javascript | 56 |
27 | bot | 51 |
28 | python-library | 50 |
29 | raspberry-pi | 49 |
30 | gan | 49 |
序号 | 包名 | 出现次数 |
---|---|---|
1 | tensorflow | 330 |
2 | http | 42 |
3 | numpy | 39 |
4 | json | 37 |
5 | sqlalchemy | 36 |
6 | pandas | 33 |
7 | theano | 32 |
8 | jupyter | 29 |
9 | scikit-learn | 28 |
10 | html | 23 |
11 | ipython | 21 |
12 | 20 | |
13 | opencv | 20 |
14 | matplotlib | 20 |
15 | parser | 18 |
16 | logging | 18 |
17 | requests | 17 |
18 | twisted | 16 |
19 | csv | 16 |
20 | aiohttp | 16 |
序号 | package-1 | package-2 | 出现次数 |
---|---|---|---|
1 | theano | tensorflow | 14 |
2 | numpy | pandas | 9 |
3 | http | requests | 8 |
4 | jupyter | ipython | 7 |
5 | pandas | matplotlib | 6 |
6 | numpy | scipy | 5 |
7 | pyside | pyqt4 | 5 |
8 | pandas | scikit-learn | 5 |
9 | scipy | numpy | 5 |
10 | numpy | opencv | 4 |
11 | numpy | tensorflow | 4 |
12 | json | html | 4 |
13 | html | json | 4 |
14 | opencv | tensorflow | 4 |
15 | pip | virtualenv | 4 |
16 | tensorflow | numpy | 4 |
17 | numpy | scikit-learn | 3 |
18 | csv | json | 3 |
19 | threading | multiprocessing | 3 |
20 | json | http | 3 |
序号 | 包名 | 出现次数 |
---|---|---|
0 | numpy | 2229 |
1 | pandas | 1902 |
2 | matplotlib | 1584 |
3 | string | 797 |
4 | scipy | 654 |
5 | regex | 527 |
6 | tkinter | 478 |
7 | pip | 474 |
8 | json | 401 |
9 | opencv | 396 |
10 | csv | 386 |
11 | sqlalchemy | 378 |
12 | datetime | 359 |
13 | subprocess | 348 |
14 | multiprocessing | 345 |
15 | tensorflow | 340 |
16 | virtualenv | 297 |
17 | scikit-learn | 289 |
18 | pyqt | 276 |
19 | ipython | 264 |
20 | html | 261 |
21 | nltk | 223 |
22 | logging | 202 |
23 | lxml | 174 |
24 | math | 170 |
25 | pyqt4 | 167 |
26 | cython | 161 |
27 | pickle | 149 |
28 | pygame | 142 |
29 | random | 139 |
30 | argparse | 133 |
31 | ctypes | 132 |
32 | setuptools | 127 |
33 | wxpython | 124 |
34 | jinja2 | 115 |
35 | twisted | 115 |
36 | time | 107 |
37 | 104 | |
38 | tornado | 95 |
39 | ssl | 89 |
40 | distutils | 84 |
41 | py2exe | 82 |
42 | mysql-python | 81 |
43 | sqlite3 | 80 |
44 | networkx | 73 |
45 | io | 72 |
46 | pymongo | 69 |
47 | pywin32 | 63 |
48 | types | 62 |
49 | pyside | 62 |
50 | pygtk | 57 |
51 | kivy | 57 |
52 | queue | 54 |
53 | pyodbc | 51 |
54 | jupyter | 51 |
55 | itertools | 49 |
56 | cgi | 48 |
57 | pytz | 41 |
58 | nose | 41 |
59 | gevent | 41 |
60 | sympy | 40 |
61 | pdb | 39 |
62 | console | 38 |
63 | pyserial | 38 |
64 | bokeh | 38 |
65 | statsmodels | 37 |
66 | pillow | 35 |
67 | decimal | 34 |
68 | theano | 34 |
69 | copy | 32 |
70 | spyder | 31 |
71 | pytables | 30 |
72 | reportlab | 28 |
73 | numbers | 27 |
74 | rpy2 | 27 |
75 | zipfile | 26 |
76 | pyyaml | 25 |
77 | base64 | 24 |
78 | warnings | 24 |
79 | h5py | 24 |
80 | heatmap | 23 |
81 | pycurl | 23 |
82 | collections | 22 |
83 | cmd | 22 |
84 | pyparsing | 22 |
85 | gzip | 21 |
86 | struct | 21 |
87 | smtplib | 20 |
88 | keyword | 19 |
89 | simplejson | 19 |
90 | ftplib | 19 |
91 | configparser | 19 |
92 | readline | 19 |
93 | scikit-image | 18 |
94 | pyaudio | 18 |
95 | python-dateutil | 18 |
96 | numba | 18 |
97 | sys | 17 |
98 | select | 17 |
99 | gensim | 17 |
100 | mercurial | 17 |
序号 | package1 | package2 | 出现次数 |
---|---|---|---|
1 | numpy | scipy | 790 |
2 | numpy | pandas | 430 |
3 | matplotlib | numpy | 384 |
4 | matplotlib | pandas | 288 |
5 | matplotlib | scipy | 146 |
6 | pyqt4 | pyqt | 146 |
7 | pip | virtualenv | 138 |
8 | csv | pandas | 126 |
9 | datetime | pandas | 104 |
10 | regex | string | 92 |
11 | cython | numpy | 92 |
12 | scikit-learn | numpy | 90 |
13 | opencv | numpy | 86 |
14 | setuptools | pip | 64 |
15 | matplotlib | ipython | 62 |
16 | pyqt | pyside | 62 |
17 | setuptools | distutils | 60 |
18 | scikit-learn | pandas | 60 |
19 | numpy | math | 60 |
20 | pandas | ipython | 56 |
21 | jupyter | ipython | 54 |
22 | csv | numpy | 54 |
23 | datetime | time | 50 |
24 | pandas | scipy | 50 |
25 | tkinter | matplotlib | 44 |
26 | pip | numpy | 44 |
27 | random | numpy | 44 |
28 | multiprocessing | numpy | 42 |
29 | matplotlib | heatmap | 38 |
30 | datetime | pytz | 36 |
31 | numpy | tensorflow | 36 |
32 | json | pandas | 34 |
33 | queue | multiprocessing | 34 |
34 | scikit-learn | scipy | 34 |
35 | statsmodels | pandas | 30 |
36 | pytables | pandas | 30 |
37 | pandas | string | 30 |
38 | numba | numpy | 30 |
39 | multiprocessing | pickle | 28 |
40 | networkx | matplotlib | 28 |
41 | scipy | math | 28 |
42 | simplejson | json | 26 |
43 | h5py | numpy | 24 |
44 | pyqt | matplotlib | 24 |
45 | datetime | numpy | 24 |
46 | sympy | numpy | 24 |
47 | smtplib | 22 | |
48 | statsmodels | numpy | 20 |
49 | pip | distutils | 20 |
50 | pytables | numpy | 20 |
51 | pickle | numpy | 20 |
52 | pandas | sqlalchemy | 20 |
53 | distutils | cython | 20 |
54 | datetime | matplotlib | 20 |
55 | html | regex | 20 |
56 | ctypes | numpy | 18 |
57 | pyqt4 | pyside | 18 |
58 | csv | json | 16 |
59 | opencv | matplotlib | 16 |
60 | lxml | html | 16 |
61 | ode | scipy | 16 |
62 | sympy | scipy | 16 |
63 | matplotlib | wxpython | 16 |
64 | setuptools | virtualenv | 14 |
65 | pip | lxml | 14 |
66 | numpy | ipython | 14 |
67 | html | pandas | 14 |
68 | pip | matplotlib | 14 |
69 | json | pickle | 14 |
70 | json | string | 14 |
71 | virtualenv | matplotlib | 14 |
72 | pyqt4 | matplotlib | 14 |
73 | datetime | string | 14 |
74 | pymongo | json | 14 |
75 | html | jinja2 | 14 |
76 | statsmodels | scipy | 14 |
77 | pip | mysql-python | 12 |
78 | multiprocessing | subprocess | 12 |
79 | py2exe | pyqt | 12 |
80 | tkinter | wxpython | 12 |
81 | spyder | matplotlib | 12 |
82 | random | string | 12 |
83 | nltk | tokenize | 12 |
84 | json | html | 12 |
85 | pip | tensorflow | 12 |
86 | opencv | scikit-image | 12 |
87 | regex | pandas | 10 |
88 | csv | string | 10 |
89 | pip | pygame | 10 |
90 | nltk | regex | 10 |
91 | types | pandas | 10 |
92 | pandas | heatmap | 10 |
93 | numpy | string | 10 |
94 | py2exe | wxpython | 10 |
95 | ctypes | pywin32 | 10 |
96 | tkinter | pygame | 10 |
97 | sympy | matplotlib | 10 |
98 | io | pandas | 10 |
99 | scikit-learn | multiprocessing | 10 |
100 | pip | ipython | 10 |
序号 | 包名 | 热度 |
---|---|---|
0 | string | 69064758 |
1 | pandas | 45328071 |
2 | numpy | 43093597 |
3 | matplotlib | 36833422 |
4 | pip | 27232593 |
5 | datetime | 19288259 |
6 | json | 17297712 |
7 | regex | 12370782 |
8 | subprocess | 10981344 |
9 | csv | 10966890 |
10 | scipy | 10668052 |
11 | tkinter | 8859804 |
12 | virtualenv | 8447119 |
13 | time | 7351270 |
14 | opencv | 6600929 |
15 | types | 6246745 |
16 | math | 6126282 |
17 | random | 6106164 |
18 | ipython | 4965214 |
19 | html | 4889697 |
20 | logging | 4587029 |
21 | sqlalchemy | 4286895 |
22 | multiprocessing | 4051885 |
23 | tensorflow | 3794396 |
24 | setuptools | 3371261 |
25 | scikit-learn | 2989051 |
26 | mysql-python | 2891747 |
27 | io | 2805214 |
28 | pyqt | 2791496 |
29 | nltk | 2733686 |
30 | copy | 2600492 |
31 | lxml | 2521582 |
32 | 2394460 | |
33 | pickle | 2291889 |
34 | argparse | 2197746 |
35 | ssl | 2039735 |
36 | pygame | 1924669 |
37 | numbers | 1911016 |
38 | jinja2 | 1880599 |
39 | keyword | 1864734 |
40 | pyqt4 | 1673453 |
41 | decimal | 1547834 |
42 | wxpython | 1525681 |
43 | sqlite3 | 1339759 |
44 | console | 1313286 |
45 | timeit | 1238865 |
46 | pyserial | 1228376 |
47 | ctypes | 1100857 |
48 | distutils | 1071212 |
49 | py2exe | 1052789 |