Skip to content

aws-haoyuli/Relationship-Analysis-between-Packages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relationship-Analysis-between-Packages

Research on the relationship between packages in Python.

以下为统计、分析结果:

Github仓库中packages使用情况统计

在通过爬虫获取的1000个仓库中,统计它们使用的packages,最常被使用的packages如下。其中,一些python自带的包(os, sys, time)最常被使用;第三方库中,numpy被使用频率最高;网页制作框架,数据可视化与数据处理相关的包,机器学习/深度学习相关的包等,出现频率也很高。

序号 package名 被使用次数
1 os 419
2 sys 366
3 time 183
4 numpy 180
5 re 136
6 json 118
7 setuptools 117
8 datetime 112
9 django 95
10 random 89
11 logging 89
12 matplotlib 83
13 requests 82
14 argparse 78
15 __future__ 76
16 math 69
17 subprocess 64
18 pandas 63
19 unittest 59
20 urllib 57
21 collections 53
22 tensorflow 48
23 sklearn 47
24 flask 44
25 distutils 41
26 threading 39
27 scipy 37
28 shutil 36
29 csv 35
30 hashlib 33

同时出现在一个工程中的packages统计

在通过爬虫获取的1000个仓库中,统计它们使用的packages,最常被使用在同一个工程中的packages如下。其中,与系统相关的包(如,os,time,sys)经常出现;与科学计算,数据处理相关的包(如,numpy, matplotlib, pandas)也经常一起出现。

序号 package-1 package-2 同时出现次数
1 os sys 283
2 os time 118
3 os re 101
4 time sys 97
5 os django 93
6 django sys 91
7 numpy matplotlib 86
8 re sys 86
9 os numpy 82
10 logging os 73
11 os datetime 70
12 os json 70
13 os setuptools 66
14 numpy sys 66
15 datetime sys 60
16 os argparse 60
17 json sys 57
18 os __future__ 57
19 time re 55
20 os subprocess 54
21 numpy sklearn 53
22 time datetime 52
23 time json 51
24 pandas numpy 50
25 logging sys 49
26 os random 49
27 sys subprocess 48
28 argparse sys 48
29 __future__ sys 48
30 sys __future__ 48
31 sklearn matplotlib 46
32 os requests 45
33 numpy tensorflow 45
34 time numpy 44
35 logging time 43
36 os matplotlib 42
37 math sys 40
38 setuptools sys 39
39 re json 39
40 datetime json 38

GitHub上仓库使用的许可证统计

在通过爬虫获取的16800个仓库中,使用许可证的统计情况(许可证项为None的在统计中会被忽略)如下。其中mit许可证占了超过三分之一的比例,被使用最多。

许可证名 仓库数 百分比
MIT License 4736 38.90%
Apache License 2.0 1558 12.80%
Other 1551 12.74%
BSD 3-clause "New" or "Revised" License 1478 12.14%
GNU General Public License v3.0 1157 9.50%
GNU General Public License v2.0 532 4.37%
BSD 2-clause "Simplified" License 433 3.56%
GNU Affero General Public License v3.0 191 1.57%
GNU Lesser General Public License v3.0 133 1.09%
The Unlicense 96 0.79%
ISC License 80 0.66%
GNU Lesser General Public License v2.1 67 0.55%
Do What The F*ck You Want To Public License 48 0.39%
Creative Commons Zero v1.0 Universal 40 0.33%
Mozilla Public License 2.0 37 0.30%
SIL Open Font License 1.1 12 0.10%
Eclipse Public License 1.0 6 0.05%
Creative Commons Attribution Share Alike 4.0 5 0.04%
zlib License 4 0.03%
Creative Commons Attribution 4.0 3 0.02%
Artistic License 2.0 2 0.02%
BSD 3-clause Clear License 2 0.02%
Open Software License 3.0 1 0.01%
European Union Public License 1.1 1 0.01%
PostgreSQL License 1 0.01%

GitHub上最常见的标签

在通过爬虫获取的16800个仓库中,最常被使用的标签如下。其中,深度学习相关标签(如,deep-learning, pytorch, tensorflow, neural-network, keras, gan等),占了很大一部分;网页制作框架(如,django, flask),也出现了较多次。

序号 标签 出现次数
1 python 2403
2 deep-learning 373
3 tensorflow 330
4 machine-learning 307
5 django 289
6 python3 188
7 pytorch 151
8 security 139
9 linux 128
10 flask 124
11 cli 99
12 keras 97
13 docker 93
14 nlp 90
15 data-science 82
16 natural-language-processing 77
17 computer-vision 76
18 python-3 72
19 asyncio 71
20 neural-network 69
21 api 68
22 visualization 64
23 reinforcement-learning 63
24 terminal 61
25 windows 60
26 javascript 56
27 bot 51
28 python-library 50
29 raspberry-pi 49
30 gan 49

标签中最常见的packages

序号 包名 出现次数
1 tensorflow 330
2 http 42
3 numpy 39
4 json 37
5 sqlalchemy 36
6 pandas 33
7 theano 32
8 jupyter 29
9 scikit-learn 28
10 html 23
11 ipython 21
12 email 20
13 opencv 20
14 matplotlib 20
15 parser 18
16 logging 18
17 requests 17
18 twisted 16
19 csv 16
20 aiohttp 16

最常组合出现在标签中的packages

序号 package-1 package-2 出现次数
1 theano tensorflow 14
2 numpy pandas 9
3 http requests 8
4 jupyter ipython 7
5 pandas matplotlib 6
6 numpy scipy 5
7 pyside pyqt4 5
8 pandas scikit-learn 5
9 scipy numpy 5
10 numpy opencv 4
11 numpy tensorflow 4
12 json html 4
13 html json 4
14 opencv tensorflow 4
15 pip virtualenv 4
16 tensorflow numpy 4
17 numpy scikit-learn 3
18 csv json 3
19 threading multiprocessing 3
20 json http 3

Stackoverflow36k个问题中包出现情况

序号 包名 出现次数
0numpy2229
1pandas1902
2matplotlib1584
3string797
4scipy654
5regex527
6tkinter478
7pip474
8json401
9opencv396
10csv386
11sqlalchemy378
12datetime359
13subprocess348
14multiprocessing345
15tensorflow340
16virtualenv297
17scikit-learn289
18pyqt276
19ipython264
20html261
21nltk223
22logging202
23lxml174
24math170
25pyqt4167
26cython161
27pickle149
28pygame142
29random139
30argparse133
31ctypes132
32setuptools127
33wxpython124
34jinja2115
35twisted115
36time107
37email104
38tornado95
39ssl89
40distutils84
41py2exe82
42mysql-python81
43sqlite380
44networkx73
45io72
46pymongo69
47pywin3263
48types62
49pyside62
50pygtk57
51kivy57
52queue54
53pyodbc51
54jupyter51
55itertools49
56cgi48
57pytz41
58nose41
59gevent41
60sympy40
61pdb39
62console38
63pyserial38
64bokeh38
65statsmodels37
66pillow35
67decimal34
68theano34
69copy32
70spyder31
71pytables30
72reportlab28
73numbers27
74rpy227
75zipfile26
76pyyaml25
77base6424
78warnings24
79h5py24
80heatmap23
81pycurl23
82collections22
83cmd22
84pyparsing22
85gzip21
86struct21
87smtplib20
88keyword19
89simplejson19
90ftplib19
91configparser19
92readline19
93scikit-image18
94pyaudio18
95python-dateutil18
96numba18
97sys17
98select17
99gensim17
100mercurial17

StackOverflow中同一问题下包出现情况:

序号 package1 package2 出现次数
1numpyscipy790
2numpypandas430
3matplotlibnumpy384
4matplotlibpandas288
5matplotlibscipy146
6pyqt4pyqt146
7pipvirtualenv138
8csvpandas126
9datetimepandas104
10regexstring92
11cythonnumpy92
12scikit-learnnumpy90
13opencvnumpy86
14setuptoolspip64
15matplotlibipython62
16pyqtpyside62
17setuptoolsdistutils60
18scikit-learnpandas60
19numpymath60
20pandasipython56
21jupyteripython54
22csvnumpy54
23datetimetime50
24pandasscipy50
25tkintermatplotlib44
26pipnumpy44
27randomnumpy44
28multiprocessingnumpy42
29matplotlibheatmap38
30datetimepytz36
31numpytensorflow36
32jsonpandas34
33queuemultiprocessing34
34scikit-learnscipy34
35statsmodelspandas30
36pytablespandas30
37pandasstring30
38numbanumpy30
39multiprocessingpickle28
40networkxmatplotlib28
41scipymath28
42simplejsonjson26
43h5pynumpy24
44pyqtmatplotlib24
45datetimenumpy24
46sympynumpy24
47emailsmtplib22
48statsmodelsnumpy20
49pipdistutils20
50pytablesnumpy20
51picklenumpy20
52pandassqlalchemy20
53distutilscython20
54datetimematplotlib20
55htmlregex20
56ctypesnumpy18
57pyqt4pyside18
58csvjson16
59opencvmatplotlib16
60lxmlhtml16
61odescipy16
62sympyscipy16
63matplotlibwxpython16
64setuptoolsvirtualenv14
65piplxml14
66numpyipython14
67htmlpandas14
68pipmatplotlib14
69jsonpickle14
70jsonstring14
71virtualenvmatplotlib14
72pyqt4matplotlib14
73datetimestring14
74pymongojson14
75htmljinja214
76statsmodelsscipy14
77pipmysql-python12
78multiprocessingsubprocess12
79py2exepyqt12
80tkinterwxpython12
81spydermatplotlib12
82randomstring12
83nltktokenize12
84jsonhtml12
85piptensorflow12
86opencvscikit-image12
87regexpandas10
88csvstring10
89pippygame10
90nltkregex10
91typespandas10
92pandasheatmap10
93numpystring10
94py2exewxpython10
95ctypespywin3210
96tkinterpygame10
97sympymatplotlib10
98iopandas10
99scikit-learnmultiprocessing10
100pipipython10

StackOverflow中热度最高问题排列:

                       
序号包名热度
0string69064758
1pandas45328071
2numpy43093597
3matplotlib36833422
4pip27232593
5datetime19288259
6json17297712
7regex12370782
8subprocess10981344
9csv10966890
10scipy10668052
11tkinter8859804
12virtualenv8447119
13time7351270
14opencv6600929
15types6246745
16math6126282
17random6106164
18ipython4965214
19html4889697
20logging4587029
21sqlalchemy4286895
22multiprocessing4051885
23tensorflow3794396
24setuptools3371261
25scikit-learn2989051
26mysql-python2891747
27io2805214
28pyqt2791496
29nltk2733686
30copy2600492
31lxml2521582
32email2394460
33pickle2291889
34argparse2197746
35ssl2039735
36pygame1924669
37numbers1911016
38jinja21880599
39keyword1864734
40pyqt41673453
41decimal1547834
42wxpython1525681
43sqlite31339759
44console1313286
45timeit1238865
46pyserial1228376
47ctypes1100857
48distutils1071212
49py2exe1052789

About

Research on the relationship between packages in Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published