python150k

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
ast_conversion.py		ast_conversion.py
parse_python3.py		parse_python3.py
prepare_data.py		prepare_data.py
preprocess.py		preprocess.py
processor_ast.py		processor_ast.py

README.md

150k Python Dataset

We provide source file used to obtain the py150 dataset (https://www.sri.inf.ethz.ch/py150). The archive contains the following files:

data.tar.gz -- Archive containing all the source files
python100k_train.txt -- List of files used in the training dataset.
python50k_eval.txt -- List of files used in the evaluation dataset.
github_repos.txt -- List of GitHub repositories and their revisions used to obtain the dataset.

Note that the order of python100k_train.txt and python100k_train.json (containing the ASTs of the parsed files) are the same. That is, parsing the n-th file from python100k_train.txt produces n-th ASTs in python100k_train.json

How to use?

Run:

python preprocess.py --dirname <folder with python150k>

Then functions are stored in:

| parsed
--> python150k_sequence.txt
--> python150k_docstrings.txt
--> python150k_ast.txt
--> python150k_comments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

python150k

python150k

README.md

150k Python Dataset

How to use?

Files

python150k

Directory actions

More options

Directory actions

More options

Latest commit

History

python150k

Folders and files

parent directory

README.md

150k Python Dataset

How to use?