get data from openAlex and do some data analysis
git clone https://github.com/XuanlongQ/HuluProject.git
pip install -r requirements.txt
cd HuluProject/
python3 pro/bin/start.py
sh start_py.sh
docs: all the universities' attributes
- docs/doi_ori_url.txt: doi url of OpenAlex (support)
- docs/doi.txt: universities' doi (support)
- docs/doi_university.txt: universities' doi and their name (support)
- docs/grids.txt:universities' iped,grid (support)
- docs/high_citedpaper_doi.txt: high cited papers' doi and openAlex url (support)
- docs/iped_grid.txt: universities' iped,grid (support)
- docs/ipeds.txt: universities' iped (support)
- docs/ror_grid.txt:universities' ror,grid (support)
- docs/school_type.txt: universities' type (support)
-
- docs/iped_grid_ror_backup.txt: universities' iped,grid,openalAlex ror (key file)
concepts.txt: all concepts level () conceptEntity.txt: parse concept - do not delete this file
pro: all the program code
- pro/bin:
- pro/bin/start.py : please confirm which script you want to apply
- pro/bin/r_clustring.R : r clustering algorithm
- pro/conf: config.yaml file for path config
- pro/lib: all code
pro/experimentdata : all the experiment data
- pro/experimentdata/r_image: r image
- pro/experimentdata/r_image_csv: csv data of r image
- pro/experimentdata/universities: original data of craweled data - sublevel
- pro/experimentdata/university_concepts: original data of craweled data - level 0
- pro/experimentdata/university_type: merged data
Using for crawl data from path (data["Path"]["iped_grid_name_ror"])
Using for transfer universities' concept from sublevel to level 0, it must run after script1
Using for transfer script2's result to 4 types
4.1 get high cited papers' original and destination concepts 4.2 get high cited paper' original and destination concepts - level0