This project is a slightly modified version of AnKushRR/Wikipedia-Text-Extractor. The getWikiPages.py extracts plan text from Wikipedia of about 110 game titles given in the pageTitleList.txt file. The output contains extracted page content for each game title. The AllGameDescriptions.txt includes all extracted content as one plane text file, which is good for using it for machine training. The getWikiGamePlayDescription.py file extracts Wikipedia content about the gameplay section only. Here is an example of the output of GameplayPuzzles.txt.
- Python3+
- You will need to download Wikipedia pip3 library
- Download the repository
- You may edit the pageTitleList.txt with your custom Wikepedia page titles. Make sure that the page title is the as in the respective Wikipedia URL link.
- Run getWikiPages.py to extract all text.
- Run getWikiGamePlayDescription.py to extract the gameplay content only.
- A new file will be created in the main directory called AllGameDescriptions
- A new directory will be created named Output