In [2]:
%load_ext autoreload
%autoreload 2

# Download the PDFs
This package uses an endpoint from [en.1jour-1jeu](https://en.1jour-1jeu.com) to collect board game manuals.

In [4]:
# Import the necessary libraries
from src.data_extraction.extractor import Extractor

### First time 

In [31]:
# first we initialize the Extractor object
extractor = Extractor()

In [None]:
# The extractor will query each search result page for an empty title search
# and extracts title and link to manual from each game
extractor.get_all_game_infos(last_page_number=2)

print("GameInfos saved internally:", len(extractor.games_list))

-> Extracting games from page 1
Extracted kingdomino
Extracted battle sheep
Extracted dr eureka
Extracted codenames
Extracted when i dream
Extracted unlock escape adventures
Extracted terraforming mars
Extracted 7 wonders duel
Extracted exploding kittens
Extracted this war of mine the board game
Extracted queendomino
Extracted 7 wonders
-> Finished extracting games from page 1
------------------------------
-> Extracting games from page 2
Extracted zombicide
Extracted clank a deck building adventure
Extracted 7 wonders duel pantheon
Extracted scythe strateges des cieux
Extracted forbidden stars
Extracted 7 wonders leaders
Extracted 7 wonders cities anniversary pack
Extracted 7 wonders leaders anniversary pack
-> Finished extracting games from page 2
------------------------------
Successfully extracted 309 games
GameInfos saved internally: 309


We can also reassign to extractor

In [None]:
ex_direct = Extractor().get_all_game_infos(last_page_number=1)

-> Extracting games from page 1
Extracted kingdomino
Extracted battle sheep
Extracted dr eureka
Extracted codenames
Extracted when i dream
Extracted unlock escape adventures
Extracted terraforming mars
Extracted 7 wonders duel
Extracted exploding kittens
Extracted this war of mine the board game
Extracted queendomino
Extracted 7 wonders
-> Finished extracting games from page 1
------------------------------
Successfully extracted 12 games


Metadata will be saved as GameInfo dataclass containing:
* `link` to manual
* `title` of the board game
* `language` of the manual

In [41]:
extractor.games_list[0]

GameInfo(link='https://cdn.1j1ju.com/medias/22/d4/b3-kingdomino-regle.pdf', title='kingdomino', language='english')

We can __save__ the games list as a json 

In [42]:
# path can be configured in config.py
extractor.save_games_list(file_name="first_two_pages")

Saving Games List to /home/knolli/code/Knolli14/game-manual-extractor/data/first_two_pages.json
Successfully saved Games List


using `Extractor().download_manual()` to dl specific game 

In [None]:
# extract game from internal list
game = extractor.games_list[10]
print(game)

queendomino, in (english)


In [48]:
# or use this method for all infos
game._display_info()

('Title: queendomino',
 'Link: https://cdn.1j1ju.com/medias/56/5a/f3-queendomino-regle.pdf',
 'Language: English')

In [53]:
extractor.download_manual(game)

Downloading queendomino
Successfully downloaded to /home/knolli/code/Knolli14/game-manual-extractor/data/pdf/queendomino.pdf
Successfully downloaded queendomino


We can __download all__ manuals at once using `src.main`

In [59]:
!python main.py

-> Extracting games from page 1
Extracted kingdomino
Extracted battle sheep
Extracted dr eureka
Extracted codenames
Extracted when i dream
Extracted unlock escape adventures
Extracted terraforming mars
Extracted 7 wonders duel
Extracted exploding kittens
Extracted this war of mine the board game
Extracted queendomino
Extracted 7 wonders
-> Finished extracting games from page 1
------------------------------
Successfully extracted 12 games

Downloading kingdomino
-> Successfully downloaded to /home/knolli/code/Knolli14/game-manual-extractor/data/pdf/kingdomino.pdf

Downloading battle sheep
-> Successfully downloaded to /home/knolli/code/Knolli14/game-manual-extractor/data/pdf/battle sheep.pdf

Downloading dr eureka
-> Successfully downloaded to /home/knolli/code/Knolli14/game-manual-extractor/data/pdf/dr eureka.pdf

Downloading codenames
-> Successfully downloaded to /home/knolli/code/Knolli14/game-manual-extractor/data/pdf/codenames.pdf

Downloading when i dream
-> Successfully downloa

### Initialize from a saved json file

In [2]:
filename = 'first_two_pages'
ex_json = Extractor.from_json(filename)

Loading Games List from /home/knolli/code/Knolli14/game-manual-extractor/data/first_two_pages.json
Successfully loaded Games List into Extractor


In [3]:
len(ex_json.games_list)

309