pywebber

Python Web Development Tools

Utilities

Link and words harvester Ripper

parsers

html.parser $ pip install html5lib # html5lib lxml, lxml-xml

Text generator LoremPysum

Installation

pip install pywebber --upgrade
pip install https://github.com/Parousiaic/pywebber/archive/master.zip

Usage

Ripper - harvest words and links on a static web page.

$ from pywebber import Ripper

Accessing words and links is easy

$ page = Ripper('http://python.org')
$ soup = page.soup # the raw Beautifulsoup4 object
$ uncleaned_links = page.raw_links # all raw <a> tags on page as bs4 objects
$ cleaned_links = page.links() # generator of all links in the form `http://www.domain.location`
$ words = page.words() # a generator of words between <p> tags

The following object creation options are available

url : Default to url="http://python.org"
parser : Default to parser="html.parser". To see a complete list of parsers, user object_instance.parsers
refresh : Default to refresh=False. The first time Ripper hits a page, it saves a prettified Beautifulsoup4 object of the scrapped page in a text file from which consequent calling of the class reads. But if set to True, Ripper will hit the site to get its data every single time its called to construc the page object.
save_path : Default to save_path=None. In this case, Ripper creates a folder on your USER DESKTOP. This folder name is in the format domainName_extension. Every page scrapped from that site is saved inside this folder. Its also possible to set save_path=/some/other/path. The save file name is of the format page_url.txt
split_string : Defaults to string.punctuation.extend(["n", " ", "://",]). You can supply a list to add to this set.
stop_words : Defaults to ['', '#', '\n', 'the', 'to', "but", "and"]. These are words that should not be included when object_instance.words() is called. You can supply a list to add to this set.

LoremPysum - Generate random texts

$ from pywebber import LoremPysum

Create a single LoremPysum instance with default Lorem Ipsum text

$ p = LoremPysum(*args, domains=None, lorem=True)

You can also decide to include your words with the standard lorem ipsum text. But if you want your words only simply pass lorem=False like this ::

$ p = LoremPysum(*args, domains=None, lorem=False)

*args is an optional list of files from which to get the words to be used. Just pass any number of text files as shown below

$ p = LoremPysum("file1_path.txt1", "file2_path.txt", domains=None, lorem=True)

The following methods are defined

$ p.email() # return a single email address. You could pass in a file for list of domains. Defaults are `[".com", ".info", ".net", ".org"]`
$ p.name() # return a name in the form "firstname I. lastname".
$ p.sentence() # generate a single sentence.
$ p.paragraphs() # return a single paragraph of standard Lorem Ipsum text.
$ p.paragraphs(count=3) # return 3 paragraphs where the first paragraph is the standard text.
$ p.paragraphs(common=False) # return a single paragraph where the first paragraph is random.
$ p.title() # generate a string (title case) with 2 to n words. Defaults is 5. Good for article titles.

In case you want to look into the words used, the following instance attributes are defined. ::

$ p.common # A list of the first few words in the lorem ipsum text
$ p.words # A list of all the words in the lorem ipsum text.
$ p.standard # Standard lorem ipsum text. Usually the first 1/3rd portion of a sample file.
$ p.domains # list of domain name endings

Code

Credits

Luca De Vitis for the inspiration and starter code for LoremPysum
'BeautifulSoup documentation'

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.ipynb_checkpoints		.ipynb_checkpoints
build/lib/pywebber		build/lib/pywebber
dist		dist
pywebber.egg-info		pywebber.egg-info
pywebber		pywebber
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
ab.ipynb		ab.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

License

chidimo/pywebber

Folders and files

Latest commit

History

Repository files navigation

pywebber

Utilities

parsers

Installation

Usage

Ripper - harvest words and links on a static web page.

LoremPysum - Generate random texts

Code

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Languages