NativeExtractor module for Python

This is official Python binding for the NativeExtractor project.

Installation

Requirements

Python >=2.7 (>3 usage is highly recommended)
pip
build-essential (gcc, make)
libglib2.0, libglib2.0-dev, libpythonX-dev

We recommend to use virtual environments.

virtualenv myproject
source myproject/bin/activate

or

python -m venv myproject
source myproject/bin/activate

Instant PyPi solution

pip install pynativeextractor

Manual

Clone the repo git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git
Install via pip or pip3
```
pip install -e ./pynativeextractor/
```

Typical usage

import os
from pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH

# Construct new Extractor instance
ex = Extractor()
# Add fictional miner from web_entities.so with name match_url matching all URLs
ex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')
text = '{}'.format("https://spongedata.cz")

# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)
with BufferStream(text) as bf:
    # Initialize occurrences list as empty list
    occurrences = []
    # Set the stream to the extractor
    with ex.set_stream(bf):
        # Mine all occurrences of URLs
        while not ex.eof():
            # Summarize occurrences
            occurrences += ex.next()

print(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
nativeextractor @ 35c8f9a		nativeextractor @ 35c8f9a
pynativeextractor		pynativeextractor
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
logo_python.png		logo_python.png
nativeextractormodule.c		nativeextractormodule.c
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NativeExtractor module for Python

Installation

Requirements

Instant PyPi solution

Manual

Typical usage

About

Releases 12

Packages

Contributors 4

Languages

License

SpongeData-cz/pynativeextractor

Folders and files

Latest commit

History

Repository files navigation

NativeExtractor module for Python

Installation

Requirements

Instant PyPi solution

Manual

Typical usage

About

Resources

License

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 4

Languages

Packages