Skip to content

Commit

Permalink
feat: Dynamic load of mapping modules
Browse files Browse the repository at this point in the history
Remove obsolete mapping modules

Minor refactor mapping load

fix yml formatting errors

Minor formatting in docs
  • Loading branch information
Sieboldianus committed Dec 11, 2020
1 parent 91f31f1 commit 4168509
Show file tree
Hide file tree
Showing 11 changed files with 115 additions and 443 deletions.
2 changes: 1 addition & 1 deletion docs/input-types.md
@@ -1,4 +1,4 @@
# Input type: file, url, or database?
# Input type: File, URL, or Database?

lbsntransform can read data from different common types of data sources:

Expand Down
11 changes: 6 additions & 5 deletions docs/output-mappings.md
Expand Up @@ -100,11 +100,12 @@ For example:
lbsntransform --include_lbsn_bases hashtag,place,date,community
```

would fill/update entries of the hlldb structures:
- topical.hashtag
- spatial.place
- temporal.date
- social.community
..would fill/update entries of the hlldb structures:

- topical.hashtag
- spatial.place
- temporal.date
- social.community

This name refers to `schema.table`.

Expand Down
13 changes: 6 additions & 7 deletions docs/use-cases.md
Expand Up @@ -2,21 +2,20 @@ If you're using the command line interface, a common usage of lbsntransform is t
import/convert arbitrary social media data, e.g. from Flickr or Twitter, to a Postgres Database
with the [common lbsn structure](https://lbsn.vgiscience.org/)

The following use cases exist:

1. importing lbsntransform as a package
The following two primary use cases exist:

1. **Importing lbsntransform as a package**
Use this approach to convert data, such as individual posts
retrieved from an API, on-the-fly (in-memory), in your own
python package.

2. using the command line interface (cli) to perform batch conversions

2. **Using the command line interface (cli) to perform batch conversions**
Use this approach if you want to convert batches of data stored as
arbitrary json/csv files, or if you want to convert from a database
with the raw lbsn structure to a database with the privacy-aware hll
format.

For any conversion,
- the input type must be provided, see [input-types](input-types)
- a mapping must exist, see [input-mappings](input-mappings)

- the input type must be provided, see [input-types](/input-types)
- a mapping must exist, see [input-mappings](/input-mappings)
18 changes: 9 additions & 9 deletions lbsntransform/__main__.py
Expand Up @@ -36,9 +36,17 @@ def main():
# Parse args
config.parse_args()

# initialize mapping class
# depending on lbsn origin
# e.g. 1 = Instagram,
# 2 = Flickr, 2.1 = Flickr YFCC100m,
# 3 = Twitter)
importer = HF.load_importer_mapping_module(
config.origin, config.mappings_path)

# initialize lbsntransform
lbsntransform = LBSNTransform(
origin_id=config.origin,
importer=importer,
logging_level=config.logging_level,
is_local_input=config.is_local_input,
transfer_count=config.transfer_count,
Expand All @@ -62,14 +70,6 @@ def main():
dbserverport_hllworker=config.dbserverport_hllworker,
include_lbsn_bases=config.include_lbsn_bases)

# initialize converter class
# depending on lbsn origin
# e.g. 1 = Instagram,
# 2 = Flickr, 2.1 = Flickr YFCC100m,
# 3 = Twitter)
importer = HF.load_importer_mapping_module(
config.origin)

# initialize input reader
input_data = LoadData(
importer=importer,
Expand Down
9 changes: 9 additions & 0 deletions lbsntransform/config/config.py
Expand Up @@ -75,6 +75,7 @@ def __init__(self):
self.include_lbsn_objects = []
self.include_lbsn_bases = None
self.override_lbsn_query_schema = None
self.mappings_path = None

BaseConfig.set_options()

Expand Down Expand Up @@ -289,6 +290,12 @@ def parse_args(self):
'types that will be ignored (e.g. to '
'ignore certain bots etc.)',
type=str)
settings_args.add_argument("--mappings_path",
help='Path mappings folder. '
'Provide a path to a custom folder '
'that contains an __init__.py and '
'one or more mapping modules.',
type=str)
settings_args.add_argument("--input_lbsn_type",
help='Input type, e.g. "post", "profile", '
'"friendslist", "followerslist" etc. '
Expand Down Expand Up @@ -523,6 +530,8 @@ def parse_args(self):
self.zip_records = True
if args.skip_until_record:
self.skip_until_record = args.skip_until_record
if args.mappings_path:
self.mappings_path = Path(args.mappings_path)
if args.min_geoaccuracy:
self.min_geoaccuracy = self.check_geoaccuracy_input(
args.min_geoaccuracy)
Expand Down
5 changes: 3 additions & 2 deletions lbsntransform/input/mappings/field_mapping_lbsn.py
Expand Up @@ -13,8 +13,9 @@
from google.protobuf.duration_pb2 import Duration
from shapely import wkb

from ...tools.helper_functions import HelperFunctions as HF
from lbsntransform.tools.helper_functions import HelperFunctions as HF

MAPPING_ID = 0

def parse_geom(geom_hex):
"""Parse Postgis hex WKB to geometry WKT"""
Expand Down Expand Up @@ -67,7 +68,7 @@ def set_lbsn_pkey(lbsn_obj_pkey, pkey_obj, pkey_val, origin_val):
pkey_obj.pkey)


class FieldMappingLBSN():
class importer():
""" Provides mapping function from LBSN (raw) endpoints to
protobuf lbsnstructure
"""
Expand Down
5 changes: 2 additions & 3 deletions lbsntransform/lbsntransform_.py
Expand Up @@ -49,7 +49,7 @@ class LBSNTransform():
"""

def __init__(
self, origin_id=3, logging_level=None,
self, importer, logging_level=None,
is_local_input: bool = False, transfer_count: int = 50000,
csv_output: bool = True, csv_suppress_linebreaks: bool = True,
dbuser_output=None, dbserveraddress_output=None, dbname_output=None,
Expand All @@ -74,8 +74,7 @@ def __init__(
# init global settings

self.transfer_count = transfer_count
self.importer = HF.load_importer_mapping_module(
origin_id)
self.importer = importer
# get origin name and id from importer
# e.g. yfcc100m dataset has origin id 21,
# but is specified as general Flickr origin (2) in importer
Expand Down
75 changes: 56 additions & 19 deletions lbsntransform/tools/helper_functions.py
Expand Up @@ -7,13 +7,17 @@

import datetime as dt
import json
import importlib.util
import logging
# due to different protocol buffers implementations on Unix, MacOS and Windows
# import types based on OS
import platform
import re
import inspect
import sys
import regex
import string
from pathlib import Path
from datetime import timezone
from json import JSONDecodeError, JSONDecoder
from typing import List, Set, Union, Optional
Expand Down Expand Up @@ -685,28 +689,61 @@ def map_to_dict(proto_map):
return {}

@staticmethod
def load_importer_mapping_module(origin: int):
def _get_file_list(path: Path, ext: str = "py"):
"""Return file list in folder"""
return [file.stem for file in path.glob(f'*.{ext}')]

@staticmethod
def dynamic_get_mapping_module(
origin: int,
mappings_path: Path = None):
"""Function to dynamically register input mappings
Args:
origin: The MAPPING_ID to identify the mapping module.
path: Override default path with user defined folder.
"""
if mappings_path is None:
mappings_module_name = "lbsntransform.input.mappings"
if origin == 0:
from lbsntransform.input.mappings.field_mapping_lbsn import \
importer as importer
return importer
else:
mapping_modules = HelperFunctions._get_file_list(
mappings_path)
init_file_str = "__init__"
mappings_module_name = mappings_path.name
for mapping_module in mapping_modules:
if mapping_module == init_file_str:
continue
spec = importlib.util.spec_from_file_location(
f"{mappings_path.name}.{mapping_module}",
mappings_path / f'{mapping_module}.py')
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
if hasattr(module, 'MAPPING_ID') and module.MAPPING_ID == origin:
if hasattr(module, 'importer'):
importer = module.importer
return importer
raise ImportError("importer missing in {module}")
raise ValueError(
f'{origin} not found in {mappings_module_name}. '
f'Input type not supported.')

@staticmethod
def load_module(package: str, name: str):
name = f"{package}.{name}"
__import__(name, fromlist=[''])

@staticmethod
def load_importer_mapping_module(
origin: int, mappings_path: Path = None):
""" Switch import module based on origin input
1 - Instagram, 2 - Flickr, 3 - Twitter, 4 - Facebook
"""
if origin == 0:
from lbsntransform.input.mappings.field_mapping_lbsn import \
FieldMappingLBSN as importer
elif origin == 2:
from lbsntransform.input.mappings.field_mapping_flickr import \
FieldMappingFlickr as importer
elif origin == 21:
# Flickr YFCC100M dataset
from lbsntransform.input.mappings.field_mapping_yfcc100m import \
FieldMappingYFCC100M as importer
elif origin == 3:
from lbsntransform.input.mappings.field_mapping_twitter import \
FieldMappingTwitter as importer
elif origin == 41:
from lbsntransform.input.mappings.field_mapping_fb import \
FieldMappingFBPlace as importer
else:
raise ValueError("Input type not supported")
importer = HelperFunctions.dynamic_get_mapping_module(
origin=origin, mappings_path=mappings_path)
return importer

@staticmethod
Expand Down
46 changes: 23 additions & 23 deletions mkdocs.yml
Expand Up @@ -15,28 +15,28 @@ theme:
- bash

markdown_extensions:
- toc:
permalink: true
- markdown_include.include:
base_path: docs
- admonition
- fenced_code
- sane_lists
- toc:
permalink: true
- markdown_include.include:
base_path: docs
- admonition
- fenced_code
- sane_lists

nav:
- Introduction: index.md
- User Guide:
- Quick Installation: quick-guide.md
- Use Cases: use-cases.md
- Input Types: input-types.md
- Mappings:
- Input Mappings: input-mappings.md
- Output Mappings: output-mappings.md
- Command Line Interface:
- Arguments: argparse/args.md
- Examples: argparse/examples.md
- Developers:
- Importing lbsntransform as a package: package.md
- API Reference (external): api/lbsntransform_.html
- Additional resources: resources.md
- About: about.md
- Introduction: index.md
- User Guide:
- Quick Installation: quick-guide.md
- Use Cases: use-cases.md
- Input Types: input-types.md
- Mappings:
- Input Mappings: input-mappings.md
- Output Mappings: output-mappings.md
- Command Line Interface:
- Arguments: argparse/args.md
- Examples: argparse/examples.md
- Developers:
- Importing lbsntransform as a package: package.md
- API Reference (external): api/lbsntransform_.html
- Additional resources: resources.md
- About: about.md

0 comments on commit 4168509

Please sign in to comment.