Skip to content

WebBeds/datalake-pylib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataLake Python Library

Private repository for store and maintain python packages used in different repositories of Datalake.


Requirements (Deprecated)

In order to utilize the packages that contain this repository you need to install gpip with pip, the recommended version is the 0.4.7

pip3 install gpip==0.4.7

Stable packages

You can view the current stable and older versions of a package in this repository here.

The version of the packages will follow this pattern. package-version. Example: etl-schema-0.1.

The versions can be specified in the gpip installation, as the == flag similar to pip, where you can specify the version, this will checkout the tag specified.


Current packages

The packages are separated by groups.

ETL Library Utility packages of Datalake.
  • Normalize dataframes.

  • Make actions with Pandas DataFrames like getting reports from differences between two DataFrames.

  • Manage and make action on S3 with Pandas DataFrames.

  • Interact with AWS Athena or Postgres, send queries or get dataframes.

  • Some usefull AWS methods that let your code be more dynamically. Example, detect when the machine that is running your code is a Lambda function.

  • Some usefull utilities without a common property but are utilized in the ETL repository. Example, send alarm to teams, loggin utility for making prints more complex.


Example of installation of a package.

This will install the Schema package of the etl group.

gpip get github.com/Webjet/datalake-pylib/etl/schema:datalake-etl-schema

Issue

If you have an issue you can publish here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages