Skip to content
Few nice functions for every-day use
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Few nice functions for every-day use

Download and Install


  • Load the devtools package


  • Install package


  • Load package



Concatenate strings

x %+% y

Read sql query from file

Only one query per file.

read_query(path) - read single query.

Read all queries from sql directory and execute them:

library(RPostgreSQL) #your_database_library

queries <- list.files('sql/',full.names = T)
all_data <- tibble(queries) %>% 
    rowwise() %>% 
    mutate(query = map_chr(queries, read_query),  # read all queries
           name = str_sub(str_extract(string = queries, '/.+\\.'),2,-2), # extract file name as query name
           result = map(query,~dbGetQuery(db, query)) # execute query


Kill all MySQL Connections.

killMySQLConnections() - will print all open connections and close them.

Write data to redshift db by chunks

write_data_redshift(con,df,name, chunk = 100000)

  • con - RPostgreSQL formal class database connection
  • df - dataframe with data to write
  • name - Name of targeted table (will be overwritten)
  • chunk - chunk-size, default 100000

Read data from database by chunks

read_data(con,chunk = 100000, table, order_column = 'Id')

  • con - DBI formal class database connection (RMySQL, RPostgreSQL)
  • table - string with table to read name
  • order_column - Name of Primary key to sort during splitting by chunks, must be unique.
  • chunk - chunk-size, default 100000

Copy data from one DB to anoother by chunks

Copy data from DBI-formal-class connection database (tested with MySQL) to redshift by chunks

copy_data(redshift, mysql, from_table, to_table, chunk = 100000, order_column = 'Id', rewrite = T, test = F)

  • redshift - RPostgreSQL formal class database connection? Targeted database
  • mysql - DBI formal class database connection (RMySQL, RPostgreSQL), source database
  • to_table - string with final table name
  • order_column - Name of Primary key to sort during splitting by chunks, must be unique.
  • chunk - chunk-size, default 100000
  • rewrite - If False will append data to existing table
  • test - If TRUE will be limited to 3 iterations

Biathlon(financial pattern retentions):

groupby-summarise function

Thanks to lxii for contribution!

Note: Dataset in tidy format, months must be factor variables. If not, run something like this: mutate(x, month = factor(month, levels = c('Feb', 'Mar', 'Apr', 'May'), ordered = T)

Using: getBiathlon(x, length)

  • x - the month or time period factor variable
  • length - the number of periods to build the biathlon for

Add financial pattern column to dataset

Note: monthly data must be in columns

make_biatlon <- (dataset,columns,base_column)

  • dataset - input dataframe
  • columns - vector with strings names of montly data. Without NAs
  • base_column - column to compare with.

Example: make_biatlon(ds,month_range$MYM[1:4],threshold)

Calculate retentioans by financial pattern segments


  • biathlon - vector with all finpattern data

Distribution of rows by groups:

Distribution by all available items

pool_ds consist of two columns:

  • user_id (key for distributed row)
  • distribution_value - partly missing identificator (branch, region, sales rep etc)


# Distribution
distribution <- distribute_honestly(pool_ds)

Gathering results:

# How many rows per one distribution_value
capacity <- distribution$capacity
# final dataset
distribution_result <- distribution$distribution_result

Distribution with precalculated volumes

distribution <- distribute(pool_ds,capacity = capacity)

Error logging

Log errors and warnings to slack channel.

Thanks to lxii for contribution!

Using: errorHandler(code, prefix)


slackr_setup(config_file = slackr'), echo = T) # configure slackR bot
  # some code, connection to database e.g.
}, prefix = 'Connection to database is broken')
You can’t perform that action at this time.